Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 01-19-2008, 07:21 PM   #1
Deputy-Dawg
Groupie
Deputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-books
 
Deputy-Dawg's Avatar
 
Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
Help writing profile to get RSS feed

I am in the throes of learning to program in Python. I have very nearly completed a profile to capture the RSS feed of my local newspaper. I am having a problem returning the print versions of the feeds. I know that there is a corresponding print format for each article.

Each article has the format:

http://www.nwaonline.net/articles/20...boozmanxna.txt

The corresponding article has the format:

http://www.nwaonline.net/articles/20...boozmanxna.prt

eg I need only to replace the extension .txt with the extension .prt.

But try as I may I just can't seem to do it. Clearly I have a blind spot. Can anyone please help
Deputy-Dawg is offline   Reply With Quote
Old 01-19-2008, 08:24 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,551
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Code:
url = original_url.rpartition('.')[0] + '.prt'
kovidgoyal is offline   Reply With Quote
Advert
Old 01-20-2008, 05:25 PM   #3
Deputy-Dawg
Groupie
Deputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-books
 
Deputy-Dawg's Avatar
 
Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
Thanks, that was the leg up I needed. I have a bit more to do on the profile. When I am done is there anyway that I can integrate it into the GUI? I am so darn clumsy in typing! Being 74 with Parkinson's does make life a bit more complicate.

On the other hand the 'need' to learn yet another language is stimulating.
Deputy-Dawg is offline   Reply With Quote
Old 01-20-2008, 05:28 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,551
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Not at the moment, it's on my TODO list. And wow, I hope I'm capable of learning a new language at 74!

In the meantime, if you post the profile here, I'll add it to the GUI so that it will be available in the next release of libprs500.
kovidgoyal is offline   Reply With Quote
Old 01-20-2008, 06:25 PM   #5
Deputy-Dawg
Groupie
Deputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-books
 
Deputy-Dawg's Avatar
 
Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
I've attached the one that I have working currently. There are still a couple of gotchas - including how to add some of their other feeds, aside from hard coding that is, and what the optimum number of files to down load.

Thad being said I am now trying to create a code for the other major newspaper on the area, "The Arkansas Democrat Gazette". They use one strange site for their RSS feed. When you access it from their RSS informaton page

http://www.nwanews.com/feeds/

by clicking on the link 'NWAnews.com (all daily "News" sections) it takes you to

feed://feeds.feedburner.com/nwanewsall

and, of course, web2lrf does not recognize a url beginning with 'feed': If you manually enter the address in the address window of Safari you get there and if you enter

http://feeds.feedburner.com/nwanewsall

you are redirected. But neither approach seems to work with web2lrf
Attached Files
File Type: zip nwa2.py.zip (995 Bytes, 607 views)

Last edited by Deputy-Dawg; 01-21-2008 at 04:13 PM. Reason: replaced the file
Deputy-Dawg is offline   Reply With Quote
Advert
Old 01-20-2008, 06:29 PM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,551
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You can just have the get_feeds function return the feed URL like this

Code:
def get_feeds(self):
  return [('NWANews', 'http://feeds.feedburner.com/nwanewsall')]
kovidgoyal is offline   Reply With Quote
Old 01-21-2008, 03:51 PM   #7
Deputy-Dawg
Groupie
Deputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-books
 
Deputy-Dawg's Avatar
 
Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
Thanks, again...
I am appending a newer version of hte profile to get the Morning News. Much to my surprise a number of the print files contain references to images which web2lrf was resolving and making a bit of a mess of the files. I have added a line of coded which seems to have fixed the problem.

The profile for the Democrat Gazette is another thing. The call to the file (the one that would be displayed on your monitor with all the ads and other BS - the url in the "href=" statement) is in the form of:

http://feeds.feedburner.com/~r/nwanewsall/~3/219845886/

which is somewhere resolved to:

http://www.nwanews.com/adg/News/214246/

and I of course want:

http://www.nwanews.com/adg/News/214246/print/

but if you append 'print/' to the originally called url giving you:

http://feeds.feedburner.com/~r/nwane...9845886/print/

it to is resolved to:

http://www.nwanews.com/adg/News/214246/

and although the desired UFL is embedded in the first called file I have yet to come up with code that will extract it with our harming the print file. (This is because the print file and the web file are, in the area in which we are interested are structurally identical)

If you have a moment take a look and see if you can suggest an approach. Also I should note that to even to begin to attempt to extract and use the URL from the display file it is necessary to increase the amount of recursion to 3 which introduces it own set of difficulties.

Sigh!!!! Programing is such fun
Deputy-Dawg is offline   Reply With Quote
Old 01-21-2008, 09:48 PM   #8
Deputy-Dawg
Groupie
Deputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-books
 
Deputy-Dawg's Avatar
 
Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
No need to respond to the last question. I found a source for the desired urls in the document. Some times you really do have to read the code quite literally. In any event here is a profile for the Arkansas Democrat Gazette and several wholly owned subsidiaries.

Again Thanks. Once I got a feel for the syntax being used it made climbing on to that new bike a bit easier. Now I have to learn to deal with the editor (or get a new one) (I am using BBedit 8.7 ) sometimes - indeed more often than not - Python will complain about an indent error even when there is none by visual inspection of the code and by checking BBedits format checker. The only fix seems to be to delete the offending code and re-enter it. I am sure this can be automated. I just have not figured it out as yet.
Attached Files
File Type: zip dem_gaz.py.zip (1.0 KB, 575 views)
Deputy-Dawg is offline   Reply With Quote
Old 01-23-2008, 12:00 AM   #9
Deputy-Dawg
Groupie
Deputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-books
 
Deputy-Dawg's Avatar
 
Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
I am working on another profile and am running into a rather different problem, or at least think I am. The url that I need returned is:

http://www.fides.org/aree/news/newsd...=11302&lan=eng

when I invoke the profile i get the following message:

Macintosh-3:books billc$ web2lrf --verbose --user-profile Agenzia_Fides.py
[ERROR] __init__.pyo:210: Error parsing article:
<item rdf:about="http://www.fides.org/aree/news/newsdet.php?idnews=11302&amp;lan=eng">
<dc:format>text/html</dc:format>
<dc:date>2008-01-21T14:00:00+01:00</dc:date>
<dc:source>http://www.fides.org</dc:source>
<dc:creator>Fides Service</dc:creator>
<title>VATICAN - The Pope's Angelus: &#x201C;The Church's evangelising mission is part of her ecumenical path&#x201D;; &#x201C;I am bound to the university world by love for the quest for truth, for discussion, frank dialogue, respectful of reciprocal positions. All this is also part of the Church's mission &#x201D;</title>
<link>http://www.fides.org/aree/news/newsdet.php?idnews=11302&amp;lan=eng</link>
<description>&lt;b&gt;VATICAN - The Pope's Angelus: &#x201C;The Church's evangelising mission is part of her ecumenical path&#x201D;; &#x201C;I am bound to the university world by love for the quest for truth, for discussion, frank dialogue, respectful of reciprocal positions. All this is also part of the Church's mission &#x201D;&lt;/b&gt;&lt;br&gt;&lt;br&gt;
Vatican City (Agenzia Fides) - On Sunday 20 January the Holy Father Pope Benedict XVI dedicated his midday Angelus reflection to the issue of ecumenism, this being the Week of Prayer for Christian Unity, and to his planned and then cancelled visit...</description>
</item>
Traceback (most recent call last):
File "libprs500/ebooks/lrf/web/profiles/__init__.pyo", line 197, in parse_feeds
File "libprs500/ebooks/lrf/web/profiles/__init__.pyo", line 269, in strptime
KeyError: u'2008-01-21T14:00:0'
[ERROR] __init__.pyo:210: Error parsing article:
<item rdf:about="http://www.fides.org/aree/news/newsdet.php?idnews=11303&amp;lan=eng">
<dc:format>text/html</dc:format>
<dc:date>2008-01-21T14:00:00+01:00</dc:date>
<dc:source>http://www.fides.org</dc:source>
<dc:creator>Fides Service</dc:creator>
<title>VATICAN - Pope Benedict XVI visits Capranica College: &#x201C;Without friendship with Jesus it is impossible for a Christian, and even more so for a priest, to bring to completion the mission entrusted by the Lord &#x201D;</title>
<link>http://www.fides.org/aree/news/newsdet.php?idnews=11303&amp;lan=eng</link>
<description>&lt;b&gt;VATICAN - Pope Benedict XVI visits Capranica College: &#x201C;Without friendship with Jesus it is impossible for a Christian, and even more so for a priest, to bring to completion the mission entrusted by the Lord &#x201D;&lt;/b&gt;&lt;br&gt;&lt;br&gt;
Vatican City (Agenzia Fides) - &#x201C;Under various circumstances I have reminded seminarians and priests of the urgency of nurturing a profound interior life, personal and continual contact with Christ in prayer and contemplation, and genuine striving for...</description>
</item>


the only line in the source file that contains anything that resembles the url is:

<a href="http://www.fides.org/aree/news/newsdet.php?idnews=11302&amp;lan=eng">

which, if I am reading the error message correctly web2lrf cannot parse. I suspect that the problem is in the '&amp;' representation of the '&' in the url, and if that is the case I see no way that I can code anything in the profile to deal with it.
Deputy-Dawg is offline   Reply With Quote
Old 01-23-2008, 12:59 AM   #10
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,551
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
No the problem is the weird date format
Code:
2008-01-21T14:00:00+01:00
The simple way to fix it is to set
Code:
use_pubdate = False
The more correct way to fix it is to override the strptime function

Code:
def strptime(self, raw):
   return calendar.timegm(time.strptime('%Y-%m-%dT%H:%M:%S+01:00', raw))-3600
You might have to play with the above strptime to get it to parse the date correctly.

Last edited by kovidgoyal; 01-23-2008 at 01:03 AM.
kovidgoyal is offline   Reply With Quote
Old 01-24-2008, 01:36 AM   #11
Deputy-Dawg
Groupie
Deputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-books
 
Deputy-Dawg's Avatar
 
Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
I have added the following to my profile:

import calendar
import time

def strptime(self, raw):
return calendar.timegm(time.strptime('%Y-%m-%dT%H:&M:%S+01:00', raw))-3600



When I run the profile in web2lrf I get the following error message:

Traceback (most recent call last):
File "libprs500/ebooks/lrf/web/profiles/__init__.pyo", line 197, in parse_feeds
File "/Users/billc/Desktop/Books/ag.py", line 34, in strptime
return calendar.timegm(time.strptime('%Y-%m-%dT%H:&M:%S+01:00', raw))-3600
File "_strptime.pyo", line 331, in strptime
ValueError: time data did not match format: data=%Y-%m-%dT%H:&M:%S+01:00 fmt=2008-01-21T14:00:00+01:00


To validate the code I inserted into a profile (nwa2.py) which I knew worked and ran it and, of course, it failed with a similar error message (eg about the formats not matching) I then altered the string to match the one given using the symbols from Pythons documentation and lo...... it works.

Finally I added

use_pubdate = False

and that too works. There is an error in the string, but I sure don't see it! Is there any debug code that would permit me to look at the parameters and data that is being passed? As I read the code the string should match

%Y = Decimal year with century prepended
%m = Decimal month
%d = Decimal day
%H = Decimal Hour (24 hour notation)
%M = Decimal Minutes
%S = Decimal Seconds

the remaining characters eg (within the quotes) "-", ":", "T","1",:0","2","4","8", represent themselves.

But it does not.

BTW the only way to get the profile Dem_Gaz.py to run is to use the use_pubdate = False because. apparently, the files have no publication date - or that is what the error message says.

Got to go to bed. Work on it some more tomorrow.
Deputy-Dawg is offline   Reply With Quote
Old 01-24-2008, 03:56 AM   #12
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,551
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
&M should be %M in the format string

Incidentally the next release of libprs500 will have the ability to add user created profiles to the GUI (it's already implemented in svn).
kovidgoyal is offline   Reply With Quote
Old 01-24-2008, 10:58 AM   #13
Deputy-Dawg
Groupie
Deputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-books
 
Deputy-Dawg's Avatar
 
Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
Yes, it should be. And it was in the original file. I retyped it an made a typo. That being said when the correct string is used (I hope I typed it correctly this morning) I still get the following error message:

[ERROR] __init__.pyo:210: Error parsing article:
<item rdf:about="http://www.fides.org/aree/news/newsdet.php?idnews=11338&amp;lan=eng">
<dc:format>text/html</dc:format>
<dc:date>2008-01-22T14:00:00+01:00</dc:date>
<dc:source>http://www.fides.org</dc:source>
<dc:creator>Fides Service</dc:creator>
<title>ASIA/HOLY LAND - Caritas Jerusalem: calls for an end to humanitarian crisis in Gaza and assistance for Palestinian children</title>
<link>http://www.fides.org/aree/news/newsdet.php?idnews=11338&amp;lan=eng</link>
<description>&lt;b&gt;ASIA/HOLY LAND - Caritas Jerusalem: calls for an end to humanitarian crisis in Gaza and assistance for Palestinian children&lt;/b&gt;&lt;br&gt;&lt;br&gt;
Jerusalem (Agenzia Fides) - Caritas Jerusalem has called for the block of persons and goods which is causing the humanitarian crisis in Gaza to be lifted. It joined major international humanitarian organisations in warning of a serious human and soci...</description>
</item>
Traceback (most recent call last):
File "libprs500/ebooks/lrf/web/profiles/__init__.pyo", line 197, in parse_feeds
File "/Users/billc/Desktop/Books/ag.py", line 34, in strptime
return calendar.timegm(time.strptime('%Y-%m-%dT%H:%M:%S+01:00', raw))-3600
File "_strptime.pyo", line 331, in strptime
ValueError: time data did not match format: data=%Y-%m-%dT%H:%M:%S+01:00 fmt=2008-01-22T14:00:00+01:00


I have examined the value in the line:

<dc:date>2008-01-22T14:00:00+01:00</dc:date>

in a hex editor to see if there were any 'strange" characters in it. There are none. I assume that this is the value that is being passed to strptime. If that is the case I don't understand what is not being matched.
Deputy-Dawg is offline   Reply With Quote
Old 01-24-2008, 01:18 PM   #14
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,551
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
oops my mistake should be

Code:
time.strptime(raw, '%Y-%m-%dT%H:%M:%S+01:00')
kovidgoyal is offline   Reply With Quote
Old 01-25-2008, 10:49 AM   #15
The Old Man
Fanatic
The Old Man ought to be getting tired of karma fortunes by now.The Old Man ought to be getting tired of karma fortunes by now.The Old Man ought to be getting tired of karma fortunes by now.The Old Man ought to be getting tired of karma fortunes by now.The Old Man ought to be getting tired of karma fortunes by now.The Old Man ought to be getting tired of karma fortunes by now.The Old Man ought to be getting tired of karma fortunes by now.The Old Man ought to be getting tired of karma fortunes by now.The Old Man ought to be getting tired of karma fortunes by now.The Old Man ought to be getting tired of karma fortunes by now.The Old Man ought to be getting tired of karma fortunes by now.
 
The Old Man's Avatar
 
Posts: 525
Karma: 1300001
Join Date: Jan 2008
Location: Keene, New Hampshire
Device: iPad Mini
Well, I have been reading this thread and I have learned one thing.
I will never be able to learn how to add feeds. - My fault, not yours.

Any chance of adding a feed from the Jerusalem Post http://www.jpost.com/
to the next version of libprs500?
Thanks
The Old Man is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
RSS Feed timezone Feedback 8 01-02-2010 07:55 PM
RSS Feed questions rambling Calibre 2 11-20-2008 06:35 AM
Working User Profile for Wired.com RSS feeds for libprs500 DaveNB Calibre 6 11-30-2007 08:00 AM
RSS Feed Updates Alexander Turcic Announcements 0 06-11-2004 05:11 PM


All times are GMT -4. The time now is 02:44 PM.


MobileRead.com is a privately owned, operated and funded community.