12-16-2007, 06:08 PM | #121 |
Translating Calibre...
Posts: 657
Karma: 2902
Join Date: Aug 2007
Location: ER.de
Device: [PRS-500], PB360
|
New Profile - Dilbert
Just to let everyone know, I posted a profile for "Dilbert" - the dayly comicstrip on Kovid's wiki.
https://libprs500.kovidgoyal.net/wiki/UserProfiles Thanks to Stenis - it is his favourite feed. |
12-17-2007, 04:37 AM | #122 |
Groupie
Posts: 182
Karma: 1078201
Join Date: Sep 2007
Device: iPad Air 2
|
Thanks for the Dilbert profile.
What a great idea! |
Advert | |
|
12-17-2007, 03:56 PM | #123 |
Translating Calibre...
Posts: 657
Karma: 2902
Join Date: Aug 2007
Location: ER.de
Device: [PRS-500], PB360
|
|
01-09-2008, 10:32 PM | #124 |
Enthusiast
Posts: 26
Karma: 11777
Join Date: Jun 2007
Location: Brooklyn
Device: PRS-500,Treo 750, Archos 605 Wifi
|
Profile for the TheNation.com
Hello
I'm in the process of developing a profile to log in and download articles from thenation.com. The Nation doesn't have an RSS feed for their monthly articles. They have feeds for Most Emailed, Top Stories, etc.. But I want to download the current month's "Magazine." What's helpful is that they the month's articles (those included in print AND web only articles) are located @ http://www.thenation.com/issue/YYYYMMDD The individual articles are located at http://www.thenation.com/doc/YYYYMMDD/author_name. So I was able to scrape out all the urls for for the articles. Then in trying to figure out what to do next, I decided to take those URLs and create an rss xml file on my local drive (c:\program files\libprs500\nation.xml), that i then returned at the end of the profile: return [('feed1','file:///c:/program%20files/libprs500/nation.xml')] I worked! Now i need figure out how to extract the article titles and descriptions and make the proper replacements to get the print versions of the articles instead. But the main reason I'm posting it to ask if creating and accessing the local rss file is the way to go. This would be a lot more convinient to anyone interested if the profile script didn't have to worry about generating files and directory structures. Just started to take a look at this a few days ago and its the first time I try my hand at python so thanks for any help in advance. |
01-09-2008, 11:06 PM | #125 |
creator of calibre
Posts: 44,565
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Creating an XML file will work, it is the least python intensive solution. However, you can also just override the parse_feeds() function. It should return a list of dictionaries. Each dictionary should be of the form
Code:
{ 'title' : article title, 'url' : URL of print version, 'date' : The publication date of the article as a string, 'description' : A summary of the article } |
Advert | |
|
01-10-2008, 02:47 AM | #126 |
Enthusiast
Posts: 26
Karma: 11777
Join Date: Jun 2007
Location: Brooklyn
Device: PRS-500,Treo 750, Archos 605 Wifi
|
Hello
Instead of overriding the get_feeds, i've attempted to override the parse_feeds function. I create the list of dictionaries and return it. Now I get this message: File "convert_from.py", line 198, in <module> File "convert_from.py", line 192, in main File "convert_from.py", line 131, in process_profile File "libprs500\ebooks\lrf\web\profiles\__init__.py o", line 93, in __init__ File "libprs500\ebooks\lrf\web\profiles\__init__.py o", line 127, in build_index AttributeError: 'list' object has no attribute 'keys' thank you |
01-10-2008, 11:19 AM | #127 |
creator of calibre
Posts: 44,565
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Oh I'm sorry, what needs to be returned is a dictionary whose keys are feed titles (like Business, National News, etc) and whose values are athe list of dictionaries I mentioned before.
|
01-10-2008, 12:15 PM | #128 |
Junior Member
Posts: 2
Karma: 19
Join Date: Jan 2008
Location: Hamburg / Germany
Device: Axim x51v and div. other / Sony PRS 505 / Nokia E51
|
Hi there
here is a quickndirty snippet from me for germany heise newsticker its working fine for me Code:
import re from libprs500.ebooks.lrf.web.profiles import DefaultProfile class heise (DefaultProfile): title = 'Heise Newsticker' max_recursions = 2 use_pubdate = False no_stylesheets = True max_articles_per_feed = 30 preprocess_regexps = [ (re.compile(i[0], re.IGNORECASE | re.DOTALL), i[1]) for i in [ (r'<!-- Site Navigation Bar -->.*?<title>', lambda match : '<title>'), (r'</title>.*?</head>', lambda match : '</title> </head>'), (r'<!-- allgemeine obere Navigation -->.*?</heisetext>', lambda match : ''), (r'<table.*?</table>', lambda match : ''), (r'<br clear="all".*?</body>', lambda match : '</div> </body>') ] ] def get_feeds(self): return [ ('Heise Newsticker', 'http://www.heise.de/newsticker/heise.rdf') ] def print_version(self, url): return url.replace('http://www.heise.de/newsticker/meldung/', 'http://www.heise.de/newsticker/meldung/print/') Stefan Last edited by shempe; 01-11-2008 at 12:09 PM. |
01-10-2008, 12:22 PM | #129 |
creator of calibre
Posts: 44,565
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You should add it to https://libprs500.kovidgoyal.net/wiki/UserProfiles so other people can find and use it. You'll need to create ana ccount and let me know the user name so I can give you write permission for the wiki.
|
01-10-2008, 01:43 PM | #130 | |
Enthusiast
Posts: 26
Karma: 11777
Join Date: Jun 2007
Location: Brooklyn
Device: PRS-500,Treo 750, Archos 605 Wifi
|
The Nation
Quote:
Finally being able the read the Nation every month and get the New York Times every morning adds so much value to my Sony Reader (I might be able to convince others to buy one.) Thanks for all your work and help. |
|
01-11-2008, 11:42 AM | #131 |
Junior Member
Posts: 2
Karma: 19
Join Date: Jan 2008
Location: Hamburg / Germany
Device: Axim x51v and div. other / Sony PRS 505 / Nokia E51
|
New Profile Golem and Heise Updated
I posted a new profile for German Golem News and update my Heise Newsticker
look at: https://libprs500.kovidgoyal.net/wiki/UserProfiles Stefan |
01-11-2008, 04:05 PM | #132 | |
Zealot
Posts: 123
Karma: 446460
Join Date: Jul 2007
Device: Inkpalm 5 Mini
|
Quote:
I know nothing of python or html and have tried experimenting but realize I need to see a working example from a non-RSS feed profile. Otherwise I think it should be quite simple because the layout of the text version of the paper is already very Sony reader friendly. I don't have my Sony Reader yet. I ordered it yesterday (shipping to Australia) but figure trying to sort this out is a good way to pass my waiting time |
|
01-14-2008, 04:25 PM | #133 | |
Translating Calibre...
Posts: 657
Karma: 2902
Join Date: Aug 2007
Location: ER.de
Device: [PRS-500], PB360
|
Quote:
Nur weiter so! :-) Magst du dich mal an die Sueddeutsche.de wagen... Oder an fscklog.com oder mactechnews.de... |
|
01-16-2008, 06:39 AM | #134 |
Member
Posts: 16
Karma: 10
Join Date: Sep 2007
Device: PRS-500
|
Hi All!
I have a problem converting one RSS feed - the problem is with < and > (feed is full of that). I tried to write regex like: Code:
(r'(<)(.*?>)', lambda match : '<code>' + match.group(1) + match.group(2) + '</code>'), can anyone help me with that? kovidgoyal - big thanx for your work on this program ! |
01-16-2008, 12:43 PM | #135 |
creator of calibre
Posts: 44,565
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
What's the problem with < and >? Are they not being converted correctly?
|
Tags |
libprs500, web2lrf |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
web2lrf to capture blog archive? | Deputy-Dawg | Sony Reader Dev Corner | 1 | 02-15-2008 12:41 AM |
web2lrf: La Repubblica | alexxxm | Sony Reader | 1 | 11-13-2007 01:27 PM |