web2lrf - Page 9

StDo · 12-16-2007, 05:08 PM

Just to let everyone know, I posted a profile for "Dilbert" - the dayly comicstrip on Kovid's wiki.
https://libprs500.kovidgoyal.net/wiki/UserProfiles

Thanks to Stenis - it is his favourite feed.

JTravers · 12-17-2007, 03:37 AM

Thanks for the Dilbert profile.
What a great idea!

StDo · 12-17-2007, 02:56 PM

Quote:

Originally Posted by JTravers

Thanks for the Dilbert profile.
What a great idea!

You are welcome.

Btw. let the karma grow!

secretsubscribe · 01-09-2008, 09:32 PM

Hello
I'm in the process of developing a profile to log in and download articles from thenation.com.
The Nation doesn't have an RSS feed for their monthly articles. They have feeds for Most Emailed, Top Stories, etc.. But I want to download the current month's "Magazine."
What's helpful is that they the month's articles (those included in print AND web only articles) are located @ http://www.thenation.com/issue/YYYYMMDD
The individual articles are located at http://www.thenation.com/doc/YYYYMMDD/author_name.

So I was able to scrape out all the urls for for the articles.
Then in trying to figure out what to do next, I decided to take those URLs and create an rss xml file on my local drive (c:\program files\libprs500\nation.xml),
that i then returned at the end of the profile:
return [('feed1','file:///c:/program%20files/libprs500/nation.xml')]

I worked!
Now i need figure out how to extract the article titles and descriptions and make the proper replacements to get the print versions of the articles instead.

But the main reason I'm posting it to ask if creating and accessing the local rss file is the way to go. This would be a lot more convinient to anyone interested if the profile script didn't have to worry about generating files and directory structures.
Just started to take a look at this a few days ago and its the first time I try my hand at python so thanks for any help in advance.

kovidgoyal · 01-09-2008, 10:06 PM

Creating an XML file will work, it is the least python intensive solution. However, you can also just override the parse_feeds() function. It should return a list of dictionaries. Each dictionary should be of the form

Code:

{
            'title'       : article title,
            'url'         : URL of print version,
            'date'        : The publication date of the article as a string,
            'description' : A summary of the article
}

secretsubscribe · 01-10-2008, 01:47 AM

Hello
Instead of overriding the get_feeds, i've attempted to override the parse_feeds function.
I create the list of dictionaries and return it.
Now I get this message:
File "convert_from.py", line 198, in <module>
File "convert_from.py", line 192, in main
File "convert_from.py", line 131, in process_profile
File "libprs500\ebooks\lrf\web\profiles\__init__.py o", line 93, in __init__
File "libprs500\ebooks\lrf\web\profiles\__init__.py o", line 127, in build_index
AttributeError: 'list' object has no attribute 'keys'

thank you

kovidgoyal · 01-10-2008, 10:19 AM

Oh I'm sorry, what needs to be returned is a dictionary whose keys are feed titles (like Business, National News, etc) and whose values are athe list of dictionaries I mentioned before.

shempe · 01-10-2008, 11:15 AM

Hi there

here is a quickndirty snippet from me

for germany heise newsticker

its working fine for me

Code:

import re

from libprs500.ebooks.lrf.web.profiles import DefaultProfile

class heise (DefaultProfile):

    title = 'Heise Newsticker'
    max_recursions = 2
    use_pubdate = False
    no_stylesheets = True
    max_articles_per_feed = 30
    
    
    preprocess_regexps = [ (re.compile(i[0], re.IGNORECASE | re.DOTALL), i[1]) for i in [
    (r'<!-- Site Navigation Bar -->.*?<title>', lambda match : '<title>'),
    (r'</title>.*?</head>', lambda match : '</title> </head>'),
    (r'<!-- allgemeine obere Navigation -->.*?</heisetext>', lambda match : ''),
    (r'<table.*?</table>', lambda match : ''),
    (r'<br clear="all".*?</body>', lambda match : '</div> </body>')
    ] ]

    def get_feeds(self):
        return [ ('Heise Newsticker', 'http://www.heise.de/newsticker/heise.rdf') ]

    def print_version(self, url): 
	        return url.replace('http://www.heise.de/newsticker/meldung/', 'http://www.heise.de/newsticker/meldung/print/')

have fun
Stefan

kovidgoyal · 01-10-2008, 11:22 AM

You should add it to https://libprs500.kovidgoyal.net/wiki/UserProfiles so other people can find and use it. You'll need to create ana ccount and let me know the user name so I can give you write permission for the wiki.

secretsubscribe · 01-10-2008, 12:43 PM

Quote:

Originally Posted by kovidgoyal

Oh I'm sorry, what needs to be returned is a dictionary whose keys are feed titles (like Business, National News, etc) and whose values are athe list of dictionaries I mentioned before.

Fantastic! It works. Just need to polish a few things as much as i currently am able and then I'll post the profile.

Finally being able the read the Nation every month and get the New York Times every morning adds so much value to my Sony Reader (I might be able to convince others to buy one.)

Thanks for all your work and help.

shempe · 01-11-2008, 10:42 AM

I posted a new profile for German Golem News and update my Heise Newsticker

look at:

https://libprs500.kovidgoyal.net/wiki/UserProfiles

Stefan

cartz · 01-11-2008, 03:05 PM

Quote:

Originally Posted by secretsubscribe

Fantastic! It works. Just need to polish a few things as much as i currently am able and then I'll post the profile.

I look forward to your posting so I can use it as a template for a newspaper I'd like to get working. It has a text only edition of the paper that has an index page and all articles a single link from that. http://www.theage.com.au/text/

I know nothing of python or html and have tried experimenting but realize I need to see a working example from a non-RSS feed profile. Otherwise I think it should be quite simple because the layout of the text version of the paper is already very Sony reader friendly.

I don't have my Sony Reader yet. I ordered it yesterday (shipping to Australia) but figure trying to sort this out is a good way to pass my waiting time

StDo · 01-14-2008, 03:25 PM

Quote:

Originally Posted by shempe

I posted a new profile for German Golem News and update my Heise Newsticker

look at:

https://libprs500.kovidgoyal.net/wiki/UserProfiles

Stefan

Super!

Nur weiter so! :-)

Magst du dich mal an die Sueddeutsche.de wagen...

Oder an fscklog.com oder mactechnews.de...

slav · 01-16-2008, 05:39 AM

Hi All!

I have a problem converting one RSS feed - the problem is with < and > (feed is full of that).

I tried to write regex like:

Code:

(r'(&lt;)(.*?&gt;)', lambda match : '<code>' + match.group(1) + match.group(2) + '</code>'),

but it doesn't work (I'm not a regex wizard :-)

can anyone help me with that?

kovidgoyal - big thanx for your work on this program !

kovidgoyal · 01-16-2008, 11:43 AM

What's the problem with < and >? Are they not being converted correctly?

12-16-2007, 05:08 PM	#121
StDo Translating Calibre... Posts: 657 Karma: 2902 Join Date: Aug 2007 Location: ER.de Device: [PRS-500], PB360	New Profile - Dilbert Just to let everyone know, I posted a profile for "Dilbert" - the dayly comicstrip on Kovid's wiki. https://libprs500.kovidgoyal.net/wiki/UserProfiles Thanks to Stenis - it is his favourite feed.

01-09-2008, 09:32 PM	#124
secretsubscribe Enthusiast Posts: 26 Karma: 11777 Join Date: Jun 2007 Location: Brooklyn Device: PRS-500,Treo 750, Archos 605 Wifi	Profile for the TheNation.com Hello I'm in the process of developing a profile to log in and download articles from thenation.com. The Nation doesn't have an RSS feed for their monthly articles. They have feeds for Most Emailed, Top Stories, etc.. But I want to download the current month's "Magazine." What's helpful is that they the month's articles (those included in print AND web only articles) are located @ http://www.thenation.com/issue/YYYYMMDD The individual articles are located at http://www.thenation.com/doc/YYYYMMDD/author_name. So I was able to scrape out all the urls for for the articles. Then in trying to figure out what to do next, I decided to take those URLs and create an rss xml file on my local drive (c:\program files\libprs500\nation.xml), that i then returned at the end of the profile: return [('feed1','file:///c:/program%20files/libprs500/nation.xml')] I worked! Now i need figure out how to extract the article titles and descriptions and make the proper replacements to get the print versions of the articles instead. But the main reason I'm posting it to ask if creating and accessing the local rss file is the way to go. This would be a lot more convinient to anyone interested if the profile script didn't have to worry about generating files and directory structures. Just started to take a look at this a few days ago and its the first time I try my hand at python so thanks for any help in advance.

01-09-2008, 10:06 PM	#125
kovidgoyal creator of calibre Posts: 45,337 Karma: 27182818 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Creating an XML file will work, it is the least python intensive solution. However, you can also just override the parse_feeds() function. It should return a list of dictionaries. Each dictionary should be of the form Code: { 'title' : article title, 'url' : URL of print version, 'date' : The publication date of the article as a string, 'description' : A summary of the article }

01-11-2008, 10:42 AM	#131
shempe Junior Member Posts: 2 Karma: 19 Join Date: Jan 2008 Location: Hamburg / Germany Device: Axim x51v and div. other / Sony PRS 505 / Nokia E51	New Profile Golem and Heise Updated I posted a new profile for German Golem News and update my Heise Newsticker look at: https://libprs500.kovidgoyal.net/wiki/UserProfiles Stefan

01-16-2008, 05:39 AM	#134
slav Member Posts: 16 Karma: 10 Join Date: Sep 2007 Device: PRS-500	Hi All! I have a problem converting one RSS feed - the problem is with < and > (feed is full of that). I tried to write regex like: Code: (r'(<)(.*?>)', lambda match : '<code>' + match.group(1) + match.group(2) + '</code>'), but it doesn't work (I'm not a regex wizard :-) can anyone help me with that? kovidgoyal - big thanx for your work on this program !

12-17-2007, 03:37 AM	#122
JTravers Groupie Posts: 182 Karma: 1078201 Join Date: Sep 2007 Device: iPad Air 2	Thanks for the Dilbert profile. What a great idea!

01-10-2008, 01:47 AM	#126
secretsubscribe Enthusiast Posts: 26 Karma: 11777 Join Date: Jun 2007 Location: Brooklyn Device: PRS-500,Treo 750, Archos 605 Wifi	Hello Instead of overriding the get_feeds, i've attempted to override the parse_feeds function. I create the list of dictionaries and return it. Now I get this message: File "convert_from.py", line 198, in <module> File "convert_from.py", line 192, in main File "convert_from.py", line 131, in process_profile File "libprs500\ebooks\lrf\web\profiles\__init__.py o", line 93, in __init__ File "libprs500\ebooks\lrf\web\profiles\__init__.py o", line 127, in build_index AttributeError: 'list' object has no attribute 'keys' thank you

01-10-2008, 10:19 AM	#127
kovidgoyal creator of calibre Posts: 45,337 Karma: 27182818 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Oh I'm sorry, what needs to be returned is a dictionary whose keys are feed titles (like Business, National News, etc) and whose values are athe list of dictionaries I mentioned before.

01-10-2008, 11:22 AM	#129
kovidgoyal creator of calibre Posts: 45,337 Karma: 27182818 Join Date: Oct 2006 Location: Mumbai, India Device: Various	You should add it to https://libprs500.kovidgoyal.net/wiki/UserProfiles so other people can find and use it. You'll need to create ana ccount and let me know the user name so I can give you write permission for the wiki.

01-16-2008, 11:43 AM	#135
kovidgoyal creator of calibre Posts: 45,337 Karma: 27182818 Join Date: Oct 2006 Location: Mumbai, India Device: Various	What's the problem with < and >? Are they not being converted correctly?

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
web2lrf to capture blog archive?	Deputy-Dawg	Sony Reader Dev Corner	1	02-14-2008 11:41 PM
web2lrf: La Repubblica	alexxxm	Sony Reader	1	11-13-2007 12:27 PM

Advert

Advert