06-12-2008, 08:12 PM | #16 |
Addict
Posts: 274
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
|
This is based on published WSJ profile.
I had pm'ed you my login name and password, feel free to use it for testing/reading. PHP Code:
|
06-12-2008, 08:23 PM | #17 |
creator of calibre
Posts: 44,397
Karma: 23798586
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Code:
return [('Todays newspaper', articles)] |
Advert | |
|
06-12-2008, 10:26 PM | #18 |
Addict
Posts: 274
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
|
I started reading it this year (being able to read on Sony was a big factor for me), so I cannot compare before-after.
|
07-04-2008, 08:52 PM | #19 |
Addict
Posts: 274
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
|
|
07-05-2008, 12:39 PM | #20 |
creator of calibre
Posts: 44,397
Karma: 23798586
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Your return statement should be:
Code:
return [('Today\'s Paper', articles)] |
Advert | |
|
07-05-2008, 11:33 PM | #21 | |
Addict
Posts: 274
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
|
Quote:
I tried it and got a new error: Traceback (most recent call last): File "convert_from.py", line 61, in <module> File "convert_from.py", line 42, in main File "calibre\web\feeds\main.pyo", line 128, in run_recipe File "calibre\web\feeds\news.pyo", line 825, in __init__ File "calibre\ebooks\lrf\web\profiles\__init__.pyo" , line 174, in __init__ File "calibre\ebooks\lrf\web\profiles\__init__.pyo" , line 204, in build_index AttributeError: 'list' object has no attribute 'keys' I put few print statements to track the flow, it never gets into this loop: for item in soup.findAll('a', attrs={'class':'bold80'}): I checked the web page, nothing was changed there. Articles are identifed correctly. Here is a link from the source code: <a class="bold80" href="/article/SB121521047990229423.html?mod=todays_us_page_one"> Kovid, your help is very much appreciated. Thanks in advance. |
|
07-06-2008, 12:21 AM | #22 |
creator of calibre
Posts: 44,397
Karma: 23798586
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Use the command feeds2lrf not web2lrf
|
07-06-2008, 01:15 AM | #23 |
Addict
Posts: 274
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
|
Error is from feeds2lrf (I have 0.4.76 calibre):
C:\Temp\News>feeds2lrf --debug wsjNew.py --username=xxx --password=xxx Fetching feeds... Sat Jul 05 22:12:09 2008 Traceback (most recent call last): File "convert_from.py", line 61, in <module> File "convert_from.py", line 42, in main File "calibre\web\feeds\main.pyo", line 128, in run_recipe File "calibre\web\feeds\news.pyo", line 825, in __init__ File "calibre\ebooks\lrf\web\profiles\__init__.pyo" , line 174, in __init__ File "calibre\ebooks\lrf\web\profiles\__init__.pyo" , line 204, in build_index AttributeError: 'list' object has no attribute 'keys' |
07-06-2008, 11:36 AM | #24 |
creator of calibre
Posts: 44,397
Karma: 23798586
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Delete the line
Code:
from calibre.ebooks.lrf.web.profiles import DefaultProfile
|
07-06-2008, 07:19 PM | #25 |
Addict
Posts: 274
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
|
The same error:
Sun Jul 06 16:14:26 2008 Traceback (most recent call last): File "convert_from.py", line 61, in <module> File "convert_from.py", line 42, in main File "calibre\web\feeds\main.pyo", line 128, in run_recipe File "calibre\web\feeds\news.pyo", line 825, in __init__ File "calibre\ebooks\lrf\web\profiles\__init__.pyo" , line 174, in __init__ File "calibre\ebooks\lrf\web\profiles\__init__.pyo" , line 204, in build_index AttributeError: 'list' object has no attribute 'keys' |
07-07-2008, 02:08 PM | #26 |
creator of calibre
Posts: 44,397
Karma: 23798586
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The attached recipe works for me with the command line
Code:
feeds2lrf test.py Code:
## Copyright (C) 2008 Kovid Goyal kovid@kovidgoyal.net ## This program is free software; you can redistribute it and/or modify ## it under the terms of the GNU General Public License as published by ## the Free Software Foundation; either version 2 of the License, or ## (at your option) any later version. ## ## This program is distributed in the hope that it will be useful, ## but WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ## GNU General Public License for more details. ## ## You should have received a copy of the GNU General Public License along ## with this program; if not, write to the Free Software Foundation, Inc., ## 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. import time import re ## from libprs500.ebooks.lrf.web.profiles import DefaultProfile ## from libprs500.ebooks.BeautifulSoup import BeautifulSoup from calibre.web.feeds.news import BasicNewsRecipe from calibre.ebooks.BeautifulSoup import BeautifulSoup class WallStreetJournalPaper(BasicNewsRecipe): import time import re from calibre.web.feeds.news import BasicNewsRecipe from calibre.ebooks.lrf.web.profiles import DefaultProfile from calibre.ebooks.BeautifulSoup import BeautifulSoup title = 'Wall Street Print Edition' __author__ = 'Kovid Goyal' simultaneous_downloads = 1 max_articles_per_feed = 200 INDEX = 'http://online.wsj.com/page/2_0133.html' timefmt = ' [%a, %b %d, %Y]' no_stylesheets = False html2lrf_options = [('--ignore-tables')] issue_date = time.ctime() print issue_date ## Don't grab articles more than 7 days old oldest_article = 7 def get_browser(self): br = DefaultProfile.get_browser() if self.username is not None and self.password is not None: br.open('http://online.wsj.com/login') br.select_form(name='login_form') br['user'] = self.username br['password'] = self.password br.submit() return br preprocess_regexps = [(re.compile(i[0], re.IGNORECASE | re.DOTALL), i[1]) for i in [ ## Remove anything before the body of the article. (r'<body.*?<!-- article start', lambda match: '<body><!-- article start'), ## Remove any insets from the body of the article. (r'<div id="inset".*?</div>.?</div>.?<p', lambda match : '<p'), ## Remove anything after the end of the article. (r'<!-- article end.*?</body>', lambda match : '</body>'), ] ] def parse_index(self): articles = [] soup = self.index_to_soup(self.INDEX) issue_date = time.ctime() for item in soup.findAll('a', attrs={'class':'bold80'}): a = item.find('a') if a and a.has_key('href'): url = item['href'] url = 'http://online.wsj.com'+url.replace('/article', '/article_print') title = self.tag_to_string(item) description = '' articles.append({ 'title':title, 'date':date, 'url':url, 'description':description }) return [('Todays Paper', articles)] |
07-08-2008, 02:17 AM | #27 |
Addict
Posts: 274
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
|
Thank you Kovid!
Your recipe went fine from command line. Output was an empty file, I think it's related to my login to the page. They block access if few logins were done from different computers. I'll try again tomorrow. |
07-09-2008, 10:21 AM | #28 |
Addict
Posts: 274
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
|
No luck with WSJ so far.
When I use the posted recipe, I get an empty file. It does find articles (a = item.find('a')), but doesn't pass this condition: "if a and a.has_key('href'):". When I remove this condition, it gets articles (I print titles and see all of them from the web page), but fails at the end: Traceback (most recent call last): File "convert_from.py", line 61, in <module> File "convert_from.py", line 42, in main File "calibre\web\feeds\main.pyo", line 134, in run_recipe File "calibre\web\feeds\news.pyo", line 472, in download File "calibre\web\feeds\news.pyo", line 578, in build_index File "c:\docume~1\davidd~1\locals~1\temp\calibre_0.4.76 _j-dnk5_recipes\recipe0 .py", line 89, in parse_index print title File "encodings\cp437.pyo", line 12, in encode UnicodeEncodeError: 'charmap' codec can't encode character u'\u2026' in position 5: character maps to <undefined> |
07-09-2008, 11:06 AM | #29 |
creator of calibre
Posts: 44,397
Karma: 23798586
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Can you send me your WSJ username and password again. I need it to debug further.
|
07-09-2008, 12:23 PM | #30 | |
Addict
Posts: 274
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
|
Quote:
I logged out from the page, you should be able to login. If I try calibre recipe few times in a row, they lock the account. Then it takes 5-6 hours to get access again. Painful to test changes. Thanks in advance. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Help with calibre recipes | CaptainJSK | Calibre | 1 | 07-11-2010 01:12 AM |
Calibre Recipes and iPad/iBooks | jbambridge | Calibre | 8 | 05-16-2010 04:30 PM |
Classification of Recipes in Calibre | wayner | Calibre | 3 | 11-27-2009 09:48 AM |
Problem with my recipes (Calibre 0.6.2) | MikeBoud | Calibre | 18 | 08-05-2009 10:20 PM |