03-23-2004, 01:35 PM | #1 |
Is papyrophobic!
Posts: 1,926
Karma: 1009999
Join Date: Aug 2003
Location: USA
Device: Dell Axim
|
The Economist Scoop
URL: http://www.economist.com/
Name: Economist Description: Economist AuthorName: Goh Boon Nam # Version 1.0 # Date updated : 9 Jan 2004 |
03-23-2004, 01:47 PM | #2 |
mechanoholic
Posts: 582
Karma: 1000217
Join Date: Mar 2004
Location: Sarasota, FL
Device: Nook STR/iPhone 4S/EVO 4G
|
This looks great! Economist.com seems to be having some problems just now, but I'm looking forward to adding this to my sites list. Thanks Morpheus.
|
03-24-2004, 12:16 AM | #3 |
mechanoholic
Posts: 582
Karma: 1000217
Join Date: Mar 2004
Location: Sarasota, FL
Device: Nook STR/iPhone 4S/EVO 4G
|
I've been testing this and I seem to get shut down. I get an HTTP GET error with the message "Automatic downloading forbidden". Now that's just not neighborly. Anyone solved this?
|
03-24-2004, 03:59 AM | #4 |
Fully Converged
Posts: 18,171
Karma: 14021202
Join Date: Oct 2002
Location: Switzerland
Device: Too many to count here.
|
Hm I've been studying Sitescooper the past few days (on your advise, ignatz!), but I don't see any option to slowdown the html download process. I think Economist has some kind of mod_bandwidth module running.
|
03-28-2004, 06:49 AM | #5 | |
Connoisseur
Posts: 62
Karma: 72
Join Date: Oct 2002
Location: Germany
Device: nook
|
Is it possible to set up the user-agent ID to something like:
Mozilla/3.0 (compatible; AvantGo 5.2; FreeBSD) perhaps that is the problem? -S. Quote:
|
|
03-31-2004, 01:33 PM | #6 | |
Fully Converged
Posts: 18,171
Karma: 14021202
Join Date: Oct 2002
Location: Switzerland
Device: Too many to count here.
|
Quote:
|
|
04-03-2004, 03:42 AM | #7 |
Fully Converged
Posts: 18,171
Karma: 14021202
Join Date: Oct 2002
Location: Switzerland
Device: Too many to count here.
|
I've improved the original economics scoop somewhat to exclude unwanted content.
Code:
URL: http://www.economist.com/ Name: Economist Description: Economist AuthorName: Goh Boon Nam # General Settings Active: 1 SizeLimit: 2000 Levels: 2 # Image Settings ImageURL: http://www.economist.com/images/dingbats/e5.gif ImageURL: http://www.economist.com/images/\d+/.* UseAltTagForURL: 0 # Content Settings ContentsStart: <td colspan="7" width="447" valign="top"> ContentsEnd: <a href="/diversions/quiz/"> ContentsUseTableSmarts: 0 # Story Settings StoryToPrintableSub: s!displayStory.cfm!PrinterFriendly.cfm! StoryURL: http://www.economist.com/(.*?)/PrinterFriendly.cfm(.*?) # PreProcess Settings ContentsHTMLPreProcess: { # remove ads...hope that's not killing it when layout changes s,<div align="center">[^<]<a href="/printedition/">.*<td width="209" valign="top" height="1700">,</font>,gim; # remove the 'More from...' Links s,<div align="right"><b><a href="[^"]*"><font[^>]*>[^/]*</font></a></b></div><br>,,gim; # remove the 'More reviews...' Links s,<div align="right"><font[^>]*><b><a href="[^"]*"><font color="[^"]*">More reviews</font></a></b></div>,,gim; # gfx -> txt headers s,<a href="[^"]*"><img src="/images/sections/(\w+)\.gif"[^>]*></a><br><br>,<hr>Section: $1<br>,gim; # gfx -> txt header "markets2 s,<p><a href="[^"]*"><img alt="MARKETS" border="0" src="/images/sections/m-d\.gif" width="207" height="19"></a></p>,<hr>Section: markets<br>,gim; # remove links to pay-content s,<a href="[^"]*">([^<]*)</a></b>\s<img alt="E\+" width="17" height="10" border="0" src="/images/dingbats/e5\.gif">,$1<font size=1>(pay-content)</font></b><img width="17" height="10" border="0" src="/images/dingbats/e5\.gif">,gim; # remove links to pay-content s,<a href="[^"]*">([^<]*)</a></b>\s<img src="/images/dingbats/e5\.gif" alt="" />,$1<font size=1>(pay-content)</font></b><img width="17" height="10" border="0" src="/images/dingbats/e5\.gif">,gim; # remove Also-on-the-site... column s,<img alt="also on the site ...".*</td></tr></table><br>,,gim; } StoryHTMLPreProcess: { # remove 'get article background...' s,<p>.*<a target="background"[^>]*">.*background</b></font></a></font></p><!--back-->,,gim; s/align="right"//gim; s/align="center"//gim; s/align=right//gim; s/align=center//gim; } Alex |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Sony Reader Daily Edition - SCOOP! | Nate the great | News | 303 | 11-03-2009 03:57 PM |
eReader SCOOP!!! on TeleRead | Robotech_Master | News | 29 | 12-10-2008 08:19 AM |
E-books on cellphones: what's the scoop? | mreames | Alternative Devices | 3 | 01-08-2007 03:23 AM |