03-13-2004, 09:02 AM | #1 |
Connoisseur
Posts: 62
Karma: 72
Join Date: Oct 2002
Location: Germany
Device: nook
|
Sitescooper German scoops
Hallo,
I add my collection of German site file. I started each file with the name of the file, like: # de_*.site This is the List of files: de_aldi-nord.site de_bild.site de_bvb.site de_cert.site de_cyberkino.site de_digitalkamera.site de_digitv_premiere.site de_eetimes.site de_gazette.site de_heise.site de_heisec.site de_heise_aktuell.site de_heise_mobil.site de_heise_tp.site de_heise_tr_aktuell.site de_heute.site de_klack-channel.site de_menshealth.site de_mobile2day.site de_palmfaq.site de_pdassi_news.site de_pdassi_software.site de_rn_do.site de_spiegel.site de_spiegel_schlagzeilen.site de_stern.site de_sz_kultur.site de_tagesschau.site de_teltarif.site de_tvspielfilm.site de_wortfilter.site de_yahoo_bvb.site # de_aldi-nord.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.1, 17.10.03 URL: http://aldi-nord.de/OFFER_D/home.htm Description: Aldi Nord Angebote Name: Aldi-Nord Levels: 3 AuthorName: Stefan Schwingeler AuthorEMail: stobs /at/ web . de ImageURL: .*\.JPG UseTableSmarts: 0 IssueLinksStart: <!-- Beginn des eigentlichen Fensters 800 x 600 --> IssueLinksEnd: <!-- Ende --> ContentsCachable: 0 StoryCachable: 0 MinPages: 1 ContentsURL: http://aldi-nord.de/OFFER_D/OFFER_\d+/AA_LISTE\.HTM StoryURL: http://aldi-nord.de/OFFER_D/OFFER_\d+/OFF\d+\.HTM ContentsUseTableSmarts: 0 StoryUseTableSmarts: 0 TableRender: keep # de_bild.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.1, 14.10.03 URL: http://mobile.bild.t-online.de/index.jsp Name: Bild.de Description: German Bild newspaper AuthorName: Stefan Schwingeler AuthorEMail: stobs /at/ web . de Levels: 3 ContentsDiff: 1 ImageURL: .*\.jpg ContentsCachable: 0 StoryCachable: 1 # de_bvb.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.3, 17.02.04 URL: http://borussia-dortmund.lycos.de/?Z%1B%E7%F4%9D Description: Borussia Dortmund News (Soccer) AuthorName: Stefan Schwingeler AuthorEMail: stobs /at/ web . de ImageURL: .*\.jpg ContentsStart:<!-- CONTENT ANFANG --> ContentsEnd: <!-- CONTENT ENDE --> StoryStart: inhalt_header StoryEnd: d_oben.gif Name: BvB Levels: 2 # de_cert.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.2, 12.2.03 URL: http://cert.uni-stuttgart.de/ticker/sidebar.php Description: German CERT Infos AuthorName: Stefan Schwingeler AuthorEMail: stobs /at/ web . de Name: CERT RUS Levels: 2 StoryURL: http://cert.uni-stuttgart.de/ticker/...e.php\?mid=\d+ StoryStart: <FONT SIZE="+2"> StoryEnd: Copyright © 2003 RUS-CERT, Universität Stuttgart ContentsDiff: 1 # remove CENTER StoryPostProcess: { s/center//gi; } # de_cyberkino.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.2, 7.5.03 URL: http://www.cyberkino.de/entertainment/kino/monate.html Description: German Cinema Infos AuthorName: Stefan Schwingeler AuthorEMail: stobs /at/ web . de ContentsDiff: 1 ImageURL: http://www.cyberkino.de/.*\.jpg Name: Cyberkino Levels: 2 # de_digitalkamera.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.1, 11.9.03 URL: http://www.digitalkamera.de/Info/ Name: German Digitalcamera.de News Description: German Digitalcamera News Levels: 2 ContentsStart: weiter zur nächsten Seite ContentsEnd: Diese Seite wurde redaktionell von ContentsCachable: 0 ContentsDiff: 1 StoryURL: http://www.digitalkamera.de/Info/News/\d+/\d+\.htm ImageURL: http://images.digitalkamera.de/.*\.jpg StoryStart: <h3> StoryEnd: "PurpleText" preview="End-Text" StoryCacheable: 1 StoryLifetime: 2 # de_digitv_premiere.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.8, 16.02.04 14:20 URL: http://www.digitalfernsehen.de/tv-se...ndex_1687.html Name: Digitv premiere news Description: German Premiere Infos Levels: 2 ContentsStart: <!-- Linke Navigation ENDE --> ContentsEnd: <!-- Premiere News Snippet Ende --> ContentsCachable: 0 ContentsDiff: 1 StoryURL: http://www.digitalfernsehen.de/news/news\_\d+\.html StoryStart: <!-- Newsbeitrag start --> StoryEnd: onClick="return printwindow(); StoryCacheable: 1 StoryLifetime: 2 ImageURL: http://www.digitalfernsehen.de/news/img/.+\.gif # de_eetimes.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.3, 18.02.04 10:45 URL: http://www.eetimes.de/hr Name: EE Times.de Description: Weltweiter Industrie-Nachrichtendienst für Elektonikingenieure AuthorName: Stefan Schwingeler AuthorEMail: stobs /at/ web . de Levels: 2 ContentsStart: <!-- TOP STORY --> ContentsEnd: <!-- BLACK TOP BORDER --> ContentsDiff: 1 StoryURL: .+/news/.+ StoryStart: <!-- TOP STORY --> StoryEnd: </STORY> StoryCacheable: 1 # remove javascript pseudo links and <center> StoryPostProcess: { s/a href=.javascript://gi; s/<center>//gi; } # de_gazette.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.1, 6.2.03 URL: http://gazette.de/ Name: Die Gazette Description: German plitics magazine AuthorName: Stefan Schwingeler AuthorEMail: stobs /at/ web . de Levels: 2 ImageURL: .*.jpg StoryURL: [A-Za-z]\S+\.html ContentsStart: bordercolor="#CCCCCC" ContentsEnd: /Archiv/Newsletter.html StoryToPrintableSub: { s,([A-Z].+)(\.html),\1-print\2, s,[A-Z],[a-z], } StoryPostProcess: { s/<center>//gi; } # de_heise.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.9, 16.2.03 # this version is with pictures. URL: http://www.heise.de/ Name: Heise Newsticker Description: German Heise IT-news Levels: 2 ContentsStart: </HEISETEXT> ContentsEnd: <!-- MITTE (NEWS-UEBERBLICK) --> ContentsCachable: 0 ContentsDiff: 1 StoryURL: http://www.heise.de/newsticker/(meldung/print|meldung)/\d+ StoryStart: <HEISETEXT> StoryEnd: </body> StoryCacheable: 1 StoryLifetime: 2 ImageURL: http://www.heise.de/bilder/.+ # StoryHeadline: <HEISETEXT>\s+<b>(.+)</b> StoryToPrintableSub: s,/newsticker/meldung/(\d+),/newsticker/meldung/print/\1, # de_heisec.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.2, 17.02.04 11:32 URL: http://www.heise.de/security/ Name: Heise Security Description: German Heise Security news Levels: 2 ContentsStart: <!-- Titel --> # ContentsEnd: <!-- Kaesten --> ContentsCachable: 0 ContentsDiff: 1 StoryURL: http://www.heise.de/security/(artikel/print|artikel)/\d+ StoryStart: <HEISETEXT> StoryEnd: <!-- news-steuerung anfang --> StoryCacheable: 1 ImageURL: http://heise.de/mobil/artikel/.*/aufmacher\.jpg StoryToPrintableSub: s,/security/artikel/(\d+),/security/artikel/print/\1, # de_heise_aktuell.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.3, 16.02.04 10:38 URL: http://www.heise.de/ct/aktuell/ Name: Heise Aktuell Description: German Heise Mobil-news Levels: 2 ContentsStart: <HEISETEXT> ContentsEnd: </HEISETEXT> ContentsCachable: 0 ContentsDiff: 1 # StoryURL: http://www.heise.de/ct/aktuell/meldung/\d+ StoryURL: http://www.heise.de/ct/aktuell/(meldung/print|meldung)/\d+ StoryStart: <HEISETEXT> StoryEnd: </HEISETEXT> StoryCacheable: 1 ImageURL: http://www.heise.de/bilder/.* StoryToPrintableSub: s,/ct/aktuell/meldung/(\d+),/ct/aktuell/meldung/print/\1, # de_heise_mobil.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.3, 23.5.03 URL: http://heise.de/mobil/ Name: Heise Mobil Description: German Heise Mobil-news Levels: 2 # Ticker ignorieren: ContentsStart: Themen ContentsEnd: <!-- MITTE+RECHTS --> ContentsCachable: 0 ContentsDiff: 1 StoryURL: http://heise.de/mobil/.*/ StoryStart: <HEISETEXT> StoryEnd: </HEISETEXT> StoryCacheable: 1 ImageURL: http://heise.de/mobil/artikel/.*/aufmacher\.jpg # remove small font commands StoryPostProcess: { s/<font size=1>//gi; } # de_heise_tp.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler and Carsten Clasohm, Version 0.6, 20.11.03 09:38 # this version is with pictures. URL: http://www.heise.de/tp/pdanews/default.html Name: Heise Telepolis Levels: 2 ContentsDiff: 1 ImageURL: .*\.gif ImageURL: .*\.jpg } # de_heise_tr_aktuell.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.1, 16.02.04 10:37 URL: http://www.heise.de/tr/aktuell/ Name: Heise TR Aktuell Description: German Heise Technology Review Levels: 2 ContentsStart: <HEISETEXT> ContentsEnd: </HEISETEXT> ContentsCachable: 0 ContentsDiff: 1 StoryURL: http://www.heise.de/tr/aktuell/(meldung/print|meldung)/\d+ StoryStart: <HEISETEXT> StoryEnd: </HEISETEXT> StoryCacheable: 1 ImageURL: http://www.heise.de/bilder/.* StoryToPrintableSub: s,/tr/aktuell/meldung/(\d+),/tr/aktuell/meldung/print/\1, # de_heute.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.2, 4.4.04 URL: http://www.heute.t-online.de/ZDFheut...HOME-4,00.html Name: heute Description: German "heute" news AuthorName: Stefan Schwingeler AuthorEMail: stobs /at/ web . de Levels: 2 StoryURL: http://www.heute.t-online.de/ZDFheut...ticle/.+\.html ImageURL: .*\.(gif|jpg).* ContentsCachable: 1 TableRender: list #SizeLimit: 1000 # remove table commands # StoryPostProcess: { # s/<table.+>//gi; #} # de_klack-channel.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.1, 17.2.03 URL: http://www.klack-channel.de/channelTipps.php3?DAY=[[YYYY]][[MM]][[DD]]&USER= Description: German TV Tipps AuthorName: Stefan Schwingeler AuthorEMail: stobs /at/ web . de Name: Klack Tagestipps Levels: 1 # remove CENTER StoryPostProcess: { s/<?center>?//gi; } # de_menshealth.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.4, 24.09.03 10:19 URL: http://www.menshealth.de/avantgo/ Description: German Men Magazin AuthorName: Stefan Schwingeler AuthorEMail: stobs /at/ web . de Name: Menshealth Levels: 3 ContentsDiff: 1 ImageURL: .*\.jpg ContentsStart: HOME</a> ContentsURL: http://www.menshealth.de/sixcms/deta...=d_mh_av_home_.. StoryURL: http://www.menshealth.de/.*/\d+/d_mh_av_detail # remove small font commands StoryPostProcess: { s/<font size=\"?\+?\d\"?>//gi; } # de_mobile2day.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.3, 14.10.03 URL: http://www.mobile2day.de/pdanews_all...ext=&isLimit=1 Name: mobile2day Description: German PDA-News AuthorName: Stefan /at/ Schwingeler.de ContentsDiff: 1 Levels: 2 StoryPostProcess: { s/<CENTER>//gi; s/size=\"?\d\"?//gi; } # de_palmfaq.site URL: http://palmfaq.de Name: PalmFAQ.de Levels: 2 ContentsDiff: 1 StoryCacheable: 1 # de_pdassi_news.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.3, 5.3.03 URL: http://pdassi.de/news1.php Name: pdassi News Description: German Palm site AuthorName: Stefan Schwingeler AuthorEMail: stobs /at/ web . de ContentsDiff: 1 Levels: 2 ImageURL: http://pdassi.de/images/.* StoryToPrintableSub: s/SID=[a-z0-9]+/SID=1/ StoryPostProcess: { s/<small>//gi; } # de_pdassi_software.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.3, 5.3.03 URL: http://pdassi.de/wcf/newuploads.php AddURL: http://pdassi.de/wcf/newupdates.php AddURL: http://pdassi.de/wcf/newprc.php Name: pdassi Software Description: German Palm site AuthorName: Stefan Schwingeler AuthorEMail: stobs /at/ web . de ContentsDiff: 1 Levels: 2 ImageURL: http://.*/util/screenshot.php\?pid=\d+.* StoryToPrintableSub: s/SID=[a-z0-9]+/SID=1/ StoryPostProcess: { s/align="center"//gi; s/<small>//gi; } # de_rn_do.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.1, 12.06.2003 URL: http://www.westline.de/lokal/main.php?link=do%2F%2Fln Name: RN Do Levels: 2 ContentsDiff: 1 ImageURL: .*\.gif ContentsStart: <!--beginn hauptteil--> ContentsEnd: <!--ende hauptteil--> StoryURL: http://www.westline.de/lokal/mono.php.* StoryStart: <!--beginn hauptteil--> StoryEnd:<!--ende hauptteil--> # de_Spiegel.site # This is a sitescooper site file. see http://sitescooper.cx/ # by Stefan Schwingeler, Version 0.6, 6.2.03 # History: # "fixed" by by L****n Wulff, L****n@multimediaconnection.de # rewritten with new PDA-link (no pics) by Stefan Schwingeler URL: http://www.spiegel.de/dertag/pda/ava...r140=1,00.html Name: Der Spiegel Description: German news magazine AuthorName: Stefan Schwingeler AuthorEMail: stobs /at/ web . de Levels: 2 StoryURL: http://www.spiegel.de/dertag/pda/ava...tikel/.*\.html # de_spiegel_schlagzeilen.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.1, 18.2.03 URL: http://www.spiegel.de/schlagzeilen/ Name: Der Spiegel Schlagzeilen Description: German news magazine AuthorName: Stefan Schwingeler AuthorEMail: stobs /at/ web . de Levels: 2 ContentsStart: mark a:visited ContentsEnd: OAS_RICH('Right'); StoryURL: http://www.spiegel.de/.+/\d+,\d+,\d+,\d+\.html StoryURL: http://www.spiegel.de/.+/\d+,\d+,druck-\d+,\d+\.html ImageURL: http://www.spiegel.de/img/\d+,\d+,\d+,\d+\.jpg StoryToPrintableSub: s:^(\S+/?\S+/0,\d+,)(\d+,\d+\.html):\1druck-\2: # de_stern.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.2, 5.3.04 URL: http://www.stern.de/pda/ #AddURL: http://www.stern.de/pda/?pda=1&rubrik=politik #AddURL: http://www.stern.de/pda/?pda=1&rubrik=wirtschaft #AddURL: http://www.stern.de/pda/?pda=1&rubrik=sport #AddURL: http://www.stern.de/pda/?pda=1&rubrik=kultur #AddURL: http://www.stern.de/pda/?pda=1&rubrik=computer #AddURL: http://www.stern.de/pda/?pda=1&rubrik=campus #AddURL: http://www.stern.de/pda/?pda=1&rubrik=wissenschaft #AddURL: http://www.stern.de/pda/?pda=1&rubrik=lifestyle Name: Stern Levels: 2 ContentsDiff: 1 ImageURL: .*\.jpg # ContentsURL: http://www.stern.de/pda/\?pda=1\&rubrik=.* ContentsStart: <strong>Lifestyle</strong> ContentsEnd: <!-- FOOTER START --> StoryURL: http://www.stern.de/.*/index.html\?id=\d+\&pda=1 StoryStart: Beginn des Artikels StoryEnd: <!-- FOOTER START --> # StoryHeadline: <div id="artikelKopf1">(.*?)</div> StoryHeadline: <h1>(.*?)</h1> # remove bigfont in <h2> StoryPostProcess: { s/<h2>//gi; } # de_sz_kultur.site # This is a sitescooper site file. see http://sitescooper.org/ # Stefan Schwingeler 2.2.04 URL: http://www.sueddeutsche.de/kultur/ticker/ Name: SZ Kultur Description: Ressort Münchner Kultur der Süddeutschen Zeitung Levels: 2 ContentsStart: <!-- beginn content --> ContentsEnd: <!--ende weiterethemen--> StoryURL: http://www.sueddeutsche.de/kultur/artikel/.+ StoryStart: <!-- beginn content --> StoryEnd: <!-- ende content --> ImageURL: .*\.jpg # de_tagesschau.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.1, 6.2.03 URL: http://www.tagesschau.de/mobileTS Name: Tagesschau Mobil Description: German news show AuthorName: Stefan Schwingeler AuthorEMail: stobs /at/ web . de Levels: 3 ImageURL: .*/image/.*\.jpg SizeLimit: 1000 Level2Cachable: 0 Level3Cachable: 0 Level4Cachable: 0 ContentsCachable: 0 # de_teltarif.site # This is a sitescooper site file. see http://sitescooper.tsx.org/ # by Stefan Schwingeler, Version 0.3, 24.02.04 11:12 URL: http://www.teltarif.de/arch/woche.html Name: Teltarif Levels: 2 ContentsDiff: 1 ContentsStart: <!-- Add Ad End --> StoryURL: http://www.teltarif.de/arch/\d\d\d\d/kw\d+/s\d+\.html ImageURL: http://www.teltarif.de/arch/\d\d\d\d/kw\d+/.+\.jpg StoryStart: <!-- Add Ad End --> StoryEnd: Ihre Meinungen und Erfahrungen # de_tvspielfilm.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler and Carsten Clasohm, Version 1.1, 28.11.03 11:09 # modified by Stefan Schwingeler 25.04.01 11:26: UseTableSmarts: 0 URL: http://www.tomorrow-newmedia.de/mobi.../tvs/tipps.php Name: TV-Spielfilm Levels: 2 ContentsDiff: 0 StoryCachable: 0 StoryURL: http://www.tomorrow-newmedia.de/mobi...tgo/tvs/gen/.* ImageURL: .+\.gif StoryUseTableSmarts: 0 # de_wortfilter.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.2, 18.02.04 URL: http://www.wortfilter.de/news.html Name: wortfilter.de Description: Deutsche eBay Infos von wortfilter.de AuthorName: Stefan Schwingeler AuthorEMail: stobs /at/ web . de Levels: 2 ContentsDiff: 1 StoryURL: http://www.wortfilter.de/News/news\d+.html StoryStart: <h1> StoryEnd: alt="Vorherige Meldung" # remove center commands StoryPostProcess: { s/align="center"//gi; } # de_yahoo_bvb.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.2, 24.09.03 10:19 URL: http://de.sports.yahoo.com/foot/germ/t/dort Name: Yahoo BvB Levels: 2 Description: Yahoo Bvb News AuthorName: Stefan Schwingeler AuthorEMail: stobs /at/ web . de ContentsDiff: 1 ImageURL: .*\.jpg StoryURL: http://de.sports.yahoo.com/\d+/\d+/.*.html ContentsStart: >Verein</a> ContentsEnd: Durchsuchen Sie das Archiv ContentsDiff: 1 StoryStart: Tageshöhepunkte StoryEnd: >Diskutieren Sie über Fußball</a> # remove small font commands StoryPostProcess: { s/<size=-?\+?\d>//gi; s/<center>//gi; } #eof |
03-13-2004, 10:00 AM | #2 |
mechanoholic
Posts: 582
Karma: 1000217
Join Date: Mar 2004
Location: Sarasota, FL
Device: Nook STR/iPhone 4S/EVO 4G
|
stobs, thanks for posting. It was some of your early posts that helped me figure out how to get started with sitescooper. I hope we can get a good discussion going here...
|
03-13-2004, 03:06 PM | #3 | |
Connoisseur
Posts: 62
Karma: 72
Join Date: Oct 2002
Location: Germany
Device: nook
|
That's great to hear
perhaps we can find Justin in this board soon. I seem not to reach him by email, perhaps his SpamAssassin is too good. -S. Quote:
|
|
03-17-2004, 05:36 AM | #4 | |
Connoisseur
Posts: 62
Karma: 72
Join Date: Oct 2002
Location: Germany
Device: nook
|
heise.de updated their style, so I updated de_heise.site.
# de_heise.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 1.0, 16.3.03 # this version is with pictures. URL: http://www.heise.de/ Name: Heise Newsticker Description: German Heise IT-news Levels: 2 ContentsStart: <!-- Liste der Meldungen --> ContentsEnd: <!-- Überblick --> ContentsCachable: 0 ContentsDiff: 1 StoryURL: http://www.heise.de/newsticker/(meldung/print|meldung)/\d+ StoryStart: <!-- Meldung --> StoryEnd: <!-- untere News-Navigation --> StoryCacheable: 1 StoryLifetime: 2 ImageURL: http://www.heise.de/bilder/.+ # StoryHeadline: <HEISETEXT>\s+<b>(.+)</b> StoryToPrintableSub: s,/newsticker/meldung/(\d+),/newsticker/meldung/print/\1, Quote:
|
|
03-17-2004, 10:58 AM | #5 |
Uebermensch
Posts: 2,583
Karma: 1094606
Join Date: Jul 2003
Location: Italy
Device: Kindle
|
Stefan,
thanks for sharing your scoops. I can definitely use them as templates for my own scoops, which, of course, I will also make public. |
03-17-2004, 01:49 PM | #6 | |
Connoisseur
Posts: 62
Karma: 72
Join Date: Oct 2002
Location: Germany
Device: nook
|
Hi TadW,
They are published since "ever": http://sitescooper.org/dist/site_sam...ional_germany/ I have only problems to get them updated by jmason (he has very effective spam-filter So I post it here. It will be very useful that they will updated at sitescooper.org and not to split the collection of site files, it's only useful to collect some input here and than send them to jmason (or check then in with CVS directly) Quote:
|
|
02-15-2005, 04:57 AM | #7 |
Connoisseur
Posts: 62
Karma: 72
Join Date: Oct 2002
Location: Germany
Device: nook
|
I updated some Germany scoops, they're attached as a zip-file as well.
de_bvb.site de_digitalkamera.site de_digitv_premiere.site de_gms_reise.site de_heisec.site de_heise_aktuell.site de_heise_mobil.site de_heise_tp.site de_heise_tr_aktuell.site de_mobile2day.site de_pdassi_news.site de_pdassi_software.site de_plockmag.site de_wortfilter.site de_yahoo_bvb.site de_yahoo_golf.site de_yahoo_hot.site -stobs. <code> # de_bvb.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.4, 26.11.04 URL: http://borussia-dortmund.lycos.de/?Z%1B%E7%F4%9D Description: Borussia Dortmund News (Soccer) AuthorName: Stefan Schwingeler AuthorEMail: stobs /at/ web . de ImageURL: .*\.jpg ContentsStart: <!-- CONTENT ANFANG --> ContentsEnd: <!-- CONTENT ENDE --> StoryStart: inhalt_header StoryEnd: d_oben.gif Name: BvB Levels: 2 #EOF # de_digitalkamera.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.1, 11.9.03 URL: http://www.digitalkamera.de/Info/ Name: German Digitalcamera.de News Description: German Digitalcamera News Levels: 2 ContentsStart: weiter zur nächsten Seite ContentsEnd: Diese Seite wurde redaktionell von ContentsCachable: 0 ContentsDiff: 1 StoryURL: http://www.digitalkamera.de/Info/News/\d+/\d+\.htm ImageURL: http://images.digitalkamera.de/.*\.jpg StoryStart: <h3> StoryEnd: "PurpleText" preview="End-Text" StoryCacheable: 1 StoryLifetime: 2 #EOF # de_digitv_premiere.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.8, 16.02.04 14:20 URL: http://www.digitalfernsehen.de/tv-se...ndex_1687.html Name: Digitv premiere news Description: German Premiere Infos Levels: 2 ContentsStart: <!-- Linke Navigation ENDE --> ContentsEnd: <!-- Premiere News Snippet Ende --> ContentsCachable: 0 ContentsDiff: 1 StoryURL: http://www.digitalfernsehen.de/news/news\_\d+\.html StoryStart: <!-- Newsbeitrag start --> StoryEnd: onClick="return printwindow(); StoryCacheable: 1 StoryLifetime: 2 ImageURL: http://www.digitalfernsehen.de/news/img/.+\.gif #EOF # de_gms_reise # by geoffreynz, see https://www.mobileread.com/forums/showthread.php?t=2020 # Version 0.1 URL: http://www.ikz-online.de/ikz/ikz.rei...ueberblick.php Levels: 2 Name: gms Reise ContentsStart: header.berichte2.gif ContentsEnd: <!-- Ende - Z_2sp_dpa_Uebers_Fortl_SQL --> StoryURL: http://www.ikz-online.de/.* StoryStart: <!-- Ende - Z_2sp_Multicom_Lang_SQL --> StoryEnd: <span class="contentfliess"> ImageURL: http://www.ikz-online.de/includes/bi....php?.*.nitf.* #EOF # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.2, 17.02.04 11:32 URL: http://www.heise.de/security/ Name: Heise Security Description: German Heise Security news Levels: 2 ContentsStart: <!-- Titel --> # ContentsEnd: <!-- Kaesten --> ContentsCachable: 0 ContentsDiff: 1 StoryURL: http://www.heise.de/security/(artikel/print|artikel)/\d+ StoryStart: <HEISETEXT> StoryEnd: <!-- news-steuerung anfang --> StoryCacheable: 1 ImageURL: http://heise.de/mobil/artikel/.*/aufmacher\.jpg StoryToPrintableSub: s,/security/artikel/(\d+),/security/artikel/print/\1, #EOF # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.5, 06.07.04 10:08 URL: http://www.heise.de/ct/aktuell/ Name: Heise Aktuell Description: German Heise Mobil-news Levels: 2 ContentsStart: <HEISETEXT> ContentsEnd: </HEISETEXT> ContentsCachable: 0 ContentsDiff: 1 StoryURL: http://www.heise.de/ct/aktuell/(meldung/print|meldung)/\d+ StoryStart: <HEISETEXT> StoryEnd: </HEISETEXT> StoryCacheable: 1 # pix missing ?! ImageURL: http://www.heise.de/bilder/.* StoryToPrintableSub: s,/ct/aktuell/meldung/(\d+),/ct/aktuell/meldung/print/\1, # EOF # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.4, 14.2.05 URL: http://www.heise.de/mobil/ Name: Heise Mobil Description: German Heise Mobil-news Levels: 2 # Ticker ignorieren: ContentsStart: zur News-\Übersicht ContentsEnd: <!-- MITTE+RECHTS --> ContentsCachable: 0 ContentsDiff: 1 StoryURL: http://www.heise.de/mobil/artikel/\d+ StoryStart: <HEISETEXT> StoryEnd: </HEISETEXT> StoryCacheable: 1 ImageURL: http://www.heise.de/bilder/.+ # remove small font commands StoryPostProcess: { s/<font size=1>//gi; } #EOF # de_heise_tp.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.7, 23.11.04 09:38 # this version is with pictures. URL: http://www.heise.de/tp/news-xl.rdf Name: Telepolis News Description: Telepolis News ContentsFormat: rss StoryURL: http://www.telepolis.de/.+/artikel/.+\.html StoryStart: <HEISETEXT> StoryEnd: </HEISETEXT> ImageURL: .*\.gif ImageURL: .*\.jpg # (This is a sitescooper site file. see http://sitescooper.org/ # It was generated from the site's RSS by rss-to-site.pl 1.1.) #EOF # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.1, 16.02.04 10:37 URL: http://www.heise.de/tr/aktuell/ Name: Heise TR Aktuell Description: German Heise Technology Review Levels: 2 ContentsStart: <HEISETEXT> ContentsEnd: </HEISETEXT> ContentsCachable: 0 ContentsDiff: 1 StoryURL: http://www.heise.de/tr/aktuell/(meldung/print|meldung)/\d+ StoryStart: <HEISETEXT> StoryEnd: </HEISETEXT> StoryCacheable: 1 ImageURL: http://www.heise.de/bilder/.* StoryToPrintableSub: s,/tr/aktuell/meldung/(\d+),/tr/aktuell/meldung/print/\1, #EOF # de_mobile2day.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.3, 14.10.03 URL: http://www.mobile2day.de/pdanews_all...ext=&isLimit=1 Name: mobile2day Description: German PDA-News AuthorName: Stefan /at/ Schwingeler.de ContentsDiff: 1 Levels: 2 StoryPostProcess: { s/<CENTER>//gi; s/size=\"?\d\"?//gi; } #EOF # de_pdassi_news.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.3, 5.3.03 URL: http://pdassi.de/news1.php Name: pdassi News Description: German Palm site AuthorName: Stefan Schwingeler AuthorEMail: stobs /at/ web . de ContentsDiff: 1 Levels: 2 ImageURL: http://pdassi.de/images/.* StoryToPrintableSub: s/SID=[a-z0-9]+/SID=1/ StoryPostProcess: { s/<small>//gi; } #EOF # de_pdassi_software.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.3, 5.3.03 URL: http://pdassi.de/wcf/newuploads.php AddURL: http://pdassi.de/wcf/newupdates.php AddURL: http://pdassi.de/wcf/newprc.php Name: pdassi Software Description: German Palm site AuthorName: Stefan Schwingeler AuthorEMail: stobs /at/ web . de ContentsDiff: 1 Levels: 2 ImageURL: http://.*/util/screenshot.php\?pid=\d+.* StoryToPrintableSub: s/SID=[a-z0-9]+/SID=1/ StoryPostProcess: { s/align="center"//gi; s/<small>//gi; } #EOF # de_plockmag.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.1, 12.09.04 15:40 URL: http://www.plockmag.de/ Name: Plock! German Golf Magazin Description: German Golf Magazin Levels: 2 ContentsStart: <!-- END PARTNER PROGRAM --> # ContentsEnd: StoryURL: http://www.plockmag.de/(news|hintergrund|kolumnen)/.+\.html StoryStart: <!-- END PARTNER PROGRAM --> # StoryEnd: StoryCacheable: 1 ImageURL: http://www.plockmag.de/img/.+\.jpg #EOF # de_wortfilter.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.2, 18.02.04 URL: http://www.wortfilter.de/news.html Name: wortfilter.de Description: Deutsche eBay Infos von wortfilter.de AuthorName: Stefan Schwingeler AuthorEMail: stobs /at/ web . de Levels: 2 ContentsDiff: 1 StoryURL: http://www.wortfilter.de/News/news\d+.html StoryStart: <h1> StoryEnd: alt="Vorherige Meldung" # remove center commands StoryPostProcess: { s/align="center"//gi; } #EOF # de_yahoo_bvb.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.2, 24.09.03 10:19 URL: http://de.sports.yahoo.com/foot/germ/t/dort Name: Yahoo BvB Levels: 2 Description: Yahoo Bvb News AuthorName: Stefan Schwingeler ContentsDiff: 1 ImageURL: .*\.jpg StoryURL: http://de.sports.yahoo.com/\d+/\d+/.*.html ContentsStart: >Verein</a> ContentsEnd: Durchsuchen Sie das Archiv ContentsDiff: 1 StoryStart: Tageshöhepunkte StoryEnd: >Diskutieren Sie über Fußball</a> # remove small font commands StoryPostProcess: { s/<size=-?\+?\d>//gi; s/<center>//gi; } #EOF # de_yahoo_golf.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.1, 6.09.04 URL: http://de.sports.yahoo.com/go/ Name: Yahoo Golf Levels: 2 Description: Yahoo Golf News AuthorName: Stefan Schwingeler ContentsDiff: 1 ImageURL: .*\.jpg StoryURL: http://de.sports.yahoo.com/\d+/\d+/.+\.html ContentsStart: Mehr News</div> ContentsEnd: Mehr News...</a> ContentsDiff: 1 StoryStart: <h3> StoryEnd: Per Mail senden</a> # remove small font commands StoryPostProcess: { s/<size=-?\+?\d>//gi; s/<center>//gi; } #EOF # de_yahoo_hot.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.3, 18.1.05 URL: http://de.fc.yahoo.com/s/sexualitaet.html Name: Yahoo hot Levels: 2 Description: Yahoo Sexualität AuthorName: Stefan Schwingeler ContentsDiff: 1 ImageURL: .*\.jpg StoryURL: http://de.news.yahoo.com/\d+/\d+/.*.html ContentsStart: <b> Top Meldung</b> ContentsEnd: Meistverschickte Artikel und Fotos ContentsDiff: 1 StoryStart: <font face=verdana size=-2> StoryEnd: Mein Yahoo</a> # remove small font commands StoryPostProcess: { s/<font face=.+>//gi; s/<center>//gi; } #EOF </code> |
11-05-2005, 06:42 PM | #8 |
Connoisseur
Posts: 62
Karma: 72
Join Date: Oct 2002
Location: Germany
Device: nook
|
Hi all,
my updated site-files: de_handhirn.site de_heisec.site de_heise_tp.site (new: with pictures included) de_plockmag.site --------------snipp-------------- # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.1, 29.09.05 16:42 URL: http://www.handhirn.de/pda/news.php Name: Handhirn news Description: German Palm Mobil-news Levels: 2 # EOF # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 1.0, 4.11.05 # from rss back to title-page, because of pictures URL: http://www.heise.de/tp/ Name: Heise Telepolis Levels: 2 ContentsStart: <!--OAS AD="Left2"--> ContentsEnd: <!--OAS AD="Bottom"--> ContentsDiff: 1 StoryURL: http://www.heise.de/(tp/.*/\d+/\d.html|bin/tp/issue/r4/dl-artikel.*) StoryCacheable: 1 StoryLifetime: 2 # StoryToPrintableSub: s,/tp/.*/(\d+)/\d\.html,/bin/tp/issue/r4/dl-artikel2.cgi?artikelnr=\1&mode=print, StoryStart: <HEISETEXT> StoryEnd: </HEISETEXT> ImageURL: .*\.gif ImageURL: .*\.jpg StoryPostProcess: { s/<font size="\+1">([^<]+)<\/font>/<b>$1<\/b>/sgi; s/<font size="\+2" ?>([^<]+)<\/font><br>/<h2>$1<\/h2>/sgi; } #eof # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.4, 23.9.05 URL: http://www.heise.de/security/ Name: Heise Security Description: German Heise Security news Levels: 2 ContentsStart: <h3>Top-Themen</h3> # ContentsEnd: <!-- Kaesten --> ContentsCachable: 0 ContentsDiff: 1 StoryURL: http://www.heise.de/security/(meldung|artikel/print|artikel)/\d+ StoryStart: <HEISETEXT> StoryEnd: <!-- news-steuerung anfang --> StoryCacheable: 1 ImageURL: http://heise.de/mobil/artikel/.*/aufmacher\.jpg StoryToPrintableSub: s,/security/artikel/(\d+),/security/artikel/print/\1, #EOF # de_plockmag.site # This is a sitescooper site file. see http://sitescooper.org/ # by Stefan Schwingeler, Version 0.2, 31.03.05 URL: http://www.plockmag.de/ Name: Plockmag Description: Plock! German Golf Magazin Levels: 2 ContentsStart: <!-- END PARTNER PROGRAM --> # ContentsEnd: StoryURL: http://www.plockmag.de/(news|hintergrund|kolumnen)/.+\.html StoryStart: <!-- END PARTNER PROGRAM --> # StoryEnd: StoryCacheable: 1 ImageURL: http://www.plockmag.de/img/.+\.jpg #EOF -S. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Does anyone get a de-DRMed German-German Dictionary? | zhaoxiaochen | Kindle Developer's Corner | 0 | 06-13-2010 05:05 AM |
German Reader Owners...Discussion in German (read-only) | benz | Deutsches Forum | 124 | 05-02-2008 10:58 AM |
German Girl who's not speaking english very well is looking for help (German Thread) | Chenrezig | Which one should I buy? | 20 | 12-17-2007 04:01 PM |
The "Kindle" iPod of reading - Newsweek scoops | Nate the great | Amazon Kindle | 129 | 11-30-2007 09:06 AM |
Sitescooper people? | Alexander Turcic | Lounge | 10 | 01-14-2003 03:20 AM |