Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > Miscellaneous > Archive > Sitescooper

Notices

 
 
Thread Tools Search this Thread
Old 03-13-2004, 09:02 AM   #1
stobs
Connoisseur
stobs is on a distinguished road
 
Posts: 62
Karma: 72
Join Date: Oct 2002
Location: Germany
Device: nook
Sitescooper German scoops

Hallo,

I add my collection of German site file.
I started each file with the name of the file, like:
# de_*.site

This is the List of files:
de_aldi-nord.site
de_bild.site
de_bvb.site
de_cert.site
de_cyberkino.site
de_digitalkamera.site
de_digitv_premiere.site
de_eetimes.site
de_gazette.site
de_heise.site
de_heisec.site
de_heise_aktuell.site
de_heise_mobil.site
de_heise_tp.site
de_heise_tr_aktuell.site
de_heute.site
de_klack-channel.site
de_menshealth.site
de_mobile2day.site
de_palmfaq.site
de_pdassi_news.site
de_pdassi_software.site
de_rn_do.site
de_spiegel.site
de_spiegel_schlagzeilen.site
de_stern.site
de_sz_kultur.site
de_tagesschau.site
de_teltarif.site
de_tvspielfilm.site
de_wortfilter.site
de_yahoo_bvb.site

# de_aldi-nord.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.1, 17.10.03
URL: http://aldi-nord.de/OFFER_D/home.htm

Description: Aldi Nord Angebote
Name: Aldi-Nord
Levels: 3
AuthorName: Stefan Schwingeler
AuthorEMail: stobs /at/ web . de
ImageURL: .*\.JPG
UseTableSmarts: 0

IssueLinksStart: <!-- Beginn des eigentlichen Fensters 800 x 600 -->
IssueLinksEnd: <!-- Ende -->

ContentsCachable: 0
StoryCachable: 0
MinPages: 1
ContentsURL: http://aldi-nord.de/OFFER_D/OFFER_\d+/AA_LISTE\.HTM

StoryURL: http://aldi-nord.de/OFFER_D/OFFER_\d+/OFF\d+\.HTM

ContentsUseTableSmarts: 0
StoryUseTableSmarts: 0
TableRender: keep

# de_bild.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.1, 14.10.03
URL: http://mobile.bild.t-online.de/index.jsp

Name: Bild.de
Description: German Bild newspaper
AuthorName: Stefan Schwingeler
AuthorEMail: stobs /at/ web . de
Levels: 3
ContentsDiff: 1
ImageURL: .*\.jpg
ContentsCachable: 0
StoryCachable: 1

# de_bvb.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.3, 17.02.04
URL: http://borussia-dortmund.lycos.de/?Z%1B%E7%F4%9D

Description: Borussia Dortmund News (Soccer)
AuthorName: Stefan Schwingeler
AuthorEMail: stobs /at/ web . de
ImageURL: .*\.jpg
ContentsStart:<!-- CONTENT ANFANG -->
ContentsEnd: <!-- CONTENT ENDE -->
StoryStart: inhalt_header
StoryEnd: d_oben.gif

Name: BvB
Levels: 2

# de_cert.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.2, 12.2.03
URL: http://cert.uni-stuttgart.de/ticker/sidebar.php

Description: German CERT Infos
AuthorName: Stefan Schwingeler
AuthorEMail: stobs /at/ web . de

Name: CERT RUS
Levels: 2
StoryURL: http://cert.uni-stuttgart.de/ticker/...e.php\?mid=\d+
StoryStart: <FONT SIZE="+2">
StoryEnd: Copyright © 2003 RUS-CERT, Universität Stuttgart
ContentsDiff: 1

# remove CENTER
StoryPostProcess: {
s/center//gi;
}

# de_cyberkino.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.2, 7.5.03
URL: http://www.cyberkino.de/entertainment/kino/monate.html

Description: German Cinema Infos
AuthorName: Stefan Schwingeler
AuthorEMail: stobs /at/ web . de
ContentsDiff: 1
ImageURL: http://www.cyberkino.de/.*\.jpg

Name: Cyberkino
Levels: 2

# de_digitalkamera.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.1, 11.9.03
URL: http://www.digitalkamera.de/Info/
Name: German Digitalcamera.de News
Description: German Digitalcamera News
Levels: 2
ContentsStart: weiter zur nächsten Seite
ContentsEnd: Diese Seite wurde redaktionell von
ContentsCachable: 0
ContentsDiff: 1
StoryURL: http://www.digitalkamera.de/Info/News/\d+/\d+\.htm
ImageURL: http://images.digitalkamera.de/.*\.jpg
StoryStart: <h3>
StoryEnd: "PurpleText" preview="End-Text"
StoryCacheable: 1
StoryLifetime: 2

# de_digitv_premiere.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.8, 16.02.04 14:20
URL: http://www.digitalfernsehen.de/tv-se...ndex_1687.html
Name: Digitv premiere news
Description: German Premiere Infos
Levels: 2
ContentsStart: <!-- Linke Navigation ENDE -->
ContentsEnd: <!-- Premiere News Snippet Ende -->
ContentsCachable: 0
ContentsDiff: 1
StoryURL: http://www.digitalfernsehen.de/news/news\_\d+\.html
StoryStart: <!-- Newsbeitrag start -->
StoryEnd: onClick="return printwindow();
StoryCacheable: 1
StoryLifetime: 2
ImageURL: http://www.digitalfernsehen.de/news/img/.+\.gif


# de_eetimes.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.3, 18.02.04 10:45

URL: http://www.eetimes.de/hr
Name: EE Times.de
Description: Weltweiter Industrie-Nachrichtendienst für Elektonikingenieure
AuthorName: Stefan Schwingeler
AuthorEMail: stobs /at/ web . de
Levels: 2
ContentsStart: <!-- TOP STORY -->
ContentsEnd: <!-- BLACK TOP BORDER -->
ContentsDiff: 1
StoryURL: .+/news/.+
StoryStart: <!-- TOP STORY -->
StoryEnd: </STORY>
StoryCacheable: 1

# remove javascript pseudo links and <center>
StoryPostProcess: {
s/a href=.javascript://gi;
s/<center>//gi;
}

# de_gazette.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.1, 6.2.03
URL: http://gazette.de/
Name: Die Gazette
Description: German plitics magazine
AuthorName: Stefan Schwingeler
AuthorEMail: stobs /at/ web . de
Levels: 2
ImageURL: .*.jpg
StoryURL: [A-Za-z]\S+\.html
ContentsStart: bordercolor="#CCCCCC"
ContentsEnd: /Archiv/Newsletter.html

StoryToPrintableSub: {
s,([A-Z].+)(\.html),\1-print\2,
s,[A-Z],[a-z],
}

StoryPostProcess: {
s/<center>//gi;
}

# de_heise.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.9, 16.2.03
# this version is with pictures.
URL: http://www.heise.de/
Name: Heise Newsticker
Description: German Heise IT-news
Levels: 2
ContentsStart: </HEISETEXT>
ContentsEnd: <!-- MITTE (NEWS-UEBERBLICK) -->
ContentsCachable: 0
ContentsDiff: 1
StoryURL: http://www.heise.de/newsticker/(meldung/print|meldung)/\d+
StoryStart: <HEISETEXT>
StoryEnd: </body>
StoryCacheable: 1
StoryLifetime: 2
ImageURL: http://www.heise.de/bilder/.+
# StoryHeadline: <HEISETEXT>\s+<b>(.+)</b>

StoryToPrintableSub: s,/newsticker/meldung/(\d+),/newsticker/meldung/print/\1,

# de_heisec.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.2, 17.02.04 11:32
URL: http://www.heise.de/security/
Name: Heise Security
Description: German Heise Security news
Levels: 2
ContentsStart: <!-- Titel -->
# ContentsEnd: <!-- Kaesten -->
ContentsCachable: 0
ContentsDiff: 1
StoryURL: http://www.heise.de/security/(artikel/print|artikel)/\d+
StoryStart: <HEISETEXT>
StoryEnd: <!-- news-steuerung anfang -->
StoryCacheable: 1
ImageURL: http://heise.de/mobil/artikel/.*/aufmacher\.jpg

StoryToPrintableSub: s,/security/artikel/(\d+),/security/artikel/print/\1,

# de_heise_aktuell.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.3, 16.02.04 10:38
URL: http://www.heise.de/ct/aktuell/
Name: Heise Aktuell
Description: German Heise Mobil-news
Levels: 2
ContentsStart: <HEISETEXT>
ContentsEnd: </HEISETEXT>
ContentsCachable: 0
ContentsDiff: 1
# StoryURL: http://www.heise.de/ct/aktuell/meldung/\d+
StoryURL: http://www.heise.de/ct/aktuell/(meldung/print|meldung)/\d+
StoryStart: <HEISETEXT>
StoryEnd: </HEISETEXT>
StoryCacheable: 1
ImageURL: http://www.heise.de/bilder/.*

StoryToPrintableSub: s,/ct/aktuell/meldung/(\d+),/ct/aktuell/meldung/print/\1,

# de_heise_mobil.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.3, 23.5.03
URL: http://heise.de/mobil/
Name: Heise Mobil
Description: German Heise Mobil-news
Levels: 2
# Ticker ignorieren:
ContentsStart: &nbsp;Themen
ContentsEnd: <!-- MITTE+RECHTS -->
ContentsCachable: 0
ContentsDiff: 1
StoryURL: http://heise.de/mobil/.*/
StoryStart: <HEISETEXT>
StoryEnd: </HEISETEXT>
StoryCacheable: 1
ImageURL: http://heise.de/mobil/artikel/.*/aufmacher\.jpg

# remove small font commands
StoryPostProcess: {
s/<font size=1>//gi;
}

# de_heise_tp.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler and Carsten Clasohm, Version 0.6, 20.11.03 09:38
# this version is with pictures.

URL: http://www.heise.de/tp/pdanews/default.html
Name: Heise Telepolis
Levels: 2
ContentsDiff: 1
ImageURL: .*\.gif
ImageURL: .*\.jpg

}

# de_heise_tr_aktuell.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.1, 16.02.04 10:37
URL: http://www.heise.de/tr/aktuell/
Name: Heise TR Aktuell
Description: German Heise Technology Review
Levels: 2
ContentsStart: <HEISETEXT>
ContentsEnd: </HEISETEXT>
ContentsCachable: 0
ContentsDiff: 1
StoryURL: http://www.heise.de/tr/aktuell/(meldung/print|meldung)/\d+
StoryStart: <HEISETEXT>
StoryEnd: </HEISETEXT>
StoryCacheable: 1
ImageURL: http://www.heise.de/bilder/.*

StoryToPrintableSub: s,/tr/aktuell/meldung/(\d+),/tr/aktuell/meldung/print/\1,

# de_heute.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.2, 4.4.04
URL: http://www.heute.t-online.de/ZDFheut...HOME-4,00.html

Name: heute
Description: German "heute" news
AuthorName: Stefan Schwingeler
AuthorEMail: stobs /at/ web . de
Levels: 2
StoryURL: http://www.heute.t-online.de/ZDFheut...ticle/.+\.html
ImageURL: .*\.(gif|jpg).*
ContentsCachable: 1
TableRender: list
#SizeLimit: 1000

# remove table commands
# StoryPostProcess: {
# s/<table.+>//gi;
#}

# de_klack-channel.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.1, 17.2.03
URL: http://www.klack-channel.de/channelTipps.php3?DAY=[[YYYY]][[MM]][[DD]]&USER=

Description: German TV Tipps
AuthorName: Stefan Schwingeler
AuthorEMail: stobs /at/ web . de

Name: Klack Tagestipps
Levels: 1

# remove CENTER
StoryPostProcess: {
s/<?center>?//gi;
}

# de_menshealth.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.4, 24.09.03 10:19
URL: http://www.menshealth.de/avantgo/

Description: German Men Magazin
AuthorName: Stefan Schwingeler
AuthorEMail: stobs /at/ web . de

Name: Menshealth
Levels: 3
ContentsDiff: 1
ImageURL: .*\.jpg
ContentsStart: HOME</a>
ContentsURL: http://www.menshealth.de/sixcms/deta...=d_mh_av_home_..
StoryURL: http://www.menshealth.de/.*/\d+/d_mh_av_detail

# remove small font commands
StoryPostProcess: {
s/<font size=\"?\+?\d\"?>//gi;
}

# de_mobile2day.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.3, 14.10.03
URL: http://www.mobile2day.de/pdanews_all...ext=&isLimit=1
Name: mobile2day
Description: German PDA-News
AuthorName: Stefan /at/ Schwingeler.de
ContentsDiff: 1
Levels: 2
StoryPostProcess: {
s/<CENTER>//gi;
s/size=\"?\d\"?//gi;
}

# de_palmfaq.site
URL: http://palmfaq.de
Name: PalmFAQ.de
Levels: 2
ContentsDiff: 1
StoryCacheable: 1

# de_pdassi_news.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.3, 5.3.03
URL: http://pdassi.de/news1.php
Name: pdassi News
Description: German Palm site
AuthorName: Stefan Schwingeler
AuthorEMail: stobs /at/ web . de
ContentsDiff: 1
Levels: 2
ImageURL: http://pdassi.de/images/.*
StoryToPrintableSub: s/SID=[a-z0-9]+/SID=1/
StoryPostProcess: {
s/<small>//gi;
}

# de_pdassi_software.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.3, 5.3.03
URL: http://pdassi.de/wcf/newuploads.php
AddURL: http://pdassi.de/wcf/newupdates.php
AddURL: http://pdassi.de/wcf/newprc.php
Name: pdassi Software
Description: German Palm site
AuthorName: Stefan Schwingeler
AuthorEMail: stobs /at/ web . de
ContentsDiff: 1
Levels: 2
ImageURL: http://.*/util/screenshot.php\?pid=\d+.*
StoryToPrintableSub: s/SID=[a-z0-9]+/SID=1/
StoryPostProcess: {
s/align="center"//gi;
s/<small>//gi;
}


# de_rn_do.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.1, 12.06.2003
URL: http://www.westline.de/lokal/main.php?link=do%2F%2Fln

Name: RN Do
Levels: 2
ContentsDiff: 1
ImageURL: .*\.gif

ContentsStart: <!--beginn hauptteil-->
ContentsEnd: <!--ende hauptteil-->
StoryURL: http://www.westline.de/lokal/mono.php.*
StoryStart: <!--beginn hauptteil-->
StoryEnd:<!--ende hauptteil-->

# de_Spiegel.site
# This is a sitescooper site file. see http://sitescooper.cx/
# by Stefan Schwingeler, Version 0.6, 6.2.03
# History:
# "fixed" by by L****n Wulff, L****n@multimediaconnection.de
# rewritten with new PDA-link (no pics) by Stefan Schwingeler

URL: http://www.spiegel.de/dertag/pda/ava...r140=1,00.html

Name: Der Spiegel
Description: German news magazine
AuthorName: Stefan Schwingeler
AuthorEMail: stobs /at/ web . de

Levels: 2
StoryURL: http://www.spiegel.de/dertag/pda/ava...tikel/.*\.html

# de_spiegel_schlagzeilen.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.1, 18.2.03

URL: http://www.spiegel.de/schlagzeilen/

Name: Der Spiegel Schlagzeilen
Description: German news magazine
AuthorName: Stefan Schwingeler
AuthorEMail: stobs /at/ web . de

Levels: 2
ContentsStart: mark a:visited
ContentsEnd: OAS_RICH('Right');
StoryURL: http://www.spiegel.de/.+/\d+,\d+,\d+,\d+\.html
StoryURL: http://www.spiegel.de/.+/\d+,\d+,druck-\d+,\d+\.html
ImageURL: http://www.spiegel.de/img/\d+,\d+,\d+,\d+\.jpg

StoryToPrintableSub: s:^(\S+/?\S+/0,\d+,)(\d+,\d+\.html):\1druck-\2:

# de_stern.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.2, 5.3.04
URL: http://www.stern.de/pda/
#AddURL: http://www.stern.de/pda/?pda=1&rubrik=politik
#AddURL: http://www.stern.de/pda/?pda=1&rubrik=wirtschaft
#AddURL: http://www.stern.de/pda/?pda=1&rubrik=sport
#AddURL: http://www.stern.de/pda/?pda=1&rubrik=kultur
#AddURL: http://www.stern.de/pda/?pda=1&rubrik=computer
#AddURL: http://www.stern.de/pda/?pda=1&rubrik=campus
#AddURL: http://www.stern.de/pda/?pda=1&rubrik=wissenschaft
#AddURL: http://www.stern.de/pda/?pda=1&rubrik=lifestyle

Name: Stern
Levels: 2
ContentsDiff: 1
ImageURL: .*\.jpg

# ContentsURL: http://www.stern.de/pda/\?pda=1\&rubrik=.*
ContentsStart: <strong>Lifestyle</strong>
ContentsEnd: <!-- FOOTER START -->
StoryURL: http://www.stern.de/.*/index.html\?id=\d+\&pda=1

StoryStart: Beginn des Artikels
StoryEnd: <!-- FOOTER START -->

# StoryHeadline: <div id="artikelKopf1">(.*?)</div>
StoryHeadline: <h1>(.*?)</h1>

# remove bigfont in <h2>
StoryPostProcess: {
s/<h2>//gi;
}

# de_sz_kultur.site
# This is a sitescooper site file. see http://sitescooper.org/
# Stefan Schwingeler 2.2.04
URL: http://www.sueddeutsche.de/kultur/ticker/
Name: SZ Kultur
Description: Ressort Münchner Kultur der Süddeutschen Zeitung
Levels: 2
ContentsStart: <!-- beginn content -->
ContentsEnd: <!--ende weiterethemen-->
StoryURL: http://www.sueddeutsche.de/kultur/artikel/.+
StoryStart: <!-- beginn content -->
StoryEnd: <!-- ende content -->
ImageURL: .*\.jpg

# de_tagesschau.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.1, 6.2.03
URL: http://www.tagesschau.de/mobileTS

Name: Tagesschau Mobil
Description: German news show
AuthorName: Stefan Schwingeler
AuthorEMail: stobs /at/ web . de


Levels: 3
ImageURL: .*/image/.*\.jpg
SizeLimit: 1000
Level2Cachable: 0
Level3Cachable: 0
Level4Cachable: 0
ContentsCachable: 0

# de_teltarif.site
# This is a sitescooper site file. see http://sitescooper.tsx.org/
# by Stefan Schwingeler, Version 0.3, 24.02.04 11:12
URL: http://www.teltarif.de/arch/woche.html
Name: Teltarif
Levels: 2
ContentsDiff: 1
ContentsStart: <!-- Add Ad End -->
StoryURL: http://www.teltarif.de/arch/\d\d\d\d/kw\d+/s\d+\.html
ImageURL: http://www.teltarif.de/arch/\d\d\d\d/kw\d+/.+\.jpg
StoryStart: <!-- Add Ad End -->
StoryEnd: Ihre Meinungen und Erfahrungen

# de_tvspielfilm.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler and Carsten Clasohm, Version 1.1, 28.11.03 11:09
# modified by Stefan Schwingeler 25.04.01 11:26: UseTableSmarts: 0

URL: http://www.tomorrow-newmedia.de/mobi.../tvs/tipps.php
Name: TV-Spielfilm
Levels: 2
ContentsDiff: 0
StoryCachable: 0
StoryURL: http://www.tomorrow-newmedia.de/mobi...tgo/tvs/gen/.*
ImageURL: .+\.gif
StoryUseTableSmarts: 0

# de_wortfilter.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.2, 18.02.04
URL: http://www.wortfilter.de/news.html

Name: wortfilter.de
Description: Deutsche eBay Infos von wortfilter.de
AuthorName: Stefan Schwingeler
AuthorEMail: stobs /at/ web . de
Levels: 2
ContentsDiff: 1
StoryURL: http://www.wortfilter.de/News/news\d+.html
StoryStart: <h1>
StoryEnd: alt="Vorherige Meldung"

# remove center commands
StoryPostProcess: {
s/align="center"//gi;
}

# de_yahoo_bvb.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.2, 24.09.03 10:19
URL: http://de.sports.yahoo.com/foot/germ/t/dort
Name: Yahoo BvB
Levels: 2
Description: Yahoo Bvb News
AuthorName: Stefan Schwingeler
AuthorEMail: stobs /at/ web . de
ContentsDiff: 1
ImageURL: .*\.jpg
StoryURL: http://de.sports.yahoo.com/\d+/\d+/.*.html
ContentsStart: >Verein</a>
ContentsEnd: Durchsuchen Sie das Archiv
ContentsDiff: 1
StoryStart: Tageshöhepunkte
StoryEnd: >Diskutieren Sie über Fußball</a>

# remove small font commands
StoryPostProcess: {
s/<size=-?\+?\d>//gi;
s/<center>//gi;
}

#eof
stobs is offline  
Old 03-13-2004, 10:00 AM   #2
ignatz
mechanoholic
ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.ignatz ought to be getting tired of karma fortunes by now.
 
ignatz's Avatar
 
Posts: 582
Karma: 1000217
Join Date: Mar 2004
Location: Sarasota, FL
Device: Nook STR/iPhone 4S/EVO 4G
stobs, thanks for posting. It was some of your early posts that helped me figure out how to get started with sitescooper. I hope we can get a good discussion going here...
ignatz is offline  
Advert
Old 03-13-2004, 03:06 PM   #3
stobs
Connoisseur
stobs is on a distinguished road
 
Posts: 62
Karma: 72
Join Date: Oct 2002
Location: Germany
Device: nook
That's great to hear
perhaps we can find Justin in this board soon.

I seem not to reach him by email, perhaps his SpamAssassin is too good.

-S.

Quote:
Originally Posted by ignatz
stobs, thanks for posting. It was some of your early posts that helped me figure out how to get started with sitescooper. I hope we can get a good discussion going here...
stobs is offline  
Old 03-17-2004, 05:36 AM   #4
stobs
Connoisseur
stobs is on a distinguished road
 
Posts: 62
Karma: 72
Join Date: Oct 2002
Location: Germany
Device: nook
heise.de updated their style, so I updated de_heise.site.

# de_heise.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 1.0, 16.3.03
# this version is with pictures.
URL: http://www.heise.de/
Name: Heise Newsticker
Description: German Heise IT-news
Levels: 2
ContentsStart: <!-- Liste der Meldungen -->
ContentsEnd: <!-- &Uuml;berblick -->
ContentsCachable: 0
ContentsDiff: 1
StoryURL: http://www.heise.de/newsticker/(meldung/print|meldung)/\d+
StoryStart: <!-- Meldung -->
StoryEnd: <!-- untere News-Navigation -->
StoryCacheable: 1
StoryLifetime: 2
ImageURL: http://www.heise.de/bilder/.+
# StoryHeadline: <HEISETEXT>\s+<b>(.+)</b>

StoryToPrintableSub: s,/newsticker/meldung/(\d+),/newsticker/meldung/print/\1,


Quote:
Originally Posted by stobs
I add my collection of German site file.
I started each file with the name of the file, like:
# de_*.site
stobs is offline  
Old 03-17-2004, 10:58 AM   #5
TadW
Uebermensch
TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.
 
TadW's Avatar
 
Posts: 2,583
Karma: 1094606
Join Date: Jul 2003
Location: Italy
Device: Kindle
Stefan,

thanks for sharing your scoops. I can definitely use them as templates for my own scoops, which, of course, I will also make public.
TadW is offline  
Advert
Old 03-17-2004, 01:49 PM   #6
stobs
Connoisseur
stobs is on a distinguished road
 
Posts: 62
Karma: 72
Join Date: Oct 2002
Location: Germany
Device: nook
Hi TadW,

They are published since "ever":
http://sitescooper.org/dist/site_sam...ional_germany/

I have only problems to get them updated by jmason (he has very effective spam-filter

So I post it here. It will be very useful that they will updated at sitescooper.org and not to split the collection of site files, it's only useful to collect some input here and than send them to jmason (or check then in with CVS directly)

Quote:
Originally Posted by TadW
thanks for sharing your scoops. I can definitely use them as templates for my own scoops, which, of course, I will also make public.
stobs is offline  
Old 02-15-2005, 04:57 AM   #7
stobs
Connoisseur
stobs is on a distinguished road
 
Posts: 62
Karma: 72
Join Date: Oct 2002
Location: Germany
Device: nook
I updated some Germany scoops, they're attached as a zip-file as well.
de_bvb.site
de_digitalkamera.site
de_digitv_premiere.site
de_gms_reise.site
de_heisec.site
de_heise_aktuell.site
de_heise_mobil.site
de_heise_tp.site
de_heise_tr_aktuell.site
de_mobile2day.site
de_pdassi_news.site
de_pdassi_software.site
de_plockmag.site
de_wortfilter.site
de_yahoo_bvb.site
de_yahoo_golf.site
de_yahoo_hot.site

-stobs.

<code>
# de_bvb.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.4, 26.11.04
URL: http://borussia-dortmund.lycos.de/?Z%1B%E7%F4%9D

Description: Borussia Dortmund News (Soccer)
AuthorName: Stefan Schwingeler
AuthorEMail: stobs /at/ web . de
ImageURL: .*\.jpg
ContentsStart: <!-- CONTENT ANFANG -->
ContentsEnd: <!-- CONTENT ENDE -->
StoryStart: inhalt_header
StoryEnd: d_oben.gif

Name: BvB
Levels: 2
#EOF

# de_digitalkamera.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.1, 11.9.03
URL: http://www.digitalkamera.de/Info/
Name: German Digitalcamera.de News
Description: German Digitalcamera News
Levels: 2
ContentsStart: weiter zur nächsten Seite
ContentsEnd: Diese Seite wurde redaktionell von
ContentsCachable: 0
ContentsDiff: 1
StoryURL: http://www.digitalkamera.de/Info/News/\d+/\d+\.htm
ImageURL: http://images.digitalkamera.de/.*\.jpg
StoryStart: <h3>
StoryEnd: "PurpleText" preview="End-Text"
StoryCacheable: 1
StoryLifetime: 2
#EOF

# de_digitv_premiere.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.8, 16.02.04 14:20
URL: http://www.digitalfernsehen.de/tv-se...ndex_1687.html
Name: Digitv premiere news
Description: German Premiere Infos
Levels: 2
ContentsStart: <!-- Linke Navigation ENDE -->
ContentsEnd: <!-- Premiere News Snippet Ende -->
ContentsCachable: 0
ContentsDiff: 1
StoryURL: http://www.digitalfernsehen.de/news/news\_\d+\.html
StoryStart: <!-- Newsbeitrag start -->
StoryEnd: onClick="return printwindow();
StoryCacheable: 1
StoryLifetime: 2
ImageURL: http://www.digitalfernsehen.de/news/img/.+\.gif
#EOF

# de_gms_reise
# by geoffreynz, see https://www.mobileread.com/forums/showthread.php?t=2020
# Version 0.1
URL: http://www.ikz-online.de/ikz/ikz.rei...ueberblick.php
Levels: 2
Name: gms Reise
ContentsStart: header.berichte2.gif
ContentsEnd: <!-- Ende - Z_2sp_dpa_Uebers_Fortl_SQL -->
StoryURL: http://www.ikz-online.de/.*
StoryStart: <!-- Ende - Z_2sp_Multicom_Lang_SQL -->
StoryEnd: <span class="contentfliess">
ImageURL: http://www.ikz-online.de/includes/bi....php?.*.nitf.*
#EOF

# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.2, 17.02.04 11:32
URL: http://www.heise.de/security/
Name: Heise Security
Description: German Heise Security news
Levels: 2
ContentsStart: <!-- Titel -->
# ContentsEnd: <!-- Kaesten -->
ContentsCachable: 0
ContentsDiff: 1
StoryURL: http://www.heise.de/security/(artikel/print|artikel)/\d+
StoryStart: <HEISETEXT>
StoryEnd: <!-- news-steuerung anfang -->
StoryCacheable: 1
ImageURL: http://heise.de/mobil/artikel/.*/aufmacher\.jpg
StoryToPrintableSub: s,/security/artikel/(\d+),/security/artikel/print/\1,
#EOF

# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.5, 06.07.04 10:08
URL: http://www.heise.de/ct/aktuell/
Name: Heise Aktuell
Description: German Heise Mobil-news
Levels: 2
ContentsStart: <HEISETEXT>
ContentsEnd: </HEISETEXT>
ContentsCachable: 0
ContentsDiff: 1
StoryURL: http://www.heise.de/ct/aktuell/(meldung/print|meldung)/\d+
StoryStart: <HEISETEXT>
StoryEnd: </HEISETEXT>
StoryCacheable: 1
# pix missing ?!
ImageURL: http://www.heise.de/bilder/.*
StoryToPrintableSub: s,/ct/aktuell/meldung/(\d+),/ct/aktuell/meldung/print/\1,
# EOF

# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.4, 14.2.05
URL: http://www.heise.de/mobil/
Name: Heise Mobil
Description: German Heise Mobil-news
Levels: 2
# Ticker ignorieren:
ContentsStart: zur News-\&Uuml;bersicht
ContentsEnd: <!-- MITTE+RECHTS -->
ContentsCachable: 0
ContentsDiff: 1
StoryURL: http://www.heise.de/mobil/artikel/\d+
StoryStart: <HEISETEXT>
StoryEnd: </HEISETEXT>
StoryCacheable: 1
ImageURL: http://www.heise.de/bilder/.+

# remove small font commands
StoryPostProcess: {
s/<font size=1>//gi;
}
#EOF

# de_heise_tp.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.7, 23.11.04 09:38
# this version is with pictures.

URL: http://www.heise.de/tp/news-xl.rdf
Name: Telepolis News
Description: Telepolis News
ContentsFormat: rss

StoryURL: http://www.telepolis.de/.+/artikel/.+\.html

StoryStart: <HEISETEXT>
StoryEnd: </HEISETEXT>
ImageURL: .*\.gif
ImageURL: .*\.jpg

# (This is a sitescooper site file. see http://sitescooper.org/
# It was generated from the site's RSS by rss-to-site.pl 1.1.)
#EOF

# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.1, 16.02.04 10:37
URL: http://www.heise.de/tr/aktuell/
Name: Heise TR Aktuell
Description: German Heise Technology Review
Levels: 2
ContentsStart: <HEISETEXT>
ContentsEnd: </HEISETEXT>
ContentsCachable: 0
ContentsDiff: 1
StoryURL: http://www.heise.de/tr/aktuell/(meldung/print|meldung)/\d+
StoryStart: <HEISETEXT>
StoryEnd: </HEISETEXT>
StoryCacheable: 1
ImageURL: http://www.heise.de/bilder/.*

StoryToPrintableSub: s,/tr/aktuell/meldung/(\d+),/tr/aktuell/meldung/print/\1,
#EOF

# de_mobile2day.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.3, 14.10.03
URL: http://www.mobile2day.de/pdanews_all...ext=&isLimit=1
Name: mobile2day
Description: German PDA-News
AuthorName: Stefan /at/ Schwingeler.de
ContentsDiff: 1
Levels: 2
StoryPostProcess: {
s/<CENTER>//gi;
s/size=\"?\d\"?//gi;
}
#EOF

# de_pdassi_news.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.3, 5.3.03
URL: http://pdassi.de/news1.php
Name: pdassi News
Description: German Palm site
AuthorName: Stefan Schwingeler
AuthorEMail: stobs /at/ web . de
ContentsDiff: 1
Levels: 2
ImageURL: http://pdassi.de/images/.*
StoryToPrintableSub: s/SID=[a-z0-9]+/SID=1/
StoryPostProcess: {
s/<small>//gi;
}
#EOF

# de_pdassi_software.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.3, 5.3.03
URL: http://pdassi.de/wcf/newuploads.php
AddURL: http://pdassi.de/wcf/newupdates.php
AddURL: http://pdassi.de/wcf/newprc.php
Name: pdassi Software
Description: German Palm site
AuthorName: Stefan Schwingeler
AuthorEMail: stobs /at/ web . de
ContentsDiff: 1
Levels: 2
ImageURL: http://.*/util/screenshot.php\?pid=\d+.*
StoryToPrintableSub: s/SID=[a-z0-9]+/SID=1/
StoryPostProcess: {
s/align="center"//gi;
s/<small>//gi;
}
#EOF

# de_plockmag.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.1, 12.09.04 15:40
URL: http://www.plockmag.de/
Name: Plock! German Golf Magazin
Description: German Golf Magazin
Levels: 2
ContentsStart: <!-- END PARTNER PROGRAM -->
# ContentsEnd:
StoryURL: http://www.plockmag.de/(news|hintergrund|kolumnen)/.+\.html
StoryStart: <!-- END PARTNER PROGRAM -->
# StoryEnd:
StoryCacheable: 1
ImageURL: http://www.plockmag.de/img/.+\.jpg
#EOF

# de_wortfilter.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.2, 18.02.04
URL: http://www.wortfilter.de/news.html
Name: wortfilter.de
Description: Deutsche eBay Infos von wortfilter.de
AuthorName: Stefan Schwingeler
AuthorEMail: stobs /at/ web . de
Levels: 2
ContentsDiff: 1
StoryURL: http://www.wortfilter.de/News/news\d+.html
StoryStart: <h1>
StoryEnd: alt="Vorherige Meldung"
# remove center commands
StoryPostProcess: {
s/align="center"//gi;
}
#EOF

# de_yahoo_bvb.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.2, 24.09.03 10:19
URL: http://de.sports.yahoo.com/foot/germ/t/dort
Name: Yahoo BvB
Levels: 2
Description: Yahoo Bvb News
AuthorName: Stefan Schwingeler
ContentsDiff: 1
ImageURL: .*\.jpg
StoryURL: http://de.sports.yahoo.com/\d+/\d+/.*.html
ContentsStart: >Verein</a>
ContentsEnd: Durchsuchen Sie das Archiv
ContentsDiff: 1
StoryStart: Tageshöhepunkte
StoryEnd: >Diskutieren Sie über Fußball</a>

# remove small font commands
StoryPostProcess: {
s/<size=-?\+?\d>//gi;
s/<center>//gi;
}
#EOF

# de_yahoo_golf.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.1, 6.09.04
URL: http://de.sports.yahoo.com/go/
Name: Yahoo Golf
Levels: 2
Description: Yahoo Golf News
AuthorName: Stefan Schwingeler
ContentsDiff: 1
ImageURL: .*\.jpg
StoryURL: http://de.sports.yahoo.com/\d+/\d+/.+\.html
ContentsStart: Mehr News</div>
ContentsEnd: Mehr News...</a>
ContentsDiff: 1
StoryStart: <h3>
StoryEnd: Per Mail senden</a>

# remove small font commands
StoryPostProcess: {
s/<size=-?\+?\d>//gi;
s/<center>//gi;
}
#EOF

# de_yahoo_hot.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.3, 18.1.05
URL: http://de.fc.yahoo.com/s/sexualitaet.html
Name: Yahoo hot
Levels: 2
Description: Yahoo Sexualität
AuthorName: Stefan Schwingeler
ContentsDiff: 1
ImageURL: .*\.jpg
StoryURL: http://de.news.yahoo.com/\d+/\d+/.*.html
ContentsStart: <b>&nbsp;Top Meldung</b>
ContentsEnd: Meistverschickte Artikel und Fotos
ContentsDiff: 1
StoryStart: <font face=verdana size=-2>
StoryEnd: Mein Yahoo</a>

# remove small font commands
StoryPostProcess: {
s/<font face=.+>//gi;
s/<center>//gi;
}
#EOF
</code>
Attached Files
File Type: zip sites.zip (7.6 KB, 1389 views)
stobs is offline  
Old 11-05-2005, 06:42 PM   #8
stobs
Connoisseur
stobs is on a distinguished road
 
Posts: 62
Karma: 72
Join Date: Oct 2002
Location: Germany
Device: nook
Hi all,

my updated site-files:

de_handhirn.site
de_heisec.site
de_heise_tp.site (new: with pictures included)
de_plockmag.site

--------------snipp--------------
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.1, 29.09.05 16:42
URL: http://www.handhirn.de/pda/news.php
Name: Handhirn news
Description: German Palm Mobil-news
Levels: 2
# EOF

# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 1.0, 4.11.05
# from rss back to title-page, because of pictures

URL: http://www.heise.de/tp/
Name: Heise Telepolis
Levels: 2
ContentsStart: <!--OAS AD="Left2"-->
ContentsEnd: <!--OAS AD="Bottom"-->
ContentsDiff: 1
StoryURL: http://www.heise.de/(tp/.*/\d+/\d.html|bin/tp/issue/r4/dl-artikel.*)
StoryCacheable: 1
StoryLifetime: 2
# StoryToPrintableSub: s,/tp/.*/(\d+)/\d\.html,/bin/tp/issue/r4/dl-artikel2.cgi?artikelnr=\1&mode=print,
StoryStart: <HEISETEXT>
StoryEnd: </HEISETEXT>
ImageURL: .*\.gif
ImageURL: .*\.jpg
StoryPostProcess: {
s/<font size="\+1">([^<]+)<\/font>/<b>$1<\/b>/sgi;
s/<font size="\+2" ?>([^<]+)<\/font><br>/<h2>$1<\/h2>/sgi;
}
#eof

# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.4, 23.9.05
URL: http://www.heise.de/security/
Name: Heise Security
Description: German Heise Security news
Levels: 2
ContentsStart: <h3>Top-Themen</h3>
# ContentsEnd: <!-- Kaesten -->
ContentsCachable: 0
ContentsDiff: 1
StoryURL: http://www.heise.de/security/(meldung|artikel/print|artikel)/\d+
StoryStart: <HEISETEXT>
StoryEnd: <!-- news-steuerung anfang -->
StoryCacheable: 1
ImageURL: http://heise.de/mobil/artikel/.*/aufmacher\.jpg
StoryToPrintableSub: s,/security/artikel/(\d+),/security/artikel/print/\1,
#EOF


# de_plockmag.site
# This is a sitescooper site file. see http://sitescooper.org/
# by Stefan Schwingeler, Version 0.2, 31.03.05
URL: http://www.plockmag.de/
Name: Plockmag
Description: Plock! German Golf Magazin
Levels: 2
ContentsStart: <!-- END PARTNER PROGRAM -->
# ContentsEnd:
StoryURL: http://www.plockmag.de/(news|hintergrund|kolumnen)/.+\.html
StoryStart: <!-- END PARTNER PROGRAM -->
# StoryEnd:
StoryCacheable: 1
ImageURL: http://www.plockmag.de/img/.+\.jpg
#EOF


-S.
stobs is offline  
 


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Does anyone get a de-DRMed German-German Dictionary? zhaoxiaochen Kindle Developer's Corner 0 06-13-2010 05:05 AM
German Reader Owners...Discussion in German (read-only) benz Deutsches Forum 124 05-02-2008 10:58 AM
German Girl who's not speaking english very well is looking for help (German Thread) Chenrezig Which one should I buy? 20 12-17-2007 04:01 PM
The "Kindle" iPod of reading - Newsweek scoops Nate the great Amazon Kindle 129 11-30-2007 09:06 AM
Sitescooper people? Alexander Turcic Lounge 10 01-14-2003 03:20 AM


All times are GMT -4. The time now is 10:22 PM.


MobileRead.com is a privately owned, operated and funded community.