07-26-2006, 06:06 AM | #1 |
Junior Member
Posts: 1
Karma: 10
Join Date: Jul 2006
|
Plucker: Help needed with spidering
Hi,
I have been having trouble with spidering the following website, http://southseas.nla.gov.au/refs/falc/contents.html At first I had several filters on, and couldn't get past the first page. For the purposes of testing, I removed all the filters, set max depth to 2, and still could not get past the first page. Here is the progress text: --------------------------------------------------------------------- Initializing Plucker spidering engine... ----------------------------------------------------------- Updating channel: falconer... ----------------------------------------------------------- Pluckerdir is 'C:\Program Files\Plucker'... Using proxy '' with authentication for user ''... ZLib compression turned on Using exclusion list C:\Program Files\Plucker\exclusionlist.txt Using exclusion list C:\Program Files\Plucker\exclusionlist.txt ---- 0 collected, 1 to do ---- Processing http://southseas.nla.gov.au/refs/falc/contents.html... Retrieved ok. Parsed ok. ---- all 1 pages retrieved and parsed ---- Writing out collected data... Writing document 'falconer' to file C:\Program Files\Plucker\channels/falconer/falconer.pdb Converting http://southseas.nla.gov.au/refs/falc/contents.html... Converted 2: http://southseas.nla.gov.au/refs/falc/contents.html Default charset is MIBenum 2252 (windows-1252) New document <PluckerIndexDocument 'plucker:/~special~/index' at 9611924> added Converted 1: plucker:/~special~/index New document <PluckerMetadataDocument 'plucker:/~special~/metadata' at 9568372> added Converted 5: plucker:/~special~/metadata Wrote 1 <= plucker:/~special~/index Wrote 2 <= http://southseas.nla.gov.au/refs/falc/contents.html Wrote 5 <= plucker:/~special~/metadata Unknown items encountered: </tbody>: ['http://southseas.nla.gov.au/refs/falc/contents.html'] <tbody>: ['http://southseas.nla.gov.au/refs/falc/contents.html'] Done! Installing channel output to destinations... Setting new due date... Tasks completed for all channels. --------------------------------------------------------------------- If anyone could possibly point out what have i been doing wrong, I'd be much obliged. UPD: Well, I have succeeded in spidering the site after downloading sunrise XP, with the minor setback that sunrise turned out to be a sneaky son of a bitch, having its regexp filters defaulted to "exclude", resulting in me trying to download the entire internet for an hour (I got about 18% done, according to the progress bar). Thus, the problem ceased to be, but another problem arose before me - the problem of thread removal - in solving which I, sadly, failed. Last edited by goybert; 07-26-2006 at 07:44 AM. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
plucker to something | eksor | Other formats | 2 | 09-18-2009 04:52 AM |
plucker help | richasta | Reading and Management | 1 | 05-04-2008 10:10 PM |
Plucker V1.6.2 out | TadW | Reading and Management | 2 | 01-11-2004 03:02 PM |
Plucker V1.6.1 out! | Alexander Turcic | Reading and Management | 0 | 11-09-2003 12:14 PM |