04-14-2010, 11:25 PM | #1 |
Junior Member
Posts: 4
Karma: 10
Join Date: Apr 2010
Device: android
|
ISBN scrapping out of pdf
Hi, all.
I'm wondering if there is some work done to scrap isbn out of pdf content. I'm trying to get a big ebook collection in calibre but setting isbn by hand, one book at a time, would take more than a lifetime... so I thought of scrapping isbn from pdf content with some regex... I started to write a plugin for calibre but then thought of asking here if something like this wasn't already done or talked about before. Anyone? |
04-14-2010, 11:32 PM | #2 |
creator of calibre
Posts: 44,380
Karma: 23766374
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
There isn't any existing work, but basically all you need to do is the following:
modify the pdf metadata reading code (in calibre.ebooks.metadata.pdf) calibre contains a nice library for pdf reflow that converts pdf to xml use that and then search for the ISBN in the XML Basically: Code:
with CurrentDir(temp_dir): pdfreflow.reflow(stream.read()) |
Advert | |
|
04-14-2010, 11:47 PM | #3 |
Junior Member
Posts: 4
Karma: 10
Join Date: Apr 2010
Device: android
|
Ok... I've just found tickets 3013 and 4113 in calibre's trac.
Too bad I haven't found a PDF on which the feature actually worked out. I'm willing to help improve that code. If can find it first! |
04-14-2010, 11:51 PM | #4 |
Junior Member
Posts: 4
Karma: 10
Join Date: Apr 2010
Device: android
|
So the code mentioned in those tickets isn't there anymore?
Anyway, Kovid, thanks for the pointer, and for calibre by the way! I'll try to go that way... and report back on my findings. |
04-15-2010, 12:58 AM | #5 |
Junior Member
Posts: 4
Karma: 10
Join Date: Apr 2010
Device: android
|
Where can I find the source or API documentation of pdfreflow?
|
Advert | |
|
04-15-2010, 01:01 AM | #6 |
creator of calibre
Posts: 44,380
Karma: 23766374
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
see ebooks/pdf/main.cpp
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Extract ISBN from PDF? | mdroberts | Calibre | 14 | 12-16-2016 07:32 AM |
Tool: ISBN to Name | MarkDXG | Kindle Developer's Corner | 2 | 10-04-2010 09:15 AM |
Updating Metadata without and ISBN | herbycanopy | Calibre | 7 | 05-22-2010 01:16 AM |
Question about ISBN numbers | ficbot | Calibre | 2 | 12-04-2009 11:02 PM |
isbn (esbn) | HenryP | Writers' Corner | 4 | 02-22-2009 08:49 AM |