06-20-2023, 03:57 PM | #1 |
Connoisseur
Posts: 80
Karma: 2178856
Join Date: Oct 2013
Device: Kobo Clara HD
|
How to extract footnotes from pdf?
My problem is to extract the footnotes from pdf or even from epub:
The footnotes at the end of a page are disturbing the ebook production from pdf or even from FR. Is there any solution how to find and extract the footnotes and collect them in a word document? And after thist operation put the collected footnotes as a whole list at the and of the book? I did that many times manually and successfully. But if the number of footnotes is very big its exhausting and not practible. Any idea what can one do? (I put the threat here because fount no better place.) |
06-25-2023, 11:22 AM | #2 |
Connoisseur
Posts: 80
Karma: 2178856
Join Date: Oct 2013
Device: Kobo Clara HD
|
09-04-2021, 10:47 PM
Trenchant Edges [New Plugin Development Plan] Extracting footnotes/endnotes and Indexing dates Link to this Thread of Trenchant Edges: https://www.mobileread.com/forums/sh...ract+footnotes Does anybody know, what happened with that project? Was there any success ot solution? |
07-07-2023, 03:14 AM | #3 | ||||
Wizard
Posts: 2,303
Karma: 12126963
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
ABBYY Finereader. Then a whole lot of elbow grease, tools, and Regular Expressions. I've written about the process I use many times over the years, for example: I've used those methods to digitize over 700 books (dense, heavily footnoted, mostly Non-Fiction). For more info, also see many other topics describing the footnote/endnote problems and steps on how to mitigate the issues:
For EPUB Footnotes? Every single one is going to have uniquely messy code, which you have to reverese engineer and come up with Regular Expressions in order to correct. - - - If you want even more fun, there was: where I described:
- - - Quote:
Quote:
In it, I also described the common OCR problem of:
This is super common with multi-page footnotes—but can randomly happen ANYWHERE in ANY book! ... and there's absolutely no way to correct this stuff without a keen eye and elbow grease. Partial Solution? Rip and Pull + Renumber Footnotes! To solve this problem and massively speed-up my workflow... I've since created 2 separate footnote helper programs for myself:
Program 1's general steps are:
Program 2's general steps are:
Program 1 will help:
Program 2 will help:
because, depending on which tools you use, the original footnote's numbering will disappear (or get mangled)! So you'll get stuff like:
where:
So you'll have to rearrange/renumber your footnotes/endnotes to:
and then relink everything from scratch. - - - Side Note: For more Program 1 and 2 info/code examples, and common footnote problems/patterns I've noticed across books, see: - - - Quote:
- - - Side Note: Another ultimate footnote topic to read is: which described the different messes you'll run across, and how to code footnotes cleanly/properly to save yourself mountains of pain/headaches in the future. Side Note #2: Anyway, if you want even more information, I highly recommend typing this into your favorite search engine:
I've written over 200 topics over the years discussing every single aspect of this digitizing-footnotes-in-ebooks problem. Last edited by Tex2002ans; 07-07-2023 at 05:05 AM. Reason: [ |
||||
07-10-2023, 04:33 PM | #4 |
Connoisseur
Posts: 80
Karma: 2178856
Join Date: Oct 2013
Device: Kobo Clara HD
|
That's very interesting for me and very helpfull for me. I hope.
I apologize for not reading your reply until today. I will try to understand and work with it. Step by step. We will see, what kind of result and success will come out of that. Thank you so much for your fantastic engagement for epubs. I value that very highly. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Extract ISBN from PDF? | mdroberts | Calibre | 14 | 12-16-2016 08:32 AM |
How to extract webm videos from PDF | JackSPk | 5 | 01-08-2016 09:38 AM | |
How to extract embedded TRUE pdf from .me file ?? | anil4523 | 0 | 06-13-2015 02:01 AM | |
Extract PDF from Palm PDB-file? | Tobago | 1 | 02-18-2010 08:32 AM | |
[REQ] Extract the first PDF page as image | Format C: | 2 | 02-09-2009 11:53 AM |