07-07-2024, 08:35 AM | #1 |
Junior Member
Posts: 5
Karma: 10
Join Date: Jun 2024
Device: desktop
|
Conversion from pdf to txt yields (near) empty file
Hi there,
I'm faced with a small problem: I am trying to convert a pdf to txt, and that pdf allows me to select and copy paste (it's not just images). I first tried: Code:
pdftotext source.pdf source.txt Code:
ebook-convert source.pdf source.txt Does anybody know if there are options I could try in this case, or is it likely hopeless (e.g. because of the complexity of the pdf) ? Using ebook-convert (calibre 7.13.0). Thanks a lot in advance! |
07-07-2024, 11:07 AM | #2 |
creator of calibre
Posts: 44,551
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
If you can select, you can possibly select all and copy paste to get the text out.
|
Advert | |
|
07-07-2024, 11:25 AM | #3 |
Zealot
Posts: 111
Karma: 642206
Join Date: Mar 2021
Device: Kindle Voyage
|
If copy/paste does not work, then you could try OCR.
Mind you it didn't work for me when I tried it. I had the same issue on a pdf of a really old public domain book and when I took a closer look at the pdf, I came to the conclusion that it was just a collection of photographs of the pages of a physical book. Some of the pages were not exactly straight and some were more faded than others. Because I prefer using my kindle and because reading such pdf's on my kindle is such a pain, I tried a few things to get an epub text version including using OCR. But although the quality of the images was enough to make it easily readable as a pdf, it obviously was not good enough for OCR, so I basically gave up and read the pdf on my computer |
07-07-2024, 11:37 AM | #4 |
Junior Member
Posts: 5
Karma: 10
Join Date: Jun 2024
Device: desktop
|
I tried, but somehow only the text of the current page gets copied (both in Calibre and in MacOS Preview). After selecting all, the current page content is selected, but other pages are somehow selected 'overall', unsure how to describe this, and the content is therefore not captured by the copying.
|
Tags |
.pdf, .txt, conversion .pdf .txt, ebook-convert |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
conversion to txt or rtf makes empty file | lunixer | Calibre | 10 | 08-25-2010 05:56 PM |
PDF to TXT conversion | alkr | Calibre | 0 | 10-02-2009 05:34 AM |
Newbie Question re txt File Conversion | GJN | Calibre | 7 | 09-04-2009 08:40 AM |
Calibre comic conversion yields poor results when target is epub; looks fine on LRF | acidzebra | Calibre | 2 | 08-17-2009 11:54 AM |