Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 07-07-2024, 08:35 AM   #1
jchwenger
Junior Member
jchwenger began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jun 2024
Device: desktop
Conversion from pdf to txt yields (near) empty file

Hi there,

I'm faced with a small problem: I am trying to convert a pdf to txt, and that pdf allows me to select and copy paste (it's not just images).

I first tried:

Code:
pdftotext source.pdf source.txt
That yielded a file with only ^L page breaks. Then I tried:

Code:
ebook-convert source.pdf source.txt
The result has the big titles in there, but literally nothing more.

Does anybody know if there are options I could try in this case, or is it likely hopeless (e.g. because of the complexity of the pdf) ?

Using ebook-convert (calibre 7.13.0).

Thanks a lot in advance!
jchwenger is offline   Reply With Quote
Old 07-07-2024, 11:07 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,566
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
If you can select, you can possibly select all and copy paste to get the text out.
kovidgoyal is offline   Reply With Quote
Old 07-07-2024, 11:25 AM   #3
sgmoore
Zealot
sgmoore ought to be getting tired of karma fortunes by now.sgmoore ought to be getting tired of karma fortunes by now.sgmoore ought to be getting tired of karma fortunes by now.sgmoore ought to be getting tired of karma fortunes by now.sgmoore ought to be getting tired of karma fortunes by now.sgmoore ought to be getting tired of karma fortunes by now.sgmoore ought to be getting tired of karma fortunes by now.sgmoore ought to be getting tired of karma fortunes by now.sgmoore ought to be getting tired of karma fortunes by now.sgmoore ought to be getting tired of karma fortunes by now.sgmoore ought to be getting tired of karma fortunes by now.
 
Posts: 111
Karma: 642206
Join Date: Mar 2021
Device: Kindle Voyage
If copy/paste does not work, then you could try OCR.

Mind you it didn't work for me when I tried it.

I had the same issue on a pdf of a really old public domain book and when I took a closer look at the pdf, I came to the conclusion that it was just a collection of photographs of the pages of a physical book. Some of the pages were not exactly straight and some were more faded than others.

Because I prefer using my kindle and because reading such pdf's on my kindle is such a pain, I tried a few things to get an epub text version including using OCR. But although the quality of the images was enough to make it easily readable as a pdf, it obviously was not good enough for OCR, so I basically gave up and read the pdf on my computer
sgmoore is offline   Reply With Quote
Old 07-07-2024, 11:37 AM   #4
jchwenger
Junior Member
jchwenger began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jun 2024
Device: desktop
Quote:
Originally Posted by kovidgoyal View Post
If you can select, you can possibly select all and copy paste to get the text out.
I tried, but somehow only the text of the current page gets copied (both in Calibre and in MacOS Preview). After selecting all, the current page content is selected, but other pages are somehow selected 'overall', unsure how to describe this, and the content is therefore not captured by the copying.
jchwenger is offline   Reply With Quote
Reply

Tags
.pdf, .txt, conversion .pdf .txt, ebook-convert


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
conversion to txt or rtf makes empty file lunixer Calibre 10 08-25-2010 05:56 PM
PDF to TXT conversion alkr Calibre 0 10-02-2009 05:34 AM
Newbie Question re txt File Conversion GJN Calibre 7 09-04-2009 08:40 AM
Calibre comic conversion yields poor results when target is epub; looks fine on LRF acidzebra Calibre 2 08-17-2009 11:54 AM


All times are GMT -4. The time now is 08:11 PM.


MobileRead.com is a privately owned, operated and funded community.