Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 08-29-2024, 01:26 PM   #1
Shohreh
Groupie
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 181
Karma: 304158
Join Date: Jan 2016
Device: none
Question [SOLVED] [OCR] Extract text layer, fix errors, re-import?

Hello,

I notice some typos in the text layer added by an OCR into a "bitmap" PDF, ie. pages are actually scanned pages.

I first tried opening the EPUB generated by Abbyy Finereader, but LibreOffice couldn't open it at all, while Sigil could after showing an error message but lacks a French dictionary to run the job (as far as I can tell).

As an alternative, pdftotext or mutool (convert) can extract the text layer from such PDF, but can they put it back after I fixed the typos?

Thank you.

--
Edit: An easy solution is to convert the PDF to EPUB using Abbyy Finereader, and then run the HTML files within through a spellchecker.

Last edited by Shohreh; 08-30-2024 at 04:28 AM.
Shohreh is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Can't extract text in image for MOBI/AZW3, despite using OCR, in Calibre for Kindle ck18ss@brocku.ca Conversion 1 08-15-2022 06:34 PM
(Open-source) application to extract text layer? Shohreh PDF 5 02-11-2022 09:00 AM
Tool to OCR an "image" PDF → add text as extra layer? Shohreh PDF 5 12-19-2020 01:47 PM
OCRmyPDF adds OCR text layer to scanned PDF files orebmur PDF 0 01-20-2018 07:16 PM
Scanned text pdf with OCR but graphical layer instead vectorial whopper PDF 2 09-10-2011 07:32 PM


All times are GMT -4. The time now is 05:42 AM.


MobileRead.com is a privately owned, operated and funded community.