[OCR] Extract text layer, fix errors, re-import?

Shohreh · 08-29-2024, 01:26 PM

Hello,

I notice some typos in the text layer added by an OCR into a "bitmap" PDF, ie. pages are actually scanned pages.

I first tried opening the EPUB generated by Abbyy Finereader, but LibreOffice couldn't open it at all, while Sigil could after showing an error message but lacks a French dictionary to run the job (as far as I can tell).

As an alternative, pdftotext or mutool (convert) can extract the text layer from such PDF, but can they put it back after I fixed the typos?

Thank you.

--
Edit: An easy solution is to convert the PDF to EPUB using Abbyy Finereader, and then run the HTML files within through a spellchecker.

08-29-2024, 01:26 PM	#1
Shohreh Groupie Posts: 181 Karma: 304158 Join Date: Jan 2016 Device: none	[SOLVED] [OCR] Extract text layer, fix errors, re-import? Hello, I notice some typos in the text layer added by an OCR into a "bitmap" PDF, ie. pages are actually scanned pages. I first tried opening the EPUB generated by Abbyy Finereader, but LibreOffice couldn't open it at all, while Sigil could after showing an error message but lacks a French dictionary to run the job (as far as I can tell). As an alternative, pdftotext or mutool (convert) can extract the text layer from such PDF, but can they put it back after I fixed the typos? Thank you. -- Edit: An easy solution is to convert the PDF to EPUB using Abbyy Finereader, and then run the HTML files within through a spellchecker. Last edited by Shohreh; 08-30-2024 at 04:28 AM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Can't extract text in image for MOBI/AZW3, despite using OCR, in Calibre for Kindle	ck18ss@brocku.ca	Conversion	1	08-15-2022 06:34 PM
(Open-source) application to extract text layer?	Shohreh	PDF	5	02-11-2022 09:00 AM
Tool to OCR an "image" PDF → add text as extra layer?	Shohreh	PDF	5	12-19-2020 01:47 PM
OCRmyPDF adds OCR text layer to scanned PDF files	orebmur	PDF	0	01-20-2018 07:16 PM
Scanned text pdf with OCR but graphical layer instead vectorial	whopper	PDF	2	09-10-2011 07:32 PM