|
|
Thread Tools | Search this Thread |
08-08-2019, 03:30 AM | #1 |
Junior Member
Posts: 1
Karma: 10
Join Date: Aug 2019
Device: Kindle Paperwhite
|
Cropping PDFs for EPUB conversion using BRISS, Ghostscript and/or Calibre
Hello! I'm new to this so please forgive me if this is basic knowledge.
I have a PDF file which is OCRed. I would like to convert it to epub. The main problem is that I'd like to crop my pdf so I do not have duplicate Headers or Page Numbers in my epub. I have tried first OSX's Preview, then Briss for that. I then tried to run it through calibre epub conversion. Didn'nt work. I then used ghostscript to extract the text: Code:
gs -sDEVICE=txtwrite -o extractedText%d.txt input.pdf Then I read on here that If you run the Briss PDF output through Ghostscript to generate a new PDF, I believe it will permanently get rid of the cropped-out material so that it won't come back in calibre. This user suggested this command: Code:
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf What am I missing here? This can't be so difficult, - can it? |
08-09-2019, 12:36 AM | #2 |
Enthusiast
Posts: 35
Karma: 14720
Join Date: Mar 2016
Device: kindle voyage, Kobo Forma, Kobo Aura One
|
Have you tried ScanTailor? It is free and open source. I have a mac so I use ScanTailor via Crossover. Though if you have macports installed then ScanTailor is easy to install. Unfortunately Homebrew does not have a cask for it yet.
http://scantailor.org/ It is designed as a preprocessing tool so it works on batches of scanned images. If you already have a pdf then simply export the pages as images and enter them into ScanTailor. Then use the various settings to crop the headers and page numbers, deskew, set margins etc . It will output in Tif format. There is no easy one click method that I have found to batch crop out extraneous material from scanned images, |
Advert | |
|
08-09-2019, 01:04 PM | #3 | |
Wizard
Posts: 1,613
Karma: 6718541
Join Date: Dec 2004
Location: Paradise (Key West, FL)
Device: Current:Surface Go & Kindle 3 - Retired: DellV8p, Clie UX50, ...
|
Quote:
One, "cropping" tools like Briss don't delete anything. They just set a new page size for viewing. The old data is still there; it's just off the page and out of view. Two, the PDF was OCRd before it was cropped. The headers and similar "junk" is still in the text layer from the OCR process and still "visible" to the format converter so it ends up in the ePub. You might be more successful if you "crop" the PDF first and then to the OCR. This might prevent the OCR process from "seeing" the parts that were trimmed. |
|
Tags |
briss, conversion from .pdf, ghostscript, pdf and calibre |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
PDF cropping software: BRISS | laborg | 331 | 08-18-2023 08:30 AM | |
Cropping PDFs | romnempire | 2 | 04-11-2011 12:59 AM | |
briss PDF cropping software from MR featured on LH. | Nexutix | General Discussions | 4 | 01-30-2011 12:17 AM |
Cropping .pdfs with Briss and converting with Calibre | mrslecavalier | Amazon Kindle | 6 | 07-13-2010 07:53 PM |
Cropping PDFs | harryo | iRex | 33 | 11-20-2009 10:41 AM |