02-03-2023, 09:20 AM | #1 |
Groupie
Posts: 181
Karma: 304158
Join Date: Jan 2016
Device: none
|
Flip (*.jpg) to PDF?
Hello,
After downloading the JPGs that make up an online Flip book, I'd like to merge them into a single PDF. Two issues: 1. ImageMagick's convert is slow*, and 2. I get a read-only PDF, so will then have to run it through an OCR to get a "selectable/copyable" PDF. Is there an open-source solution that is faster and includes an OCR? Unless I missed it, neither cpdf, Mutool, nor qpdf can do it. Thank you. Code:
*c:\ImageMagick\convert.exe -quality 100 *.jpg merged.pdf Edit: jpeg2pdf takes care of job #1. Just make sure the files are in the right order, eg. 1.jpg needs to be renamed 001.jpg if there are more than 99 files. Will then have to run it through an OCR. Last edited by Shohreh; 02-03-2023 at 09:47 AM. |
02-03-2023, 01:17 PM | #2 |
the rook, bossing Never.
Posts: 12,386
Karma: 92073397
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
GIMP: Free on Linux, Mac Windows.
Change setting to have a single window. A bit of a learning curve because it's very powerful. Put each image on a new layer. Use k2pdfopt afterwards for OCR, or jpg direct with Tesseract. if top layer is 1st page, then when Export as PDF, select reverse layer order /pages |
Advert | |
|
02-03-2023, 02:21 PM | #3 | |
Grand Sorcerer
Posts: 5,535
Karma: 100606001
Join Date: Apr 2011
Device: pb360
|
ImageMagick uses libjpeg from the Independent JPEG Group. Using -quality 100 will be slow and generate large files with almost no better quality than -quality 95.
From the cjpeg man page: https://linux.die.net/man/1/cjpeg Quote:
https://www.imagemagick.org/Usage/formats/ |
|
02-03-2023, 02:42 PM | #4 |
the rook, bossing Never.
Posts: 12,386
Karma: 92073397
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
I missed that. For line stuff/text, 95 is loads of quality and photo images can be fine at 70 to 80 if no contrasty sharp edges.
|
02-04-2023, 07:13 AM | #5 |
Grand Sorcerer
Posts: 5,640
Karma: 23191067
Join Date: Dec 2010
Device: Kindle PW2
|
@Shohreh
AFAIK, Calibre supports .cbz Comic book archives. (To create a .cbz file all you have to do is zip up the image folder and change the ending from .zip to .cbz.) However, since Calibre doesn't support OCR, you'll have to ocr the file afterwards. AFAIK, willus's K2pdfopt converter supports image to pdf conversions with OCR. You might want to ask @willus about it in the k2pdfopt forum. |
Advert | |
|
02-04-2023, 09:24 PM | #6 |
Fuzzball, the purple cat
Posts: 1,286
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
k2pdfopt at this time won't concatenate multiple image files into a single PDF (no reason why it couldn't...I'll have to add that option). I use ImageMagick to do that, as has been discussed. After that, k2pdfopt can add OCR. Again, unfortunately, it will re-render each bitmapped page and re-compress.
If you want to recreate the bitmaps as faithfully as possible: k2pdfopt -dpi 300 -mode copy -ocr t -g 1 -s- -cmax -1 -bpc 8 sourcefile.pdf You'll want to choose the dpi to match your existing dpi. Default compression is flate (.png). If you want jpeg compression, add -jpg <quality> |
02-10-2023, 07:42 AM | #7 |
Groupie
Posts: 181
Karma: 304158
Join Date: Jan 2016
Device: none
|
Thanks everyone.
|
02-14-2023, 01:00 AM | #8 | |
Wizard
Posts: 3,016
Karma: 18765431
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
|
Quote:
The OCR results seemed pretty good judging by some random searches I tried. I would like to OCR all of the PDFs I created if I could keep them close to the original file size. Last edited by rkomar; 02-14-2023 at 01:02 AM. |
|
02-14-2023, 11:31 PM | #9 | |
Fuzzball, the purple cat
Posts: 1,286
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
Quote:
Alternatively, there's a way to use cpdf to move a text layer from one PDF into another, but I don't have access to how I did that until tomorrow. I'll post the solution then. |
|
02-15-2023, 11:18 AM | #10 |
Fuzzball, the purple cat
Posts: 1,286
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
As promised--here is a Windows batch file which uses cpdf to keep the original bitmaps but add a text layer with k2pdfopt's OCR. Should be easy to convert to linux.
Code:
rem rem Step 1. Do the OCR rem Typically set dpi to at least 300 for best results rem -ocrd p sets detection at the page level. This means rem that the Tesseract algorithm will be used to rem find text on the page rather than the k2pdfopt rem algorithm. rem You may also wish to add -g, -cmax, or -s option rem adjustments to improve the bitmap contrast or rem sharpness and resulting OCR quality. rem k2pdfopt -mode copy -dpi 300 -ocr t -ocrd p src.pdf -o temp1.pdf rem rem Step 2. Replace the bitmap in the result with a very low density, rem low res bitmap (which will later be ignored / made invisible), rem but keep the text layer. rem k2pdfopt -mode copy -dpi 5 -bpc 1 -g 100 -cmax -100 -s- temp1.pdf -o temp2.pdf del /q temp1.pdf rem rem Step 3. Pair the text layer in temp2.pdf with the bitmaps in src.pdf. rem Put the result in src_searchable.pdf. rem cpdf -draft temp2.pdf -o temp3.pdf del /q temp2.pdf cpdf -combine-pages src.pdf temp3.pdf -o src_searchable.pdf del /q temp3.pdf |
02-15-2023, 01:24 PM | #11 |
the rook, bossing Never.
Posts: 12,386
Karma: 92073397
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
For a given definition of easy
These are the important bits and don't need changed on Linux? Code:
k2pdfopt -mode copy -dpi 300 -ocr t -ocrd p src.pdf -o temp1.pdf k2pdfopt -mode copy -dpi 5 -bpc 1 -g 100 -cmax -100 -s- temp1.pdf -o temp2.pdf del /q temp1.pdf Code:
cpdf -draft temp2.pdf -o temp3.pdf del /q temp2.pdf cpdf -combine-pages src.pdf temp3.pdf -o src_searchable.pdf del /q temp3.pdf The cpdf combines pdfs. It's not on my Linux. But the Linux tools I know sequentially merge PDFs. I think even qpdf does append merges rather than layer merges? |
02-16-2023, 01:37 PM | #13 |
Wizard
Posts: 3,016
Karma: 18765431
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
|
This looks like more work than I was hoping for, but I'll try it out when I have some time this weekend. I already have qpdf on my system, so I'll play with that first. Thanks to both of you for the tips!
|
02-17-2023, 12:37 PM | #14 |
Fuzzball, the purple cat
Posts: 1,286
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
I had not seriously used qpdf in quite a while. It has many more options than it did even four years ago, including an underlay option that works perfectly for this. So you can also use qpdf--in fact, it's simpler. Ignoring the commands to clean up temporary files:
Code:
k2pdfopt -mode copy -dpi 300 -ocr t -ocrd p src.pdf -o temp1.pdf k2pdfopt -mode copy -dpi 50 -bpc 1 -g 100 -cmax -100 -s- temp1.pdf -o temp2.pdf qpdf src.pdf --underlay temp2.pdf -- out.pdf |
02-18-2023, 11:52 AM | #15 |
the rook, bossing Never.
Posts: 12,386
Karma: 92073397
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
I had qpdf already but didn't realise it could merge layers rather than append documents!
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to turn multiple jpg images into 1 pdf ebook file | DawnDawn88 | Conversion | 12 | 01-19-2024 05:21 AM |
PDF -> JPG -> CBZ -> LRF | leveck | Workshop | 13 | 06-16-2011 12:21 PM |
Entourage Edge and JPG's to PDF files | xander | enTourage Archive | 23 | 04-04-2011 07:53 PM |
DR800 Convert PDF to JPG for faster loading speed? | bokjeid | iRex | 1 | 07-24-2010 10:32 AM |
Doubts about Kobo - jpg converted to pdf, and some smaller issues... | mig_akira | Kobo Reader | 9 | 06-10-2010 07:11 PM |