Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 02-03-2023, 09:20 AM   #1
Shohreh
Groupie
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 181
Karma: 304158
Join Date: Jan 2016
Device: none
Question Flip (*.jpg) to PDF?

Hello,

After downloading the JPGs that make up an online Flip book, I'd like to merge them into a single PDF.

Two issues:
1. ImageMagick's convert is slow*, and
2. I get a read-only PDF, so will then have to run it through an OCR to get a "selectable/copyable" PDF.

Is there an open-source solution that is faster and includes an OCR?

Unless I missed it, neither cpdf, Mutool, nor qpdf can do it.

Thank you.

Code:
*c:\ImageMagick\convert.exe -quality 100 *.jpg merged.pdf
---
Edit: jpeg2pdf takes care of job #1. Just make sure the files are in the right order, eg. 1.jpg needs to be renamed 001.jpg if there are more than 99 files.

Will then have to run it through an OCR.

Last edited by Shohreh; 02-03-2023 at 09:47 AM.
Shohreh is offline   Reply With Quote
Old 02-03-2023, 01:17 PM   #2
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 12,360
Karma: 92073397
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
GIMP: Free on Linux, Mac Windows.
Change setting to have a single window.
A bit of a learning curve because it's very powerful.

Put each image on a new layer.

Use k2pdfopt afterwards for OCR, or jpg direct with Tesseract.
if top layer is 1st page, then when Export as PDF, select reverse layer order /pages
Quoth is offline   Reply With Quote
Advert
Old 02-03-2023, 02:21 PM   #3
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,527
Karma: 100606001
Join Date: Apr 2011
Device: pb360
ImageMagick uses libjpeg from the Independent JPEG Group. Using -quality 100 will be slow and generate large files with almost no better quality than -quality 95.

From the cjpeg man page:
https://linux.die.net/man/1/cjpeg
Quote:
The -quality switch lets you trade off compressed file size against quality of the reconstructed image: the higher the quality setting, the larger the JPEG file, and the closer the output image will be to the original input. Normally you want to use the lowest quality setting (smallest file) that decompresses into something visually indistinguishable from the original image. For this purpose the quality setting should be between 50 and 95; the default of 75 is often about right. If you see defects at -quality 75, then go up 5 or 10 counts at a time until you are happy with the output image. (The optimal setting will vary from one image to another.)

-quality 100 will generate a quantization table of all 1's, minimizing loss in the quantization step (but there is still information loss in subsampling, as well as roundoff error). This setting is mainly of interest for experimental purposes. Quality values above about 95 are not recommended for normal use; the compressed file size goes up dramatically for hardly any gain in output image quality.
See also:
https://www.imagemagick.org/Usage/formats/
j.p.s is offline   Reply With Quote
Old 02-03-2023, 02:42 PM   #4
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 12,360
Karma: 92073397
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
I missed that. For line stuff/text, 95 is loads of quality and photo images can be fine at 70 to 80 if no contrasty sharp edges.
Quoth is offline   Reply With Quote
Old 02-04-2023, 07:13 AM   #5
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,640
Karma: 23191067
Join Date: Dec 2010
Device: Kindle PW2
@Shohreh

AFAIK, Calibre supports .cbz Comic book archives. (To create a .cbz file all you have to do is zip up the image folder and change the ending from .zip to .cbz.)
However, since Calibre doesn't support OCR, you'll have to ocr the file afterwards.

AFAIK, willus's K2pdfopt converter supports image to pdf conversions with OCR. You might want to ask @willus about it in the k2pdfopt forum.
Doitsu is offline   Reply With Quote
Advert
Old 02-04-2023, 09:24 PM   #6
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,283
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
k2pdfopt at this time won't concatenate multiple image files into a single PDF (no reason why it couldn't...I'll have to add that option). I use ImageMagick to do that, as has been discussed. After that, k2pdfopt can add OCR. Again, unfortunately, it will re-render each bitmapped page and re-compress.

If you want to recreate the bitmaps as faithfully as possible:

k2pdfopt -dpi 300 -mode copy -ocr t -g 1 -s- -cmax -1 -bpc 8 sourcefile.pdf

You'll want to choose the dpi to match your existing dpi. Default compression is flate (.png). If you want jpeg compression, add -jpg <quality>
willus is offline   Reply With Quote
Old 02-10-2023, 07:42 AM   #7
Shohreh
Groupie
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 181
Karma: 304158
Join Date: Jan 2016
Device: none
Thanks everyone.
Shohreh is offline   Reply With Quote
Old 02-14-2023, 01:00 AM   #8
rkomar
Wizard
rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.
 
Posts: 3,015
Karma: 18765431
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
Quote:
Originally Posted by willus View Post
Default compression is flate (.png). If you want jpeg compression, add -jpg <quality>
I tried a PDF I created earlier where the text pages were in JBIG2 format. After doing the OCR with k2pdfopt, the output PDF was about 10x larger and the images were in some other format. I did not see an option to keep the original images in the output PDF. Is it not possible to do that?

The OCR results seemed pretty good judging by some random searches I tried. I would like to OCR all of the PDFs I created if I could keep them close to the original file size.

Last edited by rkomar; 02-14-2023 at 01:02 AM.
rkomar is offline   Reply With Quote
Old 02-14-2023, 11:31 PM   #9
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,283
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by rkomar View Post
I tried a PDF I created earlier where the text pages were in JBIG2 format. After doing the OCR with k2pdfopt, the output PDF was about 10x larger and the images were in some other format. I did not see an option to keep the original images in the output PDF. Is it not possible to do that?

The OCR results seemed pretty good judging by some random searches I tried. I would like to OCR all of the PDFs I created if I could keep them close to the original file size.
Correct--at this time it is not possible for k2pdfopt to just add a text layer but keep the original image formats (I mentioned that in my previous post in this thread). I hope to add that capability at some point (I'm not entirely sure how to code it yet). You can probably get that 10x down to a more reasonable number with strategic selection of the output dpi and number of bits per pixel.

Alternatively, there's a way to use cpdf to move a text layer from one PDF into another, but I don't have access to how I did that until tomorrow. I'll post the solution then.
willus is offline   Reply With Quote
Old 02-15-2023, 11:18 AM   #10
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,283
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
As promised--here is a Windows batch file which uses cpdf to keep the original bitmaps but add a text layer with k2pdfopt's OCR. Should be easy to convert to linux.
Code:
rem
rem Step 1.  Do the OCR
rem          Typically set dpi to at least 300 for best results
rem          -ocrd p sets detection at the page level.  This means
rem                  that the Tesseract algorithm will be used to
rem                  find text on the page rather than the k2pdfopt
rem                  algorithm.
rem          You may also wish to add -g, -cmax, or -s option
rem          adjustments to improve the bitmap contrast or
rem          sharpness and resulting OCR quality.
rem
k2pdfopt -mode copy -dpi 300 -ocr t -ocrd p src.pdf -o temp1.pdf
rem
rem Step 2.  Replace the bitmap in the result with a very low density,
rem          low res bitmap (which will later be ignored / made invisible),
rem          but keep the text layer.
rem
k2pdfopt -mode copy -dpi 5 -bpc 1 -g 100 -cmax -100 -s- temp1.pdf -o temp2.pdf
del /q temp1.pdf
rem
rem Step 3.  Pair the text layer in temp2.pdf with the bitmaps in src.pdf.
rem          Put the result in src_searchable.pdf.
rem
cpdf -draft temp2.pdf -o temp3.pdf
del /q temp2.pdf
cpdf -combine-pages src.pdf temp3.pdf -o src_searchable.pdf
del /q temp3.pdf
willus is offline   Reply With Quote
Old 02-15-2023, 01:24 PM   #11
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 12,360
Karma: 92073397
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
For a given definition of easy

These are the important bits and don't need changed on Linux?
Code:
k2pdfopt -mode copy -dpi 300 -ocr t -ocrd p src.pdf -o temp1.pdf

k2pdfopt -mode copy -dpi 5 -bpc 1 -g 100 -cmax -100 -s- temp1.pdf -o temp2.pdf
del /q temp1.pdf
This needs changed
Code:
cpdf -draft temp2.pdf -o temp3.pdf
del /q temp2.pdf
cpdf -combine-pages src.pdf temp3.pdf -o src_searchable.pdf
del /q temp3.pdf
rm is used instead of del
The cpdf combines pdfs. It's not on my Linux.

But the Linux tools I know sequentially merge PDFs.
I think even qpdf does append merges rather than layer merges?
Quoth is offline   Reply With Quote
Old 02-15-2023, 10:44 PM   #12
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,283
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Correct. In Linux, "rem" can be replaced by "#" and "del /q" by "rm" or "rm -f". Cpdf binaries are here.
willus is offline   Reply With Quote
Old 02-16-2023, 01:37 PM   #13
rkomar
Wizard
rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.
 
Posts: 3,015
Karma: 18765431
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
This looks like more work than I was hoping for, but I'll try it out when I have some time this weekend. I already have qpdf on my system, so I'll play with that first. Thanks to both of you for the tips!
rkomar is offline   Reply With Quote
Old 02-17-2023, 12:37 PM   #14
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,283
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
I had not seriously used qpdf in quite a while. It has many more options than it did even four years ago, including an underlay option that works perfectly for this. So you can also use qpdf--in fact, it's simpler. Ignoring the commands to clean up temporary files:
Code:
k2pdfopt -mode copy -dpi 300 -ocr t -ocrd p src.pdf -o temp1.pdf
k2pdfopt -mode copy -dpi 50 -bpc 1 -g 100 -cmax -100 -s- temp1.pdf -o temp2.pdf
qpdf src.pdf --underlay temp2.pdf -- out.pdf
I bumped up the dpi in the throw-away file to 50 to improve the placement of the text layer.
willus is offline   Reply With Quote
Old 02-18-2023, 11:52 AM   #15
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 12,360
Karma: 92073397
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
I had qpdf already but didn't realise it could merge layers rather than append documents!
Quoth is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
How to turn multiple jpg images into 1 pdf ebook file DawnDawn88 Conversion 12 01-19-2024 05:21 AM
PDF -> JPG -> CBZ -> LRF leveck Workshop 13 06-16-2011 12:21 PM
Entourage Edge and JPG's to PDF files xander enTourage Archive 23 04-04-2011 07:53 PM
DR800 Convert PDF to JPG for faster loading speed? bokjeid iRex 1 07-24-2010 10:32 AM
Doubts about Kobo - jpg converted to pdf, and some smaller issues... mig_akira Kobo Reader 9 06-10-2010 07:11 PM


All times are GMT -4. The time now is 07:47 AM.


MobileRead.com is a privately owned, operated and funded community.