ghostscript ccitt with pdfwrite

icq70610 · 01-02-2023, 01:12 PM

Hi all

another -- kind of specific question ...

im currently (ok 3 - 4 years) digitizing my whole library from childhood and i have become kind of obsessed with scanning procedures.

my current workflow is like this

Scan --> scantailor --> lots of clicking --> tiff --> mogrify stuff --> ps --> pdf

the results are really good and normally Im quite fond of them. filesize quality 300x300 i ould give them a 8-9 of 10. My "trouble" starts with books that are from like minded people that doing similar efforts and a example doc may look like this.

-----
Creator: PDF-XChange Editor 5.5.xxx
Producer: PDF-XChange PDF Core API (5.5.xxx)
CreationDate: xxx xxx
ModDate: xxx xxx
Custom Metadata: no
Metadata Stream: yes
Tagged: no
UserProperties: no
Suspects: no
Form: none
JavaScript: no
Pages: 164
Encrypted: no
Page size: 372 x 559.68 pts
Page rot: 0
File size: 8100115 bytes
Optimized: no
PDF version: 1.2
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 1550 2332 rgb 3 8 jpeg no 170 0 300 300 371K 3.5%
2 1 image 1550 2332 index 1 1 ccitt no 172 0 300 300 17B 0.0%
3 2 image 1550 2332 index 1 1 ccitt no 174 0 300 300 9123B 2.0%
4 3 image 1550 2332 index 1 1 ccitt no 176 0 300 300 7332B 1.6%
5 4 image 1550 2332 index 1 1 ccitt no 178 0 300 300 36.8K 8.3%
6 5 image 1550 2332 index 1 1 ccitt no 180 0 300 300 42.6K 9.7%

----

as you can see 371K for the cover 300x300 dpi and around 42K for a "traditional" grey image. Cover is a jpg rbg and the rest is ccitt (tiff4) encoded.

Anybody has an Idea to instruct ghostscript commandline to achieve similar encodings?

when i encode them it looks like this

page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 1550 2332 icc 3 8 jpeg no 9 0 300 300 389K 3.7%
2 1 image 1550 2332 index 1 1 image no 16 0 300 300 461B 0.1%
3 2 image 1550 2332 index 1 1 image no 22 0 300 300 19.7K 4.5%
4 3 image 1550 2332 index 1 1 image no 28 0 300 300 14.4K 3.3%
5 4 image 1550 2332 index 1 1 image no 34 0 300 300 64.4K 15%
6 5 image 1550 2332 index 1 1 image no 40 0 300 300 74.3K 17%
7 6 image 1550 2332 index 1 1 image no 46 0 300 300 76.3K 17%
8 7 image 1550 2332 index 1 1 image no 52 0 300 300 76.5K 17%
9 8 image 1550 2332 index 1 1 image no 58 0 300 300 75.7K 17%

as you can see on ghostscript im not reaching ccitt encoding? anyone knows the correct parameter for gs ... even a single page to encode in ccitt with pdfwrite as device?

\Pete

rkomar · 01-02-2023, 06:06 PM

TIFF format is more about the wrapping than the encoding. You can compress using many different methods within a TIFF wrapper. I would suggest that you produce your black-and-white images as CCIT4 encoded TIFF files before calling gs to create the PDF file (i.e. use the option "-compress Group4" when running mogrify/convert). I'm not familiar with scantailor, but maybe it offers that option out of the box.

When I was first scanning my books, I would use convert to produce the TIFF files with CCIT4 compression. Then I would use tiffcp to combine the separate TIFF files into a single multi-page TIFF file. I would then use either tumble or tiff2pdf to convert the multi-page TIFF file into a PDF file. Then I would use gs as the last step to add PDFMARKS to the PDF file.

Nowadays I used pdfbeads, but that has become more complicated than my old way because the program is no longer maintained and is very difficult to get working on a modern system. I use my old copy of pdfbeads within an old linux distro running inside VirtualBox.

icq70610 · 01-03-2023, 02:00 PM

Quote:

Originally Posted by rkomar

TIFF format is more about the wrapping than the encoding. You can compress using many different methods within a TIFF wrapper. I would suggest that you produce your black-and-white images as CCIT4 encoded TIFF files before calling gs to create the PDF file (i.e. use the option "-compress Group4" when running mogrify/convert). I'm not familiar with scantailor, but maybe it offers that option out of the box.

When I was first scanning my books, I would use convert to produce the TIFF files with CCIT4 compression. Then I would use tiffcp to combine the separate TIFF files into a single multi-page TIFF file. I would then use either tumble or tiff2pdf to convert the multi-page TIFF file into a PDF file. Then I would use gs as the last step to add PDFMARKS to the PDF file.

Nowadays I used pdfbeads, but that has become more complicated than my old way because the program is no longer maintained and is very difficult to get working on a modern system. I use my old copy of pdfbeads within an old linux distro running inside VirtualBox.

Thank you for the quick answer - pdfbeads -- interesting idea -- (vm i get it :-) ) - as for the other points above -- thats exactly what im currently doing and i wrote a crude bash wrapper for tryouts

but basically its the following.

qpdf --> explode all pdfs into single pdf pages
gs --> convert pdf to tiff (b/w tiffg4)
gs -q -dBATCH -dNOPAUSE -sDEVICE=tiffg4 -r300x300 -dFirstPage=1 -dLastPage=1 -sOutputFile=111.tif page-111.pdf
than loop over the tif -> pdf with img2pdf a "raw" wrapper without encoding https://gitlab.mister-muffin.de/josch/img2pdf
and than bulk all the PDF's together into a combined pdf.

rather crude - but i achieve good compression results on b/w images with minimal effort and quite reasonable quality. (please be aware that the input images should be b/w allready -- if they are grey the tiffg4 encode gives sometimes funky results.

though i share -- topic closed on my end -- but it was hell of frustrating :-) to get some grip on that.

\Pete

Quoth · 01-03-2023, 06:07 PM

ImageMagick, or the GIMP (import as layers)

01-02-2023, 01:12 PM	#1
icq70610 Enthusiast Posts: 49 Karma: 510 Join Date: Sep 2008 Device: PSR505	ghostscript ccitt with pdfwrite Hi all another -- kind of specific question ... im currently (ok 3 - 4 years) digitizing my whole library from childhood and i have become kind of obsessed with scanning procedures. my current workflow is like this Scan --> scantailor --> lots of clicking --> tiff --> mogrify stuff --> ps --> pdf the results are really good and normally Im quite fond of them. filesize quality 300x300 i ould give them a 8-9 of 10. My "trouble" starts with books that are from like minded people that doing similar efforts and a example doc may look like this. ----- Creator: PDF-XChange Editor 5.5.xxx Producer: PDF-XChange PDF Core API (5.5.xxx) CreationDate: xxx xxx ModDate: xxx xxx Custom Metadata: no Metadata Stream: yes Tagged: no UserProperties: no Suspects: no Form: none JavaScript: no Pages: 164 Encrypted: no Page size: 372 x 559.68 pts Page rot: 0 File size: 8100115 bytes Optimized: no PDF version: 1.2 page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio -------------------------------------------------------------------------------------------- 1 0 image 1550 2332 rgb 3 8 jpeg no 170 0 300 300 371K 3.5% 2 1 image 1550 2332 index 1 1 ccitt no 172 0 300 300 17B 0.0% 3 2 image 1550 2332 index 1 1 ccitt no 174 0 300 300 9123B 2.0% 4 3 image 1550 2332 index 1 1 ccitt no 176 0 300 300 7332B 1.6% 5 4 image 1550 2332 index 1 1 ccitt no 178 0 300 300 36.8K 8.3% 6 5 image 1550 2332 index 1 1 ccitt no 180 0 300 300 42.6K 9.7% ---- as you can see 371K for the cover 300x300 dpi and around 42K for a "traditional" grey image. Cover is a jpg rbg and the rest is ccitt (tiff4) encoded. Anybody has an Idea to instruct ghostscript commandline to achieve similar encodings? when i encode them it looks like this page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio -------------------------------------------------------------------------------------------- 1 0 image 1550 2332 icc 3 8 jpeg no 9 0 300 300 389K 3.7% 2 1 image 1550 2332 index 1 1 image no 16 0 300 300 461B 0.1% 3 2 image 1550 2332 index 1 1 image no 22 0 300 300 19.7K 4.5% 4 3 image 1550 2332 index 1 1 image no 28 0 300 300 14.4K 3.3% 5 4 image 1550 2332 index 1 1 image no 34 0 300 300 64.4K 15% 6 5 image 1550 2332 index 1 1 image no 40 0 300 300 74.3K 17% 7 6 image 1550 2332 index 1 1 image no 46 0 300 300 76.3K 17% 8 7 image 1550 2332 index 1 1 image no 52 0 300 300 76.5K 17% 9 8 image 1550 2332 index 1 1 image no 58 0 300 300 75.7K 17% as you can see on ghostscript im not reaching ccitt encoding? anyone knows the correct parameter for gs ... even a single page to encode in ccitt with pdfwrite as device? \Pete

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Cropping PDFs for EPUB conversion using BRISS, Ghostscript and/or Calibre	fredthefork	PDF	2	08-09-2019 01:04 PM
Pdf compression options in Ghostscript?	MarjaE	PDF	1	06-15-2019 01:44 PM
ghostscript?	MartinZ	PocketBook Developer's Corner	7	04-04-2012 08:13 PM
603 PDF with CCITT images	adam l	PocketBook	10	09-01-2011 03:50 AM

01-02-2023, 06:06 PM	#2
rkomar Wizard Posts: 3,012 Karma: 18765431 Join Date: Oct 2010 Location: Sudbury, ON, Canada Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633	TIFF format is more about the wrapping than the encoding. You can compress using many different methods within a TIFF wrapper. I would suggest that you produce your black-and-white images as CCIT4 encoded TIFF files before calling gs to create the PDF file (i.e. use the option "-compress Group4" when running mogrify/convert). I'm not familiar with scantailor, but maybe it offers that option out of the box. When I was first scanning my books, I would use convert to produce the TIFF files with CCIT4 compression. Then I would use tiffcp to combine the separate TIFF files into a single multi-page TIFF file. I would then use either tumble or tiff2pdf to convert the multi-page TIFF file into a PDF file. Then I would use gs as the last step to add PDFMARKS to the PDF file. Nowadays I used pdfbeads, but that has become more complicated than my old way because the program is no longer maintained and is very difficult to get working on a modern system. I use my old copy of pdfbeads within an old linux distro running inside VirtualBox.

01-03-2023, 06:07 PM	#4
Quoth the rook, bossing Never. Posts: 12,322 Karma: 90943357 Join Date: Jun 2017 Location: Ireland Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11	ImageMagick, or the GIMP (import as layers)

Advert