01-02-2023, 01:12 PM | #1 |
Enthusiast
Posts: 49
Karma: 510
Join Date: Sep 2008
Device: PSR505
|
ghostscript ccitt with pdfwrite
Hi all
another -- kind of specific question ... im currently (ok 3 - 4 years) digitizing my whole library from childhood and i have become kind of obsessed with scanning procedures. my current workflow is like this Scan --> scantailor --> lots of clicking --> tiff --> mogrify stuff --> ps --> pdf the results are really good and normally Im quite fond of them. filesize quality 300x300 i ould give them a 8-9 of 10. My "trouble" starts with books that are from like minded people that doing similar efforts and a example doc may look like this. ----- Creator: PDF-XChange Editor 5.5.xxx Producer: PDF-XChange PDF Core API (5.5.xxx) CreationDate: xxx xxx ModDate: xxx xxx Custom Metadata: no Metadata Stream: yes Tagged: no UserProperties: no Suspects: no Form: none JavaScript: no Pages: 164 Encrypted: no Page size: 372 x 559.68 pts Page rot: 0 File size: 8100115 bytes Optimized: no PDF version: 1.2 page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio -------------------------------------------------------------------------------------------- 1 0 image 1550 2332 rgb 3 8 jpeg no 170 0 300 300 371K 3.5% 2 1 image 1550 2332 index 1 1 ccitt no 172 0 300 300 17B 0.0% 3 2 image 1550 2332 index 1 1 ccitt no 174 0 300 300 9123B 2.0% 4 3 image 1550 2332 index 1 1 ccitt no 176 0 300 300 7332B 1.6% 5 4 image 1550 2332 index 1 1 ccitt no 178 0 300 300 36.8K 8.3% 6 5 image 1550 2332 index 1 1 ccitt no 180 0 300 300 42.6K 9.7% ---- as you can see 371K for the cover 300x300 dpi and around 42K for a "traditional" grey image. Cover is a jpg rbg and the rest is ccitt (tiff4) encoded. Anybody has an Idea to instruct ghostscript commandline to achieve similar encodings? when i encode them it looks like this page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio -------------------------------------------------------------------------------------------- 1 0 image 1550 2332 icc 3 8 jpeg no 9 0 300 300 389K 3.7% 2 1 image 1550 2332 index 1 1 image no 16 0 300 300 461B 0.1% 3 2 image 1550 2332 index 1 1 image no 22 0 300 300 19.7K 4.5% 4 3 image 1550 2332 index 1 1 image no 28 0 300 300 14.4K 3.3% 5 4 image 1550 2332 index 1 1 image no 34 0 300 300 64.4K 15% 6 5 image 1550 2332 index 1 1 image no 40 0 300 300 74.3K 17% 7 6 image 1550 2332 index 1 1 image no 46 0 300 300 76.3K 17% 8 7 image 1550 2332 index 1 1 image no 52 0 300 300 76.5K 17% 9 8 image 1550 2332 index 1 1 image no 58 0 300 300 75.7K 17% as you can see on ghostscript im not reaching ccitt encoding? anyone knows the correct parameter for gs ... even a single page to encode in ccitt with pdfwrite as device? \Pete |
01-02-2023, 06:06 PM | #2 |
Wizard
Posts: 3,012
Karma: 18765431
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
|
TIFF format is more about the wrapping than the encoding. You can compress using many different methods within a TIFF wrapper. I would suggest that you produce your black-and-white images as CCIT4 encoded TIFF files before calling gs to create the PDF file (i.e. use the option "-compress Group4" when running mogrify/convert). I'm not familiar with scantailor, but maybe it offers that option out of the box.
When I was first scanning my books, I would use convert to produce the TIFF files with CCIT4 compression. Then I would use tiffcp to combine the separate TIFF files into a single multi-page TIFF file. I would then use either tumble or tiff2pdf to convert the multi-page TIFF file into a PDF file. Then I would use gs as the last step to add PDFMARKS to the PDF file. Nowadays I used pdfbeads, but that has become more complicated than my old way because the program is no longer maintained and is very difficult to get working on a modern system. I use my old copy of pdfbeads within an old linux distro running inside VirtualBox. |
Advert | |
|
01-03-2023, 02:00 PM | #3 | |
Enthusiast
Posts: 49
Karma: 510
Join Date: Sep 2008
Device: PSR505
|
Quote:
but basically its the following. qpdf --> explode all pdfs into single pdf pages gs --> convert pdf to tiff (b/w tiffg4) gs -q -dBATCH -dNOPAUSE -sDEVICE=tiffg4 -r300x300 -dFirstPage=1 -dLastPage=1 -sOutputFile=111.tif page-111.pdf than loop over the tif -> pdf with img2pdf a "raw" wrapper without encoding https://gitlab.mister-muffin.de/josch/img2pdf and than bulk all the PDF's together into a combined pdf. rather crude - but i achieve good compression results on b/w images with minimal effort and quite reasonable quality. (please be aware that the input images should be b/w allready -- if they are grey the tiffg4 encode gives sometimes funky results. though i share -- topic closed on my end -- but it was hell of frustrating :-) to get some grip on that. \Pete |
|
01-03-2023, 06:07 PM | #4 |
the rook, bossing Never.
Posts: 12,322
Karma: 90943357
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
ImageMagick, or the GIMP (import as layers)
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Cropping PDFs for EPUB conversion using BRISS, Ghostscript and/or Calibre | fredthefork | 2 | 08-09-2019 01:04 PM | |
Pdf compression options in Ghostscript? | MarjaE | 1 | 06-15-2019 01:44 PM | |
ghostscript? | MartinZ | PocketBook Developer's Corner | 7 | 04-04-2012 08:13 PM |
603 PDF with CCITT images | adam l | PocketBook | 10 | 09-01-2011 03:50 AM |