Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 05-12-2023, 03:22 PM   #1
Shohreh
Groupie
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 181
Karma: 304158
Join Date: Jan 2016
Device: none
Question [SOLVED] Hard trimming PDFs?

Hello,

I need to hard-trim PDFs, ie. the stuff outside the mediabox should really be gone from the output file.

I tried the following, but they only perform visual trimming, ie. it's displayed as expected but the data's actually still in the file:

Code:
cpdf.exe -crop "0 0 400pt 600pt" input.pdf 1-50 -o output.pdf input.pdf

pdfcpu.exe box add -- "media:[0 0 400 600]" input.pdf output.pdf

mutool.exe trim -b mediabox -o output.pdf input.pdf
Is there a tool, preferably open-source, that supports hard-trimming?

Thank you.

Last edited by Shohreh; 05-14-2023 at 09:49 AM.
Shohreh is offline   Reply With Quote
Old 05-12-2023, 04:48 PM   #2
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,224
Karma: 16536676
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
With the caveat that I haven't attempted to crop PDFs for more than 5 years ...

At that time I seem to remember that GhostScript had the ability (commandline only) to, using your terms, "hard trim" a PDF which had previously been "visually trimmed" with some other utility (I used to use briss for the visual trimming).

Here's an old example I noted at the time:
Code:
gswin64c.exe -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dNOPAUSE -dQUIET -dBATCH -sOutputFile=hard_trim.pdf visual_trim.pdf
It may be too old to be useful, but I offer it as an option to look into if you wish.
jackie_w is offline   Reply With Quote
Advert
Old 05-12-2023, 05:52 PM   #3
Shohreh
Groupie
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 181
Karma: 304158
Join Date: Jan 2016
Device: none
Calibre still displays the cropped data in the EPUB, so it's still in the file, but I'll look into GS.

Briss got stuck on that ~400 page PDF, which is partly why I tried CLI apps.

Thank you.

--
Edit:
Code:
mutool.exe pages soft.cropped.pdf 20
soft.cropped.pdf:
<page pagenum="20">
<MediaBox l="0" b="0" r="424" t="600" />
<CropBox l="0" b="0" r="424" t="600" />
<Rotate v="0" />
</page>

mutool.exe pages hard_trim.pdf 20
hard_trim.pdf:
<page pagenum="20">
<MediaBox l="0" b="0" r="424" t="600" />
<CropBox l="0" b="0" r="424" t="600" />
<Rotate v="0" />
</page>

Last edited by Shohreh; 05-12-2023 at 06:18 PM.
Shohreh is offline   Reply With Quote
Old 05-13-2023, 06:37 AM   #4
rjwse@aol.com
Addict
rjwse@aol.com ought to be getting tired of karma fortunes by now.rjwse@aol.com ought to be getting tired of karma fortunes by now.rjwse@aol.com ought to be getting tired of karma fortunes by now.rjwse@aol.com ought to be getting tired of karma fortunes by now.rjwse@aol.com ought to be getting tired of karma fortunes by now.rjwse@aol.com ought to be getting tired of karma fortunes by now.rjwse@aol.com ought to be getting tired of karma fortunes by now.rjwse@aol.com ought to be getting tired of karma fortunes by now.rjwse@aol.com ought to be getting tired of karma fortunes by now.rjwse@aol.com ought to be getting tired of karma fortunes by now.rjwse@aol.com ought to be getting tired of karma fortunes by now.
 
rjwse@aol.com's Avatar
 
Posts: 304
Karma: 2228060
Join Date: Dec 2013
Location: LaVernia, Texas
Device: kindle epub readers on android
pdfjam works well for me in linux. it is command line driven in terminal. no gui.
rjwse@aol.com is offline   Reply With Quote
Old 05-13-2023, 09:09 AM   #5
Shohreh
Groupie
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 181
Karma: 304158
Join Date: Jan 2016
Device: none
How would you use it to permanently (not just hide) headers and footers?

https://github.com/rrthomas/pdfjam

What about adding "redaction annotations" in each page, and then have those sections entirely removed from the PDF?

https://pspdfkit.com/guides/processo...tion/overview/

Yet another possibility: What about a script in PyMuPDF that would read each page, create a new one that's cropped, and save that into a new PDF?

https://pypdf2.readthedocs.io/en/3.0...nsforming.html

Last edited by Shohreh; 05-13-2023 at 09:12 AM.
Shohreh is offline   Reply With Quote
Advert
Old 05-13-2023, 09:43 AM   #6
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 12,379
Karma: 92073397
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
I've used K2pdfopt, imagemagik and also the GIMP.
Quoth is offline   Reply With Quote
Old 05-13-2023, 01:07 PM   #7
Shohreh
Groupie
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 181
Karma: 304158
Join Date: Jan 2016
Device: none
I'd rather convert the PDF to EPUB with Calibre since my e-reader doesn't handle PDFs very well. I tried k2pdfopt, and didn't like it.

Besides, if the only thing is to remove the headers and footers, it's worth investigating.
Shohreh is offline   Reply With Quote
Old 05-13-2023, 02:11 PM   #8
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 12,379
Karma: 92073397
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
PDFs vary in ability to convert. If convertible at all, Word, Writer or other tools are far better than Calibre for PDFs. Then convert a docx to epub.
Unless you OCR, all you can do with an image based PDF is crop, resize, contrast/brightness/bit-depth.

I'd only use Calibre to catalogue and transfer existing PDFS as PDFs to ereaders or tablet that can manage them.
Quoth is offline   Reply With Quote
Old 05-13-2023, 03:46 PM   #9
Shohreh
Groupie
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 181
Karma: 304158
Join Date: Jan 2016
Device: none
The book I'm playing with converts just fine to EPUB.

The only thing I'd need to get a near perfect EPUB is removing headers and footers… which is the perfect occasion to dig and understand why people bother with regex in the HTML at all if you can just remove the data from the source PDF before running Calibre.

There's got to be a way to either remove everything that's outside the mediabox, or mark some sections as redaction annotions and remove them all.
Shohreh is offline   Reply With Quote
Old 05-14-2023, 09:48 AM   #10
Shohreh
Groupie
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 181
Karma: 304158
Join Date: Jan 2016
Device: none
PyMuPDF to the rescue…

Code:
#https://artifex.com/blog/advanced-text-manipulation-using-pymupdf
import fitz

doc = fitz.open("original.pdf")
page = doc[18]

#print(page.get_text())
rect = fitz.Rect(0,0,424,50)
page.add_redact_annot(rect)
page.apply_redactions()

doc.save("redacted.pdf")

#ebook-convert.exe redacted.pdf redacted.epub
Hard to believe no ready-to-use command-line tool can add and delete redaction annotations.

Another useful tool would be a PDF viewer that lets the user select a rectangle and display its coordinates, ready to be copy-pasted into the command line.
Attached Thumbnails
Click image for larger version

Name:	0A672E69-5381-4EAD-81D4-AA3CCE0654D2.png
Views:	386
Size:	101.2 KB
ID:	201543  

Last edited by Shohreh; 05-14-2023 at 10:11 AM.
Shohreh is offline   Reply With Quote
Old 05-14-2023, 02:37 PM   #11
Shohreh
Groupie
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 181
Karma: 304158
Join Date: Jan 2016
Device: none
Parse the whole PDF, ignoring the first page of each chapter:

Code:
import fitz

doc = fitz.open("original.pdf")
rect = fitz.Rect(0,0,424,50)
exclude = [range(1, 14), 17,24,97,155,186,232,258,297,322,343,404]
for index in range(1,doc.page_count+1):
	if index not in exclude:
		page = doc[index]
		page.add_redact_annot(rect)
		page.apply_redactions()
doc.save("redacted.pdf")
Shohreh is offline   Reply With Quote
Old 05-14-2023, 05:45 PM   #12
Shohreh
Groupie
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 181
Karma: 304158
Join Date: Jan 2016
Device: none
And do the same for the footer on all the pages:

Code:
import fitz
import sys

doc = fitz.open("original.pdf")

"""
#Until a PDF viewer comes along… here's how to find where to draw a box around the header+footer
#left,top,right,bottom
rect = fitz.Rect(0,560,424,600)
page = doc[13]
shape = page.new_shape()
shape.drawRect(rect)
shape.finish(color=None)
shape.commit()
"""

rect_header = fitz.Rect(0,0,424,50)
rect_footer = fitz.Rect(0,560,424,600)
exclude = [range(1, 14), 17,24,97,155,186,232,258,297,322,343,404]
for index in range(1,doc.page_count):
	print(index)
	page = doc[index]

	#remove header only on non-chapter pages
	if index not in exclude:
		page.add_redact_annot(rect_header)

	#remove footer on all pages
	page.add_redact_annot(rect_footer)
	page.apply_redactions()
	
doc.save("redacted.pdf")
Shohreh is offline   Reply With Quote
Old 05-14-2023, 06:07 PM   #13
Kromaa
Junior Member
Kromaa makes transoceanic flights without the assistance of an airplaneKromaa makes transoceanic flights without the assistance of an airplaneKromaa makes transoceanic flights without the assistance of an airplaneKromaa makes transoceanic flights without the assistance of an airplaneKromaa makes transoceanic flights without the assistance of an airplaneKromaa makes transoceanic flights without the assistance of an airplaneKromaa makes transoceanic flights without the assistance of an airplaneKromaa makes transoceanic flights without the assistance of an airplaneKromaa makes transoceanic flights without the assistance of an airplaneKromaa makes transoceanic flights without the assistance of an airplaneKromaa makes transoceanic flights without the assistance of an airplane
 
Kromaa's Avatar
 
Posts: 6
Karma: 55624
Join Date: May 2023
Location: France
Device: Kobo by Fnac Nia 6" 8 Go
Quote:
Originally Posted by Shohreh View Post
Hello,

I need to hard-trim PDFs, ie. the stuff outside the mediabox should really be gone from the output file.

I tried the following, but they only perform visual trimming, ie. it's displayed as expected but the data's actually still in the file:

Code:
cpdf.exe -crop "0 0 400pt 600pt" input.pdf 1-50 -o output.pdf input.pdf

pdfcpu.exe box add -- "media:[0 0 400 600]" input.pdf output.pdf

mutool.exe trim -b mediabox -o output.pdf input.pdf
Is there a tool, preferably open-source, that supports hard-trimming?

Thank you.
I'm not sure I understand. Are you looking to cut out pieces of text from the pdf, or just select a new dimension?
Kromaa is offline   Reply With Quote
Old 05-14-2023, 06:41 PM   #14
Shohreh
Groupie
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 181
Karma: 304158
Join Date: Jan 2016
Device: none
Cutting out.

Looks like adding and deleting redaction annotions is the way to go.

The commands above do work… but only on the screen: The data's still in the file, and thus included in the EPUB generated by Calibre.
Shohreh is offline   Reply With Quote
Old 05-16-2023, 03:09 PM   #15
Kromaa
Junior Member
Kromaa makes transoceanic flights without the assistance of an airplaneKromaa makes transoceanic flights without the assistance of an airplaneKromaa makes transoceanic flights without the assistance of an airplaneKromaa makes transoceanic flights without the assistance of an airplaneKromaa makes transoceanic flights without the assistance of an airplaneKromaa makes transoceanic flights without the assistance of an airplaneKromaa makes transoceanic flights without the assistance of an airplaneKromaa makes transoceanic flights without the assistance of an airplaneKromaa makes transoceanic flights without the assistance of an airplaneKromaa makes transoceanic flights without the assistance of an airplaneKromaa makes transoceanic flights without the assistance of an airplane
 
Kromaa's Avatar
 
Posts: 6
Karma: 55624
Join Date: May 2023
Location: France
Device: Kobo by Fnac Nia 6" 8 Go
Quote:
Originally Posted by Shohreh View Post
Cutting out.

Looks like adding and deleting redaction annotions is the way to go.

The commands above do work… but only on the screen: The data's still in the file, and thus included in the EPUB generated by Calibre.
Have you tried resizing the code to make it converge with what you are explaining? I think the problem might be that you have a slightly old device. Or maybe you didn't upgrade... I'd really like to know how it's going
Kromaa is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Trimming covers going wrong ownedbycats Calibre 5 07-26-2022 05:03 AM
CBR to PDF Conversion and Trimming stexxe Conversion 3 07-05-2011 02:51 PM
Trimming Covers hmf Library Management 5 03-15-2011 04:44 AM
problems with individuating and trimming the ebooks covers killa Calibre 1 12-11-2010 11:59 AM
TRIMMING MY SHORT 'N CURLIES!!!!! recluse Lounge 19 04-08-2010 01:24 PM


All times are GMT -4. The time now is 07:31 AM.


MobileRead.com is a privately owned, operated and funded community.