Yet another PDF cropping tool

sjvr767 · 06-28-2008, 07:51 AM

Hi,

This is my first post and I thought I'd share a script I use for cropping PDFs, so that they'll display better on my iLiad.

I'm pretty sure a lot people on here have done this or something similar, but I created the script a month or two ago after finding that some journal papers didn't want to be cropped using the normal pdfcrop tool.

The script is based heavily on the cropping section of the example on the pyPdf homepage.

To run the script, you will need python and pyPdf (available for most linux distros)... With that installed, just copy the below code into a file and make executable.

Code:

#! /usr/bin/python

import getopt, sys
from pyPdf import PdfFileWriter, PdfFileReader

def usage ():
    print """sjvr767\'s PDF Cropping Script.
Example:
my_pdf_crop.py -s -p 0.5 -i input.pdf -o output.pdf
my_pdf_crop.py --skip --percent 0.5 -input input.pdf -output output.pdf
\n
REQUIRED OPTIONS:
-p\t--percent
The factor by which to crop. Must be positive and less than or equal to 1.

-i\t--input
The path to the file to be cropped.
\n
OPTIONAL:
-s\t--skip
Skip the first page. Ouptut file will not contain the first page of the input file.

-o\t--output
Specify the name and path of the output file. If none specified, the script appends \'cropped\' to the file name.
"""
    sys.exit(0)

def cut_length(dictionary, key, factor):
	cut_factor = 1-factor
	cut = dictionary[key]*cut_factor
	cut = cut / 4
	return cut
	
def new_coords(dictionary, key, cut):
	return abs(dictionary[key]-cut)

try:
	opts, args = getopt.getopt(sys.argv[1:], "sp:i:o:s", ["skip", "percent=", "input=", "output="])
except getopt.GetoptError, err:
        # print help information and exit:
        print str(err) # will print something like "option -a not recognized"
        usage()
        sys.exit(2)

skipone = 0

for a in opts[:]:
	if a[0] == '-s' or a[0]=='--skip':
		skipone = 1

factor = 0.8 #default scaling factor

for a in opts[:]:
	if a[0] == '-p' or a[0]=='--factor':
		if a[1] != None:
			try:
				factor = float(a[1])
			except TypeError:
				print "Factor must be a number."
				sys.exit(2) #exit if no appropriate input file

input_file = None #no defualt input file
		
for a in opts[:]:
	if a[0] == '-i' or a[0]=='--input':
		if a[1] != None:
			try:
				if a[1][-4:]=='.pdf':
					input_file = a[1]
				else:
					print "Input file must be a PDF."
					sys.exit(2) #exit if no appropriate input file
			except TypeError:
				print "Input file must be a PDF."
				sys.exit(2) #exit if no appropriate input file
			except IndexError:
				print "Input file must be a PDF."
				sys.exit(2) #exit if no appropriate input file
		else:
			print "Please speicfy an input file."
			sys.exit(2) #exit if no appropriate input file

output_file = "%s_cropped.pdf" %input_file[:-4] #default output

for a in opts[:]:
	if a[0] == '-o' or a[0]=='--output': 
		if a[1]!= None:
			try:
				if a[1][-4:]=='.pdf':
					output_file = a[1]
				else:
					print "Output file must be a PDF."
			except TypeError:
				print "Output file must be a PDF."
			except IndexError:
				print "Output file must be a PDF."


input1 = PdfFileReader(file(input_file, "rb"))

output = PdfFileWriter()
outputstream = file(output_file, "wb")

pages = input1.getNumPages()

top_right = {'x': input1.getPage(1).mediaBox.getUpperRight_x(), 'y': input1.getPage(1).mediaBox.getUpperRight_y()}
top_left = {'x': input1.getPage(1).mediaBox.getUpperLeft_x(), 'y': input1.getPage(1).mediaBox.getUpperLeft_y()}
bottom_right = {'x': input1.getPage(1).mediaBox.getLowerRight_x(), 'y': input1.getPage(1).mediaBox.getLowerRight_y()}
bottom_left = {'x': input1.getPage(1).mediaBox.getLowerLeft_x(), 'y': input1.getPage(1).mediaBox.getLowerLeft_y()}

cut = cut_length(top_right, 'x', factor)

new_tr = (new_coords(top_right, 'x', cut), new_coords(top_right, 'y', cut))
new_br = (new_coords(bottom_right, 'x', cut), new_coords(bottom_right, 'y', cut))
new_tl = (new_coords(top_left, 'x', cut), new_coords(top_left, 'y', cut))
new_bl = (new_coords(bottom_left, 'x', cut), new_coords(bottom_left, 'y', cut))

if skipone == 0:
	for i in range(0, pages):
		page = input1.getPage(i)
		page.mediaBox.upperLeft = new_tl
		page.mediaBox.upperRight = new_tr
		page.mediaBox.lowerLeft = new_bl
		page.mediaBox.lowerRight = new_br
		output.addPage(page)
else:
	for i in range(1, pages):
		page = input1.getPage(i)
		page.mediaBox.upperLeft = new_tl
		page.mediaBox.upperRight = new_tr
		page.mediaBox.lowerLeft = new_bl
		page.mediaBox.lowerRight = new_br
		output.addPage(page)

output.write(outputstream)
outputstream.close()

If you paste it into a file called "my_pdfcrop.py" then you can crop a file by approximately 20% using the command,

Code:

./my_pdfcrop.py -p 0.8 -i input.pdf

Hope someone finds this useful and let me know if you find a bug or have a suggestion.

haridasi · 08-16-2008, 11:08 AM

This is interesting. Could you add the code as a file attachment?

haridasi · 08-24-2008, 06:39 AM

I have now tried to crop a pdf, but it doesn't crop the left side of the document. Furthermore, it takes some time guessing the correct percentage.

sjvr767 · 08-29-2008, 04:21 AM

Hi,

Sorry that I haven't put up a file yet, I have few other things that need to be addressed immediately (i.e. my dissertation).

Yes, the percentage is tricky (since it isn't exactly a true percentage). Personally, I only use this script on files which do not crop using Heiko Oberdiek's "pdfcrop" (found or available on most Linux systems via the command "pdfcrop"). These tend to be papers from JSTOR, hence the skip first page option.

Thank you for trying it though, I will try my best to address the "left-crop" issue as soon as I have time.

sjvr767 · 09-23-2008, 11:07 AM

Quote:

Originally Posted by haridasi

I have now tried to crop a pdf, but it doesn't crop the left side of the document. Furthermore, it takes some time guessing the correct percentage.

Hi there, I had a few minutes to spare and changed the way the new coordinates are determined. It should solve the "left-side" issue. This is more of a hack than a significant change, but I hope it helps. Code at the end of the document.

Before I give the code, I'd like to say that when I get time I will do a proper update of this. There are a few features I want to implement, such as splitting pages in half and then scaling those to A4. That should enlarge the doc quite a bit..

Here is the code:

Code:

#! /usr/bin/python

import subprocess
import getopt, sys
import find_lines
from pyPdf import PdfFileWriter, PdfFileReader

def usage ():
    print """sjvr767\'s PDF Cropping Script.
Example:
my_pdf_crop.py -s -p 0.5 -i input.pdf -o output.pdf
my_pdf_crop.py --skip --percent 0.5 -input input.pdf -output output.pdf
\n
REQUIRED OPTIONS:
-p\t--percent
The factor by which to crop. Must be positive and less than or equal to 1.

-i\t--input
The path to the file to be cropped.
\n
OPTIONAL:
-s\t--skip
Skip the first page. Ouptut file will not contain the first page of the input file.

-o\t--output
Specify the name and path of the output file. If none specified, the script appends \'cropped\' to the file name.
"""
    sys.exit(0)

def cut_length(dictionary, key, factor):
	cut_factor = 1-factor
	cut = dictionary[key]*cut_factor
	cut = cut / 4
	return cut
	
def new_coords(dictionary, key, cut):
	return abs(dictionary[key]-cut)
	
def new_coords2(ty, lx, rx, by, cut):
	new_ty = ty - cut
	new_by = by + cut
	new_lx = lx + cut
	new_rx = rx - cut
	top_left = {'x': new_lx, 'y': new_ty}
	bottom_left = {'x': new_lx, 'y': new_by}
	bottom_right = {'x': new_rx, 'y': new_by}
	top_right = {'x': new_rx, 'y': new_ty}
	return {'tr': top_right, 'tl': top_left, 'bl': bottom_left, 'br': bottom_right}

try:
	opts, args = getopt.getopt(sys.argv[1:], "sp:i:o:sch", ["skip", "percent=", "input=", "output=", "column", "half"])
except getopt.GetoptError, err:
        # print help information and exit:
        print str(err) # will print something like "option -a not recognized"
        usage()
        sys.exit(2)

skipone = 0

for a in opts[:]:
	if a[0] == '-s' or a[0]=='--skip':
		skipone = 1

factor = 0.8 #default scaling factor

for a in opts[:]:
	if a[0] == '-p' or a[0]=='--factor':
		if a[1] != None:
			try:
				factor = float(a[1])
			except TypeError:
				print "Factor must be a number."
				sys.exit(2) #exit if no appropriate input file

input_file = None #no defualt input file
		
for a in opts[:]:
	if a[0] == '-i' or a[0]=='--input':
		if a[1] != None:
			try:
				if a[1][-4:]=='.pdf':
					input_file = a[1]
				else:
					print "Input file must be a PDF."
					sys.exit(2) #exit if no appropriate input file
			except TypeError:
				print "Input file must be a PDF."
				sys.exit(2) #exit if no appropriate input file
			except IndexError:
				print "Input file must be a PDF."
				sys.exit(2) #exit if no appropriate input file
		else:
			print "Please speicfy an input file."
			sys.exit(2) #exit if no appropriate input file

output_file = "%s_cropped.pdf" %input_file[:-4] #default output

for a in opts[:]:
	if a[0] == '-o' or a[0]== '--output': 
		if a[1]!= None:
			try:
				if a[1][-4:]=='.pdf':
					output_file = a[1]
				else:
					print "Output file must be a PDF."
			except TypeError:
				print "Output file must be a PDF."
			except IndexError:
				print "Output file must be a PDF."

col = 0

for a in opts[:]:
	if a[0] == '-c' or a[0]=='--column':
		col = 1

half = 0

for a in opts[:]:
	if a[0] == '-h' or a[0]=='--half':
		half = 1


input1 = PdfFileReader(file(input_file, "rb"))

output = PdfFileWriter()
outputstream = file(output_file, "wb")

pages = input1.getNumPages()

top_right = {'x': input1.getPage(1).mediaBox.getUpperRight_x(), 'y': input1.getPage(1).mediaBox.getUpperRight_y()}

ty = input1.getPage(1).mediaBox.getUpperLeft_y()
lx = input1.getPage(1).mediaBox.getUpperLeft_x()
rx = input1.getPage(1).mediaBox.getLowerRight_x()
by = input1.getPage(1).mediaBox.getLowerRight_y()
print ty, lx, rx, by

cut = cut_length(top_right, 'x', factor)

newCoords = new_coords2(ty, lx, rx, by, cut)
new_tr = (newCoords['tr']['x'], newCoords['tr']['y'])
new_tl = (newCoords['tl']['x'], newCoords['tl']['y'])
new_br = (newCoords['br']['x'], newCoords['br']['y'])
new_bl = (newCoords['bl']['x'], newCoords['bl']['y'])

print new_tl[1], new_tl[0], new_bl[1], new_bl[0]

if skipone == 0 and col == 0 and half == 0:
	for i in range(0, pages):
		page = input1.getPage(i)
		page.mediaBox.upperLeft = new_tl
		page.mediaBox.upperRight = new_tr
		page.mediaBox.lowerLeft = new_bl
		page.mediaBox.lowerRight = new_br
		output.addPage(page)
elif skipone == 0 and col == 0 and half == 1:
	for i in range(0, pages-2):
		page = input1.getPage(i)
		page.mediaBox.upperLeft = new_tl
		page.mediaBox.upperRight = new_tr
		page.mediaBox.lowerLeft = new_bl
		page.mediaBox.lowerRight = new_br
		temp_output = PdfFileWriter()
		temp_output.addPage(page)
		tos = file("temp.pdf", "wb")
		temp_output.write(tos)
		tos.close()
		cmd = 'convert temp.pdf -density 8400 -colorspace Gray -contrast -contrast -contrast -colors 16 temp.gif'
		subprocess.call(cmd, shell=True)
		height = find_lines.find_hline('temp.gif', 5, 80)
		page1 = input1.getPage(i)
		page1.mediaBox.upperLeft = new_tl
		page1.mediaBox.upperRight = new_tr
		page1.mediaBox.lowerLeft = (new_tl[0], new_tl[1]-height)
		page1.mediaBox.lowerRight = (new_tr[0], new_tr[1]-height)
		output.addPage(page1)
		page2 = input1.getPage(i)
		page2.mediaBox.upperLeft = (new_tl[0], new_tl[1]-height)
		page2.mediaBox.upperRight = (new_tr[0], new_tr[1]-height)
		page2.mediaBox.lowerLeft = new_bl
		page2.mediaBox.lowerRight = new_br
		output.addPage(page2)

elif skipone == 1 and col == 0 and half == 0:
	for i in range(1, pages):
		page = input1.getPage(i)
		page.mediaBox.upperLeft = new_tl
		page.mediaBox.upperRight = new_tr
		page.mediaBox.lowerLeft = new_bl
		page.mediaBox.lowerRight = new_br
		output.addPage(page)

output.write(outputstream)
outputstream.close()

ashkulz · 09-23-2008, 11:21 AM

sjvr767: I was going to work on this very idea, but you beat me to it

First, there are two very good projects which already implement this: pdfcrop and pdfcrop.pl (the latter has a very good fork at pdfcrop2). All of them have the same disadvantage: they detect the bounding box using ghostscript (which is very good and accurate) but then they don't update the PDF in-place: they re-create the PDF using pdftex or other software.

I'd already done a proof-of-concept that it worked using pyPdf [I've contributed to it in the past] but other projects (notably ebookutils) took my time

Would you be interested in taking it further using gs? The command line to generate a bbox is

Code:

gs -dBATCH -dSAFER -dNOPAUSE -dUseCropBox -sDEVICE=bbox <input.pdf>

You can capture the output using the subprocess module and then use it for setting the cropbox.

EDIT: just saw that you posted this much earlier. My apologies

sjvr767 · 09-28-2008, 11:29 AM

Quote:

Originally Posted by ashkulz

First, there are two very good projects which already implement this: pdfcrop and pdfcrop.pl (the latter has a very good fork at pdfcrop2). All of them have the same disadvantage: they detect the bounding box using ghostscript (which is very good and accurate) but then they don't update the PDF in-place: they re-create the PDF using pdftex or other software.

Thanks for the links, the pdfcrop2 project looks quite interesting. When I get a chance, I'll give detecting a bounding box via gs a go.

The main thing I want to do is create a programme that will take a page and cut it into two pages. Those "half-pages" can then be rescaled using a PDF printer (probably as landscape)... Think how well that will display on the Iliad? Half an A4 page is about the same size as the Iliads screen, so it should work nicely.

The trick, however, is to cut the page in such a way that you do not cut through a sentence.

I tried converting PDF pages to images and then using Python Image Library to analyze the color composition of areas at and near the middle. If the area was mostly white, then it was fine to cut there...

It almost worked, but the results were quite inconsistent. Some pages were cut cleanly near the middle, others were cut either a third of the way down etc.

The idea that I have now is to export each page of the PDF as a SVG file. Since SVG is an XML-based format, one can then simply copy elements with y coordinates above or below a certain value to separate SVG files. Then print those files as PDFs, and merge all of them back into one PDF.

Unfortunately I haven't had the time to really sit and code the above. Never worked with with parsing XML in Python, so I have to first learn how to do that...

Any suggestions would be welcome.

BTW, I'm not a programmer... I only code for fun.

sjvr767 · 02-14-2009, 07:04 AM

Hi there,

Sorry this has taken so long. I have a lot going on in my life right now, but I managed to do a bit of code clean-up (not much) and I added the ability to specify manual cropping in addition to the proportional cropping.

Therefore, you can now tweak the cropping slightly. For example, I have paper called "systemic_risk.pdf",

Code:

./my_crop.py -s -p 0.7 -i systemic_risk.pdf -o systemic_risk2.pdf

and after checking the output file (systemic_risk2.pdf), I see that I'd like to crop the left, top and bottom side a bit more... Then I can go,

Code:

./my_crop.py -s -p 0.7 -i systemic_risk.pdf -o systemic_risk2.pdf -m "15 50 0 50"

Which will crop the left side by 15, the top by 50, the right by 0 and the bottom by 50.

BTW, the script now outputs the dimension of the first page... You can use that in order to give yourself an idea as to how much to crop manually. Also, you can do pure manual cropping by specifying -p 1.

If you have pyPDF, just cut and paste the following code into a file called "my_crop.py" and make it executable:

Code:

#! /usr/bin/python

import getopt, sys
from pyPdf import PdfFileWriter, PdfFileReader

def usage ():
    print """sjvr767\'s PDF Cropping Script.
Example:
my_pdf_crop.py -s -p 0.5 -i input.pdf -o output.pdf
my_pdf_crop.py --skip --percent 0.5 -input input.pdf -output output.pdf
\n
REQUIRED OPTIONS:
-p\t--percent
The factor by which to crop. Must be positive and less than or equal to 1.

-i\t--input
The path to the file to be cropped.
\n
OPTIONAL:
-s\t--skip
Skip the first page. Ouptut file will not contain the first page of the input file.

-o\t--output
Specify the name and path of the output file. If none specified, the script appends \'cropped\' to the file name.

-m\t--margin
Specify additional absolute cropping, for fine tuning results.
\t-m "left top right bottom"
"""
    sys.exit(0)

def cut_length(dictionary, key, factor):
	cut_factor = 1-factor
	cut = float(dictionary[key])*cut_factor
	cut = cut / 4
	return cut
		
def new_coords(dictionary, key, cut, margin, code = "tl"):
	if code == "tl":
		if key == "x":
			return abs(float(dictionary[key])+(cut+margin["l"]))
		else:
			return abs(float(dictionary[key])-(cut+margin["t"]))
	elif code == "tr":
		if key == "x":
			return abs(float(dictionary[key])-(cut+margin["r"]))
		else:
			return abs(float(dictionary[key])-(cut+margin["t"]))
	elif code == "bl":
		if key == "x":
			return abs(float(dictionary[key])+(cut+margin["l"]))
		else:
			return abs(float(dictionary[key])+(cut+margin["b"]))
	else:
		if key == "x":
			return abs(float(dictionary[key])-(cut+margin["r"]))
		else:
			return abs(float(dictionary[key])+(cut+margin["b"]))

try:
	opts, args = getopt.getopt(sys.argv[1:], "sp:i:o:m:", ["skip", "percent=", "input=", "output=", "margin="])
except getopt.GetoptError, err:
        # print help information and exit:
        print str(err) # will print something like "option -a not recognized"
        usage()
        sys.exit(2)

skipone = 0

for a in opts[:]:
	if a[0] == '-s' or a[0]=='--skip':
		skipone = 1

factor = 0.8 #default scaling factor

for a in opts[:]:
	if a[0] == '-p' or a[0]=='--factor':
		if a[1] != None:
			try:
				factor = float(a[1])
			except TypeError:
				print "Factor must be a number."
				sys.exit(2) #exit if no appropriate input file

input_file = None #no defualt input file
		
for a in opts[:]:
	if a[0] == '-i' or a[0]=='--input':
		if a[1] != None:
			try:
				if a[1][-4:]=='.pdf':
					input_file = a[1]
				else:
					print "Input file must be a PDF."
					sys.exit(2) #exit if no appropriate input file
			except TypeError:
				print "Input file must be a PDF."
				sys.exit(2) #exit if no appropriate input file
			except IndexError:
				print "Input file must be a PDF."
				sys.exit(2) #exit if no appropriate input file
		else:
			print "Please speicfy an input file."
			sys.exit(2) #exit if no appropriate input file

output_file = "%s_cropped.pdf" %input_file[:-4] #default output

for a in opts[:]:
	if a[0] == '-o' or a[0]=='--output': 
		if a[1]!= None:
			try:
				if a[1][-4:]=='.pdf':
					output_file = a[1]
				else:
					print "Output file must be a PDF."
			except TypeError:
				print "Output file must be a PDF."
			except IndexError:
				print "Output file must be a PDF."

margin = {"l": 0, "t": 0, "r": 0, "b": 0}

for a in opts[:]:
	if a[0] == '-m' or a[0]=='--margin':
		if a[1]!= None:
			m_temp = a[1].strip("\"").split()
			margin["l"] = float(m_temp[0])
			margin["t"] = float(m_temp[1])
			margin["r"] = float(m_temp[2])
			margin["b"] = float(m_temp[3])
		else:
			print "Error"

input1 = PdfFileReader(file(input_file, "rb"))

output = PdfFileWriter()
outputstream = file(output_file, "wb")

pages = input1.getNumPages()

top_right = {'x': input1.getPage(1).mediaBox.getUpperRight_x(), 'y': input1.getPage(1).mediaBox.getUpperRight_y()}
top_left = {'x': input1.getPage(1).mediaBox.getUpperLeft_x(), 'y': input1.getPage(1).mediaBox.getUpperLeft_y()}
bottom_right = {'x': input1.getPage(1).mediaBox.getLowerRight_x(), 'y': input1.getPage(1).mediaBox.getLowerRight_y()}
bottom_left = {'x': input1.getPage(1).mediaBox.getLowerLeft_x(), 'y': input1.getPage(1).mediaBox.getLowerLeft_y()}

print('Page dim.\t%f by %f' %(top_right['x'], top_right['y']))

cut = cut_length(top_right, 'x', factor)

new_tr = (new_coords(top_right, 'x', cut, margin, code = "tr"), new_coords(top_right, 'y', cut, margin, code = "tr"))
new_br = (new_coords(bottom_right, 'x', cut, margin, code = "br"), new_coords(bottom_right, 'y', cut, margin, code = "br" ))
new_tl = (new_coords(top_left, 'x', cut, margin, code = "tl"), new_coords(top_left, 'y', cut, margin, code = "tl"))
new_bl = (new_coords(bottom_left, 'x', cut, margin, code = "bl"), new_coords(bottom_left, 'y', cut, margin, code = "bl"))

if skipone == 0:
	for i in range(0, pages):
		page = input1.getPage(i)
		page.mediaBox.upperLeft = new_tl
		page.mediaBox.upperRight = new_tr
		page.mediaBox.lowerLeft = new_bl
		page.mediaBox.lowerRight = new_br
		output.addPage(page)
else:
	for i in range(1, pages):
		page = input1.getPage(i)
		page.mediaBox.upperLeft = new_tl
		page.mediaBox.upperRight = new_tr
		page.mediaBox.lowerLeft = new_bl
		page.mediaBox.lowerRight = new_br
		output.addPage(page)

output.write(outputstream)
outputstream.close()

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
PDF cropping software: BRISS	laborg	PDF	331	08-18-2023 08:30 AM
PDF to EPUP conversion after page cropping	Naismith	Calibre	6	03-09-2010 08:37 AM
cropping pdf with preview	wang960	Sony Reader	2	05-05-2009 09:28 AM
yet another cropping tool	moggie	PDF	4	01-16-2009 04:42 AM
Nice Mac OS X .pdf Cropping Tool	jmdor	Sony Reader	0	04-04-2007 10:41 PM

08-16-2008, 11:08 AM	#2
haridasi Zealot Posts: 119 Karma: 603 Join Date: May 2008 Location: Oslo, Norway Device: irex iliad	This is interesting. Could you add the code as a file attachment?

08-24-2008, 06:39 AM	#3
haridasi Zealot Posts: 119 Karma: 603 Join Date: May 2008 Location: Oslo, Norway Device: irex iliad	I have now tried to crop a pdf, but it doesn't crop the left side of the document. Furthermore, it takes some time guessing the correct percentage.

08-29-2008, 04:21 AM	#4
sjvr767 Junior Member Posts: 5 Karma: 10 Join Date: Jun 2008 Device: iRex iLiad	Hi, Sorry that I haven't put up a file yet, I have few other things that need to be addressed immediately (i.e. my dissertation). Yes, the percentage is tricky (since it isn't exactly a true percentage). Personally, I only use this script on files which do not crop using Heiko Oberdiek's "pdfcrop" (found or available on most Linux systems via the command "pdfcrop"). These tend to be papers from JSTOR, hence the skip first page option. Thank you for trying it though, I will try my best to address the "left-crop" issue as soon as I have time.

09-23-2008, 11:21 AM	#6
ashkulz Addict Posts: 350 Karma: 705 Join Date: Dec 2006 Location: Mumbai, India Device: Kindle 1/REB 1200	sjvr767: I was going to work on this very idea, but you beat me to it First, there are two very good projects which already implement this: pdfcrop and pdfcrop.pl (the latter has a very good fork at pdfcrop2). All of them have the same disadvantage: they detect the bounding box using ghostscript (which is very good and accurate) but then they don't update the PDF in-place: they re-create the PDF using pdftex or other software. I'd already done a proof-of-concept that it worked using pyPdf [I've contributed to it in the past] but other projects (notably ebookutils) took my time Would you be interested in taking it further using gs? The command line to generate a bbox is Code: gs -dBATCH -dSAFER -dNOPAUSE -dUseCropBox -sDEVICE=bbox <input.pdf> You can capture the output using the subprocess module and then use it for setting the cropbox. EDIT: just saw that you posted this much earlier. My apologies

Advert

Advert