Batch DRM/Password detection

rfog · 07-12-2020, 08:38 AM

All!

I have across time a lot of purchased PDF and I want if there exist a way to check if they have DRM or are password protected or copy/print/whatever restriction?

I know I can go PDF by PDF checking it, but when the number is about two thousand...

Does not matter if the way to do it need to be done in macOS, Windows or Linux.

Thanks in advance.

(And no, I'm not asking a way to *remove* DRM, I want to collect my DRM protected PDF).

JSWolf · 07-12-2020, 11:11 AM

Sorry, there is no way to check for DRM on PDF in batches. You have to do it one by one.

j.p.s · 07-12-2020, 12:49 PM

Quote:

Originally Posted by JSWolf

Sorry, there is no way to check for DRM on PDF in batches. You have to do it one by one.

Of course there is a way.

In a bash terminal in linux with pdftk installed:

Code:

for file in *.pdf
  do pdftk $file dump_data > /dev/null 2>> encrypted_list.txt
done

The file encrypted_list.txt will contain a list of encrypted files (and any other errors that turn up).

Doitsu · 07-12-2020, 12:55 PM

It's also relatively easy to check for password-protected files with the PyPDF2 Python library:

1. Install Python 3.x and the PyPDF2 library.
2. Save the following lines as a text file with a *.py extension.
(Make sure to copy it verbatim; in Python, indentations matter. Missing/extra spaces will cause the script to fail.)

Code:

#!/usr/bin/env python
import sys, os, glob
from PyPDF2 import PdfFileReader

def main():
    current_dir = os.path.dirname(os.path.abspath(__file__))
    pdf_files = glob.glob(os.path.join(current_dir,  '**', '*.pdf*'), recursive=True)
    for pdf_file in pdf_files:
        with open(pdf_file, 'rb') as fh:
            reader = PdfFileReader(fh)
            encrypted = False
            if reader.isEncrypted: encrypted = True
        if encrypted: os.rename(pdf_file, pdf_file + '.encrypted.pdf')

if __name__ == "__main__":
   sys.exit(main())

3. Copy the *.py file to a folder with *.pdf files in it and double-click it.

If the script worked, all password-protected files should have an *.encryped.pdf extension. If it doesn't, open a command prompt/terminal window, execute the file and post the error messages.

j.p.s · 07-12-2020, 01:03 PM

^ If renaming is acceptable, that is an elegant solution.

rfog · 07-12-2020, 03:33 PM

Wow!

Thanks a lot! I will test all of this tomorrow.

j.p.s · 07-16-2020, 01:48 PM

I did a bit of looking around and found a couple more ways to do it.

1. qpdf gives a bit cleaner results.

Code:

for f in *.pdf; do qpdf --show-encryption $f > /dev/null; done

2. For those like me that find perl easier to read and write than python

Code:

#!/usr/bin/perl
use PDF::API2;

while (glob "*.pdf") {
  $pdf = PDF::API2->open($_);
  print "$_ is encrypted.\n" if $pdf->isEncrypted();
}

PDF::API2 was not included by default on any of my systems, but neither was PyPDF2 including on a very large anaconda install of python at work.

rfog · 07-17-2020, 01:45 PM

Wow!!!

I thought it was more complex to do.

Thanks a lot to all.

Now comes the second part: is there any way to check if those PDF with DRM have real text? I've found sometimes that copy and paste for citation dealt with garbage or nonsense texts and I've had to manually type the text.

Any automated way to detect those pdf?

willus · 07-18-2020, 10:57 AM

You can do something like this for batch text extraction:

k2pdfopt -ocrout %s_text.txt -o dummy.pdf "*.pdf" -mode copy -n -dpi 100

For every file, e.g. myfile.pdf, this will create myfile_text.txt which will have the extracted text layer.

rfog · 07-19-2020, 03:42 AM

Quote:

Originally Posted by willus

You can do something like this for batch text extraction:

k2pdfopt -ocrout %s_text.txt -o dummy.pdf "*.pdf" -mode copy -n -dpi 100

For every file, e.g. myfile.pdf, this will create myfile_text.txt which will have the extracted text layer.

Ho Ho.

Impressive. Even faster if I add -p 10-20 (for example), to only get the text of some pages and see if they contains text or garbage.

So many tools, and so little time...

07-12-2020, 08:38 AM	#1
rfog Guru Posts: 696 Karma: 2383012 Join Date: Aug 2007 Location: Schiedam (The Netherlands) Device: Lots of eInk devices and iOS stuff	Batch DRM/Password detection All! I have across time a lot of purchased PDF and I want if there exist a way to check if they have DRM or are password protected or copy/print/whatever restriction? I know I can go PDF by PDF checking it, but when the number is about two thousand... Does not matter if the way to do it need to be done in macOS, Windows or Linux. Thanks in advance. (And no, I'm not asking a way to remove DRM, I want to collect my DRM protected PDF).

07-12-2020, 12:55 PM	#4
Doitsu Grand Sorcerer Posts: 5,725 Karma: 24031401 Join Date: Dec 2010 Device: Kindle PW2	It's also relatively easy to check for password-protected files with the PyPDF2 Python library: 1. Install Python 3.x and the PyPDF2 library. 2. Save the following lines as a text file with a .py extension. (Make sure to copy it verbatim; in Python, indentations matter. Missing/extra spaces will cause the script to fail.) Code: #!/usr/bin/env python import sys, os, glob from PyPDF2 import PdfFileReader def main(): current_dir = os.path.dirname(os.path.abspath(__file__)) pdf_files = glob.glob(os.path.join(current_dir, '', '.pdf'), recursive=True) for pdf_file in pdf_files: with open(pdf_file, 'rb') as fh: reader = PdfFileReader(fh) encrypted = False if reader.isEncrypted: encrypted = True if encrypted: os.rename(pdf_file, pdf_file + '.encrypted.pdf') if __name__ == "__main__": sys.exit(main()) 3. Copy the .py file to a folder with .pdf files in it and double-click it. If the script worked, all password-protected files should have an .encryped.pdf extension. If it doesn't, open a command prompt/terminal window, execute the file and post the error messages.

07-16-2020, 01:48 PM	#7
j.p.s Grand Sorcerer Posts: 5,779 Karma: 103362673 Join Date: Apr 2011 Device: pb360	I did a bit of looking around and found a couple more ways to do it. 1. qpdf gives a bit cleaner results. Code: for f in .pdf; do qpdf --show-encryption $f > /dev/null; done 2. For those like me that find perl easier to read and write than python Code: #!/usr/bin/perl use PDF::API2; while (glob ".pdf") { $pdf = PDF::API2->open($_); print "$_ is encrypted.\n" if $pdf->isEncrypted(); } PDF::API2 was not included by default on any of my systems, but neither was PyPDF2 including on a very large anaconda install of python at work.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Batch convert drm-free ePub to mobi	Barty	Workshop	2	10-09-2011 09:12 AM
Password-based DRM has a dim future	ardeegee	General Discussions	119	06-15-2011 07:18 PM
Help with Chapter detection	ubergeeksov	Calibre	0	09-02-2010 04:56 AM
DRM protected detection	makani	General Discussions	10	07-21-2010 08:55 PM
Adobe CS5 introduces password-protected ebook DRM	ebookreaders	News	13	12-15-2009 08:07 AM

07-12-2020, 11:11 AM	#2
JSWolf Resident Curmudgeon Posts: 79,657 Karma: 145864619 Join Date: Nov 2006 Location: Roslindale, Massachusetts Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3	Sorry, there is no way to check for DRM on PDF in batches. You have to do it one by one.

07-12-2020, 01:03 PM	#5
j.p.s Grand Sorcerer Posts: 5,779 Karma: 103362673 Join Date: Apr 2011 Device: pb360	^ If renaming is acceptable, that is an elegant solution.

07-12-2020, 03:33 PM	#6
rfog Guru Posts: 696 Karma: 2383012 Join Date: Aug 2007 Location: Schiedam (The Netherlands) Device: Lots of eInk devices and iOS stuff	Wow! Thanks a lot! I will test all of this tomorrow.

07-17-2020, 01:45 PM	#8
rfog Guru Posts: 696 Karma: 2383012 Join Date: Aug 2007 Location: Schiedam (The Netherlands) Device: Lots of eInk devices and iOS stuff	Wow!!! I thought it was more complex to do. Thanks a lot to all. Now comes the second part: is there any way to check if those PDF with DRM have real text? I've found sometimes that copy and paste for citation dealt with garbage or nonsense texts and I've had to manually type the text. Any automated way to detect those pdf?

07-18-2020, 10:57 AM	#9
willus Fuzzball, the purple cat Posts: 1,302 Karma: 11087488 Join Date: Jun 2011 Location: California Device: iPad	You can do something like this for batch text extraction: k2pdfopt -ocrout %s_text.txt -o dummy.pdf "*.pdf" -mode copy -n -dpi 100 For every file, e.g. myfile.pdf, this will create myfile_text.txt which will have the extracted text layer.

Advert

Advert