07-12-2020, 08:38 AM | #1 |
Guru
Posts: 695
Karma: 2383012
Join Date: Aug 2007
Location: Schiedam (The Netherlands)
Device: Lots of eInk devices and iOS stuff
|
Batch DRM/Password detection
All!
I have across time a lot of purchased PDF and I want if there exist a way to check if they have DRM or are password protected or copy/print/whatever restriction? I know I can go PDF by PDF checking it, but when the number is about two thousand... Does not matter if the way to do it need to be done in macOS, Windows or Linux. Thanks in advance. (And no, I'm not asking a way to *remove* DRM, I want to collect my DRM protected PDF). |
07-12-2020, 11:11 AM | #2 |
Resident Curmudgeon
Posts: 76,122
Karma: 134368292
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Sorry, there is no way to check for DRM on PDF in batches. You have to do it one by one.
|
Advert | |
|
07-12-2020, 12:49 PM | #3 | |
Grand Sorcerer
Posts: 5,472
Karma: 100408738
Join Date: Apr 2011
Device: pb360
|
Quote:
In a bash terminal in linux with pdftk installed: Code:
for file in *.pdf do pdftk $file dump_data > /dev/null 2>> encrypted_list.txt done Last edited by j.p.s; 07-12-2020 at 01:28 PM. |
|
07-12-2020, 12:55 PM | #4 |
Grand Sorcerer
Posts: 5,637
Karma: 23191067
Join Date: Dec 2010
Device: Kindle PW2
|
It's also relatively easy to check for password-protected files with the PyPDF2 Python library:
1. Install Python 3.x and the PyPDF2 library. 2. Save the following lines as a text file with a *.py extension. (Make sure to copy it verbatim; in Python, indentations matter. Missing/extra spaces will cause the script to fail.) Code:
#!/usr/bin/env python import sys, os, glob from PyPDF2 import PdfFileReader def main(): current_dir = os.path.dirname(os.path.abspath(__file__)) pdf_files = glob.glob(os.path.join(current_dir, '**', '*.pdf*'), recursive=True) for pdf_file in pdf_files: with open(pdf_file, 'rb') as fh: reader = PdfFileReader(fh) encrypted = False if reader.isEncrypted: encrypted = True if encrypted: os.rename(pdf_file, pdf_file + '.encrypted.pdf') if __name__ == "__main__": sys.exit(main()) If the script worked, all password-protected files should have an *.encryped.pdf extension. If it doesn't, open a command prompt/terminal window, execute the file and post the error messages. |
07-12-2020, 01:03 PM | #5 |
Grand Sorcerer
Posts: 5,472
Karma: 100408738
Join Date: Apr 2011
Device: pb360
|
^ If renaming is acceptable, that is an elegant solution.
|
Advert | |
|
07-12-2020, 03:33 PM | #6 |
Guru
Posts: 695
Karma: 2383012
Join Date: Aug 2007
Location: Schiedam (The Netherlands)
Device: Lots of eInk devices and iOS stuff
|
Wow!
Thanks a lot! I will test all of this tomorrow. |
07-16-2020, 01:48 PM | #7 |
Grand Sorcerer
Posts: 5,472
Karma: 100408738
Join Date: Apr 2011
Device: pb360
|
I did a bit of looking around and found a couple more ways to do it.
1. qpdf gives a bit cleaner results. Code:
for f in *.pdf; do qpdf --show-encryption $f > /dev/null; done Code:
#!/usr/bin/perl use PDF::API2; while (glob "*.pdf") { $pdf = PDF::API2->open($_); print "$_ is encrypted.\n" if $pdf->isEncrypted(); } |
07-17-2020, 01:45 PM | #8 |
Guru
Posts: 695
Karma: 2383012
Join Date: Aug 2007
Location: Schiedam (The Netherlands)
Device: Lots of eInk devices and iOS stuff
|
Wow!!!
I thought it was more complex to do. Thanks a lot to all. Now comes the second part: is there any way to check if those PDF with DRM have real text? I've found sometimes that copy and paste for citation dealt with garbage or nonsense texts and I've had to manually type the text. Any automated way to detect those pdf? |
07-18-2020, 10:57 AM | #9 |
Fuzzball, the purple cat
Posts: 1,283
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
You can do something like this for batch text extraction:
k2pdfopt -ocrout %s_text.txt -o dummy.pdf "*.pdf" -mode copy -n -dpi 100 For every file, e.g. myfile.pdf, this will create myfile_text.txt which will have the extracted text layer. |
07-19-2020, 03:42 AM | #10 | |
Guru
Posts: 695
Karma: 2383012
Join Date: Aug 2007
Location: Schiedam (The Netherlands)
Device: Lots of eInk devices and iOS stuff
|
Quote:
Impressive. Even faster if I add -p 10-20 (for example), to only get the text of some pages and see if they contains text or garbage. So many tools, and so little time... Last edited by rfog; 07-19-2020 at 03:47 AM. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Batch convert drm-free ePub to mobi | Barty | Workshop | 2 | 10-09-2011 09:12 AM |
Password-based DRM has a dim future | ardeegee | General Discussions | 119 | 06-15-2011 07:18 PM |
Help with Chapter detection | ubergeeksov | Calibre | 0 | 09-02-2010 04:56 AM |
DRM protected detection | makani | General Discussions | 10 | 07-21-2010 08:55 PM |
Adobe CS5 introduces password-protected ebook DRM | ebookreaders | News | 13 | 12-15-2009 08:07 AM |