![]() |
#1 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 696
Karma: 2383012
Join Date: Aug 2007
Location: Schiedam (The Netherlands)
Device: Lots of eInk devices and iOS stuff
|
Batch DRM/Password detection
All!
I have across time a lot of purchased PDF and I want if there exist a way to check if they have DRM or are password protected or copy/print/whatever restriction? I know I can go PDF by PDF checking it, but when the number is about two thousand... Does not matter if the way to do it need to be done in macOS, Windows or Linux. Thanks in advance. (And no, I'm not asking a way to *remove* DRM, I want to collect my DRM protected PDF). |
![]() |
![]() |
![]() |
#2 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,657
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Sorry, there is no way to check for DRM on PDF in batches. You have to do it one by one.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,779
Karma: 103362673
Join Date: Apr 2011
Device: pb360
|
Quote:
In a bash terminal in linux with pdftk installed: Code:
for file in *.pdf do pdftk $file dump_data > /dev/null 2>> encrypted_list.txt done Last edited by j.p.s; 07-12-2020 at 01:28 PM. |
|
![]() |
![]() |
![]() |
#4 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,725
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
It's also relatively easy to check for password-protected files with the PyPDF2 Python library:
1. Install Python 3.x and the PyPDF2 library. 2. Save the following lines as a text file with a *.py extension. (Make sure to copy it verbatim; in Python, indentations matter. Missing/extra spaces will cause the script to fail.) Code:
#!/usr/bin/env python import sys, os, glob from PyPDF2 import PdfFileReader def main(): current_dir = os.path.dirname(os.path.abspath(__file__)) pdf_files = glob.glob(os.path.join(current_dir, '**', '*.pdf*'), recursive=True) for pdf_file in pdf_files: with open(pdf_file, 'rb') as fh: reader = PdfFileReader(fh) encrypted = False if reader.isEncrypted: encrypted = True if encrypted: os.rename(pdf_file, pdf_file + '.encrypted.pdf') if __name__ == "__main__": sys.exit(main()) If the script worked, all password-protected files should have an *.encryped.pdf extension. If it doesn't, open a command prompt/terminal window, execute the file and post the error messages. |
![]() |
![]() |
![]() |
#5 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,779
Karma: 103362673
Join Date: Apr 2011
Device: pb360
|
^ If renaming is acceptable, that is an elegant solution.
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 696
Karma: 2383012
Join Date: Aug 2007
Location: Schiedam (The Netherlands)
Device: Lots of eInk devices and iOS stuff
|
Wow!
Thanks a lot! I will test all of this tomorrow. |
![]() |
![]() |
![]() |
#7 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,779
Karma: 103362673
Join Date: Apr 2011
Device: pb360
|
I did a bit of looking around and found a couple more ways to do it.
1. qpdf gives a bit cleaner results. Code:
for f in *.pdf; do qpdf --show-encryption $f > /dev/null; done Code:
#!/usr/bin/perl use PDF::API2; while (glob "*.pdf") { $pdf = PDF::API2->open($_); print "$_ is encrypted.\n" if $pdf->isEncrypted(); } |
![]() |
![]() |
![]() |
#8 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 696
Karma: 2383012
Join Date: Aug 2007
Location: Schiedam (The Netherlands)
Device: Lots of eInk devices and iOS stuff
|
Wow!!!
I thought it was more complex to do. Thanks a lot to all. Now comes the second part: is there any way to check if those PDF with DRM have real text? I've found sometimes that copy and paste for citation dealt with garbage or nonsense texts and I've had to manually type the text. Any automated way to detect those pdf? |
![]() |
![]() |
![]() |
#9 |
Fuzzball, the purple cat
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,302
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
You can do something like this for batch text extraction:
k2pdfopt -ocrout %s_text.txt -o dummy.pdf "*.pdf" -mode copy -n -dpi 100 For every file, e.g. myfile.pdf, this will create myfile_text.txt which will have the extracted text layer. |
![]() |
![]() |
![]() |
#10 | |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 696
Karma: 2383012
Join Date: Aug 2007
Location: Schiedam (The Netherlands)
Device: Lots of eInk devices and iOS stuff
|
Quote:
Impressive. Even faster if I add -p 10-20 (for example), to only get the text of some pages and see if they contains text or garbage. ![]() ![]() ![]() So many tools, and so little time... Last edited by rfog; 07-19-2020 at 03:47 AM. |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Batch convert drm-free ePub to mobi | Barty | Workshop | 2 | 10-09-2011 09:12 AM |
Password-based DRM has a dim future | ardeegee | General Discussions | 119 | 06-15-2011 07:18 PM |
Help with Chapter detection | ubergeeksov | Calibre | 0 | 09-02-2010 04:56 AM |
DRM protected detection | makani | General Discussions | 10 | 07-21-2010 08:55 PM |
Adobe CS5 introduces password-protected ebook DRM | ebookreaders | News | 13 | 12-15-2009 08:07 AM |