Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 07-12-2020, 08:38 AM   #1
rfog
Guru
rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.
 
Posts: 695
Karma: 2383012
Join Date: Aug 2007
Location: Schiedam (The Netherlands)
Device: Lots of eInk devices and iOS stuff
Batch DRM/Password detection

All!

I have across time a lot of purchased PDF and I want if there exist a way to check if they have DRM or are password protected or copy/print/whatever restriction?

I know I can go PDF by PDF checking it, but when the number is about two thousand...

Does not matter if the way to do it need to be done in macOS, Windows or Linux.

Thanks in advance.

(And no, I'm not asking a way to *remove* DRM, I want to collect my DRM protected PDF).
rfog is offline   Reply With Quote
Old 07-12-2020, 11:11 AM   #2
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 76,122
Karma: 134368292
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Sorry, there is no way to check for DRM on PDF in batches. You have to do it one by one.
JSWolf is offline   Reply With Quote
Advert
Old 07-12-2020, 12:49 PM   #3
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,472
Karma: 100408738
Join Date: Apr 2011
Device: pb360
Quote:
Originally Posted by JSWolf View Post
Sorry, there is no way to check for DRM on PDF in batches. You have to do it one by one.
Of course there is a way.

In a bash terminal in linux with pdftk installed:
Code:
for file in *.pdf
  do pdftk $file dump_data > /dev/null 2>> encrypted_list.txt
done
The file encrypted_list.txt will contain a list of encrypted files (and any other errors that turn up).

Last edited by j.p.s; 07-12-2020 at 01:28 PM.
j.p.s is offline   Reply With Quote
Old 07-12-2020, 12:55 PM   #4
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,637
Karma: 23191067
Join Date: Dec 2010
Device: Kindle PW2
It's also relatively easy to check for password-protected files with the PyPDF2 Python library:

1. Install Python 3.x and the PyPDF2 library.
2. Save the following lines as a text file with a *.py extension.
(Make sure to copy it verbatim; in Python, indentations matter. Missing/extra spaces will cause the script to fail.)

Code:
#!/usr/bin/env python
import sys, os, glob
from PyPDF2 import PdfFileReader

def main():
    current_dir = os.path.dirname(os.path.abspath(__file__))
    pdf_files = glob.glob(os.path.join(current_dir,  '**', '*.pdf*'), recursive=True)
    for pdf_file in pdf_files:
        with open(pdf_file, 'rb') as fh:
            reader = PdfFileReader(fh)
            encrypted = False
            if reader.isEncrypted: encrypted = True
        if encrypted: os.rename(pdf_file, pdf_file + '.encrypted.pdf')

if __name__ == "__main__":
   sys.exit(main())
3. Copy the *.py file to a folder with *.pdf files in it and double-click it.

If the script worked, all password-protected files should have an *.encryped.pdf extension. If it doesn't, open a command prompt/terminal window, execute the file and post the error messages.
Doitsu is offline   Reply With Quote
Old 07-12-2020, 01:03 PM   #5
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,472
Karma: 100408738
Join Date: Apr 2011
Device: pb360
^ If renaming is acceptable, that is an elegant solution.
j.p.s is offline   Reply With Quote
Advert
Old 07-12-2020, 03:33 PM   #6
rfog
Guru
rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.
 
Posts: 695
Karma: 2383012
Join Date: Aug 2007
Location: Schiedam (The Netherlands)
Device: Lots of eInk devices and iOS stuff
Wow!

Thanks a lot! I will test all of this tomorrow.
rfog is offline   Reply With Quote
Old 07-16-2020, 01:48 PM   #7
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,472
Karma: 100408738
Join Date: Apr 2011
Device: pb360
I did a bit of looking around and found a couple more ways to do it.

1. qpdf gives a bit cleaner results.
Code:
for f in *.pdf; do qpdf --show-encryption $f > /dev/null; done
2. For those like me that find perl easier to read and write than python
Code:
#!/usr/bin/perl
use PDF::API2;

while (glob "*.pdf") {
  $pdf = PDF::API2->open($_);
  print "$_ is encrypted.\n" if $pdf->isEncrypted();
}
PDF::API2 was not included by default on any of my systems, but neither was PyPDF2 including on a very large anaconda install of python at work.
j.p.s is offline   Reply With Quote
Old 07-17-2020, 01:45 PM   #8
rfog
Guru
rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.
 
Posts: 695
Karma: 2383012
Join Date: Aug 2007
Location: Schiedam (The Netherlands)
Device: Lots of eInk devices and iOS stuff
Wow!!!

I thought it was more complex to do.

Thanks a lot to all.

Now comes the second part: is there any way to check if those PDF with DRM have real text? I've found sometimes that copy and paste for citation dealt with garbage or nonsense texts and I've had to manually type the text.

Any automated way to detect those pdf?
rfog is offline   Reply With Quote
Old 07-18-2020, 10:57 AM   #9
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,283
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
You can do something like this for batch text extraction:

k2pdfopt -ocrout %s_text.txt -o dummy.pdf "*.pdf" -mode copy -n -dpi 100

For every file, e.g. myfile.pdf, this will create myfile_text.txt which will have the extracted text layer.
willus is offline   Reply With Quote
Old 07-19-2020, 03:42 AM   #10
rfog
Guru
rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.
 
Posts: 695
Karma: 2383012
Join Date: Aug 2007
Location: Schiedam (The Netherlands)
Device: Lots of eInk devices and iOS stuff
Quote:
Originally Posted by willus View Post
You can do something like this for batch text extraction:

k2pdfopt -ocrout %s_text.txt -o dummy.pdf "*.pdf" -mode copy -n -dpi 100

For every file, e.g. myfile.pdf, this will create myfile_text.txt which will have the extracted text layer.
Ho Ho.

Impressive. Even faster if I add -p 10-20 (for example), to only get the text of some pages and see if they contains text or garbage.



So many tools, and so little time...

Last edited by rfog; 07-19-2020 at 03:47 AM.
rfog is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Batch convert drm-free ePub to mobi Barty Workshop 2 10-09-2011 09:12 AM
Password-based DRM has a dim future ardeegee General Discussions 119 06-15-2011 07:18 PM
Help with Chapter detection ubergeeksov Calibre 0 09-02-2010 04:56 AM
DRM protected detection makani General Discussions 10 07-21-2010 08:55 PM
Adobe CS5 introduces password-protected ebook DRM ebookreaders News 13 12-15-2009 08:07 AM


All times are GMT -4. The time now is 10:48 AM.


MobileRead.com is a privately owned, operated and funded community.