Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 04-25-2007, 09:02 AM   #1
ashkulz
Addict
ashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enough
 
ashkulz's Avatar
 
Posts: 350
Karma: 705
Join Date: Dec 2006
Location: Mumbai, India
Device: Kindle 1/REB 1200
PDFRead 1.7 released

UPDATE: PDFRead 1.7 has been released. The changes for 1.7 and batch conversion instructions are mentioned first, then followed by the inital release announcement for 1.6.

I've released PDFRead 1.7, which has minor bug fixes and enhancements. Changes in this release:
  • add a "landscape-half" mode which splits a page into two even halves (gdxf's suggestion)
  • if the output document does not have the proper file extension, then append it automatically.
  • remove imagemagick and use pngnq for color reduction.
  • fix the problems if the PDF has an incorrect TOC referring to an invalid page. Also added option --no-toc to disable TOC generation.

Also, batch conversion can now be done on Windows for all PDFs in a folder.
  1. Download the file attached to linked post and rename it as pdfread-batch.bat
  2. Open up the renamed file, and change the set OPT= line to use the appropriate profile. In case you have installed in a non-default location, change the set LOC= line too.
  3. Copy the batch file into a directory where you want to convert, and double click on it. Please do not put the directory anywhere on the Desktop or My Documents, it can cause some problems. Put it somewhere in the root of your drive ( C:, D: )
  4. The filename will be used as the book title, so be sure to name files properly. Please ensure that the filename does not contain special characters not present in UTF-8. A ebook with be created with the same name (but with given extension ie. sample.pdf => sample.lrf).
In case you want to customize further:
  1. Do a normal conversion with your custom params for a single file and copy the command line options to a text file. Some advice on how to copy the options from the window:
    Quote:
    Originally Posted by alex_d
    To copy text from a CMD window, right-click on the title bar (the bar that has the X and minimize buttons), choose properities, and then enable QuickEdit mode. This lets you highlight text and copy it by right-clicking on it. Copy everything, even if you have to scroll up.
  2. Copy the command line parameters and replace the set OPT= mentioned above. Do NOT include the input filename, the title (-t option) or the call to pdfread, just the options. The value should be valid command line options.

People on OS X/Linux can hack together a similiar script very easily, so I won't bother to post it. If you do want such a script, let me know.


Original announcement follows


After a long wait, PDFRead 1.6 has been released. You can download from PDFRead @ SourceForge.

The focus on this release has been to rewrite the code for better maintainability. It can now be easily integrated into other tools. PDFRead now has a plugin based architecture, which will allow new features to be added easily -- which I've already done for this release.

Lots of new image processing options have been added to PDFRead. unpaper integration ensures that bad scans will be cleaned up properly. The new cropping algorithm removes whitespace very agressively, even from the middle of the page without any loss of content. All images are now run through an edge-enhancement filter, which is the same one used by both rbmake and RasterFarian.

Support for the TIFF and IMGLIST input formats has been added. The IMGLIST format is a simple text file containing a list of images which are to be considered as a single document.

Batch support is not directly present for Windows, but can be achieved via a batch file. The command line used to convert each book (using the current settings) is printed before conversion. You can then copy this to tweak your conversion settings. Users of Linux/OS X are assumed to be familiar with the command-line, and the batch support can be achieved by scripting.

You can also specify a range of pages for conversion. This has the side-effect of giving a preview feature, as specifying the same page as the start and end page will run the processing only for that page.

The Windows GUI has been revamped: there are now tooltips everywhere, and there is no "advanced" page anymore. If you do want to control those parameters, please use the command line directly.

Lots of other minor tweaks have gone into this release.

The detailed changelog for this release:
  • revamped the Windows GUI: added tooltips, preview feature and show the command line options when executed (useful for batch execution).
  • add support for TIFF and a list of page images for input.
  • add unpaper support for image cleanup.
  • add extremely agressive whitespace detection, even in the middle of the page text.
  • added an edge-enhancement filter, similiar to rbmake and RasterFarian.
  • allow all processing stages to be selectively disabled.
  • allow a page range to be specified for conversion.
  • tweak the prs-500 profile to rotate right instead of left (thanks gdxf)
  • add an optional step to optimize generated PNG images via OptiPNG.
  • removed the dependency on xpdf.
  • removed the autocontrast and ghostscript cropping features (no longer useful).
  • fix problem where the IMP file was not created if the latest eBook Publisher was not installed.
  • complete overhaul of the code for better maintainability.
Some screenshots of the effect of the various image processing options are also attached.
Attached Thumbnails
Click image for larger version

Name:	dilation_before.png
Views:	2161
Size:	35.9 KB
ID:	3291   Click image for larger version

Name:	dilation_after.png
Views:	2054
Size:	36.8 KB
ID:	3292   Click image for larger version

Name:	crop_before.png
Views:	2035
Size:	21.2 KB
ID:	3293   Click image for larger version

Name:	crop_after.png
Views:	1987
Size:	19.6 KB
ID:	3294   Click image for larger version

Name:	unpaper_before.png
Views:	2001
Size:	89.9 KB
ID:	3295   Click image for larger version

Name:	unpaper_after.png
Views:	2058
Size:	78.1 KB
ID:	3296   Click image for larger version

Name:	edge_enhance_before.png
Views:	2023
Size:	47.3 KB
ID:	3297   Click image for larger version

Name:	edge_enhance_after.png
Views:	1920
Size:	36.8 KB
ID:	3298  

Last edited by ashkulz; 04-30-2007 at 02:19 AM.
ashkulz is offline   Reply With Quote
Old 04-25-2007, 10:27 AM   #2
Azayzel
Cache Ninja!
Azayzel ought to be getting tired of karma fortunes by now.Azayzel ought to be getting tired of karma fortunes by now.Azayzel ought to be getting tired of karma fortunes by now.Azayzel ought to be getting tired of karma fortunes by now.Azayzel ought to be getting tired of karma fortunes by now.Azayzel ought to be getting tired of karma fortunes by now.Azayzel ought to be getting tired of karma fortunes by now.Azayzel ought to be getting tired of karma fortunes by now.Azayzel ought to be getting tired of karma fortunes by now.Azayzel ought to be getting tired of karma fortunes by now.Azayzel ought to be getting tired of karma fortunes by now.
 
Azayzel's Avatar
 
Posts: 643
Karma: 1002300
Join Date: Jan 2007
Location: Tokyo, Japan
Device: PRS-500, HTC Shift, iPod Touch, iPaq 4150, TC1100, Panasonic WordsGear
Wow, you've been quite busy! Have to give this one a whirl and see how things turn out. I have a watermarked PDF I hope PDFRead works well on, we'll see. Thanks!
Azayzel is offline   Reply With Quote
Advert
Old 04-25-2007, 01:11 PM   #3
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,565
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Cool now that I've finished the HTML,TXT -> LRF converters, I can look into integrating PDFRead into libprs500. There is one concern: http://www.py2exe.org/index.cgi/Py2E...ssInteractions

Would you be willing to fix that in your code?

EDIT: More information http://sourceforge.net/tracker/index...70&atid=105470

Last edited by kovidgoyal; 04-25-2007 at 01:27 PM.
kovidgoyal is offline   Reply With Quote
Old 04-25-2007, 01:21 PM   #4
ashkulz
Addict
ashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enough
 
ashkulz's Avatar
 
Posts: 350
Karma: 705
Join Date: Dec 2006
Location: Mumbai, India
Device: Kindle 1/REB 1200
Quote:
Originally Posted by kovidgoyal
Cool now that I've finished the HTML,TXT -> LRF converters, I can look into integrating PDFRead into libprs500. There is one concern: http://www.py2exe.org/index.cgi/Py2E...ssInteractions

Would you be willing to fix that in your code?
I don't see how it affects pdfread. I use explicit pipes when I'm calling other executables (gs, convert, etc), so there shouldn't be a problem in pdfread. If you mean that you may have problem when calling pdfread as a console application, I'd suggest you not to do it that way. Just import pdfread and call the convert() function -- and you're set to go. You should also replace the variable P_STREAM in common.py with any valid stream -- it currently points to sys.stdout, so that's only one place for you to replace streams.
ashkulz is offline   Reply With Quote
Old 04-25-2007, 01:39 PM   #5
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,565
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
OK...anyway I just realized that this bug has been squashed in python 2.5.1
kovidgoyal is offline   Reply With Quote
Advert
Old 04-26-2007, 10:04 AM   #6
ashkulz
Addict
ashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enough
 
ashkulz's Avatar
 
Posts: 350
Karma: 705
Join Date: Dec 2006
Location: Mumbai, India
Device: Kindle 1/REB 1200
For those on Windows, there's a quick way to convert all PDFs in a folder.
  1. Download the attached file and rename it as pdfread-batch.bat
  2. Open up the renamed file, and change the set OPT= line to use the appropriate profile. You may also have to change the EXT= line if you are using a different profile. In case you have installed in a non-default location, change the set LOC= line too.
  3. Copy the batch file into a directory where you want to convert, and double click on it. The filename will be used as the book title, so be sure to name files properly. A ebook with be created with the same name (but with given extension ie. sample.pdf => sample.lrf).
In case you want to customize further:
  1. Do a normal conversion with your custom params for a single file and copy the command line options to a text file. Some advice on how to copy the options from the window:
    Quote:
    Originally Posted by alex_d
    To copy text from a CMD window, right-click on the title bar (the bar that has the X and minimize buttons), choose properities, and then enable QuickEdit mode. This lets you highlight text and copy it by right-clicking on it. Copy everything, even if you have to scroll up.
  2. Copy the command line parameters and replace the set OPT= mentioned above. Do NOT include the input filename, the title (-t option) or the call to pdfread, just the options. The value should be valid command line options.

People on OS X/Linux can hack together a similiar script very easily, so I won't bother to post it. If you do want such a script, let me know.
Attached Files
File Type: txt pdfread-batch_bat.txt (602 Bytes, 1406 views)

Last edited by ashkulz; 04-26-2007 at 10:23 AM.
ashkulz is offline   Reply With Quote
Old 04-26-2007, 10:43 AM   #7
Gravitas
Muppet
Gravitas doesn't litterGravitas doesn't litter
 
Gravitas's Avatar
 
Posts: 123
Karma: 107
Join Date: Apr 2007
Location: Nottingham, England, UK
Device: Zen Vision :M / Nokia 5800 musicXpress / Sony PRS500
I'm being such a muppet (so much so that I changed my title and avatar to match), but I was trying to use this software last night and couldn't get any lrf files from it. i also couldn't get the files it did produce (png) into the folder I specified in my output path - they all went into a temp folder. Even when I used the .lrf file extension in the name of the book.

I'm not usually such a muppet IT-wise (thank god, as I'm an IT Manager looking after a MPLS Citrix network over 36 sites with 600 users) so I reckon it's the excitement of finally getting my hands on my Reader tomorrow, that is shortcircuiting my brain.

Any idea what I'm doing wrong? - I have every confidence that you guys will get me using this stuff properly,as you all sorted me out using BD

oh, and I'm using Windows
Gravitas is offline   Reply With Quote
Old 04-26-2007, 12:15 PM   #8
ashkulz
Addict
ashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enough
 
ashkulz's Avatar
 
Posts: 350
Karma: 705
Join Date: Dec 2006
Location: Mumbai, India
Device: Kindle 1/REB 1200
Gravitas: did you use the prs500 or the prs500-l profile? The LRF is produced only if the profile is one of the above two. Otherwise, depending on the profile it will produce output targeted for another device.

If it still doesn't work, can you post some a screenshot of the settings before pressing Convert and the explorer view of the output folder?

Don't worry, we all have those days every now and then
ashkulz is offline   Reply With Quote
Old 04-26-2007, 12:19 PM   #9
Gravitas
Muppet
Gravitas doesn't litterGravitas doesn't litter
 
Gravitas's Avatar
 
Posts: 123
Karma: 107
Join Date: Apr 2007
Location: Nottingham, England, UK
Device: Zen Vision :M / Nokia 5800 musicXpress / Sony PRS500
I was using the prs500 profile. I'll have another go when I get home and post some screenies.

EDIT

Ok here are my screenies, I'm sure I've done something blindly obviously wrong


[IMG]
[/IMG]

Last edited by Gravitas; 04-26-2007 at 06:45 PM. Reason: added screenies
Gravitas is offline   Reply With Quote
Old 04-26-2007, 06:15 PM   #10
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,565
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Works pretty well for me. Minor point:
the spelling of portrait is 'portrait not potrait (-m option)
kovidgoyal is offline   Reply With Quote
Old 04-26-2007, 07:22 PM   #11
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,565
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Hmm problems
The following cmdline cause an exception
Code:
python pdfread.py -p prs500 -o /home/kovid/temp/test.lrf -t 'Guide to NumPy' -a 'Travis Oliphant' -f lrf -i pdf -m potrait  /home/kovid/documents/text/notes/NumPy/numpybook.pdf --last-page=2

Creating BBeB file ... Traceback (most recent call last):
  File "/home/kovid/build/pdfread-1.6/pdfread.py", line 204, in <module>
    main()
  File "/home/kovid/build/pdfread-1.6/pdfread.py", line 90, in main
    delete = output.generate(input.toc)
  File "/home/kovid/build/pdfread-1.6/output.py", line 211, in generate
    imagenum = toc_map[int(page_)]
KeyError: 12
Probably because the TOC refers to pages not included.

Also, this is my first time rasterizing a PDF (I usually have access to the LaTeX sources). Is the font rasterization always so bad? I've attached samples to show you what I mean.
Attached Files
File Type: lrf test.lrf (31.1 KB, 1395 views)
File Type: pdf numpy_tutorial [miex.org].pdf (25.6 KB, 2601 views)
kovidgoyal is offline   Reply With Quote
Old 04-27-2007, 12:43 AM   #12
gdxf
Enthusiast
gdxf began at the beginning.
 
Posts: 48
Karma: 27
Join Date: Oct 2006
Device: Sony Reader PRS-500
I followed the batch mode instructions to run batch conversion in windows, but had encountered this notice in the command line:

"Unable to determine total number of pages in document
Please enter number of pages: "

When I put in a page number, it results in a blank lrf file.

Here is what the screen says:

"Unable to determine total number of pages in document
Please enter number of pages: 1

Temporary directory: c:\docume~1........

Page 1/1: EXTRACT RASTERIZE BLANK

Creating BBeB file ... done.
Unable to determine total number of pages in document
Please enter number of pages: 1

Temporary directory: c:\docume~1\.........

Page 1/1: EXTRACT RASTERIZE BLANK

Creating BBeB file ... done.
Press any key to continue . . ."



Quote:
Originally Posted by ashkulz
For those on Windows, there's a quick way to convert all PDFs in a folder.
gdxf is offline   Reply With Quote
Old 04-27-2007, 05:40 AM   #13
ashkulz
Addict
ashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enough
 
ashkulz's Avatar
 
Posts: 350
Karma: 705
Join Date: Dec 2006
Location: Mumbai, India
Device: Kindle 1/REB 1200
Okay, I've discovered the problem that bit Gravitas and kovidgoyal. The PDF file is incorrect, as it contains a TOC reference for a page that doesn't exist. I've fixed that, and will be making another release tomorrow.
ashkulz is offline   Reply With Quote
Old 04-27-2007, 06:07 AM   #14
Gravitas
Muppet
Gravitas doesn't litterGravitas doesn't litter
 
Gravitas's Avatar
 
Posts: 123
Karma: 107
Join Date: Apr 2007
Location: Nottingham, England, UK
Device: Zen Vision :M / Nokia 5800 musicXpress / Sony PRS500
Quote:
Originally Posted by ashkulz
Okay, I've discovered the problem that bit Gravitas and kovidgoyal. The PDF file is incorrect, as it contains a TOC reference for a page that doesn't exist. I've fixed that, and will be making another release tomorrow.
What a star
Gravitas is offline   Reply With Quote
Old 04-27-2007, 12:06 PM   #15
ashkulz
Addict
ashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enough
 
ashkulz's Avatar
 
Posts: 350
Karma: 705
Join Date: Dec 2006
Location: Mumbai, India
Device: Kindle 1/REB 1200
Okay, I've released 1.7. Changes in this release:
  • add a "landscape-half" mode which splits a page into two even halves (gdxf's suggestion)
  • if the output document does not have the proper file extension, then append it automatically.
  • remove imagemagick and use pngnq for color reduction.
  • fix the problems if the PDF has an incorrect TOC referring to an invalid page. Also added option --no-toc to disable TOC generation.

If you are on OS X or Linux, please recheck the installation instructions -- there have been changes since the last release.

EDIT: I'm going away for the weekend (it's a long weekend), so I may not respond quickly for a few days

Last edited by ashkulz; 04-27-2007 at 12:09 PM.
ashkulz is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PDFRead 1.8.2 released! nrapallo Workshop 372 12-29-2011 12:26 PM
Need help using PDFRead daithi81 Workshop 8 10-16-2009 10:33 AM
Hacks Kindle 2 and PDFRead 1.8 daffy4u Amazon Kindle 38 05-06-2009 10:38 AM
Need help with PDFRead pfisterfarm PDF 8 03-23-2009 10:19 AM
PDFRead v5 available on Sourceforge Alexander Turcic PDF 3 04-08-2007 07:31 AM


All times are GMT -4. The time now is 12:45 AM.


MobileRead.com is a privately owned, operated and funded community.