05-26-2010, 03:35 AM | #1 |
Guru
Posts: 750
Karma: 1323
Join Date: Dec 2009
Device: PRS-505, PRS-600, iPad 16GB Wifi, Kindle Voyage, Nexus 6, Razr HD
|
Advise for scanned pdf
I have scanned pdfs that have very light fonts. Zooming into the page increases the documents to a readable level, but text is still very grayish. I could extract the single pages and then increase contrast for every page and put it back together as a pdf. Unfortunately, the auto-contrast-adjustment takes the pages average contrast values which can differ from page to page depending on graphics and charts on a page, so the outcome isn't really consistent.
Is there a program which can increase or somewhat re-render text in a scanned pdf without having to deal with single pages? |
05-26-2010, 08:18 AM | #2 |
PRS+ author
Posts: 1,637
Karma: 2446233
Join Date: Dec 2007
Device: Sony PRS-300, 505, 600, 650, 950
|
Did you try this?
https://www.mobileread.com/forums/showthread.php?t=13135 |
Advert | |
|
05-26-2010, 10:10 AM | #3 |
Connoisseur
Posts: 94
Karma: 999884
Join Date: Jun 2009
Device: prs700, i-mate JAMin, smartq v7, GeeksPhone Zero, iPad 3rd Gen
|
scantailor when color/grayscale output is selected
Regards |
05-26-2010, 11:45 AM | #4 |
Nameless Being
|
I've yet to view a scanned PDF that was very readable on a reader. Unless you can use OCR software, most of which does a lousy job of converting scanned images to reflowable text, you will generally have crap on your screen. If you can use OCR software and create reflowable text, and then if you spend hours editing to correct the copious mistakes of the OCR software and to format the book so that images and tables appear where and as they should, you might wind up with a decent ePub. It is a lot of work that might take as long as reading the doc in printed format. Notebook paper sized PDF files are not designed to be viewed on a book reader. They are designed to be printed. They really don't even work that well on large computer monitors.
Bottomline: If you can create truly reflowable text from the scanned doc, then with some work you can create a very readable ebook. If your scanned doc looks like a copy of a copy of a copy of a copy of a copy, that is rather fuzzy and difficult to read even when printed, then you probably won't be able to create a usable ebook. |
05-26-2010, 03:23 PM | #5 |
Guru
Posts: 750
Karma: 1323
Join Date: Dec 2009
Device: PRS-505, PRS-600, iPad 16GB Wifi, Kindle Voyage, Nexus 6, Razr HD
|
I just gave this one a try, but it looks like all pages have to be single pictures. I tried it on a picture I had in my files, but I'm not sure what the program did. I made a project, applied some different settings and saved the project. How do I save my efforts?
On the other side, processing all pictures of a file is tedious, even if there's batch processing. I also would have to extract all pages first from my pdfs. So it's quite a bit of work. Like stated in my first post, my pdfs are mostly readable. I would have wished to increase contrast of the text, but I'm not too eager to jump through loops as long as the files are readable sufficiently. That pdfs made for letter sized prints or monitors is not the best thing for readers is obvious. Still, having the portability of the files on my reader is just plain awesome. I'm not looking for the 100% solution. I was just checking if I could increase my current 80% to somewhere between 85-90% without too much effort Anyway, thanks a lot for the help |
Advert | |
|
05-27-2010, 07:13 AM | #6 | |
Connoisseur
Posts: 94
Karma: 999884
Join Date: Jun 2009
Device: prs700, i-mate JAMin, smartq v7, GeeksPhone Zero, iPad 3rd Gen
|
Quote:
I apologize for the forgotten issues. I thought that you begun with single images not with a pdf. Any way, pdftk can "burst" a multipage pdf to single page pdf. then with "convert" www.imagemagick.org you can convert them to png, jpg ... and process them with scantailor. After processing you will obtain a set of tiff. From this a pdf can be obtained with tiff2ps and ps2pdf or if you want, build a cbr (just rar thw whole set, after png or jpg conversion, and rename) and filter thru calibre or pdflrf. I have done this with a badly scanned comic with good results but not with a text. I will try to do some experiments to post to this thread. Regards |
|
05-27-2010, 12:36 PM | #7 |
Guru
Posts: 750
Karma: 1323
Join Date: Dec 2009
Device: PRS-505, PRS-600, iPad 16GB Wifi, Kindle Voyage, Nexus 6, Razr HD
|
Having a comparison would be awesome. I would appreciate your input.
Still, it sounds tedious!?!? |
05-28-2010, 05:43 AM | #8 | |
Connoisseur
Posts: 94
Karma: 999884
Join Date: Jun 2009
Device: prs700, i-mate JAMin, smartq v7, GeeksPhone Zero, iPad 3rd Gen
|
Quote:
Yes, it is tedious. Lets begun with a pdf, Sample.pdf, in a Linux environment and finally I'd managed to do the whole thing without scantailor Obtain individual pages with pdftk: pdftk Sample.pdf burst This outputs in the same directory pg_0001.pdf pg_0002.pdf and so on, in this case ends with the third page Adjust individually the contrast and perform some filtering: for a in $(seq -w 1 3); do convert -contrast -enhance pg_000$a.pdf eq_pg_000$a.pdf;done (replace 3 with the last page, caution is needed with the number of zeroes, if more than 9 pages, and less than 100, then use pg_00$a.pdf instead as input and eq_pg_00$a.pdf as output and so on, the -w in seq means ad zeroes to the left) after this we have eq_pg_0001.pdf eq_pg_0002.pdf eq_pg_0003.pdf in the working directory. Now we want to build a new pdf with the processed pages: pdftk eq*.pdf cat output eqSample.pdf and that's all. Convert is a cross platform tool http://www.imagemagick.org/script/index.php. I don't know how to make scripts in MSDOS. I could try good old Digital DCL in VMS :-)) Hope this helps, regards. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Scanned PDF onto Kindle 2. Help! | Tac420oma | 6 | 07-20-2012 08:42 AM | |
PRS-600 Dictionary on scanned PDF? | antistar | Sony Reader | 8 | 11-29-2009 03:05 PM |
Some Calibre PDF>Mobi conversion advise please | AdrianC | Calibre | 3 | 09-16-2009 02:00 PM |
Ok I have scanned pdf books....but | DeathtoToasters | Sony Reader | 38 | 11-04-2008 07:51 PM |
pdf with scanned images | Leite | iRex | 5 | 08-18-2008 12:54 PM |