10-13-2010, 12:31 PM | #1 |
Junior Member
Posts: 1
Karma: 10
Join Date: Oct 2010
Device: Kindle
|
Best method of converting PDFs (maintaining paragraphs)
Hey all,
So far my method has been (on a mac) to use Stanza to convert PDFs to .azw. And then just connect and transfer via USB. But the problem with this method is that all paragraph breaks are lost. Is there a workflow which allows for maintaining paragraph breaks? |
10-13-2010, 12:51 PM | #2 |
Enjoying the show....
Posts: 14,270
Karma: 10462843
Join Date: Jun 2008
Location: Arizona
Device: A K1, Kindle Paperwhite, an Ipod, IPad2, Iphone, an Ipad Mini & macAir
|
Welcome to MobileRead, sebastianman
someone should step in soon with suggestions........ |
Advert | |
|
10-13-2010, 01:20 PM | #3 |
Member
Posts: 23
Karma: 1074640
Join Date: Jun 2008
Device: Kindle 3G
|
|
10-13-2010, 03:51 PM | #4 |
Confused
Posts: 402
Karma: 5538
Join Date: Oct 2010
Location: Bay Area
Device: Kindle DXG
|
I've found that virtually every pdf converter program on the net uses the same open source program pdf2html http://pdftohtml.sourceforge.net/
It uses ghostscript to extract images, and it operates in one of two modes 1) extract all images and dump them inline to file, without preserving tables. - Text comes out in paragraphs with random line breaks, and looks very ugly, tables are not preserved. 2) extract each page background as a whole image, and create each page as a table. - All formatting is preserved. - HTML document looks almost identical to PDF Method 2 looks good, but won't work for ebooks because of the static background page size (no reflow) Method 1 is used instead (but no tables are preserved) This is the same method that Acrobat 9 uses to export HTML 3.0 Now, if your document has limited tables, and your have a simple PDF with a few columns you want to reflow or change, you can use method 2. Use http://pdftohtml.sourceforge.net/ without images enabled. Then convert the HTML to EPUB with tables enabled. You should get yourself a very respectable document, with intact, flowing/reflowing paragraphs that span multiple pages. |
10-14-2010, 07:19 AM | #5 |
Groupie
Posts: 170
Karma: 1010944
Join Date: Oct 2010
Location: The African bush
Device: Kindle 3
|
I have found consistently that the freebie Mobipocket Reader for PC does a far better conversion from PDF than Calibre. You can just drag and drop the PDF (or indeed a DOC or HTML file) into the Mobi reader and it produces a PRC file, which can then be used for a straightforward lossless conversion to MOBI (or indeed any other ebook format) in Calibre.
|
Advert | |
|
10-14-2010, 08:43 AM | #6 |
Yaabbaa dabba doo
Posts: 76
Karma: 5348
Join Date: Sep 2010
Location: India
Device: Kindle Touch wifi, Kindle 3 Wifi, Asus Transformer TF101
|
The tech journals that I have converted so far are suitable only for casual reading or when you are too tired of a computer screen. If you create your own pdf with proper formatting, it will convert to other formats fine. I have tried that. So, it really depends on how your pdf was initially formatted.
|
10-18-2010, 11:54 AM | #7 |
Confused
Posts: 402
Karma: 5538
Join Date: Oct 2010
Location: Bay Area
Device: Kindle DXG
|
OH, i made a post about this the other day. Acrobat professional export to HTML works well (or RTF) If you don't have pro, you can do export to TXT on the free reader version and it won't do images. However, images that have text combined with them have cause some slight issues (like graphs will have the axis labels stripped from the images) However, text will flow from columns and be very readable.
http://www.adobe.com/products/acroba...linetools.html Adobe lets you convert free online, though for me online won't do images. Perhaps if you email it they will send you a document with images, I have not tried. There are other programs out there, another I have found that does essentially the same thing as adobes is "Easy PDF to HTML converter" It is probably based upon the same code or something because the output files are nearly identical. http://www.pdf-to-html-word.com/ note there is a word and html version get html because ebook programs generally don't like to deal with word, they like html. There are other programs just like it, with the same gui and back end, that make me think that it's an open source program that someone compiled and sold to make money. I didn't pay for it. I wouldn't pay for it, because I think it's dishonest to sell software you didn't code yourself. And I don't like buying software that does niche things. If you look hard enough you can probably find the "free version" out there, though it is probably command line based. It may be sourceforge pdf2html, but I am not sure of this. |
10-18-2010, 12:04 PM | #8 |
Ebook Reader
Posts: 605
Karma: 3205128
Join Date: Nov 2009
Location: Texas
Device: Kindle 3, HTC Evo, HTC View
|
Mobipocket Creator!!!!!!!!!!!!! And it's free.
|
10-18-2010, 06:02 PM | #9 |
Confused
Posts: 402
Karma: 5538
Join Date: Oct 2010
Location: Bay Area
Device: Kindle DXG
|
Mobipocket Creator uses pdf2html from sourceforge and does an awful job.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Converting PDFs | macrotor | 62 | 08-14-2011 08:10 PM | |
Converting PDFs | JoshLessard | Amazon Kindle | 12 | 10-07-2010 07:40 AM |
Converting Layered? PDFs | kerrware | Calibre | 2 | 06-30-2010 04:31 PM |
Numbers in pdfs not converting | kilgoretrout | Workshop | 9 | 06-25-2010 06:18 PM |
Converting PDFs to Images | fargo | iRex | 9 | 05-02-2008 12:34 AM |