07-30-2007, 05:56 PM | #1 |
creator of calibre
Posts: 44,542
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
pdf2lrf
Part of libprs500 v0.3.81. It extracts the text from PDF files and converts them to LRF. Preserves bold and italics. See attached demo.
It doesn't support embedded images and results are not going to be satisfactory for complex PDF files. But for converting simple novels, it works great. Linux users: If you want support for PDF links then you need to install poppler from CVS. To use: Code:
pdf2lrf "mybook.pdf" Last edited by kovidgoyal; 07-30-2007 at 06:02 PM. |
07-30-2007, 06:54 PM | #2 |
Resident Curmudgeon
Posts: 76,404
Karma: 136564696
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Would it be possible to have PDF2HTML so we can then edit the text/book how we want and then use html2lrf to create a properly formatted book?
And this is a great step forward for PDF conversion without the need for Acrobat. |
Advert | |
|
07-30-2007, 07:05 PM | #3 |
creator of calibre
Posts: 44,542
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
pdftohtml is on your path in windows.
Code:
pdftohtml mybook.pdf |
07-30-2007, 07:07 PM | #4 |
Resident Curmudgeon
Posts: 76,404
Karma: 136564696
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
|
08-03-2007, 04:02 PM | #5 |
creator of calibre
Posts: 44,542
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Incidentally, is there some reason this thread isn't being made a sticky?
|
Advert | |
|
08-03-2007, 06:05 PM | #6 |
The Introvert
Posts: 8,307
Karma: 1000077497
Join Date: Jan 2007
Location: United Kingdom
Device: Sony Reader PRS-650 & 505 & 500
|
Are there any instruction how to use this feature? Sort of help or FAQ?
|
08-03-2007, 06:08 PM | #7 |
creator of calibre
Posts: 44,542
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Start up a terminal (Start->Run and type cmd.exe)
change to the directory of your pdf file Code:
cd "c:\my directory" pdf2lrf mybook.pdf |
08-11-2007, 03:22 PM | #8 |
Resident Curmudgeon
Posts: 76,404
Karma: 136564696
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
|
08-11-2007, 03:27 PM | #9 |
creator of calibre
Posts: 44,542
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
thanks.
|
08-18-2007, 12:19 PM | #10 |
Member
Posts: 13
Karma: 10
Join Date: Aug 2007
Device: Kindle 4
|
foreign characters
Could you explain how to get correct non-english characters from pdf? I get strange results with polish language.
a word "CZĘŚĆ" is converted into: <b>CZ </b><br> <b>E ´S ´</b><br> <b>C I</b><br> |
08-18-2007, 12:24 PM | #11 |
creator of calibre
Posts: 44,542
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Try the -enc switch of pdftohtml?
|
08-18-2007, 03:17 PM | #12 |
Member
Posts: 13
Karma: 10
Join Date: Aug 2007
Device: Kindle 4
|
Thanks, I tried it, but I can only get the error message:
Error: Couldn't find unicodeMap file for the 'iso-8859-2' encoding Is there a list of encoding names? |
08-18-2007, 03:27 PM | #13 |
creator of calibre
Posts: 44,542
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I dont know, you'll have to contact the author of pdftohtml.
|
08-18-2007, 04:43 PM | #14 |
Member
Posts: 13
Karma: 10
Join Date: Aug 2007
Device: Kindle 4
|
Thanks for your input, I discovered that my pdf has embedded font without unicode map, which may be the reason of all problems and there is no easy way of fixing it :-(
|
04-02-2008, 04:38 PM | #15 |
Evangelist
Posts: 415
Karma: 510423
Join Date: Nov 2006
Device: Sony PRS-505
|
This is a MESS.
Line breaks ignored. Page breaks after 1-2 lines on a page, IN THE MIDDLE of the sentence. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Converting your PDF files to Kindle using PDF2LRF (Better than Amazon's conversion) | guineapiguser | Amazon Kindle | 33 | 08-02-2011 07:07 PM |
Classic PDF2LRF equivalent for PDF->EPUB? | Waba | Barnes & Noble NOOK | 2 | 08-02-2010 06:56 PM |
A suggestion to Pdf2lrf | inew | Sony Reader | 3 | 10-08-2008 01:48 AM |
PDF2LRF contribution fund for CACAPEE! | skyd171 | Sony Reader | 1 | 01-31-2008 06:05 PM |