09-09-2007, 08:40 PM | #1 |
Enthusiast
Posts: 43
Karma: 28
Join Date: Aug 2007
Device: Sony Reader PRS-500
|
New conversion method: txt->rst->html->lrf
Hi all;
I've just gone through my first e-book creation experiment, and was looking for an easy way to convert the PG txt files to reader format. Restructured Text (rst) is a simple format designed to be both readable in text form and able to be processed into other formats automatically. It's the format used by Python's DocUtils package. One program included with that package, rst2html, can be used to convert lightly modified PG text files into HTML. I tried it out with Anna Karenina (by Tolstoy). Any feedback on the process is welcome, but I am happy with the result so far. (Of course, I'm only about 50 pages in on the reader...). If you'd like to view the results, check the reader downloads page. I discovered that the process is actually pretty easy, but with a book as large as this one is, the Table of Contents (TOC) is difficult to navigate (many pages). So I went for a compromise. I split the original text file into separate files for each part, and had rst2html automatically generate a TOC for the part. I then created a page of links to the other pages, ran the whole collection through rst2html to generate html pages, then used html2lrf to convert that to an e-book. I believe the results are quite nice. The keys for this are: comfort with a good text editor (I use emacs), full python install (I installed Cygwin on my PC, and use the Python that came with it), docutils (search Google for the installer, then add it to your Python distribution, and comfort using the command line. I do all my conversion work here. I'll post detailed instructions after I've done a couple of additional books. Phrodod |
09-09-2007, 09:18 PM | #2 |
creator of calibre
Posts: 44,530
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Umm txt2lrf already supports a lightweight txt markup language, namely, markdown.
|
Advert | |
|
09-10-2007, 01:44 PM | #3 |
Enthusiast
Posts: 43
Karma: 28
Join Date: Aug 2007
Device: Sony Reader PRS-500
|
Thanks. I didn't know that. I'll go read up on it!
|
09-12-2007, 03:53 PM | #4 | |
Enthusiast
Posts: 43
Karma: 28
Join Date: Aug 2007
Device: Sony Reader PRS-500
|
Quote:
On a separate note, does HTML2LRF have a way to make nested Reader TOC's? I notice that Sony's operations guide has that, and I'd find it much simpler to navigate using the number buttons and the page buttons than the joystick for getting to individual chapters. But if they all go onto a single, huge TOC page, I'm looking at something on the order of 200 chapters (or 20 pages of TOC entries). I'd love to make that simpler to navigate by putting each part on its own page. Part One has 34 chapters already, so the sub-TOC for Part 1 would STILL take 4 pages! Thanks. Phrodod |
|
09-12-2007, 04:17 PM | #5 |
creator of calibre
Posts: 44,530
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Considering that you can embed HTML in markdown, I find it hard to believe it's less capable. Some examples?
The operations guide is a PDF. As far as I know the LRF format doesn't have support for defining a hierarchical TOC. |
Advert | |
|
09-12-2007, 10:10 PM | #6 | |
Enthusiast
Posts: 43
Karma: 28
Join Date: Aug 2007
Device: Sony Reader PRS-500
|
Quote:
I appreciate having Markdown support, and I may use it in the future for simple documents (where a single-level TOC is sufficient!), but in this case, I found that it felt insufficient to me. One other item I discovered. If I have multiple H2s in an HTML document that all have identical text, html2lrf only adds the first one to the TOC. So when I first attempted to convert Anna Karenina, I ended up with Part 1, Chapter 1, ..., Chapter 34, Part 2, Chapter 35, Part 3, Part 4, ... Part 2 has 35 chapters, but 1-34 are named identically to Part 1's 34 chapters, so they never showed up in the Reader's TOC menu. OTOH, they showed up fine in the in-line TOC in the book. Thanks for all your hard work! Phrodod |
|
09-12-2007, 10:32 PM | #7 |
creator of calibre
Posts: 44,530
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Yeah that's a bug. open a report and i'll fix it as soon as i get some time.
|
09-13-2007, 03:50 AM | #8 |
Enthusiast
Posts: 43
Karma: 28
Join Date: Aug 2007
Device: Sony Reader PRS-500
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
TXT conversion to ePub or LRF - paragraph formatting | Zapped | Calibre | 6 | 10-23-2009 06:06 PM |
HTML to TXT conversion | alkr | Calibre | 3 | 10-02-2009 10:54 AM |
[Old Thread] unable to convert ebooks(rtf, txt,lit,html,pdf) to lrf in calibre .4.131 | jackdeth191 | Calibre | 9 | 05-02-2009 03:55 AM |
TXT, RTF, and HTML conversion issues | daesdaemar | Calibre | 15 | 12-10-2008 10:05 PM |
Batch conversion html to lrf | lilpretender | Sony Reader | 5 | 04-22-2008 10:22 PM |