|
|
Thread Tools | Search this Thread |
02-04-2011, 07:53 AM | #1 |
Guru
Posts: 973
Karma: 2458402
Join Date: Aug 2010
Location: St. Louis
Device: Kindle Keyboard, Nook HD+
|
Text to HTML (or any e-book format, really) program that detects chapters?
I have many old text files that are broken up into chapters by simple labels. Chapter 1 (or Chapter I) and so forth.
I've been just loading them into Sigil, but then I have to go through it and mark each chapter by hand. While this isn't that big a deal, only takes like 10 minutes each, it does get kind of old. It seems like there must be a way to automate it, and then be able to save it as a html file with the chapters labeled so I can import that into Sigil. Any ideas? (Sadly, ten years ago I could probably write a program to do this, but I don't even have a programming language installed on my computer anymore...) |
02-04-2011, 09:08 AM | #2 |
Zealot
Posts: 118
Karma: 36978
Join Date: Sep 2010
Location: Johannesburg, South Africa
Device: Kindle Android, Kindle 3 Wi-Fi
|
Hello Jeremy, Would it be possible for you to use Calibre to do this for you?
|
Advert | |
|
02-04-2011, 10:17 AM | #3 |
Wizard
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
|
Regular expression find and replace?
It would depend on the particular system/editor. In Vim I would search for: ^\s*Chapter\s*[0-9XVI]*\s*$ That breaks down like this: ^ = start of line \s* = any amount (inluding none) of whitespace (spaces, tabs, etc.) Chapter = the word Chapter \s* = any amount of whitespace again [0-9XVI]* = a string of any length consisting only of 0-9 X and V and I \s* = any amount of whitespace again $ = the end of the line And replace with <h1>&<\/h1> which is the search pattern but with <h1> HTML tags around it. Not sure if Sigil supports something like that (I don't use it), but it might. If not, you can use any advanced Text editor. |
02-04-2011, 09:50 PM | #4 |
Guru
Posts: 973
Karma: 2458402
Join Date: Aug 2010
Location: St. Louis
Device: Kindle Keyboard, Nook HD+
|
|
02-04-2011, 09:51 PM | #5 | |
Guru
Posts: 973
Karma: 2458402
Join Date: Aug 2010
Location: St. Louis
Device: Kindle Keyboard, Nook HD+
|
Quote:
|
|
Advert | |
|
02-05-2011, 01:58 PM | #6 |
Grand Sorcerer
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
|
02-05-2011, 11:15 PM | #7 |
Sigil & calibre developer
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
calibre has an entire heuristic processing section. One of the heuristics is for detecting and applying the appropriate style to chapter headings. TXT input defaults will try to auto detect the paragraph type and the formatting used. If Textile or Markdown are not detected the default is to enable the majority of the heuristic processing options.
|
02-06-2011, 12:31 PM | #8 | |
New York Editor
Posts: 6,384
Karma: 16540415
Join Date: Aug 2007
Device: PalmTX, Pocket eDGe, Alcatel Fierce 4, RCA Viking Pro 10, Nexus 7
|
Quote:
It's cross-platform, free, and open source. The best Windows version is probably Active State's: http://www.activestate.com/activeperl/downloads They sell supported commercial versions, but a free Community edition is available, with source. You can also look at Text2HTML, an open source conversion program built on Perl. _______ Dennis |
|
02-10-2011, 07:05 PM | #9 | |
Guru
Posts: 973
Karma: 2458402
Join Date: Aug 2010
Location: St. Louis
Device: Kindle Keyboard, Nook HD+
|
Unless I am reading it wrong (and that is quite possible), it only searches things that have been tagged. But my problem is, as it's plain text, it hasn't been tagged with anything.
Quote:
Indeed, not only does it not detect chapters, it runs together most of the paragraphs making the end result unreadable. So really, Calibre doesn't enter the picture into this process. I am just looking for something to apply before loading the file into Sigil (which detects the paragraphs just fine). |
|
02-10-2011, 08:04 PM | #10 | |
Guru
Posts: 973
Karma: 2458402
Join Date: Aug 2010
Location: St. Louis
Device: Kindle Keyboard, Nook HD+
|
Quote:
|
|
02-10-2011, 09:16 PM | #11 | |
Grand Sorcerer
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
Quote:
Calibre could search for something like the word Chapter I think if all your chapters start with that word. Have you just tried pasting the text onto an empty Sigil page? Dale |
|
02-10-2011, 10:29 PM | #12 |
Guru
Posts: 973
Karma: 2458402
Join Date: Aug 2010
Location: St. Louis
Device: Kindle Keyboard, Nook HD+
|
Thanks for the help everyone (esp frabjous ), I've got it now.
Sigil will do it, sort of. It mostly supports REGEX So I searched for CHAPTER [0-9XVI]+ and then replaced with <h3>\0</h3> or if you want it to add chapter break marks <hr class="sigilChapterBreak" /><h3>\0</h3> |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[Old Thread] calibre detects chapters, doesn't add to TOC | Corey.Langner | Calibre | 17 | 09-25-2011 08:22 PM |
Order of Chapters in HTML->ePub | alias_neo | Calibre | 9 | 05-16-2011 12:55 PM |
HTML to MOBI text format is off when I get it on Kindle | cloudyvisions | Calibre | 5 | 07-14-2010 01:42 AM |
Convert html to e-book format | udav | Other formats | 1 | 01-26-2010 02:19 PM |
chapters (HTML-files) not showing up | erik5000 | ePub | 1 | 12-21-2009 05:22 PM |