02-25-2011, 03:43 AM | #1 |
Addict
Posts: 234
Karma: 40
Join Date: Apr 2010
Device: The Nook, iPad
|
Help! txt to epub conversion markup: linked notes
Hi, I have a huge txt file that contains hundreds of short stories each with a notes section, it's in a format of something like the following example, can some kindly explain to me how can I easily add a markup mapping for the notes in an text editor (hopefully something that can be done by global replacing/adding) so that when converted to epub in Calibre, the number and the corresponding notes can interlinked with each other. The file has notes number [1] under each story, how can I make sure the notes no. 1 for word "queen" don't mess up with other words "shoe", "alone" that are also labeled as notes no. 1 (but for another story). Thank you so much!. Following is a watered down version of the text format (the real stories are much much longer and usually have about 100 notes under each story):
## Story One: "My God," said the Queen[1], "I'm pregnant[2]. I wonder who the father is." Notes: [1]queen: a female monarch. [2]pregnant: with child or young as a woman or female mammal. ## Story Two: For sale: baby shoes[1], never worn[2]. Notes: [1]shoe: an external covering for the human foot. [2]worn: past participle of wear - to carry or have (a garment, etc). ## Story Three: The last man on Earth sat alone[1] in a room. There was a knock[2] on the door. Notes: [1]alone: apart, or isolated from others. [2]knock: the sound of knocking, especially a rap, as at a door. |
02-25-2011, 07:27 AM | #2 |
Guru
Posts: 657
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
|
It's going to be quite a bit of work.
A few questions, What text editor do you use? Can it search in selection only? How many individual short stories are there? Does it have to be in markdown? Does any of your text have any other markup apart from headers (###), i.e. bold/italics? The reason I ask is if we convert it to Textile, it will be easier to do the footnotes. Do the footnotes need to be able to link back to the place they linked from? Some readers don't have a backup button, and need to have a link back. I can see how I'd do it in Textile (my preference txt format), using two different search and replaces, doing each stories text, then its footnotes, then the next stories text, and then its footnotes. |
Advert | |
|
02-25-2011, 07:57 AM | #3 |
Wizard
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
Interesting problem. Are you familiar with regular expressions? If so, you could use the search & replace feature. I could see something along the lines of
Code:
(?s)(?P<number>\[\d+\])(?P<word>\w+:)(?P<text>.*?)(?P=number)(?P=word) Code:
\g<number>\g<word>\g<text>\g<number>\g<word> The caveat here is that this is just off the top of my head. This may work, or it may fail catastrophically |
02-25-2011, 09:03 AM | #4 |
Guru
Posts: 657
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
|
The BIG problem is that all the footnotes are re-using the same numbers.
Manichean, One of the problems is that Markdown can't do id tags. (I suppose we could add html for those tags, then it would be a reasonably simple regex) |
02-25-2011, 09:39 AM | #5 |
Guru
Posts: 657
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
|
Right, I've done it with the small example given.
It's three Search and replaces, Two on the orig markdown text file, then when converted to epub, a S&R done in Sigil Also, the footnotes have to be seperated by blank lines, instead of on following lines as in example. Open the original markdown txt file Search Code:
^\[(\d+)\](\S+) Code:
<sup>\1</sup><a id="fn\1" href="#fnr\1">\2<a> Then Search Code:
(\S+)\[(\d+)\] Code:
\1<sup><a id="#fnr\2" href="#fn\2">\2</a></sup> Save file, convert with calibre to epub, Using pagebreak in structure detection with default Xpath expression, so each story would the be in their own file. Then open in Sigil, Search Code:
../Text/index_split_000.html Edit: replacing all occurences in CodeView, and replace in all files Save, you should then have the correctly done file. Last edited by Perkin; 02-25-2011 at 09:41 AM. Reason: Added info |
Advert | |
|
02-25-2011, 12:04 PM | #6 |
Wizard
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
Huh? First off, I'm suggesting a regex in the search and replace field using named backreferences, I don't understand your Markdown comment... Second, I use the number and the word to identify the footnote, so reusing numbers shouldn't be a problem. I think that using what I suggested you could do in one S&R pass what you suggested doing using multiple passes plus Sigil.
|
02-25-2011, 12:43 PM | #7 |
Addict
Posts: 234
Karma: 40
Join Date: Apr 2010
Device: The Nook, iPad
|
Thanks, Perkin and Manichean.
I am using UltraEdit version 15. I tried Perkin's replacing, it does not work, as all notes numbers are referencing the first story notes, and I do need the notes to be able to link back to original text. The original text has about 500 stories each varying from a few sentences to a few pages of length. I am still trying to understand Manichean's suggestions as I am not very good at regular expression. Will give it a try once I understand this better. |
02-25-2011, 12:45 PM | #8 |
Guru
Posts: 657
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
|
Sorry, just the way I worded it. (I'm not very clear when trying to explain things)
I tried to say that the BIG problem, is that the OP's file uses all the same numbers for each of the stories and so any link references written in the replace would link to the first footnote reference in the converted file, hence the need to use Sigil and the replace of the '../Text/index_split_000.html', that way they would only link to ones in their own file rather than all jumping to the link in the first file. Since Markdown text doesn't do markup for an id for a link tag to reference, as in html tag <a href="#idname">, I had it so that what's inserted is what the html would be in the final output instead of the markdown. Going from the OP's sample, it seemed an easier solution to use the 2 s&r's to do the task, although using your one regex would work just as well, the fiddly bit was working out what would work in the final output, that's why I broke it into two, working each one out separately. Hope it's clearer (note I said clearer and not clear) |
02-25-2011, 12:51 PM | #9 | |
Guru
Posts: 657
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
|
Quote:
Then the S&R in sigil, replacing the links 'jump' reference, removing the '../Text/index_split_000.html', would then make them link only to their own file (story) footnote links. Edit: I tried it on the 'Three story' sample, and it did work, and after the epub footnote links did only link to their correct places and the back links as well. Last edited by Perkin; 02-25-2011 at 12:55 PM. |
|
02-25-2011, 01:07 PM | #10 |
Wizard
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
Perkin, now I understand what you mean.
Please note that if you decide to go with my method, you'd still need to adapt the replace text to include whatever Markdown uses for links. I don't know Markdown, so I only wrote down what ought to reproduce the input. |
02-25-2011, 01:14 PM | #11 | |
Guru
Posts: 657
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
|
Quote:
Last edited by Perkin; 02-25-2011 at 01:17 PM. |
|
02-25-2011, 01:25 PM | #12 |
Addict
Posts: 234
Karma: 40
Join Date: Apr 2010
Device: The Nook, iPad
|
Thanks, Perkin, I see what you mean and am getting closer. But how do you force split on chapters in Calibre, I guess that's what I am missing in conversion.
|
02-25-2011, 01:34 PM | #13 | ||
Guru
Posts: 657
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
|
For the conversions I done
In 'conversion - Structure detection' section Detect chapters (default) Quote:
Insert page breaks before, Ive got Quote:
|
||
02-25-2011, 02:09 PM | #14 |
Addict
Posts: 234
Karma: 40
Join Date: Apr 2010
Device: The Nook, iPad
|
Thanks, Perkin, it only works one way (from main story to notes) but not back link, I guess your first S&R has a syntax error, why it is ending with <a> but not </a>? And I am guessing the second S&R has issues also, why there is two # there, should one be without a #. I am not familiar with this regular expression, but it looks fishy. I will edit and recompile to see how it works. Many thanks for the help.
Last edited by Sylver; 02-25-2011 at 02:21 PM. |
02-25-2011, 02:32 PM | #15 |
Guru
Posts: 657
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
|
Sorry
You're right on both counts, add the / to the <a> in the first replace statement and remove the first # (in the id tag) in the second replace. Next time I'll copy and paste them rather than type them out. Apologies. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Preserving <br /> on epub -> txt conversion | billingd | Calibre | 1 | 08-11-2010 07:24 AM |
Conversion: EPUB to TXT | Starson17 | Calibre | 11 | 05-29-2010 01:31 PM |
TXT conversion to ePub or LRF - paragraph formatting | Zapped | Calibre | 6 | 10-23-2009 06:06 PM |
HTML to TXT conversion | alkr | Calibre | 3 | 10-02-2009 10:54 AM |
Batch conversion of txt | BlackVoid | Sony Reader | 8 | 11-17-2007 10:53 PM |