02-01-2009, 09:25 AM | #1 |
Addict
Posts: 210
Karma: 4282
Join Date: Oct 2008
Location: Florida
Device: Sony 505, Kindle 3, iPad 3
|
line formatting formatting question
We are all aware of those pesky text documents that have a hard return/line break at the end of each line. Fortunately, there are usually two line breaks at the end of a paragraph so it is relatively easy to format out the line breaks at the end of each line and yet keep the paragraph structure.
Here's my problem: I have a book in lit format that I converted to lrf in Calibre and the line breaks were scattered all over the place and it looked awful. I then converted the lit file with ConvertLit to html and then to text/rtf. There is a hard line break at the end of every single line, and there are not two breaks at the end of paragraphs. If I remove the line breaks I have a 400 page document with no structure at all. I have also tried to convert in BookDesigner with the same problem -- no paragraph structure. Any ideas? |
02-01-2009, 02:11 PM | #2 |
creator of calibre
Posts: 44,377
Karma: 23764838
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The algorithm to use in such a case is based on average line length. First calculate the average line length and when there are lines significantly shorter than that, dont remove the breaks. That will take care of most of the breaks correctly
|
Advert | |
|
02-01-2009, 02:17 PM | #3 |
Addict
Posts: 210
Karma: 4282
Join Date: Oct 2008
Location: Florida
Device: Sony 505, Kindle 3, iPad 3
|
I will try working on that... thanks.
|
02-01-2009, 02:52 PM | #4 |
Wizard
Posts: 3,454
Karma: 10484861
Join Date: May 2006
Device: PocketBook 360, before it was Sony Reader, cassiopeia A-20
|
For such books I use Vim script
(www.vim.org - a very powerful text editor) You can write a command in Vim saying "find every line NOT ending with dot, question mark, exclamation point or closing quote, optionally followed by a space character and join it with the next line" :vglobal/[.!?"']\s*$/join I often abbreviate the above command this way: :v/[.!?"']\s*$/j That is it. You can also say: "find every line ending with .!?" and enter an empty line after it" "find every line shorter than (let's say) 50 characters and enter an empty line after it" "find two empty lines and replace it with one empty line" "Join paragraphs" "delete empty lines" That should take care about formatting 99 percent of excessive newline characters. You have to tweak the above steps for a particular book, because every single misformated book is unique. You can also try to have a look at the html file and try to distinguish between wanted and unwanted line breaks. Most often, unfortunately, the html file is generated by MSWord. MSWord is THE most horrible tool for producing html format. You can also try to process html file with a program html_tidy http://www.w3.org/People/Raggett/tidy/ |
02-01-2009, 03:37 PM | #5 |
Resident Curmudgeon
Posts: 75,975
Karma: 134368292
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
This does not sound like it is a legal LIT file. Where did this LIT come from?
|
Advert | |
|
02-01-2009, 05:00 PM | #6 |
Addict
Posts: 210
Karma: 4282
Join Date: Oct 2008
Location: Florida
Device: Sony 505, Kindle 3, iPad 3
|
@kacir... very interesting. I will need to look into this program.
@JSWolf... this is actually a lit book that I have legally purchased several years ago. I am trying to format it for my Sony 505. I have actually run across several lit books that do not convert easily. |
02-01-2009, 05:06 PM | #7 |
creator of calibre
Posts: 44,377
Karma: 23764838
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
these are LIT books that embed TXT files instead of HTML files
|
02-01-2009, 05:27 PM | #8 |
Addict
Posts: 210
Karma: 4282
Join Date: Oct 2008
Location: Florida
Device: Sony 505, Kindle 3, iPad 3
|
|
02-02-2009, 03:04 AM | #9 |
Wizard
Posts: 3,454
Karma: 10484861
Join Date: May 2006
Device: PocketBook 360, before it was Sony Reader, cassiopeia A-20
|
|
02-06-2009, 11:47 AM | #10 |
Resident Curmudgeon
Posts: 75,975
Karma: 134368292
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
With the wonky LIT, use lit2oeb to convert the LIT into it's component parts. Then you can fix it up so it converts the way you want.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[KOBO] Strip existing formatting to apply my own default formatting to all books | digital_steve | Calibre | 2 | 08-10-2010 06:34 PM |
Calibre and FORMATTING how to stop it altering my formatting? | nerys | Calibre | 37 | 07-23-2010 02:35 AM |
Formatting Question | HiddenZebra | Amazon Kindle | 2 | 06-26-2010 02:16 AM |
Calibre and FORMATTING how to stop it altering my formatting? | nerys | Calibre | 0 | 02-28-2010 04:51 PM |
Formatting epic poems with line numbers? | Lima_dat | Workshop | 4 | 02-25-2008 03:53 PM |