06-07-2009, 10:03 PM | #1 |
Wizard
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
|
Removing Line-breaks / Preserving Paragraphs
Please find attached repartee, a python script that--I believe--should do a fairly good job of automatically removing linebreaks without interfering with paragraph breaks.
I just finished throwing it together, so it doubtless leaves much to be desired. However I would be grateful if people could either test it or point me to some unorthodoxly line-broken/paragraph-broken files upon which I could try the program myself. The script doesn't touch the input file (unless you purposely specify the input file's name as also the output file) and is programmed not to output anything if it doesn't think it can tell line-breaks apart from paragraph-breaks. If you find a file that the script should fix (i.e.: it has both line-breaks and paragraph breaks), but it refuses, saying "Unable to find a clear and/or consistent line break / paragraph break pattern.", please send the file (or a portion thereof) my way for analysis. Keep in mind though that the script is meant to be used on full size plaintext novels or reasonably long short stories. It is more likely to break with very short pieces of text, almost certainly won't do anything useful with flash fiction, and may behave erratically with complexly formatted (i.e.: language text book, and other similarly non-novel type of) text files. - Ahi Ps.: In particular, I would be grateful, Gideon, if you tried it on the file you recently had trouble with and let me know the results. |
06-08-2009, 01:44 AM | #2 |
Wizard
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
|
Here's the updated version that has had the parsing/identification logic fixed to not trip up on files where there are two spaces after punctuation and other similar quirks.
- Ahi Ps.: It should be noted that repartee does not abuse poems, and other similar text so long as it is indented by a few spaces on the beginning of each line (which thus sets it apart from other lines). |
Advert | |
|
06-08-2009, 02:00 AM | #3 |
Wearer of Pants
Posts: 1,050
Karma: 7634
Join Date: Jan 2008
Location: Norman, OK
Device: Amazon Kindle DX / iPhone
|
I gave it a quick go with a file I had that was like my notorious file from the other day was supposed to be (really did have no spaces after each line, just a paragraph). Most paragraphs did begin with a tab, however. But the script just said nothing could be identified and so nothing could be done.
|
06-08-2009, 02:07 AM | #4 | |
Wizard
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
|
Quote:
- Ahi Ps.: Or, post the output (assuming you are using the zip file from the second post. That one gives some numbers as to what it identifies as word space, line-break, et al. Last edited by ahi; 06-08-2009 at 02:11 AM. |
|
06-08-2009, 02:12 AM | #5 |
Wearer of Pants
Posts: 1,050
Karma: 7634
Join Date: Jan 2008
Location: Norman, OK
Device: Amazon Kindle DX / iPhone
|
This was just something I put together to test it. But I can post it. I only created five paragraphs, though.
|
Advert | |
|
06-08-2009, 02:22 AM | #6 | |
Wizard
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
|
Quote:
It's, generally speaking, more likely to deal well with longer text though. - Ahi Last edited by ahi; 06-08-2009 at 02:27 AM. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Removing unnecessary line breaks in books. | Wintersdark | Calibre | 17 | 09-04-2010 04:34 AM |
Tool for removing line breaks in text documents | kahn10 | Sony Reader | 9 | 08-22-2010 10:05 PM |
Removing Returns, Preserving Paragraphs | Gideon | Workshop | 41 | 06-19-2009 05:07 AM |
Removing extra line breaks | plemming | Calibre | 0 | 07-31-2008 07:50 PM |
Book Designer - too many breaks/paragraphs? | moneytoo | Sony Reader | 10 | 10-25-2007 02:48 PM |