06-03-2007, 11:56 PM | #1 |
Member
Posts: 22
Karma: 10
Join Date: Jun 2007
Device: Amazon Kindle Voyage
|
Quick Reformatting of Terrible E-Books
I've been thinking, and as I don't actually know perl myself I'm unable to write the script for it, but a script that did the following would be amazingly useful in formatting eBooks for viewing on mobile devices.
Scan all files in a directory (and subdirectories, hey, why not) and replace all instances of <newline> not immediately followed by either <newline> or <tab> with a single space. Reasoning behind this: I've seen entirely too many eBooks formatted such that they use manual line breaks instead of using the word wrap feature, and when transferred to a mobile device, you end up with fragmented lines: "This is a bunch of text serving as example of incorrect word wrap due to stupid formatting of eBooks. I really wish there was some way to fix it, because it's almost impossible to read this terribly formatted text." Can anyone think of any files they have that this command would damage? I wouldn't want to run it on poetry, but other than that, it seems that this script can be safely run on normally-formatted eBooks without changing anything. |
06-04-2007, 12:00 AM | #2 |
Member
Posts: 22
Karma: 10
Join Date: Jun 2007
Device: Amazon Kindle Voyage
|
Note to self: Check previous posts before making new post. Your question may already have been discussed.
|
Advert | |
|
06-04-2007, 12:15 AM | #3 |
eNigma
Posts: 503
Karma: 1335
Join Date: Dec 2006
Location: The Philippines
Device: HTC G1 Android FBReader
|
Previous threads on scripting
Sometimes it is hard to know what to search for. I am familiar with this thread and this one too that discuss scripting and handling the text-formatting problem that concerns you.
I hope this helps. |
06-05-2007, 11:46 AM | #4 |
Resident Curmudgeon
Posts: 75,244
Karma: 133361584
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
WYSIWG Editor is broken
Last edited by JSWolf; 06-05-2007 at 11:49 AM. Reason: WYSIWG Editor is broken |
06-05-2007, 11:48 AM | #5 | |
Resident Curmudgeon
Posts: 75,244
Karma: 133361584
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
|
|
Advert | |
|
06-21-2007, 11:13 AM | #6 |
Reader
Posts: 11,504
Karma: 8720163
Join Date: May 2007
Location: South Wales, UK
Device: Sony PRS-500, PRS-505, Asus EEEpc 4G
|
With Project Gutenberg books in text file format, I just paste them into a word document, then run Stingo's Macro. This only takes a couple of minutes and solves the hard carriage breaks.
|
06-21-2007, 02:59 PM | #7 |
Resident Curmudgeon
Posts: 75,244
Karma: 133361584
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
If there is an HTML version available, I go for that one. You'll get images if there are any, and italics. Its not hard to work with the HTML in Book Designer. if you use the text file instead, you lose what attributes and images there might be. So please use the HTML when one exists.
|
06-21-2007, 04:46 PM | #8 |
fruminous edugeek
Posts: 6,745
Karma: 551260
Join Date: Oct 2006
Location: Northeast US
Device: iPad, eBw 1150
|
Stingo's macro just looks for double paragraph marks, doesn't it? Won't help if you have a file that doesn't have an extra line between paragraphs (as often happens with files that have been through a PDF stage somewhere in their history). I've been thinking of writing a perl script to make a "best guess" based on line length. I'll be doing some perl work this summer, and may have a chance to slip it in then. I'll post it somewhere on mobileread (in the wiki, maybe) if I get a reasonable version working.
|
06-21-2007, 08:47 PM | #9 |
Resident Curmudgeon
Posts: 75,244
Karma: 133361584
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
With the HTML from PG, there is no need to have to reformat it to remove the extra line spaces. It works just fine in BD as is. And if there are line spaces, they are meant to be there.
|
06-21-2007, 11:22 PM | #10 |
eNigma
Posts: 503
Karma: 1335
Join Date: Dec 2006
Location: The Philippines
Device: HTC G1 Android FBReader
|
When designing scripts to deal with hard carriage returns, it is good to be able to actually see which character codes are causing the problem. A programmer's editor is the tool to start with for your basic research. You can read more here.
|
06-22-2007, 08:32 AM | #11 | |
Uebermensch
Posts: 2,583
Karma: 1094606
Join Date: Jul 2003
Location: Italy
Device: Kindle
|
If you deal with pre-formatting PG books, also check out their faq which provides some useful tips.
Quote:
|
|
06-22-2007, 11:17 AM | #12 |
eNigma
Posts: 503
Karma: 1335
Join Date: Dec 2006
Location: The Philippines
Device: HTC G1 Android FBReader
|
Dealing with ugly line spacing
Let me give an example:
My favorite file format for the Reader is plain old ASCII text. The title on the Reader turns out to be the same as the filename. I like that. I can make text files from many other formats. The middle font size on the Reader is right for normal reading and then I can go one bigger if the lighting is bad. I don't have to experiment a lot when I am in a hurry to read something. So I got a book in lit format and converted it to lrf. The resultant line formatting was just plain ugly. There were sentence fragments everywhere and way too many spaces between lines. I decided to tighten it up. I used Amber lit converter (abclit) to convert the original lit file to text. Then I opened the file in PSPad (see earlier post for source). I used the hex display mode to examine the character structure of the ugliness. I noticed that there were $0d$0a pairs everywhere. That is a carriage return line feed combination. But at the beginning of every real paragraph there was an $a0 character. That is a space character with the high order bit set, I don't know why anybody put that character in there. It is not common. But I liked it because it gave me a way to reformat everything easily. First I used search and replace to find all the $0d$0a pairs and replace them with $20 (space). Then I replaced all the $a0 characters with $0d$0a pairs. The result was pure beauty! The paragraphs all flowed well and there were no unwanted line spaces. It took five minutes! Last edited by mogui; 06-22-2007 at 11:20 AM. |
06-22-2007, 11:39 AM | #13 |
Technogeezer
Posts: 7,233
Karma: 1601464
Join Date: Nov 2006
Location: Virginia, USA
Device: Sony PRS-500
|
While MS Word (and OpenOffice) are nice tools for documents, nothing beats Ultra Edit for major work on raw text files. I use it frequently when preparing the Harvard Classics series of books. It is a commercial product; but, for me it has been worth it.
|
08-03-2007, 10:20 PM | #14 | ||
Member
Posts: 10
Karma: 3650
Join Date: Dec 2004
Device: Tungsten TC
|
Quote:
Quote:
http://www.simtel.net/product.php%5B...t_page%5D76296 |
||
08-03-2007, 11:15 PM | #15 | |
Resident Curmudgeon
Posts: 75,244
Karma: 133361584
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
I've been thinking of trying to find a better text editor then Notepad. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Anti-recommendations: Read any terrible books lately? | ficbot | Reading Recommendations | 82 | 01-26-2011 01:09 PM |
need a quick lesson how how to download and read e-books. | clear | General Discussions | 9 | 10-10-2010 05:28 PM |
Classic Quick question - library books | Thrasher | Barnes & Noble NOOK | 6 | 06-23-2010 01:11 PM |
quick question regarding removing books | oncdoc | Amazon Kindle | 2 | 07-26-2009 09:53 PM |
connect store downloads books i didnt order! Terrible connectstore support | alexjlee | Sony Reader | 15 | 01-01-2007 06:26 PM |