11-04-2014, 02:39 AM | #1 |
Guru
Posts: 869
Karma: 2676800
Join Date: Aug 2008
Location: Taranaki - NZ
Device: Kobo Aura H2O, Kobo Forma
|
Some ebook creators just can't be helped...
So I'm not really sure if this is the most appropriate place to put this. Anyway...
So, I got a book from amazon, imported into Calibre, then converted to epub as you do. The first sign that something was amiss was that it took calibre over two and a half minutes to convert the book. Normal conversions only take around 10s or so. Naturally I wanted to see what the problem was, so I opened the book in the Editor. I let loose a string of expletives. Then I let loose a few more for good measure. I have then gone and opened the original azw3 in the ebook editor. The editor reports that the size of the HTML file is 6.5 Megabytes . This is not the bible mind you, just a run of the mill length novel. So, what does the body of the text look like? Here is a very small sample: Code:
<p class="MsoNormal" style="text-align:justify;text-indent:.25in"><span style="font-size:0.92rem">"<span style="letter-spacing:-.05pt">W</span>h<span style="letter-spacing:-.05pt">a</span>t<span style="letter-spacing:1.4pt"> </span><span style="letter-spacing:-.1pt">s</span><span style="letter-spacing:.05pt">a</span>y<span style="letter-spacing:1.25pt"> </span><span style="letter-spacing:-.3pt">y</span><span style="letter-spacing:.05pt">o</span>u<span style="letter-spacing:.05pt">?</span>"<span style="letter-spacing:1.5pt"> </span> This happens throughout the entire book. Sorry about the rant, I just felt I had to get it out. EDIT: This bit of regex solved the problem... Code:
<span style="letter-spacing[^>]+>([^<]+)</span> Last edited by sherman; 11-04-2014 at 02:46 AM. |
11-04-2014, 02:48 AM | #2 |
Wizard
Posts: 1,154
Karma: 3252017
Join Date: Jan 2008
Location: Germany
Device: Pocketbook Touch Lux (623)
|
OK, that is... special. Even for MS Word's atrocious HTML export.
|
Advert | |
|
11-04-2014, 02:51 AM | #3 |
Guru
Posts: 869
Karma: 2676800
Join Date: Aug 2008
Location: Taranaki - NZ
Device: Kobo Aura H2O, Kobo Forma
|
|
11-04-2014, 02:58 AM | #4 | |
Bookmaker & Cat Slave
Posts: 11,482
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
That's usually the only time you'll see that much cruft, coupled with the MSoNormal body style. I'm not a bettor, but that's my best (and fairly educated) guess--Adobe is in the mix, that, or another faux-convert-a-book type thing, exporting a print-layout PDF into "Word." FWIW. Hitch |
|
11-04-2014, 03:05 AM | #5 |
Guru
Posts: 869
Karma: 2676800
Join Date: Aug 2008
Location: Taranaki - NZ
Device: Kobo Aura H2O, Kobo Forma
|
Wherever the original source came from, this was definitely an eye-opener for me.
As an aside, I have found that word can output acceptable (filtered) html. However, that relies on the author knowing how to use styles... |
Advert | |
|
11-04-2014, 03:16 AM | #6 |
Guru
Posts: 869
Karma: 2676800
Join Date: Aug 2008
Location: Taranaki - NZ
Device: Kobo Aura H2O, Kobo Forma
|
Grrr... Another small rant.
One would think that in this day and age, Authors should know what a page-break is. |
11-04-2014, 04:36 AM | #7 | |
Bookmaker & Cat Slave
Posts: 11,482
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
There is simply no way of getting around it--someone, somewhere, has to apply styles. Whether the author does it initially (in wich case, they mostly don't need bookmakers), or the bookmaker cleans it all up, and applies them, either in Word or ??? or in HTML via CSS...it's all the same thing. It's even thus in print layout. There's no alternative. IME, which is not small, Word frankly does a better job of outputting reasonably decent styles in filtered HTML than anything else, period. It's faster to clean up than almost anything else; more tools, more macros, etc., out there to assist. {shrug}. Lots of folks will argue that "markup is better," and of course, in some ideal parallel universe, that's true...but no author is going to write in MARKUP, it's just daft. Of the remaining choices, programs, etc., Word is the best option, all told. Hitch |
|
11-04-2014, 05:15 AM | #8 |
Fanatic
Posts: 563
Karma: 403106
Join Date: Aug 2014
Device: PRS-T1
|
I believe the word processing software did that during conversion through an integrated whatever_my_format 2 ePub (AZW). I haven't used so many word processors/editors to recognize a "style", but it looks like a pro system, I mean a typesetting one (DTP) like PageMaker, QuarkXpress and stuff. Why not (La)Tex?
|
11-04-2014, 05:27 AM | #9 |
The Grand Mouse 高貴的老鼠
Posts: 72,255
Karma: 309000000
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
IIRC, Amazon charges the author data fees for the size of the book when on the 70% option. So the author is probably losing some income because of the large size of the ebook. It might be a kindness to mention this to the author.
|
11-04-2014, 06:58 AM | #10 |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
Like Hitch said, Word in itself is acceptable if used correctly. However, the myriad of 'free automated conversion tools' are not helping. They are all labeled as 'wonderful', but are in reality a nightmare. It may look alright from afar, it is actually atrocious.
For documents in general the credo 'keep it clean, keep it simple' does do wonders. If you really want to knock yourself out in styling and alike, it must be done as final step, not as intermediate. |
11-04-2014, 09:04 AM | #11 | |
temp. out of service
Posts: 2,799
Karma: 24285242
Join Date: May 2010
Location: Duisburg (DE)
Device: PB 623
|
Quote:
(size optimisation of ebooks) If you are right, then it'd be more than a pet peeve problem solution but really useful to find as much opportunities as I can. I'll try to do my best . |
|
11-04-2014, 10:24 AM | #12 |
frumious Bandersnatch
Posts: 7,534
Karma: 19000001
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
This looks like one these "amazing PDF to HTML conversion" results
|
11-04-2014, 12:52 PM | #13 | |
Bookmaker & Cat Slave
Posts: 11,482
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
We are, in fact, dealing with an illustrated story (not a comic) right now, with this issue; it's coming out at 78MB. Obviously, first, we have to get it down to the 50MB size (Amazon's filesize limit for self-pubs through the KDP) and secondly, the delivery-fee issue. Hitch |
|
11-04-2014, 02:39 PM | #14 | |
The Grand Mouse 高貴的老鼠
Posts: 72,255
Karma: 309000000
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
Quote:
|
|
11-04-2014, 02:49 PM | #15 |
Bookmaker & Cat Slave
Posts: 11,482
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Where are the great 'thinkers' and 'creators' salons today? | Lynx-lynx | General Discussions | 92 | 07-06-2013 01:38 PM |
You helped me once, could you do it again? | Farhad | Which one should I buy? | 11 | 02-28-2011 05:26 PM |
Amazon has 76% of ebook market, helped by iPad | L.J. Sellers | News | 19 | 10-13-2010 08:51 AM |
Who helped initiate you? | GraceKrispy | Lounge | 36 | 03-16-2010 10:27 AM |