05-17-2009, 03:39 PM | #31 |
Wizard
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
|
Some of the comments, and particularly the special focus on quotation marks makes me wonder if my original "dream" of a single script to take a plaintext document from A to Z is misguided. (Even though, as I stated earlier, it seems to me GutenMark does a reasonable job of that.)
Perhaps a better way would be to write small utilities each of which focus on just one aspect of the document cleanup/conversion/fix-up process. I myself might play around with a quotation mark fixing utility, when I get a chance over the next week or so. Some other utilities I can think of: - metadata recognizer (i.e.: figures out title, author, chapter titles, et al) - paragraph normalizer (remove manual linebreaks between lines, keep only one between paragraphs) - emphasis normalizer (convert the myriad ways of indicating emphasis into a single standard [and ideally simple to accurately parse] markup) All of these utilities I think of as being command line tools that do as much as possible without human intervention but ask for human/manual arbitration when up against a case that requires a judgement call (or, rather, actually understanding the text). Does anything like this exist? Is the idea kind of crazy, or kind of sensible? Sincerely, AHI |
05-17-2009, 04:01 PM | #32 |
Guru
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
|
It is sensible, I am just a bit doubtful about it being realistic :-)
I was thinking along these lines, but really, even with same-source files, there are too many variations so that it's too much effort writing a fully-automatic utility. I just keep a collection of useful regexps (or normal replaces) and in each case decide which one (and in which order) to use. |
Advert | |
|
05-17-2009, 04:07 PM | #33 | |
Wizard
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
|
Quote:
It definitely is a tall order... but broken down into its components like this might make it more achievable. Like I said, I might try my hands at the quotation stuff in the near future. |
|
05-18-2009, 05:11 AM | #34 |
frumious Bandersnatch
Posts: 7,534
Karma: 19000001
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Unfortunately (as far as I know) there is no different character for a curly apostrophe, we have to use a single right quote (' is a straight apostrophe, just as " is a straight double quote). That's why I use ’ and & #8217; for single right quotes and apostrophes, they are exactly the same character, with two different names, but at least I can keep them different in the source file.
|
05-18-2009, 05:20 AM | #35 |
Banned
Posts: 475
Karma: 796
Join Date: Sep 2008
Location: Honolulu
Device: Nokia 770 (fbreader)
|
Hmmm, but that's for presentation, right? Is there a way to hack that with CSS? Calling an apostrophe a quote makes for inaccurate meta.
(Okay, now that I've begun thinking about it, CSS is becoming like a magical pony, bringing gifts and hopes to all the children.) m a r |
Advert | |
|
05-18-2009, 06:01 AM | #36 | |
Grand Sorcerer
Posts: 9,707
Karma: 32763414
Join Date: Dec 2008
Location: Krewerd
Device: Pocketbook Inkpad 4 Color; Samsung Galaxy Tab S6
|
Quote:
|
|
05-18-2009, 06:11 AM | #37 |
frumious Bandersnatch
Posts: 7,534
Karma: 19000001
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
I agree that's somewhat undesirable... maybe another solution would be keeping the ' in the source and postprocessing it to a curly apostrophe when creating a particular end format, but I don't think that can be done with CSS (at least not with the CSS2 used in ePUB).
|
05-18-2009, 08:02 AM | #38 |
Banned
Posts: 475
Karma: 796
Join Date: Sep 2008
Location: Honolulu
Device: Nokia 770 (fbreader)
|
Can you name your own entities with CSS2?
@sweetpea: yeah, I'm starting to get that... m a r |
05-18-2009, 10:41 AM | #39 |
frumious Bandersnatch
Posts: 7,534
Karma: 19000001
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Hm... good question.
I don't think you can with CSS2, but maybe there are other means. See an example here (view the source). Now the question is whether this can be used with ePUB. |
05-18-2009, 12:53 PM | #40 | |
Grand Sorcerer
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
Quote:
Dale |
|
05-18-2009, 01:00 PM | #41 | |
Grand Sorcerer
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
Quote:
Dale |
|
05-18-2009, 03:36 PM | #42 |
Guru
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
|
|
05-18-2009, 11:56 PM | #43 |
Banned
Posts: 475
Karma: 796
Join Date: Sep 2008
Location: Honolulu
Device: Nokia 770 (fbreader)
|
So that means you could re-name them as well, right?
Such that for problems like $apos; -- you could use the common entity-name, then anyone that wanted to could redefine it as a $rsquo; or whatever they like. Still, it'd be better if it could be called externally to the ebook file, since it's really a presentation issue, no? m a r |
05-19-2009, 06:08 AM | #44 |
frumious Bandersnatch
Posts: 7,534
Karma: 19000001
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
I don't know if you can actually redefine ', since it's one of the predefined XML entities, but it looks like you can redefine — and others. In any case, it's possible to create a new ≈ entity and make it look whatever you like. I'll do some tests with this...
|
05-19-2009, 07:43 AM | #45 |
Banned
Posts: 475
Karma: 796
Join Date: Sep 2008
Location: Honolulu
Device: Nokia 770 (fbreader)
|
I was just looking at the CSS tutorial at w3schools.
I'm wondering if a different <link> tag in the head might allow you to call some customizations? I'm just spit-balling here -- I have no idea how the <link> tag works; but if the CSS file is read in the head, it makes me wonder if you could use one of the other rel="" attribute/values to call a set of entity redefinitions? Like maybe a "section" or a "bookmark" Anyway, I'll leave it to you experts. m a r Last edited by rogue_ronin; 05-19-2009 at 07:44 AM. Reason: Add Link |
Tags |
conversion, typography |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Kindle Typography | ChaoZ | Amazon Kindle | 21 | 08-14-2010 12:50 PM |
Is there hope for better ebook typography? | tomsem | Amazon Kindle | 0 | 08-12-2010 10:44 PM |
Typography on the iPad | LDBoblo | Apple Devices | 1 | 04-14-2010 03:33 PM |
French Typography | ahi | Workshop | 14 | 09-16-2009 02:22 PM |
Chinese Typography | ahi | Workshop | 81 | 09-14-2009 09:34 AM |