02-07-2009, 07:20 PM | #1 |
Zealot
Posts: 106
Karma: 3566
Join Date: Aug 2008
Location: London UK
Device: iPhone 5, Kindle K3, Kindle Voyage
|
A tool for converting to curly quotes
I've noticed recently that a surprising number of my ebooks have a mixture of both curly quotes and "straight" quotes. Most of the start-of-para quotes are open-curly (“ ), along with a few in-para open-curly. However, most of the close-quotes are the staight version, rather than the close-curly (” ). This is clearly the result of a broken conversion algorithm and bad (or no) copy-editing, and is quite distracting, as once you have spotted it, you notice every instance.
So lets defy R.W. Emerson and have a little consistency, please. It's easy to do a global replace of all quotes to the straight version, but I quite like curly quotes. The reverse conversion is more tricky, and not really do-able with a normal programmer's editor. MS Word 2003 apparently can, but I don't have that. So I wrote a quick-and-dirty brute-force style perl script to do it for me. It seems to work quite well, so I've cleaned it up a bit, and have attached it for anybody to use/modify/hack. On a very large book like The Count of Monte Cristo, it runs through in 25 seconds on a 2nd-hand 5 year old machine.
This is what a typical run looks like: Code:
D:\E-Library\work\processing>..\_bin\curly.pl Readin The_Count_of_Monte_Cristo.html Input file renamed to The_Count_of_Monte_Cristo.html.curlybak STATS ----- lines=12372 double-quote count=30595 lines with unmatched double-quotes=211 single-quote count=4218, lsquotes output=654, rsquotes output=3564 html tag count=25395 lines with broken tags=0 processing time 25 seconds total time taken 26 seconds D:\E-Library\work\processing> Good luck Snowman |
02-08-2009, 03:40 AM | #2 | ||
frumious Bandersnatch
Posts: 7,534
Karma: 19000001
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Quote:
Quote:
I think I'll continue using partially manual search and replace with vim. I also try to distinguish between closing single quotes (& rsquo;) and curly apostrophes (&# 8217;). They are both the same character (glyph), but using different codes in the source HTML allows me to easily exchange single and double quotes without affecting apostrophes, if needed. I usually first replace every instance of ([letter]'[letter]) with the apostrophe, then search for (s') and put the apostrophe if needed, then search one by one for all (") or (') and replace it with opening or closing single or double quotes or apostrophe (each case is attached to one single key, so it's relatively quick and easy, and I can keep track of nesting levels or multi-paragraph quotes), as a bonus, I can also detect many cases of missing or wrong quote marks! Last edited by Jellby; 02-08-2009 at 08:39 AM. |
||
Advert | |
|
02-08-2009, 04:10 AM | #3 | ||
eBook Enthusiast
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
Quote:
Quote:
|
||
02-08-2009, 07:02 AM | #4 | |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Quote:
Miscellaneous Options o Show your signature o Automatically parse links in text o Disable smilies in text |
|
02-08-2009, 07:53 AM | #5 |
Grand Sorcerer
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
|
How is this solved in ePub. The real solution ought to be to mark up the text so that all different variants can be generated dependent on country, language and wishes from the reader.
So you would have something like: <q> This is a nested <q>example</q>.</q> which would be translated to e.g: ``This is a nested `example'.'' |
Advert | |
|
02-08-2009, 08:46 AM | #6 | ||
frumious Bandersnatch
Posts: 7,534
Karma: 19000001
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Quote:
Quote:
|
||
02-08-2009, 10:43 AM | #7 |
Zealot
Posts: 106
Karma: 3566
Join Date: Aug 2008
Location: London UK
Device: iPhone 5, Kindle K3, Kindle Voyage
|
Thanks for the replies; I agree with what you say. There is no way such a simple-minded tool could cope with many different 'house' styles around, especially the older ones. And I'm only using it on those ebooks where a half-baked conversion has been done, leaving a nasty mix of straight and curly quotes. These are usually conversions to etext from books published from around 1970 onwards. (often those with the title and author centered between rows of hyphens on page 1).
The single quote section was a bit of an afterthought, and I'm still dithering about whether I will retain it. I used 'Cristo' as the example as this happened to be the largest book I had conveniently available, and gave a good idea of the runtime. The usual book normally runs through in less than 5 secs. I clearly misnamed the thread. 'Checker' or 'correction' would have have better conveyed the intention rather than 'converting'. Still, there it is. I leave it to the reader whether to use it as another tool in the armoury, to use it as a framework to build something better, or to use it as an awful example of how not to do it. regards Snowman |
02-08-2009, 12:22 PM | #8 |
Reader
Posts: 11,504
Karma: 8720163
Join Date: May 2007
Location: South Wales, UK
Device: Sony PRS-500, PRS-505, Asus EEEpc 4G
|
I find that whatever I use, it is still essential to check all instances of quotation marks after emdashes; and a good idea to check quotation marks before emdashes just in case.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Change single quotes to double quotes | Elfwreck | Workshop | 16 | 04-26-2013 10:06 AM |
convert straight quotes to curly quotes | alansplace | Calibre | 3 | 09-25-2010 03:51 PM |
curly quotes | DaleDe | Sigil | 6 | 06-26-2010 10:33 PM |
Pielrf - Text to LRF with Easy TOC, Headers, Curly-Quotes, etc. (Mac!) | EatingPie | LRF | 104 | 01-12-2009 12:35 PM |
Austen, Jane: Emma HTML (PDA and iPhone-friendly) with curly quotes etc | andym | Other Books | 6 | 09-11-2007 02:00 PM |