Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book General > General Discussions

Notices

Reply
 
Thread Tools Search this Thread
Old 01-28-2011, 10:34 AM   #136
Patricia29
Junior Member
Patricia29 began at the beginning.
 
Posts: 4
Karma: 10
Join Date: May 2010
Device: Sony Reader
I just finished 'Fall of Giants' by Ken Follett, in every instance in the ebook, the country house in Wales is referred to as T? Gwyn. I believe it should be Ty Gwyn.
This was a major irritation, since it appears countless times throughout the book and there is no excuse for it!
Patricia29 is offline   Reply With Quote
Old 01-28-2011, 10:36 AM   #137
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Quote:
Originally Posted by bizzybody View Post
Since comers is such a rarely used and archaic word in English, almost always with the word all before it, any time English OCR software thinks it sees "comers" it should be flagged, tagged and bagged as corners unless all is right before it. http://www.thefreedictionary.com/All+comers
I don't know about you, but I'd really rather not have the word "newcomers" (a relatively common word) converted into "newcorners".
HarryT is offline   Reply With Quote
Advert
Old 01-28-2011, 10:45 AM   #138
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Quote:
Originally Posted by Patricia29 View Post
I just finished 'Fall of Giants' by Ken Follett, in every instance in the ebook, the country house in Wales is referred to as T? Gwyn. I believe it should be Ty Gwyn.
This was a major irritation, since it appears countless times throughout the book and there is no excuse for it!
There is actually a sensible reason for that. The word would almost certainly have been "Ty" where the "y" had a circumflex accent. This is a Unicode character which is absent from many fonts. The "?" is the Sony's way of saying "this is a character which isn't present in my font".
HarryT is offline   Reply With Quote
Old 01-30-2011, 04:39 AM   #139
bizzybody
Addict
bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.
 
Posts: 294
Karma: 7742186
Join Date: Apr 2007
Location: Idaho, USA
Device: Various PalmOS PDAs, Android Phones, Sharper Image Literati
Which is why e-books for platforms that don't do Unicode need to have the text converted to extended ASCII. That has all the characters to handle most languages which use 'english' style characters.

Even better would be for the reader software to implement its own Unicode support using its own fonts.
bizzybody is offline   Reply With Quote
Old 01-30-2011, 10:54 AM   #140
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Quote:
Originally Posted by bizzybody View Post
Which is why e-books for platforms that don't do Unicode need to have the text converted to extended ASCII. That has all the characters to handle most languages which use 'english' style characters.
Is the "y with a circumflex" character present in extended ASCII?

It's not really relevant to this case, though, given that ePub is a Unicode-based standard.
HarryT is offline   Reply With Quote
Advert
Old 01-30-2011, 11:23 AM   #141
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,522
Karma: 19000001
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Quote:
Originally Posted by HarryT View Post
Is the "y with a circumflex" character present in extended ASCII?
I think he/she meant that it should be "degraded" to something in extended ASCII, i.e., Tŷ -> Ty

Quote:
It's not really relevant to this case, though, given that ePub is a Unicode-based standard.
And the Mobipocket format should support it too, if properly encoded. The problem is only the default font in the device not having the required character, nothing related to the format.
Jellby is offline   Reply With Quote
Old 01-30-2011, 11:27 AM   #142
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Quote:
Originally Posted by Jellby View Post
And the Mobipocket format should support it too, if properly encoded. The problem is only the default font in the device not having the required character, nothing related to the format.
Yes, that was my meaning: that this isn't a typo in the book.
HarryT is offline   Reply With Quote
Old 01-30-2011, 11:58 AM   #143
Andrew H.
Grand Master of Flowers
Andrew H. ought to be getting tired of karma fortunes by now.Andrew H. ought to be getting tired of karma fortunes by now.Andrew H. ought to be getting tired of karma fortunes by now.Andrew H. ought to be getting tired of karma fortunes by now.Andrew H. ought to be getting tired of karma fortunes by now.Andrew H. ought to be getting tired of karma fortunes by now.Andrew H. ought to be getting tired of karma fortunes by now.Andrew H. ought to be getting tired of karma fortunes by now.Andrew H. ought to be getting tired of karma fortunes by now.Andrew H. ought to be getting tired of karma fortunes by now.Andrew H. ought to be getting tired of karma fortunes by now.
 
Posts: 2,201
Karma: 8389072
Join Date: Oct 2010
Location: Naptown
Device: Kindle PW, Kindle 3 (aka Keyboard), iPhone, iPad 3 (not for reading)
I downloaded a sample on K4PC and found "Ty Gwyn." FYI.
Andrew H. is offline   Reply With Quote
Old 01-30-2011, 12:05 PM   #144
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Quote:
Originally Posted by Andrew H. View Post
I downloaded a sample on K4PC and found "Ty Gwyn." FYI.
That in itself is really a "cop-out", because "Tŷ" is a Welsh word ("House": "Tŷ Gwyn" means "White House"), whereas "Ty" isn't a word at all. Given, though, that "ŷ" isn't an ASCII character, most Welsh speakers are probably used to seeing "ŷ" written as "y".
HarryT is offline   Reply With Quote
Old 01-30-2011, 09:26 PM   #145
bizzybody
Addict
bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.
 
Posts: 294
Karma: 7742186
Join Date: Apr 2007
Location: Idaho, USA
Device: Various PalmOS PDAs, Android Phones, Sharper Image Literati
All these characters are in the extended ASCII set, or Windows 1252 which is pretty much the same thing. The extended ASCII set with line drawing characters is a creation of IBM.

I had to leave the semicolons off the UTF-8 codes because the forum software is not setup to leave *everything* between the code commands 100% exactly as entered. With the semicolons after the numbers the bleeping forum "helpfully" converts the codes to the characters.

Any e-book conversion software that can convert to formats for which there is a reader for non-unicode platforms should have an option to use extended ASCII or Windows 1252 encoding, including converting all these UTF-8 codes (with the semicolon of course) to their ASCII equivalents instead of to their Unicode equivalents.

The result looks exactly the same, but the file size can be significantly smaller.

Code:
&#033
!
&#034
"
&#035
#
&#036
$
&#037
%
&#038
&
&#039
'
&#040
(
&#041
)
&#042
*
&#043
+
&#044
,
&#045
-
&#046
.
&#047
/
&#048
0
&#049
1
&#050
2
&#051
3
&#052
4
&#053
5
&#054
6
&#055
7
&#056
8
&#057
9
&#058
:
&#059
;
&#060
<
&#061
=
&#062
>
&#063
?
&#064
@
&#065
A
&#066
B
&#067
C
&#068
D
&#069
E
&#070
F
&#071
G
&#072
H
&#073
I
&#074
J
&#075
K
&#076
L
&#077
M
&#078
N
&#079
O
&#080
P
&#081
Q
&#082
R
&#083
S
&#084
T
&#085
U
&#086
V
&#087
W
&#088
X
&#089
Y
&#090
Z
&#091
[
&#092
\
&#093
]
&#094
^
&#095
_
&#096
`
&#097
a
&#098
b
&#099
c
&#100
d
&#101
e
&#102
f
&#103
g
&#104
h
&#105
i
&#106
j
&#107
k
&#108
l
&#109
m
&#110
n
&#111
o
&#112
p
&#113
q
&#114
r
&#115
s
&#116
t
&#117
u
&#118
v
&#119
w
&#120
x
&#121
y
&#122
z
&#123
{
&#124
|
&#125
}
&#126
~
&#128
€
&#130
‚
&#131
ƒ
&#132
„
&#133
…
&#134
†
&#135
‡
&#136
ˆ
&#137
‰
&#138
Š
&#139
‹
&#140
Œ
&#142
Ž
&#145
‘
&#146
’
&#147
“
&#148
”
&#149
•
&#150
–
&#151
—
&#152
˜
&#153
™
&#154
š
&#155
›
&#156
œ
&#158
ž
&#159
Ÿ
&#160
&nbsp
&#161
¡
&#162
¢
&#163
£
&#164
¤
&#165
¥
&#166
¦
&#167
§
&#168
¨
&#169
©
&#170
ª
&#171
«
&#172
¬
&#173
*
&#174
®
&#175
¯
&#176
°
&#177
±
&#178
²
&#179
³
&#180
´
&#181
µ
&#182
¶
&#183
·
&#184
¸
&#185
¹
&#186
º
&#187
»
&#188
¼
&#189
½
&#190
¾
&#191
¿
&#192
À
&#193
Á
&#194
Â
&#195
Ã
&#196
Ä
&#197
Å
&#198
Æ
&#199
Ç
&#200
È
&#201
É
&#202
Ê
&#203
Ë
&#204
Ì
&#205
Í
&#206
Î
&#207
Ï
&#208
Ð
&#209
Ñ
&#210
Ò
&#211
Ó
&#212
Ô
&#213
Õ
&#214
Ö
&#215
×
&#216
Ø
&#217
Ù
&#218
Ú
&#219
Û
&#220
Ü
&#221
Ý
&#222
Þ
&#223
ß
&#224
à
&#225
á
&#226
â
&#227
ã
&#228
ä
&#229
å
&#230
æ
&#231
ç
&#232
è
&#233
é
&#234
ê
&#235
ë
&#236
ì
&#237
í
&#238
î
&#239
ï
&#240
ð
&#241
ñ
&#242
ò
&#243
ó
&#244
ô
&#245
õ
&#246
ö
&#247
÷
&#248
ø
&#249
ù
&#250
ú
&#251
û
&#252
ü
&#253
ý
&#254
þ
&#255
ÿ
&#338
Œ
&#339
œ
&#352
Š
&#353
š
&#376
Ÿ
&#402
ƒ
&#8211
–
&#8212
—
&#8216
‘
&#8217
’
&#8218
‚
&#8220
“
&#8221
”
&#8222
„
&#8224
†
&#8225
‡
&#8226
•
&#8230
…
&#8240
‰
&#8364
€
&#8482
™
Like I said earlier, the *best* thing would be for e-book reader software to include its own Unicode support on platforms without native support, but unless someone else does it, that will never ever happen for Mobipocket since Amazon bought it for use on Kindle. Failing that, the only thing one can do when converting to formats for any Palm reader app or other non-unicode platform is to pre-convert the source to de-Unicode it, unless you like common punctuation replaced by spaces, blank boxes, 'weird' characters or simply removed and replaced with nothing at all.

If there's anyone here that knows the C# programming language, I posted a program with source code on the forum. It's a text string replacer that works if the list of text strings to swap is kept short enough. It still has some bugs, it can't handle the full list of UTF-8 and ASCII codes, it corrupts the list by replacing much of it with the unknown character box. With it fixed to handle long enough lists it'd be very useful for doing very fast replacing of any strings of text with any other strings of text.
bizzybody is offline   Reply With Quote
Old 01-30-2011, 09:33 PM   #146
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 75,157
Karma: 132820308
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Given the ePub version, there is no reason not to embed a font that supports proper unicode and thus, the word would be correct.
JSWolf is offline   Reply With Quote
Old 01-31-2011, 07:59 AM   #147
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,522
Karma: 19000001
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Quote:
Originally Posted by bizzybody View Post
All these characters are in the extended ASCII set, or Windows 1252 which is pretty much the same thing.
You are assuming there's one "extended ASCII", but that's not true, there are many. Which one should be chosen? The one you like best? Windows 1252 is comparable to ISO-8859-1, which is fine for Western European languages, but why not any other variant?

Most current devices are perfectly capable of showing a wide range of Unicode characters, there's no need to "downgrade" to some limited 8-bit encoding. If some characters are not properly displayed it's because the device is lacking a good font (and the possibility of using a custom one). Should ebooks be created with typos just so they show sort-of-OK in defective devices? I don't think so. What if next month a software upgrade fixes the devices? Now suddenly all those books would be of reduced quality for no reason (and don't hope a "corrected" version of every book will be released).
Jellby is offline   Reply With Quote
Old 01-31-2011, 08:07 AM   #148
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Quote:
Originally Posted by Jellby View Post
Most current devices are perfectly capable of showing a wide range of Unicode characters, there's no need to "downgrade" to some limited 8-bit encoding. If some characters are not properly displayed it's because the device is lacking a good font (and the possibility of using a custom one). Should ebooks be created with typos just so they show sort-of-OK in defective devices? I don't think so. What if next month a software upgrade fixes the devices? Now suddenly all those books would be of reduced quality for no reason (and don't hope a "corrected" version of every book will be released).
The book I'm currently proof-reading, John Buchan's "The Courts of the Morning", is about a revolution in a fictional South American country. Unfortunately, the eBook source is an ASCII version, so a lot of the work of proof-reading consists of restoring the accents to all the Spanish words that should have them.
HarryT is offline   Reply With Quote
Old 01-31-2011, 10:12 PM   #149
elcreative
Wizard
elcreative ought to be getting tired of karma fortunes by now.elcreative ought to be getting tired of karma fortunes by now.elcreative ought to be getting tired of karma fortunes by now.elcreative ought to be getting tired of karma fortunes by now.elcreative ought to be getting tired of karma fortunes by now.elcreative ought to be getting tired of karma fortunes by now.elcreative ought to be getting tired of karma fortunes by now.elcreative ought to be getting tired of karma fortunes by now.elcreative ought to be getting tired of karma fortunes by now.elcreative ought to be getting tired of karma fortunes by now.elcreative ought to be getting tired of karma fortunes by now.
 
Posts: 2,888
Karma: 5875940
Join Date: Dec 2007
Device: PRS505, 600, 350, 650, Nexus 7, Note III, iPad 4 etc
Proofreading and proofreaders were one of the first things to bite the dust when we went electronic with books being produced on computers from author to printer... they weren't eliminated exactly but the process was made lazier by the assumption (from people who didn't know better) that spellchecking (and grammar checking!!!) could automate most of the process so reducing the need for proper proofing...


Quote:
Originally Posted by Quexos View Post
I see.
However this remains unforgivable. All that is needed after a text has been OCR'd is for someone to proof-read it. Now don't tell me publishers can't afford that or did not think of that.
elcreative is offline   Reply With Quote
Old 02-01-2011, 08:20 PM   #150
jrlewis
SF/F Author
jrlewis ought to be getting tired of karma fortunes by now.jrlewis ought to be getting tired of karma fortunes by now.jrlewis ought to be getting tired of karma fortunes by now.jrlewis ought to be getting tired of karma fortunes by now.jrlewis ought to be getting tired of karma fortunes by now.jrlewis ought to be getting tired of karma fortunes by now.jrlewis ought to be getting tired of karma fortunes by now.jrlewis ought to be getting tired of karma fortunes by now.jrlewis ought to be getting tired of karma fortunes by now.jrlewis ought to be getting tired of karma fortunes by now.jrlewis ought to be getting tired of karma fortunes by now.
 
jrlewis's Avatar
 
Posts: 160
Karma: 349656
Join Date: Oct 2010
Location: USA
Device: Adobe Digital Editions
As a technical exercise, the typos in a commercially published ebook do make sense. Publishers usually follow a long and linear process of converting text from one format to another until the *final* version only exists in a special formatting program like InDesign or FrameMaker, not a standard text file like Word or Rich Text.

So when they export to a non-print-ready format, like Epub or Mobi, they're going to get all sort of administrative formatting garbage in the file. I would hope that they would have a proofer check specifically for this issue, but apparently they aren't doing a great job of it right now.
jrlewis is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Looking for examples of typos in eBooks Tonycole General Discussions 1 05-05-2010 04:23 AM
typos or mistakes in ebooks delcimai Sony Reader 15 02-14-2010 11:53 AM
Typos during conversion ddavtian Calibre 11 10-20-2008 12:57 AM
eBooks and Typos seldan Reading and Management 9 10-08-2007 12:35 PM
ebook typos sugarbear2403 Sony Reader 6 10-09-2006 11:47 PM


All times are GMT -4. The time now is 10:39 AM.


MobileRead.com is a privately owned, operated and funded community.