08-08-2023, 11:41 AM | #1 |
Evangelist
Posts: 461
Karma: 7897546
Join Date: Aug 2013
Location: Hamden, CT
Device: Kindle Paperwhite (11th gen), Scribe
|
Spell checking...every word is always shown at least once
Here's a sample of the spell check results on an eBook. This is just a sample, as this happens with all books.
Note that words that are obviously spelled correctly ("as", "at", "be", "bed", etc.) are listed, even though "Show only misspelled words" is checked. In addition, those words appear much more often than the listed count, but it looks like the spell checker thinks that only one instance of the word is misspelled. The eBook has the language set to "en" (no qualifiers like "en-US") in both the OPF and each HTML page. In "Manage Dictionaries", "United States" is set as the preferred variant for the English language. Is there any other config I should look for that might be the culprit? |
08-08-2023, 12:40 PM | #2 |
creator of calibre
Posts: 44,566
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
If you are using the builtin english dictionary and your books actually have language specified as english then it will work. So one of those conditions is not as you think.
|
Advert | |
|
08-08-2023, 04:22 PM | #3 |
Well trained by Cats
Posts: 30,454
Karma: 58055868
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
You may have cross languages set someplace:
The Library view may say English, but the books OPF says something else <dc:language>en</dc:language> or the individual HTML Code:
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" lang="en" xml:lang="en"> |
08-08-2023, 07:17 PM | #4 |
null operator (he/him)
Posts: 21,008
Karma: 27620706
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Look for spurious variants of 'are', 'at', 'been' etc in Tools->Reports->Words, I've occasionally seen something like this due to convoluted markup.
BR |
08-09-2023, 10:41 AM | #5 |
Evangelist
Posts: 461
Karma: 7897546
Join Date: Aug 2013
Location: Hamden, CT
Device: Kindle Paperwhite (11th gen), Scribe
|
This is from a book covered by copyright, but I don't think the metadata I'm posting violates the rules...if it does, I'm sorry.
Header on each HTML page: Code:
<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops"> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type" /> <title>Chasing the Dime</title> <link href="../Styles/stylesheet.css" rel="stylesheet" type="text/css" /> </head> Code:
<?xml version="1.0" encoding="utf-8"?> <package version="2.0" unique-identifier="uid" xmlns="http://www.idpf.org/2007/opf"> <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf"> <dc:title>Chasing the Dime</dc:title> <dc:language>en</dc:language> <dc:identifier id="uid">3897963789</dc:identifier> <dc:creator>Connelly, Michael</dc:creator> <dc:publisher>Little, Brown and Company</dc:publisher> <dc:subject>Fiction / Thrillers / General</dc:subject> <dc:date opf:event="publication">2002-10-15</dc:date> <dc:rights>Copyright © 2002 by Hieronymus, Inc.</dc:rights> <meta name="output encoding" content="utf-8"/> <meta name="primary-writing-mode" content="horizontal-lr"/> <meta name="Sigil version" content="1.9.30"/> <dc:date opf:event="modification" xmlns:opf="http://www.idpf.org/2007/opf">2023-08-03</dc:date> </metadata> Code:
<p class="para-indent">“Well, it’s occupied at the moment but it might not be for long.”</p> Also note that only the menu item for spell check shows the word as spelled incorrectly. The editor does not purple underline the word. Last edited by nabsltd; 08-09-2023 at 10:44 AM. |
Advert | |
|
08-09-2023, 04:42 PM | #6 |
Wizard
Posts: 1,366
Karma: 6794938
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
|
@nabsltd
In the Calibre Editor, move the cursor through each individual letter of the misspelt word. "be" in your example. Watch the bottom right corner and see if there are any spurious characters in the word. I know sounds silly, but there are hidden characters that can be added. I was able to show "be" as misspelt by adding a word joiner character. See image below... |
08-10-2023, 10:38 AM | #7 | |
Evangelist
Posts: 461
Karma: 7897546
Join Date: Aug 2013
Location: Hamden, CT
Device: Kindle Paperwhite (11th gen), Scribe
|
Quote:
Before: Code:
<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops"> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type" /> <title>Chasing the Dime</title> <link href="../Styles/stylesheet.css" rel="stylesheet" type="text/css" /> </head> Code:
<?xml version='1.0' encoding='utf-8'?> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" lang="en" xml:lang="en"> <head> <title>Chasing the Dime</title> <link href="../Styles/stylesheet.css" rel="stylesheet" type="text/css"/> </head> |
|
08-10-2023, 02:05 PM | #8 | |
Bibliophagist
Posts: 40,603
Karma: 157444382
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
|
|
08-10-2023, 02:13 PM | #9 | |
creator of calibre
Posts: 44,566
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Quote:
|
|
08-10-2023, 02:27 PM | #10 | ||
Bibliophagist
Posts: 40,603
Karma: 157444382
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
When I last looked at the epub2 documentation and dug into the supporting documents, they referenced the XHTML 1.1 documentation which states: Quote:
|
||
08-11-2023, 11:55 AM | #11 | |
Evangelist
Posts: 461
Karma: 7897546
Join Date: Aug 2013
Location: Hamden, CT
Device: Kindle Paperwhite (11th gen), Scribe
|
Quote:
Those are my headers, and I did not ask the Calibre editor to change HTML tags. I asked it to replace content within an HTML tag. |
|
08-11-2023, 10:03 PM | #12 |
creator of calibre
Posts: 44,566
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Feel free to not use the editor in that case.
|
08-13-2023, 03:07 PM | #13 | ||
Grand Sorcerer
Posts: 5,640
Karma: 23191067
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
Quote:
|
||
08-14-2023, 12:26 AM | #14 | |
creator of calibre
Posts: 44,566
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Quote:
EPUB is not HTML it is XHTML and is not rendered directly by browsers. In XHTML served up with the correct XHTML MIME type, the doctype is not required: XHTML If you serve your page as XHTML using the application/xhtml+xml MIME type in the Content-Type HTTP header, you do not need a DOCTYPE to enable standards mode, as such documents always use 'full standards mode'. https://developer.mozilla.org/en-US/...rds_Mode#xhtml And even epubcheck agrees with me. It does not warn about missing DOCTYPE in more modern versions of EPUB than EPUB 2. |
|
08-14-2023, 11:59 AM | #15 |
Evangelist
Posts: 461
Karma: 7897546
Join Date: Aug 2013
Location: Hamden, CT
Device: Kindle Paperwhite (11th gen), Scribe
|
Do you actually think silently deleting headers during a spell check replace is acceptable behavior?
No other search and replace in the Calibre editor does this...only "Fix HTML" and "Beautify files" make these sort of header changes, and the user would be expecting such changes, and those can be reverted by using "See what changed". I'd argue that neither of these should change valid headers, either, but that's a different issue. This behavior definitely does not follow the principle of least astonishment. |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Spell checking and PageEdit | softfoot | Sigil | 3 | 04-01-2021 10:22 AM |
spell checking with two or more languages | mcdummy | Editor | 5 | 07-23-2018 06:43 AM |
spell checking | brolny | Sigil | 1 | 09-18-2015 10:38 AM |
Multi-lingual spell checking | Stingo | Amazon Kindle | 6 | 11-19-2013 05:58 PM |