11-17-2023, 07:39 PM | #1 | |
Wizard
Posts: 1,085
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
Spell check question
In the sample text
Quote:
Same for i.e. and e.g. (just the i.e and e.g part) I assume it's because spell check assumes that the last period is the end of a sentence I was going to just add a.m to the dictionary, but got concerned that sometimes there would be an a.m that really was not correct Any suggestions or ideas? |
|
11-17-2023, 07:51 PM | #2 |
creator of calibre
Posts: 44,380
Karma: 23766374
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
yeah that's basically a tokenization issue, the splitting of text into words by ICU doesnt handle these. There's real solution I'm afraid, you just ignore them. Or use AM and PM and that is and for example instead of the abbreviations.
|
11-17-2023, 07:51 PM | #3 |
Bibliophagist
Posts: 39,732
Karma: 154147706
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
You are going to need to decide which one is more important to you. Personally, I tend to add a.m., p.m., etc. to the exceptions since I consider the chance of, for example, having a. followed by a stray m to be fairly low.
|
11-17-2023, 08:37 PM | #4 |
null operator (he/him)
Posts: 20,946
Karma: 27620688
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
I accept the limitation, I've seen too many am.s and i.es to try second guessing.
For the UK you could use - "From 8:00am to 6:00pm Monday through Saturday" IIRC CMS allows closed small caps AM and PM. |
11-18-2023, 09:12 AM | #5 |
the rook, bossing Never.
Posts: 12,267
Karma: 89531599
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
Style guides variously have all of the possible options!
AM PM AM PM in small caps am pm A.M. P.M. A.M. P.M. in small caps a.m. p.m. AD CE BC BCE have less variations (small caps is common, the . usage rare and lower case very rare) The am and pm is pretty common in British English, usually with a space (perhaps a small space in print). Consistency is important. |
11-18-2023, 10:31 AM | #6 |
Wizard
Posts: 1,085
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
@Everyone
Thanks for the information. I had thought about a RegEx to replace 'a.m.' with just 'am' etc. but then i thought that if the 'a.m.' was at the end of a sentence, I'd only make things worse |
11-18-2023, 11:13 AM | #7 |
Well trained by Cats
Posts: 30,397
Karma: 58055234
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
There are cases, like you mention, that you just never use Replace all.
Search, <eyeball:yes> replace & find <eyeball:no> Search (a skip) Chances are that pattern will only appear a dozen or so times. A whole minute to do this way |
11-19-2023, 09:38 AM | #8 |
Wizard
Posts: 1,085
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
In my currently being edited book, chapter 1 had 20+ "a.m."s
Some in middle of sentence: " at 9 a.m. they ..." Some at end of sentence: " at 9 a.m. Next they ..." Some at end of clause: " at 9 a.m., and then they ..." Replace All would really foul that up |
11-19-2023, 10:40 AM | #9 | ||
the rook, bossing Never.
Posts: 12,267
Karma: 89531599
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
You'd need to detect a.m.<space> <Capital> and a.m.<end of paragraph> at least.
Quote:
Quote:
I'd search and then manual replace as I'd not trust myself to to think of all of the combinations and then do a correct regex. I had an ebook with messed up chapter headings (up to 13, but there was no 5) and the final edit needed a search and manual edit. As well as crazy spans the title of the chapter was on the line above "Chapter <n>" which IMO is the wrong way round. Also no CSS at all, no system ToC and entire ebook was one file. I added a CSS file, replaced styles with classes and then let a Calibre convert to get file per chapter. It also used multiple spaces (deleted all and did indents etc with CSS) and multiple empty paragraphs for layout (added CSS to new class for headings). |
||
11-20-2023, 05:44 PM | #10 | |
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
They discussed lots of helpful regexes + edge-cases + tips on how to catch/fix/normalize these types of issues. - - - Side Note: And Kovid is right. It's a hard problem with no real solution. Just Right-Click > Ignore the red squigglies in these few cases + come up with a few regexes to check for the common edge-cases. Like: Search: \b[APap]\.[Mm], Search: \b[APap][Mm]\., which would check for a.m. + p.m. missing a period followed by a comma. If you make use of Saved Searches this can be as simple as a single run of a Group. - - - Side Note #2: If you want extreme details on "sentence-ending periods" and why you don't want to enable spellchecking periods at end of words... see my discussion in: Sigil 1.9.10+ made that change, and I was STRONGLY opposed to it. The amount of clutter and mess it introduced into the Spellcheck Lists was immense. Heavily outweighed by the handful of acronyms like "a.m." + "p.m." you'd have to check. In Post #21, I even showed graphs of "Sentence-Enders" vs. "Acronyms", where 0.2% hits were "corrected", but 99.8% hits were made much worse. Last edited by Tex2002ans; 11-20-2023 at 06:12 PM. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Issues with the Spell Check | Frenchdummy | Calibre | 4 | 02-14-2017 07:53 AM |
Spell Check question | phossler | Editor | 14 | 11-09-2016 08:55 PM |
Spell check question | MerlinMama | Editor | 4 | 07-24-2015 03:45 AM |
Spell Check | GeckoFriend | Sigil | 5 | 06-15-2012 03:09 PM |
how to use spell check | richreads | Sigil | 2 | 01-24-2012 10:13 PM |