05-01-2016, 10:03 AM | #1 |
Wizard
Posts: 1,085
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
RegEx or RE Function to apply [Change Case] Capitialize?
Is there a RegEx or RegEx Function that will apply [Change Case] Capitialize to the upper case words at the beginning of a paragraph, which is usually in the first paragraph of each chapter?
It's purely personal opinion but in my Kindle I find the 'paper book' typographic style disconcerting and if there were a easy way to change it'd save a lot of manual work. I'd like to change the first <p> to the second <p>. It's not 100% since there could be proper names or acronyms in the UPPER CASE TEXT, but I can clean them up manually Code:
<body> <p>ALLCAPS ALLCAPS ALLCAPS lower case lower case lower case lower case lower case </p> <p>Allcaps allcaps allcaps lower case lower case lower case lower case lower case </p> </body> Thanks Last edited by phossler; 05-01-2016 at 11:15 AM. Reason: spelling |
05-01-2016, 11:44 AM | #2 |
Ex-Helpdesk Junkie
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
There are builtin functions to replace captured text as capitalized/lowercased/titlecased...
|
Advert | |
|
05-01-2016, 01:24 PM | #3 |
Wizard
Posts: 1,085
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
|
05-01-2016, 01:33 PM | #4 |
Well trained by Cats
Posts: 30,410
Karma: 58055234
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
|
05-01-2016, 02:13 PM | #5 | |
Wizard
Posts: 1,085
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
Quote:
Sorry, but I'm not sure I'm following that Calibre has the manual change case, so are you saying that with the Calibre RE engine, there's no way to change cases? If so, then I guess I'll have to keep doing it manually, chapter by chapter |
|
Advert | |
|
05-01-2016, 02:29 PM | #6 | |
Well trained by Cats
Posts: 30,410
Karma: 58055234
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
Or use Sigil if you don't write in Python |
|
05-01-2016, 03:17 PM | #7 |
creator of calibre
Posts: 44,413
Karma: 23977332
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
05-01-2016, 04:47 PM | #8 |
Wizard
Posts: 1,085
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
@kovidgoyal
Thanks, I use that all the time to fix things like Code:
<h1>CHAPTER 1 - THE BEGINNING</h1> Code:
<body> (before) <p>ALLCAPS ALLCAPS ALLCAPS lower case lower case lower case lower case lower case </p> (after) <p>Allcaps allcaps allcaps lower case lower case lower case lower case lower case </p> </body> I did try using the Capitialize text function with a Find of <p[^>]*>.+?</p> That works good for the ALLCAPS text I was using for a test, but it also gets applied to every paragraph. I was thinking there might be some clever (i.e. beyond me) RegEx way to take the first whole word after a <p> and make it TitleCase, and then the (say) next 4 words and make them lowercase. Then stepping through with Find or Replace&Find would just be a lot easier |
05-01-2016, 04:57 PM | #9 |
Ex-Helpdesk Junkie
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
I still don't understand the problem.
According to your example in post #1, capture the first three words and apply the default, builtin regex function. Although using both the word allcaps and caps letters to indicate your intent, then arguing with yourself, isn't helping. |
05-01-2016, 05:33 PM | #10 | |
Wizard
Posts: 1,085
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
Yes, but if there's 100 chapters in the book that all have the upper case leadin for the first paragraph, then I'd have to select the text for each chapter manually.
I was only wondering if there was a way to find a group of upper case words at the beginning of a paragraph and do all 100 at once Quote:
|
|
05-01-2016, 06:27 PM | #11 |
null operator (he/him)
Posts: 20,957
Karma: 27620688
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
@phossler - as I understand it, you want a One-Click something that marches through a book detecting all the 'first paragraphs' (some book's format the paragraph following a scene break as a 'first paragraph') that have leading Upper Case words and then change the leading Upper Case words to Sentence Case words.
I've yet to find a Case changer that deals with proper names correctly. So 'SMILEY FLEW TO VIENNA; CARSTAIR'S was in the arrivals hall to meet him...' inevitably becomes 'Smiley flew to vienna; carstair's was in the arrivals hall to meet him...' Some errors may be picked up by a Spellchecker. But not if 'SMILEY FLEW TO DODGE CITY; CAPABILITY BROWN was in the arrivals hall to meet him...' I suspect a goodly number of 'first paragraphs' in novels, biographies, and histories contain a proper name in the first sentence. So even after a One-Click button waved its magic wand, you'd have to eyeball each 'first paragraph' to sort the wand's curses from it's blessings. I always found the capitalisation of leading words in print irritating, especially as it's often arbitrary with no regard as to the author's phrasing. To me they're a printer's affectation. Getting rid of them in e-books. whilst tempting, usually requires more effort than I'm willing to expend. Unless of course the book is in need of serious tin-bashing, in which case selecting the text, changing the case, and correcting proper names may be worth the effort. BR |
05-01-2016, 07:19 PM | #12 |
Wizard
Posts: 1,085
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
@BR -- agree 100%
Was not even thinking of limited to just first paragraphs. I was going for any <p> tag I was thinking that ... if the first word after a <p> is all caps, then it would be capitialized if the second, third, and fourth (maybe more) words after a <p> were all caps, then they would be lower cased Even stepping through with [Find] or [Replace, Find Next] to catch: 'SMILEY FLEW TO DODGE CITY; CAPABILITY BROWN was in the arrivals hall to meet him...' to be fixed manually would be a lot faster then manually going to each file, highlighting the 'offending text,' and right clicking to change the case, etc. 'Smiley flew to Dodge City; Capaility Brown was in the arrivals hall to meet him...' I tried 2 RegExs to do it in two passes, but never got even the first one to work. I think theducks' comment about differences in RegEx engines means that there is no way to automate the process |
05-01-2016, 09:04 PM | #13 |
null operator (he/him)
Posts: 20,957
Karma: 27620688
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
The only instances I've seen of upper casing the first few words of a paragraph have been in the first paragraph of a chapter or scene - hence 'first paragraph'.
I doubt there's a regex engine that will differentiate between - 'LOS VEGAS IS CITY ONE SHOULD DODGE' and 'I HAD A GREAT TIME IN DODGE CITY.' I just opened first three 'real' books in my Test library and looked at words 2-6 in the first paragraph of the first chapter. A factual history had no proper nouns, a Regency novel had 'HYDE PARK', and an international banking exposé had 'LONG-TERM CAPITAL MANAGEMENT'. Of those six words the only one that would even be flagged as a spelling error is 'hyde', the other five ('park', 'long', 'term', 'capital', and 'management') are valid in lower case - yet they would all be patently wrong in the context they are used. Whilst ever there are no multi-lingual, deep-intelligence enabled regex engines which are capable of further deep-learning, us humans will always have something to do BR |
05-01-2016, 09:39 PM | #14 | |
Wizard
Posts: 1,085
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
Quote:
|
|
05-01-2016, 10:56 PM | #15 |
creator of calibre
Posts: 44,413
Karma: 23977332
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
No you wont. Simply run the S&R on "All text files" it will end up title casing all paragraphs in your book. But that is no problem since you want all paragraphs to have the same casing behavior.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
RegEx Function: Title Case | phossler | Editor | 29 | 07-04-2020 10:52 AM |
Regex Function about «» and “” | senhal | Editor | 8 | 04-06-2016 02:12 AM |
Regex Function - Split unknown word | Paulie_D | Editor | 19 | 12-07-2014 05:12 AM |
Change Case with Regex Problem | nqk | Editor | 4 | 07-25-2014 10:38 PM |
Regex for Title Case or Sentence case? | Turtle91 | Sigil | 3 | 01-19-2013 01:36 PM |