Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 05-01-2016, 10:03 AM   #1
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,085
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
RegEx or RE Function to apply [Change Case] Capitialize?

Is there a RegEx or RegEx Function that will apply [Change Case] Capitialize to the upper case words at the beginning of a paragraph, which is usually in the first paragraph of each chapter?

It's purely personal opinion but in my Kindle I find the 'paper book' typographic style disconcerting and if there were a easy way to change it'd save a lot of manual work.

I'd like to change the first <p> to the second <p>. It's not 100% since there could be proper names or acronyms in the UPPER CASE TEXT, but I can clean them up manually


Code:
<body>

<p>ALLCAPS ALLCAPS ALLCAPS lower case  lower case  lower case  lower case  lower case </p>

<p>Allcaps allcaps allcaps lower case  lower case  lower case  lower case  lower case </p>

</body>

Thanks

Last edited by phossler; 05-01-2016 at 11:15 AM. Reason: spelling
phossler is offline   Reply With Quote
Old 05-01-2016, 11:44 AM   #2
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
There are builtin functions to replace captured text as capitalized/lowercased/titlecased...
eschwartz is offline   Reply With Quote
Advert
Old 05-01-2016, 01:24 PM   #3
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,085
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
Quote:
Originally Posted by eschwartz View Post
There are builtin functions to replace captured text as capitalized/lowercased/titlecased...
Yes -- Select + Right Click - [Change Case] Capitialize is what I've been doing

I was looking for a less manual way
phossler is offline   Reply With Quote
Old 05-01-2016, 01:33 PM   #4
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,115
Karma: 57259780
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by phossler View Post
Yes -- Select + Right Click - [Change Case] Capitialize is what I've been doing

I was looking for a less manual way
You have run into a difference in REGEX engines

Sigil uses PCRE. That engine can do case (\L, \U ) from S&R
theducks is offline   Reply With Quote
Old 05-01-2016, 02:13 PM   #5
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,085
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
Quote:
Originally Posted by theducks View Post
You have run into a difference in REGEX engines

Sigil uses PCRE. That engine can do case (\L, \U ) from S&R

Sorry, but I'm not sure I'm following that

Calibre has the manual change case, so are you saying that with the Calibre RE engine, there's no way to change cases?

If so, then I guess I'll have to keep doing it manually, chapter by chapter
phossler is offline   Reply With Quote
Advert
Old 05-01-2016, 02:29 PM   #6
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,115
Karma: 57259780
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by phossler View Post
Sorry, but I'm not sure I'm following that

Calibre has the manual change case, so are you saying that with the Calibre RE engine, there's no way to change cases?

If so, then I guess I'll have to keep doing it manually, chapter by chapter
There is a Function Mode for the S&R in the editor (use the Help in the Editor, to locate) (I am currently on a system that does not have the editor)

Or use Sigil if you don't write in Python
theducks is offline   Reply With Quote
Old 05-01-2016, 03:17 PM   #7
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,149
Karma: 22670164
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
http://manual.calibre-ebook.com/func...n-the-document
kovidgoyal is offline   Reply With Quote
Old 05-01-2016, 04:47 PM   #8
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,085
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
@kovidgoyal

Thanks, I use that all the time to fix things like

Code:
<h1>CHAPTER 1 - THE BEGINNING</h1>
I was really looking to see if there is any automatic way to change paragraph leading upper case words to lower case

Code:
<body>

(before)

<p>ALLCAPS ALLCAPS ALLCAPS lower case  lower case  lower case  lower case  lower case </p>

(after)

<p>Allcaps allcaps allcaps lower case  lower case  lower case  lower case  lower case </p>

</body>

I did try using the Capitialize text function with a Find of <p[^>]*>.+?</p>

That works good for the ALLCAPS text I was using for a test, but it also gets applied to every paragraph.

I was thinking there might be some clever (i.e. beyond me) RegEx way to take the first whole word after a <p> and make it TitleCase, and then the (say) next 4 words and make them lowercase. Then stepping through with Find or Replace&Find would just be a lot easier
phossler is offline   Reply With Quote
Old 05-01-2016, 04:57 PM   #9
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
I still don't understand the problem.

According to your example in post #1, capture the first three words and apply the default, builtin regex function.


Although using both the word allcaps and caps letters to indicate your intent, then arguing with yourself, isn't helping.
eschwartz is offline   Reply With Quote
Old 05-01-2016, 05:33 PM   #10
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,085
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
Yes, but if there's 100 chapters in the book that all have the upper case leadin for the first paragraph, then I'd have to select the text for each chapter manually.

I was only wondering if there was a way to find a group of upper case words at the beginning of a paragraph and do all 100 at once


Quote:
Although using both the word allcaps and caps letters to indicate your intent, then arguing with yourself, isn't helping.
I'll try to avoid arguing with myself
phossler is offline   Reply With Quote
Old 05-01-2016, 06:27 PM   #11
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,771
Karma: 27405072
Join Date: Mar 2012
Location: Sydney Australia
Device: none
@phossler - as I understand it, you want a One-Click something that marches through a book detecting all the 'first paragraphs' (some book's format the paragraph following a scene break as a 'first paragraph') that have leading Upper Case words and then change the leading Upper Case words to Sentence Case words.

I've yet to find a Case changer that deals with proper names correctly. So 'SMILEY FLEW TO VIENNA; CARSTAIR'S was in the arrivals hall to meet him...' inevitably becomes 'Smiley flew to vienna; carstair's was in the arrivals hall to meet him...'

Some errors may be picked up by a Spellchecker. But not if 'SMILEY FLEW TO DODGE CITY; CAPABILITY BROWN was in the arrivals hall to meet him...'

I suspect a goodly number of 'first paragraphs' in novels, biographies, and histories contain a proper name in the first sentence. So even after a One-Click button waved its magic wand, you'd have to eyeball each 'first paragraph' to sort the wand's curses from it's blessings.

I always found the capitalisation of leading words in print irritating, especially as it's often arbitrary with no regard as to the author's phrasing. To me they're a printer's affectation.

Getting rid of them in e-books. whilst tempting, usually requires more effort than I'm willing to expend. Unless of course the book is in need of serious tin-bashing, in which case selecting the text, changing the case, and correcting proper names may be worth the effort.

BR
BetterRed is online now   Reply With Quote
Old 05-01-2016, 07:19 PM   #12
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,085
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
@BR -- agree 100%

Was not even thinking of limited to just first paragraphs. I was going for any <p> tag


I was thinking that ...

if the first word after a <p> is all caps, then it would be capitialized

if the second, third, and fourth (maybe more) words after a <p> were all caps, then they would be lower cased

Even stepping through with [Find] or [Replace, Find Next] to catch:

'SMILEY FLEW TO DODGE CITY; CAPABILITY BROWN was in the arrivals hall to meet him...'

to be fixed manually would be a lot faster then manually going to each file, highlighting the 'offending text,' and right clicking to change the case, etc.

'Smiley flew to Dodge City; Capaility Brown was in the arrivals hall to meet him...'

I tried 2 RegExs to do it in two passes, but never got even the first one to work.

I think theducks' comment about differences in RegEx engines means that there is no way to automate the process
phossler is offline   Reply With Quote
Old 05-01-2016, 09:04 PM   #13
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,771
Karma: 27405072
Join Date: Mar 2012
Location: Sydney Australia
Device: none
The only instances I've seen of upper casing the first few words of a paragraph have been in the first paragraph of a chapter or scene - hence 'first paragraph'.

I doubt there's a regex engine that will differentiate between - 'LOS VEGAS IS CITY ONE SHOULD DODGE' and 'I HAD A GREAT TIME IN DODGE CITY.'

I just opened first three 'real' books in my Test library and looked at words 2-6 in the first paragraph of the first chapter. A factual history had no proper nouns, a Regency novel had 'HYDE PARK', and an international banking exposé had 'LONG-TERM CAPITAL MANAGEMENT'. Of those six words the only one that would even be flagged as a spelling error is 'hyde', the other five ('park', 'long', 'term', 'capital', and 'management') are valid in lower case - yet they would all be patently wrong in the context they are used.

Whilst ever there are no multi-lingual, deep-intelligence enabled regex engines which are capable of further deep-learning, us humans will always have something to do

BR
BetterRed is online now   Reply With Quote
Old 05-01-2016, 09:39 PM   #14
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,085
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
Quote:
Whilst ever there are no multi-lingual, deep-intelligence enabled regex engines which are capable of further deep-learning, us humans will always have something to do
Roger that
phossler is offline   Reply With Quote
Old 05-01-2016, 10:56 PM   #15
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,149
Karma: 22670164
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by phossler View Post
Yes, but if there's 100 chapters in the book that all have the upper case leadin for the first paragraph, then I'd have to select the text for each chapter manually.
No you wont. Simply run the S&R on "All text files" it will end up title casing all paragraphs in your book. But that is no problem since you want all paragraphs to have the same casing behavior.
kovidgoyal is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
RegEx Function: Title Case phossler Editor 29 07-04-2020 10:52 AM
Regex Function about «» and “” senhal Editor 8 04-06-2016 02:12 AM
Regex Function - Split unknown word Paulie_D Editor 19 12-07-2014 05:12 AM
Change Case with Regex Problem nqk Editor 4 07-25-2014 10:38 PM
Regex for Title Case or Sentence case? Turtle91 Sigil 3 01-19-2013 01:36 PM


All times are GMT -4. The time now is 06:32 PM.


MobileRead.com is a privately owned, operated and funded community.