Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 06-03-2019, 06:30 AM   #1
Holden100
Member
Holden100 began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Jun 2018
Device: Kindle
small roman numerals in header – regex help

I am trying to search and replace while converting PDF to Epub but stuck at how to take care of small Roman numbers (from i to xxxi) that appear as headers in the prologue – the even numbers are: xii<br>

prologue<br> etc while the odd numbers are prologue<br>xix<br> etc.

Clearly [xvi] does not help
Holden100 is offline   Reply With Quote
Old 06-03-2019, 09:11 AM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,073
Karma: 57259778
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
You are looking for a sequence of characters contained in the SET [xvi]

Find: [xvi]{1,3}
theducks is offline   Reply With Quote
Advert
Old 06-03-2019, 10:37 AM   #3
Holden100
Member
Holden100 began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Jun 2018
Device: Kindle
Quote:
Originally Posted by theducks View Post
You are looking for a sequence of characters contained in the SET [xvi]

Find: [xvi]{1,3}
Searching for this string [xvi]{1,3} includes all those words where i, x or v also appear.

Actually, it's a book and the small roman numerals are used only for the prologue pages – which run from 1 to 31, or i to xxxi

The string [xvi]{1,3} ends up finding over 34,000 occurrences. I am only looking to find around 31.

I don't know regex at all – but clearly it would need something like /b or something to narrow the searches down to only such cases where the words containing them are at the least 1 digit (from i, v, x) through two (ii, iv, vi, ix, xi, xx), three digits (iii, vii, xii, xiv, xvi), four digits (viii, xiii, xvii, xxii, xxiv, xxvi, xxix) and five digits (xviii, xxiii, xxvii) . or at the most six digits (xxviii) long
Holden100 is offline   Reply With Quote
Old 06-03-2019, 10:45 AM   #4
Holden100
Member
Holden100 began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Jun 2018
Device: Kindle
So basically I am looking to find words composed either, all or some repetition of letters [xvi] that could be from one to six letters long!
Holden100 is offline   Reply With Quote
Old 06-03-2019, 11:34 AM   #5
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,073
Karma: 57259778
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by Holden100 View Post
So basically I am looking to find words composed either, all or some repetition of letters [xvi] that could be from one to six letters long!
I changed the {1,6} to {1,3}, due to faulty thinking
The search I actually used, included the tags that existed in my document..
Again, I faild by overly simplifing

BTW I only make changes S&R using the editor. I don't try and do this kind of stuff during conversion
theducks is offline   Reply With Quote
Advert
Old 06-03-2019, 11:38 AM   #6
Holden100
Member
Holden100 began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Jun 2018
Device: Kindle
But clearly, something else is also required as simply doing [xvi]{1,6} still shows up over 34,000 results, when it should only be finding about 31 or so.

In fact, the search is locating almost ANY word with the letter x, v or i whereas we need to be looking ONLY for letters composed of these letters and no other letter.
Holden100 is offline   Reply With Quote
Old 06-03-2019, 12:54 PM   #7
lumpynose
Wizard
lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.
 
Posts: 1,086
Karma: 6719822
Join Date: Jul 2012
Device: Palm Pilot M105
Quote:
Originally Posted by theducks View Post
BTW I only make changes S&R using the editor. I don't try and do this kind of stuff during conversion
Converting from PDF is often a minefield with lots of things going bad. So this suggestion is something to pay attention to. Try doing a conversion with nothing else going on (no replacements), then open the epub in the editor and see how much damage was done. Other epub editors might also be helpful; I use sigil but I don't know the pros and cons of sigil vs calibre's editor.
lumpynose is offline   Reply With Quote
Old 06-03-2019, 03:02 PM   #8
Holden100
Member
Holden100 began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Jun 2018
Device: Kindle
But surely there must be some way of writing a regular expression for small roman numerals from i to xxxi?
Holden100 is offline   Reply With Quote
Old 06-03-2019, 04:19 PM   #9
Holden100
Member
Holden100 began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Jun 2018
Device: Kindle
Sorry, not clear. Please help.

I am trying to search and replace while converting PDF to Epub but stuck at how to take care of small Roman numbers (from i to xxxi) that appear as headers in the prologue – the even numbers are: "ii<br>prologue<br>" etc while the odd numbers are prologue"<br>xix<br>" etc.

Clearly [xvi]+ does not help as it ends up highlighting all words which have any of these letters.

Actually, it's a book and the small roman numerals are used only for the prologue pages – which run from 1 to 31, or i to xxxi

The string [xvi] ends up finding over 34,000 occurrences. I am only looking to find around 31.

I don't know regex at all – but clearly it would need something like /b or something to narrow the searches down to only such cases where the words containing them are at the least 1 digit (from i, v, x) through two (ii, iv, vi, ix, xi, xx), three digits (iii, vii, xii, xiv, xvi), four digits (viii, xiii, xvii, xxii, xxiv, xxvi, xxix) and five digits (xviii, xxiii, xxvii) . or at the most six digits (xxviii) long
Holden100 is offline   Reply With Quote
Old 06-03-2019, 05:06 PM   #10
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,755
Karma: 27405072
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Moderator Notice

Please don't post the same issue in multiple threads - See #3 - Duplicates in MR Guidelines

I moved your post in Blurr's thread into here

BR

BetterRed is offline   Reply With Quote
Old 06-03-2019, 09:26 PM   #11
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,906
Karma: 47303822
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
Try:

Code:
\b[ivx]+\b
The "\b" is for the word delimiter. The only issue I can think of is if there is one of those letters somewhere else in the book. And the following should fix if I read what you are saying correctly:

Code:
\b[ivx]+<br
davidfor is offline   Reply With Quote
Old 06-04-2019, 12:58 AM   #12
Holden100
Member
Holden100 began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Jun 2018
Device: Kindle
Quote:
Originally Posted by davidfor View Post
Try:

Code:
\b[ivx]+\b
The "\b" is for the word delimiter. The only issue I can think of is if there is one of those letters somewhere else in the book. And the following should fix if I read what you are saying correctly:

Code:
\b[ivx]+<br
Thank you so much! I think this works!
Holden100 is offline   Reply With Quote
Old 06-04-2019, 01:27 AM   #13
Holden100
Member
Holden100 began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Jun 2018
Device: Kindle
Quote:
Originally Posted by BetterRed View Post

Please don't post the same issue in multiple threads - See #3 - Duplicates in MR Guidelines

I moved your post in Blurr's thread into here

BR
Apologies
Holden100 is offline   Reply With Quote
Reply

Tags
regex


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
regex-function convert roman numerals weberr Editor 11 09-22-2021 05:15 PM
Roman numerals in series ownedbycats Calibre 2 10-21-2018 05:51 PM
Using Roman Numerals in Chapter Titles navyo Editor 5 06-30-2016 06:24 AM
Convert Roman numerals to Arabic? Peter W Sigil 2 04-09-2012 11:55 AM
regex search for roman numerals Blurr Calibre 2 12-16-2009 05:55 PM


All times are GMT -4. The time now is 02:34 AM.


MobileRead.com is a privately owned, operated and funded community.