06-03-2019, 06:30 AM | #1 |
Member
Posts: 10
Karma: 10
Join Date: Jun 2018
Device: Kindle
|
small roman numerals in header – regex help
I am trying to search and replace while converting PDF to Epub but stuck at how to take care of small Roman numbers (from i to xxxi) that appear as headers in the prologue – the even numbers are: xii<br>
prologue<br> etc while the odd numbers are prologue<br>xix<br> etc. Clearly [xvi] does not help |
06-03-2019, 09:11 AM | #2 |
Well trained by Cats
Posts: 30,416
Karma: 58055868
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
You are looking for a sequence of characters contained in the SET [xvi]
Find: [xvi]{1,3} |
06-03-2019, 10:37 AM | #3 | |
Member
Posts: 10
Karma: 10
Join Date: Jun 2018
Device: Kindle
|
Quote:
Actually, it's a book and the small roman numerals are used only for the prologue pages – which run from 1 to 31, or i to xxxi The string [xvi]{1,3} ends up finding over 34,000 occurrences. I am only looking to find around 31. I don't know regex at all – but clearly it would need something like /b or something to narrow the searches down to only such cases where the words containing them are at the least 1 digit (from i, v, x) through two (ii, iv, vi, ix, xi, xx), three digits (iii, vii, xii, xiv, xvi), four digits (viii, xiii, xvii, xxii, xxiv, xxvi, xxix) and five digits (xviii, xxiii, xxvii) . or at the most six digits (xxviii) long |
|
06-03-2019, 10:45 AM | #4 |
Member
Posts: 10
Karma: 10
Join Date: Jun 2018
Device: Kindle
|
So basically I am looking to find words composed either, all or some repetition of letters [xvi] that could be from one to six letters long!
|
06-03-2019, 11:34 AM | #5 | |
Well trained by Cats
Posts: 30,416
Karma: 58055868
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
The search I actually used, included the tags that existed in my document.. Again, I faild by overly simplifing BTW I only make changes S&R using the editor. I don't try and do this kind of stuff during conversion |
|
06-03-2019, 11:38 AM | #6 |
Member
Posts: 10
Karma: 10
Join Date: Jun 2018
Device: Kindle
|
But clearly, something else is also required as simply doing [xvi]{1,6} still shows up over 34,000 results, when it should only be finding about 31 or so.
In fact, the search is locating almost ANY word with the letter x, v or i whereas we need to be looking ONLY for letters composed of these letters and no other letter. |
06-03-2019, 12:54 PM | #7 |
Wizard
Posts: 1,086
Karma: 6719822
Join Date: Jul 2012
Device: Palm Pilot M105
|
Converting from PDF is often a minefield with lots of things going bad. So this suggestion is something to pay attention to. Try doing a conversion with nothing else going on (no replacements), then open the epub in the editor and see how much damage was done. Other epub editors might also be helpful; I use sigil but I don't know the pros and cons of sigil vs calibre's editor.
|
06-03-2019, 03:02 PM | #8 |
Member
Posts: 10
Karma: 10
Join Date: Jun 2018
Device: Kindle
|
But surely there must be some way of writing a regular expression for small roman numerals from i to xxxi?
|
06-03-2019, 04:19 PM | #9 |
Member
Posts: 10
Karma: 10
Join Date: Jun 2018
Device: Kindle
|
Sorry, not clear. Please help.
I am trying to search and replace while converting PDF to Epub but stuck at how to take care of small Roman numbers (from i to xxxi) that appear as headers in the prologue – the even numbers are: "ii<br>prologue<br>" etc while the odd numbers are prologue"<br>xix<br>" etc. Clearly [xvi]+ does not help as it ends up highlighting all words which have any of these letters. Actually, it's a book and the small roman numerals are used only for the prologue pages – which run from 1 to 31, or i to xxxi The string [xvi] ends up finding over 34,000 occurrences. I am only looking to find around 31. I don't know regex at all – but clearly it would need something like /b or something to narrow the searches down to only such cases where the words containing them are at the least 1 digit (from i, v, x) through two (ii, iv, vi, ix, xi, xx), three digits (iii, vii, xii, xiv, xvi), four digits (viii, xiii, xvii, xxii, xxiv, xxvi, xxix) and five digits (xviii, xxiii, xxvii) . or at the most six digits (xxviii) long |
06-03-2019, 05:06 PM | #10 |
null operator (he/him)
Posts: 20,959
Karma: 27620690
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Moderator Notice
Please don't post the same issue in multiple threads - See #3 - Duplicates in MR Guidelines I moved your post in Blurr's thread into here BR |
06-03-2019, 09:26 PM | #11 |
Grand Sorcerer
Posts: 24,905
Karma: 47303822
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
Try:
Code:
\b[ivx]+\b Code:
\b[ivx]+<br |
06-04-2019, 12:58 AM | #12 |
Member
Posts: 10
Karma: 10
Join Date: Jun 2018
Device: Kindle
|
Thank you so much! I think this works!
|
06-04-2019, 01:27 AM | #13 | |
Member
Posts: 10
Karma: 10
Join Date: Jun 2018
Device: Kindle
|
Quote:
|
|
Tags |
regex |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
regex-function convert roman numerals | weberr | Editor | 11 | 09-22-2021 05:15 PM |
Roman numerals in series | ownedbycats | Calibre | 2 | 10-21-2018 05:51 PM |
Using Roman Numerals in Chapter Titles | navyo | Editor | 5 | 06-30-2016 06:24 AM |
Convert Roman numerals to Arabic? | Peter W | Sigil | 2 | 04-09-2012 11:55 AM |
regex search for roman numerals | Blurr | Calibre | 2 | 12-16-2009 05:55 PM |