03-01-2023, 06:48 PM | #1 |
Bookworm
Posts: 4
Karma: 10
Join Date: Mar 2023
Location: Germany
Device: Kindle Keyboard + Paperwhite 1 + Paperwhite 3
|
Calibre RegExp search: unexpected results
Hi there...
Sorry to jump in with this rather esoteric question on my first post to this forum -- but everything less complicated has had solutions already, so there was never a need to register... [My thanks for that!] Caveat: I'm using a localized German version of Calibre 6.13, so my English links/names/... to certain calibre functions and dialogues may not be fully accurate. I'm trying to subselect my library with a regexp search in order to then work the remaining books with a "metadata/search&replace" operation. Context: When importing books, often the title contains information about the series and the series number. It would be convenient to separate these attributes using the regexp available at "add books/read metadata". However, there are so many different formattings of those attributes that I was unable to come up with a regexp that catches at least most of them. As this dialogue has no way to use more than one reqexp, I have to do that myself. Additionally, I want to shorten series information like "A ... series book 15" to "... 15" while letting "A ... book 23" stand at "A ... 15", because in the former case the "A" is not part of the series title. Obviously, after extracting the series name, I will also extract the series number, and then remove the series name from the title... One of my regexp to search for a specific class of titles is this: Code:
title:"~\((?:(An?|The)\s+)(?P<sname>[^\)]*?)(?:[,-:]?\s*)(?:(Small\s+Town|Trilogy|Series|Roman(ce|tic)|Cozy|Crime|Thrillers?|Suspense|Myster(y|ies))([\s:,]*))*Series\s*(?:(No.?|Number|Volume|Book|(Book\s*)\#)\s*)\#?(?P<sno>\d+([.,]\d+)?)\)" Code:
(The ... Series Book 1) (A ... Series Book 2) (An ... Series Book 3) Code:
... # Code:
Once Upon A Death (Days Of Death Series Book 1) BloodGifted: The Dantonville Legacy Series Book 1 (A Paranormal Romance) Poor Boy Road: A Gritty Hard-Hitting Thriller Series Book # 1 (JAKE CALDWELL) Alexa O'Brien Huntress Series Book 1-4 Box Set The Trouble with Bree: The Spotlight Series Book 1.5 #2 + #3 do not have a series number in front of the closing bracket. #4 + #5 have no brackets at all. When I test my expression against the names found by calibre, those names (name classes) are correctly not matched. Can anyone help me to understand what's goin wrong here? Tnx, Tillomar |
03-03-2023, 08:54 PM | #2 |
Bookworm
Posts: 4
Karma: 10
Join Date: Mar 2023
Location: Germany
Device: Kindle Keyboard + Paperwhite 1 + Paperwhite 3
|
simplified...
I simplified my regexp expression to gain insight into what is causing the problem (one of serveral); so far, this one is the simplest expression that does also find titles w/o even any round bracket:
Code:
title:"~\([^()]*?\s+\d+\)" Currently, I feel that this must be some bug in calibre, but I would hate to bother the bug list if the problem sits in front of the keyboard. Anyone? |
Advert | |
|
03-03-2023, 09:17 PM | #3 |
creator of calibre
Posts: 44,759
Karma: 24967300
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
This will almost certainly be escaping issues. IIRC you need double backslashes. Start with something simple like
title:"~\\(" check that this finds titles with a ( in them. |
03-04-2023, 06:57 AM | #4 | |||
Grand Sorcerer
Posts: 12,171
Karma: 7908995
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Quote:
Quote:
Quote:
|
|||
03-08-2023, 12:11 AM | #5 |
Bookworm
Posts: 4
Karma: 10
Join Date: Mar 2023
Location: Germany
Device: Kindle Keyboard + Paperwhite 1 + Paperwhite 3
|
Many thanks: double (or super-) escaping did it!
|
Advert | |
|
Tags |
calibre, rexexp, search |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Search results in latest Calibre (6.2.1) | ggtdm | Calibre | 6 | 08-09-2022 02:00 PM |
Search results only show first result. How can I view successive results? | lovedj1 | Calibre | 2 | 05-07-2021 08:53 AM |
Template: Unexpected results with days_between | ownedbycats | Library Management | 1 | 03-28-2021 09:42 PM |
Forma Search-in-book results sometimes ends on page17, even if there's more (hidden)results | droopy | Kobo Reader | 9 | 06-30-2020 12:05 PM |
Unutterably Silly Unexpected results of the pumpkin pie | kennyc | Lounge | 7 | 11-24-2010 01:14 PM |