![]() |
#1 |
Junior Member
![]() Posts: 4
Karma: 30
Join Date: Jun 2010
Location: Amsterdam
Device: Kobo Forma and H2O 1st gen
|
Custom Regular Expressions for adding book information
Hi,
I would like to start a thread about recipes for regular expressions to extract book information from the file names Here are a few to start with: Standaard Calibre: (?P<title>.+) - (?P<author>[^_]+) example "Murder on the golf links - Agatha Christie.epub" Title: Murder on the golf links Author: Agatha Christie Series: Series Index: \s*(?P<series_index>[0-9]*)\s*(?P<title>[^_].+) ?- (?P<author>[^_]+) example "40 Murder on the golf links - Agatha Christie.epub" Title: Murder on the golf links Author: Agatha Christie Series: Series Index: 40.0 \s*((?P<title>(?P<series_index>[0-9]*)\s*[^_].+)) ?- (?P<author>([^_]+)) example "40 Murder on the golf links - Agatha Christie.epub" Title: 40 Murder on the golf links Author: Agatha Christie Series: Series Index: 40.0 \s*((?P<title>(?P<series_index>[0-9]*)\s*[^_].+)) ?- (?P<series>(?P<author>([^_]+))) example "40 Murder on the golf links - Agatha Christie.epub" Title: 40 Murder on the golf links Author: Agatha Christie Series: Agatha Christie Series Index: 40.0 Maybe it would be handy to have a dropdown box for the regular expression field with a history in it to choose from. Iam not to handy with reg. exp. and when experimenting sometimes i lose the correct syntax. Further I love Calibre ![]() |
![]() |
![]() |
![]() |
#2 |
Junior Member
![]() Posts: 1
Karma: 10
Join Date: Dec 2010
Device: Sony PRC-650
|
I think it might be more helpful for people to learn that there exists tools which can help us humans create regular expressions. This is one such tool:
http://gskinner.com/RegExr/ It's pretty good, unfortunately it doesn't show which group matches which strings, and it doesn't acknowledge certain special python/calibre rules (such as []] and [-]) but it does allow you to write your filename you want to match, and then type and see how far your current expression matches the string. Here's my expression (to get back on topic ![]() ^(?P<author>[^-]+)(\s*-\s*(\[?(?P<series>[^-0-9]+)\s*(?P<series_index>[0-9.]+)?]?)?)?.*?-\s*(?P<title>[^\]{[()]+\w) The expression expects the filename to start with Author and end with Title (possibly followed by garbage in parentheses). It also optionally matches series, and series index in the middle. It requires the title to end in an alpha-numeric character, and it does not allow the title to contain any kinds of parentheses (anything following a parenthesis will be discarded) It matches the following examples filenames Author Harris - Kingdom come Author Harris - Kingdom come (v1.0) Author Harris - Kingdom series - The Very Magical Kingdom (v1.0) Author Harris - Kingdom series 14.5 - The Very Magical Kingdom (v1.0) Author Harris - [Kingdom series 14.5] - The Very Magical Kingdom (v1.0) Author Harris - [Kingdom series 14.5] [another useless string]- The Very Magical Kingdom (v1.0) Author: Author Harris Title: Kingdom come Series: Kingdom series Series Index: 14.5 |
![]() |
![]() |
Advert | |
|
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Calibre book adding: Regular expression request... | Spiffy | Calibre | 34 | 01-19-2016 02:03 PM |
Problem with regular expressions | Manichean | Conversion | 10 | 02-03-2011 03:27 PM |
Help with Regular Expressions | ghostyjack | Workshop | 2 | 01-08-2010 12:04 PM |
Regular Expressions help needed | Phil_C | Workshop | 20 | 10-03-2009 01:14 AM |
BookDesigner v5 and regular expressions | ShineOn | Sony Reader | 11 | 08-25-2008 05:06 PM |