05-01-2009, 09:42 AM | #1 |
Groupie
Posts: 193
Karma: 1032826
Join Date: Mar 2008
Location: Miami, FL, USA
Device: iPhone 4, iPad 2
|
Need help with metadata by filename
My books are named as follows:
AuthorFirst AuthorLast - [Series_Name Index] - Book Tittle.EXT Example: Alex Archer - [Rogue Angel 03] - The Spider Stone.mobi I have tried and I have been unable to come up with an expression that will import the series name and index when the book is imported. Can someone here suggest a possible expression to do this job? Thank you in advance. Art |
05-01-2009, 12:54 PM | #2 |
creator of calibre
Posts: 44,428
Karma: 24044628
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
(?P<author>.+?) - \[(?P<series>.+?) (?P<series_index>[0-9]+)\] - (?P<title>.+)
|
Advert | |
|
05-11-2009, 12:07 AM | #3 |
Groupie
Posts: 193
Karma: 1032826
Join Date: Mar 2008
Location: Miami, FL, USA
Device: iPhone 4, iPad 2
|
|
08-12-2009, 08:32 PM | #4 |
Member
Posts: 16
Karma: 10
Join Date: May 2009
Device: sony prs-700bc
|
I have a similar need. I have filenames that have multiple hyphens. I want everything after the first hyphen to be considered the title. When I use this string:
(?P<author>.+) - (?P<title>[^_]+) everything after the rightmost hyphen is considered the title. I need it to be the leftmost. Thanks. Bob |
08-12-2009, 10:26 PM | #5 |
hopeless n00b
Posts: 5,110
Karma: 19597086
Join Date: Jan 2009
Location: in the middle of nowhere
Device: PW4, PW3, Libra H2O, iPad 10.5, iPad 11, iPad 12.9
|
The regular expression kovid posted should work fine, too. Just need to enclose the series info and one of the hyphens inside a ()?
|
Advert | |
|
08-12-2009, 10:39 PM | #6 |
Member
Posts: 16
Karma: 10
Join Date: May 2009
Device: sony prs-700bc
|
|
08-12-2009, 11:28 PM | #7 |
hopeless n00b
Posts: 5,110
Karma: 19597086
Join Date: Jan 2009
Location: in the middle of nowhere
Device: PW4, PW3, Libra H2O, iPad 10.5, iPad 11, iPad 12.9
|
Either
(?P<author>.+?) - (\[(?P<series>.+?) (?P<series_index>[0-9]+)\] - )?(?P<title>.+) or (?P<author>.+?)( - (\[(?P<series>.+?) (?P<series_index>[0-9]+)\])? - (?P<title>.+) would work. If you don't have any filenames with series information, then the following might be simpler: (?P<author>.+?) - (?P<title>[^_]+) |
08-13-2009, 12:11 PM | #8 | |
Member
Posts: 16
Karma: 10
Join Date: May 2009
Device: sony prs-700bc
|
Quote:
Maybe I'm not making myself clear. This should be a very simple string operation. Except, apparently, in Python. I want everything to the left of the leftmost hyphen to be the author. Everything to the right of the leftmost hyphen, including other hyphens, is the title. For example, if the filename is Aaaaaaa, Bbbbb - Qqqqqqq Rrrrr - Sssssssss Ttttttttt.pdf the author is Aaaaaaa, Bbbbb the title is Qqqqqqq Rrrrr - Sssssssss Ttttttttt Thanks again, Bob |
|
08-13-2009, 12:21 PM | #9 |
creator of calibre
Posts: 44,428
Karma: 24044628
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Code:
(?<author>[^-]+?) - (?<title>.+) |
08-13-2009, 12:23 PM | #10 |
frumious Bandersnatch
Posts: 7,536
Karma: 19000001
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
I think the problem is this (from the regular expressions reference):
The '*', '+', and '?' qualifiers are all greedy; they match as much text as possible. Sometimes this behaviour isn’t desired; if the RE <.*> is matched against '<H1>title</H1>', it will match the entire string, and not just '<H1>'. Adding '?' after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. Using .*? in the previous expression will match only '<H1>'. So, just use: (?P<author>[^_]+?) - (?P<title>.+) |
08-13-2009, 12:45 PM | #11 | |
Member
Posts: 16
Karma: 10
Join Date: May 2009
Device: sony prs-700bc
|
Quote:
(?P<author>.+?) - (\[(?P<series>.+?) (?P<series_index>[0-9]+)\] - )?(?P<title>.+) did work after all. I must've missed a character when I copied and pasted. Thank you. However, the expression (?P<author>[^_]+?) - (?P<title>.+) does not work. It still drops the last word of the title. Thanks again. You guys rule. |
|
08-13-2009, 04:29 PM | #12 | |
Member
Posts: 16
Karma: 10
Join Date: May 2009
Device: sony prs-700bc
|
Quote:
(?P<author>.+?) - (\[\] - )?(?P<title>.+) and now it seems to work exactly the way I wanted it. One thing I've noticed is that an expression may work when you test it in the Preferences/Advanced dialog, but work differently in real life. The expression above that I thought worked perfectly, only worked in test mode. It still garbled some of my file names when I added files to the library. I would say there's a bug somewhere. |
|
08-14-2009, 12:19 AM | #13 |
Wizard
Posts: 1,763
Karma: 30063305
Join Date: Dec 2006
Location: Singapore
Device: Boyue
|
I used booksorter from here
http://iterati.org/ebookTools/BookSorter/Default.aspx to rename all my files author - series 00 - Title it made calibre more accurate as it removed all the unwanted details from the filenames |
08-14-2009, 05:38 AM | #14 | |
Member
Posts: 16
Karma: 10
Join Date: May 2009
Device: sony prs-700bc
|
Quote:
|
|
08-14-2009, 07:06 AM | #15 |
Connoisseur
Posts: 57
Karma: 122
Join Date: Jul 2008
Device: CyBook Gen3, Sony PRS-600
|
Python is hardly arcane, and these recipes are not exactly Python anyway, they are regular expressions that are basically global across many programming languages (PHP, Perl all spring to mind)
Mark |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Regex and Metadata from filename. | asrrin29 | Calibre | 5 | 12-03-2023 04:51 AM |
Metadata in Title/filename | mezme | Calibre | 0 | 08-18-2010 03:08 AM |
Metadata Filename Syntax | gandor62 | Calibre | 15 | 07-18-2010 03:46 AM |
Little Help with Metadata from Filename needed | plunderydoo | Calibre | 4 | 09-06-2009 08:34 AM |
Metadata from filename problem | kad032000 | Calibre | 0 | 05-24-2009 02:26 AM |