08-28-2009, 08:11 AM | #16 |
Junior Member
Posts: 2
Karma: 10
Join Date: Aug 2009
Device: iPhone
|
Maybe I'm going about this the wrong way?
What I'm having trouble with are the file names that come out when I use the conversion tool. I'm trying to find a way to change the naming convention.
For example: File Brazen.pdf comes out as Brazen - Unknown.epub Sometime the author's name will populate. I'd like to eliminate the author name altogether along with the hyphen and space so it's Brazen.epub Lori |
08-29-2009, 04:27 PM | #17 |
Zealot
Posts: 115
Karma: 150
Join Date: Jul 2008
Location: Netherlands Veenendaal
Device: Palm T5, Sony PRS-505, Nook Color
|
Just to let you know that I might have found something that might help you too .
Googling for some help I found two programs that really helped me, YMMV: Regex Coach : http://weitz.de/regex-coach/ Kodos : http://kodos.sourceforge.net/ Where I found Regex Coach the better one with more possiblities and better info on what is happening. Regards, Joop |
Advert | |
|
11-13-2009, 11:01 AM | #18 |
Junior Member
Posts: 1
Karma: 10
Join Date: Nov 2009
Device: none
|
for people who are coming to this topic, here is another regex helper app -- web based and free, though you'll have to translate from perl regex to python regex once you're done; mostly just some syntax stuff, for instance - perl has no (?P<tag>) captures, just '()'. i use this site often to help me when i'm stuck on a particular regex problem.
http://www.gskinner.com/RegExr/ it's flash-based, but very worthwhile. the tooltip debug info and highlighting really make it worthwhile, imo. |
12-16-2009, 03:50 AM | #19 |
Junior Member
Posts: 2
Karma: 10
Join Date: Dec 2009
Device: stanza
|
so i'm having a problem getting calibre to read the titles & author for my books. I realized that I have it saved in a different format than what it's asking for, but is there anyway to get calibre to read my particular format without me actually having to change the 400 ebooks i have on my computer?
below are some samples on how my books are saved. Nora Roberts_True Betrayals.lit or Nora Roberts_TSI 01. Dance Upon the Air.lit is there a regular expressions i can use that will allow calibre to read these titles? thanks in advance! |
12-16-2009, 04:39 AM | #20 |
Wizard
Posts: 4,553
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
|
With .LIT files Calibre will normally take this information from the metadata stored inside the .LIT file at the time you add the files to the Calibre library. Is there any particular reason that you want it taken from the filename instead?
|
Advert | |
|
12-16-2009, 05:14 PM | #21 |
Junior Member
Posts: 2
Karma: 10
Join Date: Dec 2009
Device: stanza
|
For some reason, when it takes it from the metadata files, it is completely off. Some of them aren't even the right titles, or the author often time becomes "unknown". so i was hoping to pull it form the file name, so that it would look the way I want it to look
|
12-30-2009, 09:10 PM | #22 | |
Enthusiast
Posts: 28
Karma: 10
Join Date: Dec 2009
Device: PRS-505; Galaxy Tab 7
|
Quote:
It allows you to rename files using a flexible Match expression, and a flexible Replacement expression so swapping fields is easy. Since all my files are formatted as either Surname, Firstname - whatever.txt or Firstname Lastname - whatever.txt you can build off the comma to match the files you want swapped. I use the regex. Code:
^(\w+)[ ]*,[ ]*([^-]+?)[ ]*-[ ]*(.*) In case that's all just gibberish, here's an visual example: http://i97.photobucket.com/albums/l2...Authorswap.jpg From there the expressions in this thread work perfectly to import into Calibre. Last edited by DedTV; 12-31-2009 at 11:25 AM. |
|
01-01-2010, 02:40 AM | #23 |
Connoisseur
Posts: 59
Karma: 10
Join Date: Dec 2009
Device: PRS700
|
Speaking of regex help... maybe some of the experts on here can help this beginner out
My files are formatted in the following ways: option A -> Author ~ Title option B -> Author ~ Title - [Series 00] option C -> Author ~ Title - [Collection 00 - Series 00] I have finally got the following regex to work correctly for option B and option C (?P<author>.+?) ~ (?P<title>.+?)(\s-\s)\[(?P<series>.*)\s(?P<series_index>[0-9.]*)\]? Here is what I get when I run the REGEX on the following format(Author ~ Title) I get the following, which is what I DO NOT want... I want it to also separate the author and title even when there isn't a series Title = Author ~ Title Author = No Match Series = No Match If I run it with (Author ~ Title - [Series]) I get the following which is what I want: Title = Title Author = Author Series = Series [Series Index] If I run it with (Author ~ Title - [Collection - Series]) I get the following which is what I want: Title = Title Author = Author Series = Collection - Series [Series Index] However, I won't recognize Option A... how can I get it to read the author and title correctly if there is NO series? Last edited by mezme; 01-01-2010 at 02:46 AM. |
01-04-2010, 07:48 PM | #24 |
Guru
Posts: 644
Karma: 1242364
Join Date: May 2009
Location: The Right Coast
Device: PC (Calibre), Nexus 7 2013 (Moon+ Pro), HTC HD2/Leo (Freda)
|
Ok, I'm not where I can get to my regex software, but as near as I can tell from the expression...
It seems that you've been greedy (the + operator) without giving back at the end. So effectively you use the entire filename for the first expression test (Author), but it fails because it is not supposed to have a tilde in it. You need to make use of the "give back" operator to release portions of the filename to limit it to just the Author portion. Unfortunately I cannot remember the operator at the moment, nor the "phrasing" for doing so. |
01-08-2010, 01:56 AM | #25 |
Connoisseur
Posts: 59
Karma: 10
Join Date: Dec 2009
Device: PRS700
|
Thanks! after playing with it I finally got it working:
(?P<author>[^-]+)\x20-\x20(?P<title>[^-]+)(?:-\s+\[(?P<series>[^.]+?)(?P<series_index>\d+)?\])? Successfully parses the following formats: author - title author - title - [series] author - title - [series Number] author - title - [collection - series Number] author - title - [collection number - series Number] |
01-10-2010, 11:04 PM | #26 |
Tablet eReader
Posts: 45
Karma: 12620
Join Date: Dec 2009
Location: Western PA
Device: Samsung Galaxy Tab 7, iPad, Dell Streak 7, Moto RAZR MAXX
|
Does anyone have a reg ex similar to this one:
^((?P<author>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?((?P<series>[^0-9\-]+)(\s*-\s*)?(?P<series_index>[0-9.]+)\s*-\s*)?(?P<title>[^\-_0-9]+) But... (there's always a but) I need it to remove the comma between the author's first and last names. Such as in this filename: Last, First - Series 01 - Title.lit Which gets imported like this: Author: First Last, THANKS in advance, Tom |
01-10-2010, 11:29 PM | #27 |
Tablet eReader
Posts: 45
Karma: 12620
Join Date: Dec 2009
Location: Western PA
Device: Samsung Galaxy Tab 7, iPad, Dell Streak 7, Moto RAZR MAXX
|
This regex does the same thing, as far as I can figure:
(?P<author>.+?) - ((?P<series>.+?) (?P<series_index>[0-9]+) - )?(?P<title>.+) They both detect a series name and index and read them properly whether they're there or not. But it still has the comma problem in the author's name. |
01-18-2010, 12:31 PM | #28 |
Tablet eReader
Posts: 45
Karma: 12620
Join Date: Dec 2009
Location: Western PA
Device: Samsung Galaxy Tab 7, iPad, Dell Streak 7, Moto RAZR MAXX
|
Anyone?
|
01-18-2010, 11:56 PM | #29 | |
Banned
Posts: 475
Karma: 796
Join Date: Sep 2008
Location: Honolulu
Device: Nokia 770 (fbreader)
|
My first take is that I think the problem comes down to the fact that the (?<author>) function has to include the comma because you have to find the beginning and the end of the name -- if there were separate functions for First and Last you could exclude the comma.
Even placing the comma in its own set via parentheses, there's no obvious way to replace it with nothing -- or filter it from the match. Now, I'm no expert. Perhaps there's a tricky way to exclude from the return a subset of that return. I just took a look at the Calibre regex help -- I thought maybe this would do it: Quote:
Code:
^((?P<author>([^\-_0-9]+)(?:,)([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?((?P<series>[^0-9\-]+)(\s*-\s*)?(?P<series_index>[0-9.]+)\s*-\s*)?(?P<title>[^\-_0-9]+) Now I realize that what it means is that in a more normal regex, the objects in such a set are not placed into the numeric variables for reuse later [ie: \1 \2 \3 or $1 $2 $3 depending on your regex flavor.] But because that comma is contained within a larger set, that larger set is returned to the label <author>. BTW, your original regex found the Author as "Last, First" not "First Last," in the test window, so I cannot comment to its effectiveness. m a r Last edited by rogue_ronin; 01-18-2010 at 11:59 PM. |
|
01-19-2010, 01:51 PM | #30 | ||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
I think it's an error to keep the comma in the lastname. There's no way to get rid of it that I can see short of fixing the code, so I searched the code: lines 135 to 144 of meta.py have: Code:
if prefs['swap_author_names'] and mi.authors: def swap(a): parts = a.split() if len(parts) > 1: t = parts[-1] parts = parts[:-1] parts.insert(0, t) return ' '.join(parts) mi.authors = [swap(x) for x in mi.authors] Dropping the last char of the parts[:-1] string, if it is a comma, will work for simple cases. However, this area of the code could probably be improved even more. For example, the code above will change "Tolkien, J R R" to "R Tolkien, J R" I have the bare minimum of skill to improve this code, but I suspect someone who is more familiar with python, ebooks and the philosophy of calibre could do better. For example, do you want to split the name at the comma, instead of at the last character string, or are commas in author names common? Do you want to do something special with "John T Smith, Jr." or "Smith, John T, Jr." or add more checkbox options or what? Anyone who wants to compile a list of various author name formats, single and multiple that might be encountered and comments on what the code should do in each case could help whoever wants to improve this code. I'd suggest anyone who wants this improved should add a ticket and get back here to post the ticket number and their comments on exactly how the improvement should work. BTW, If anyone wants a simple fix to their own code, adding the two lines below will do it: Code:
if prefs['swap_author_names'] and mi.authors: def swap(a): parts = a.split() if len(parts) > 1: t = parts[-1] parts = parts[:-1] if parts[-1].endswith(','): parts[-1]=parts[-1][:-1] parts.insert(0, t) return ' '.join(parts) mi.authors = [swap(x) for x in mi.authors] edit: An even simpler fix is to change the split() in the original code to split(','). This splits on the comma (assumes that the firstname and lastname are separated by a comma - as mine all are). This correctly swaps a name like "Tolkien, J R R." Last edited by Starson17; 01-19-2010 at 04:47 PM. |
||
Tags |
regex, regular expressions |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Regular Expression Help | smartmart | Calibre | 5 | 10-17-2010 06:19 AM |
Need Help Creating a Regular Expression | Worm | Calibre | 9 | 08-18-2010 02:20 PM |
Regular Expression Help Needed | dloyer4 | Calibre | 1 | 07-25-2010 11:37 PM |
Help with the regular expression | Dysonco | Calibre | 9 | 03-22-2010 11:45 PM |
I don't know how to use wilcards and regular expression.... | superanima | Sigil | 4 | 02-21-2010 10:42 AM |