Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 08-28-2009, 08:11 AM   #16
lorijames
Junior Member
lorijames began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Aug 2009
Device: iPhone
Maybe I'm going about this the wrong way?

What I'm having trouble with are the file names that come out when I use the conversion tool. I'm trying to find a way to change the naming convention.

For example: File Brazen.pdf comes out as Brazen - Unknown.epub
Sometime the author's name will populate.

I'd like to eliminate the author name altogether along with the hyphen and space so it's Brazen.epub

Lori
lorijames is offline  
Old 08-29-2009, 04:27 PM   #17
JvdW
Zealot
JvdW doesn't litterJvdW doesn't litter
 
Posts: 115
Karma: 150
Join Date: Jul 2008
Location: Netherlands Veenendaal
Device: Palm T5, Sony PRS-505, Nook Color
Just to let you know that I might have found something that might help you too .
Googling for some help I found two programs that really helped me, YMMV:
Regex Coach : http://weitz.de/regex-coach/
Kodos : http://kodos.sourceforge.net/
Where I found Regex Coach the better one with more possiblities and better info on what is happening.

Regards,

Joop
JvdW is offline  
Advert
Old 11-13-2009, 11:01 AM   #18
tetujin
Junior Member
tetujin began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Nov 2009
Device: none
for people who are coming to this topic, here is another regex helper app -- web based and free, though you'll have to translate from perl regex to python regex once you're done; mostly just some syntax stuff, for instance - perl has no (?P<tag>) captures, just '()'. i use this site often to help me when i'm stuck on a particular regex problem.

http://www.gskinner.com/RegExr/

it's flash-based, but very worthwhile. the tooltip debug info and highlighting really make it worthwhile, imo.
tetujin is offline  
Old 12-16-2009, 03:50 AM   #19
ooayeloo
Junior Member
ooayeloo began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Dec 2009
Device: stanza
so i'm having a problem getting calibre to read the titles & author for my books. I realized that I have it saved in a different format than what it's asking for, but is there anyway to get calibre to read my particular format without me actually having to change the 400 ebooks i have on my computer?

below are some samples on how my books are saved.

Nora Roberts_True Betrayals.lit
or
Nora Roberts_TSI 01. Dance Upon the Air.lit

is there a regular expressions i can use that will allow calibre to read these titles?

thanks in advance!
ooayeloo is offline  
Old 12-16-2009, 04:39 AM   #20
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,553
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
With .LIT files Calibre will normally take this information from the metadata stored inside the .LIT file at the time you add the files to the Calibre library. Is there any particular reason that you want it taken from the filename instead?
itimpi is offline  
Advert
Old 12-16-2009, 05:14 PM   #21
ooayeloo
Junior Member
ooayeloo began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Dec 2009
Device: stanza
For some reason, when it takes it from the metadata files, it is completely off. Some of them aren't even the right titles, or the author often time becomes "unknown". so i was hoping to pull it form the file name, so that it would look the way I want it to look
ooayeloo is offline  
Old 12-30-2009, 09:10 PM   #22
DedTV
Enthusiast
DedTV began at the beginning.
 
Posts: 28
Karma: 10
Join Date: Dec 2009
Device: PRS-505; Galaxy Tab 7
Quote:
Originally Posted by sircastor View Post
Unless I'm missing something, I would skip trying to get your expression to handle different orders.
I use Bulk Rename Utility (Freeware) and rename the files so they're all consistent which I can then use Calibre to easily handle.
It allows you to rename files using a flexible Match expression, and a flexible Replacement expression so swapping fields is easy. Since all my files are formatted as either Surname, Firstname - whatever.txt or Firstname Lastname - whatever.txt you can build off the comma to match the files you want swapped. I use the regex.
Code:
^(\w+)[ ]*,[ ]*([^-]+?)[ ]*-[ ]*(.*)
with Replace set to \2 \1 - \3. so it only selects the files where the author has a comma in it for renaming and leaves the ones already formatted correctly alone to get a consistent naming scheme.

In case that's all just gibberish, here's an visual example:
http://i97.photobucket.com/albums/l2...Authorswap.jpg

From there the expressions in this thread work perfectly to import into Calibre.

Last edited by DedTV; 12-31-2009 at 11:25 AM.
DedTV is offline  
Old 01-01-2010, 02:40 AM   #23
mezme
Connoisseur
mezme began at the beginning.
 
Posts: 59
Karma: 10
Join Date: Dec 2009
Device: PRS700
Speaking of regex help... maybe some of the experts on here can help this beginner out

My files are formatted in the following ways:
option A -> Author ~ Title
option B -> Author ~ Title - [Series 00]
option C -> Author ~ Title - [Collection 00 - Series 00]

I have finally got the following regex to work correctly for option B and option C
(?P<author>.+?) ~ (?P<title>.+?)(\s-\s)\[(?P<series>.*)\s(?P<series_index>[0-9.]*)\]?

Here is what I get when I run the REGEX on the following format(Author ~ Title) I get the following, which is what I DO NOT want... I want it to also separate the author and title even when there isn't a series
Title = Author ~ Title
Author = No Match
Series = No Match

If I run it with (Author ~ Title - [Series]) I get the following which is what I want:
Title = Title
Author = Author
Series = Series [Series Index]

If I run it with (Author ~ Title - [Collection - Series]) I get the following which is what I want:
Title = Title
Author = Author
Series = Collection - Series [Series Index]

However, I won't recognize Option A... how can I get it to read the author and title correctly if there is NO series?

Last edited by mezme; 01-01-2010 at 02:46 AM.
mezme is offline  
Old 01-04-2010, 07:48 PM   #24
Sabardeyn
Guru
Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.
 
Sabardeyn's Avatar
 
Posts: 644
Karma: 1242364
Join Date: May 2009
Location: The Right Coast
Device: PC (Calibre), Nexus 7 2013 (Moon+ Pro), HTC HD2/Leo (Freda)
Ok, I'm not where I can get to my regex software, but as near as I can tell from the expression...

It seems that you've been greedy (the + operator) without giving back at the end. So effectively you use the entire filename for the first expression test (Author), but it fails because it is not supposed to have a tilde in it.

You need to make use of the "give back" operator to release portions of the filename to limit it to just the Author portion. Unfortunately I cannot remember the operator at the moment, nor the "phrasing" for doing so.
Sabardeyn is offline  
Old 01-08-2010, 01:56 AM   #25
mezme
Connoisseur
mezme began at the beginning.
 
Posts: 59
Karma: 10
Join Date: Dec 2009
Device: PRS700
Thanks! after playing with it I finally got it working:

(?P<author>[^-]+)\x20-\x20(?P<title>[^-]+)(?:-\s+\[(?P<series>[^.]+?)(?P<series_index>\d+)?\])?

Successfully parses the following formats:
author - title
author - title - [series]
author - title - [series Number]
author - title - [collection - series Number]
author - title - [collection number - series Number]
mezme is offline  
Old 01-10-2010, 11:04 PM   #26
Tom2112
Tablet eReader
Tom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterTom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterTom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterTom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterTom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterTom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterTom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterTom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterTom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterTom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterTom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
Posts: 45
Karma: 12620
Join Date: Dec 2009
Location: Western PA
Device: Samsung Galaxy Tab 7, iPad, Dell Streak 7, Moto RAZR MAXX
Does anyone have a reg ex similar to this one:

^((?P<author>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?((?P<series>[^0-9\-]+)(\s*-\s*)?(?P<series_index>[0-9.]+)\s*-\s*)?(?P<title>[^\-_0-9]+)


But... (there's always a but) I need it to remove the comma between the author's first and last names.

Such as in this filename:

Last, First - Series 01 - Title.lit

Which gets imported like this:
Author: First Last,

THANKS in advance,
Tom
Tom2112 is offline  
Old 01-10-2010, 11:29 PM   #27
Tom2112
Tablet eReader
Tom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterTom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterTom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterTom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterTom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterTom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterTom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterTom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterTom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterTom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterTom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
Posts: 45
Karma: 12620
Join Date: Dec 2009
Location: Western PA
Device: Samsung Galaxy Tab 7, iPad, Dell Streak 7, Moto RAZR MAXX
This regex does the same thing, as far as I can figure:

(?P<author>.+?) - ((?P<series>.+?) (?P<series_index>[0-9]+) - )?(?P<title>.+)

They both detect a series name and index and read them properly whether they're there or not. But it still has the comma problem in the author's name.
Tom2112 is offline  
Old 01-18-2010, 12:31 PM   #28
Tom2112
Tablet eReader
Tom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterTom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterTom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterTom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterTom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterTom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterTom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterTom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterTom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterTom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterTom2112 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
Posts: 45
Karma: 12620
Join Date: Dec 2009
Location: Western PA
Device: Samsung Galaxy Tab 7, iPad, Dell Streak 7, Moto RAZR MAXX
Anyone?
Tom2112 is offline  
Old 01-18-2010, 11:56 PM   #29
rogue_ronin
Banned
rogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-books
 
Posts: 475
Karma: 796
Join Date: Sep 2008
Location: Honolulu
Device: Nokia 770 (fbreader)
My first take is that I think the problem comes down to the fact that the (?<author>) function has to include the comma because you have to find the beginning and the end of the name -- if there were separate functions for First and Last you could exclude the comma.

Even placing the comma in its own set via parentheses, there's no obvious way to replace it with nothing -- or filter it from the match.

Now, I'm no expert. Perhaps there's a tricky way to exclude from the return a subset of that return.

I just took a look at the Calibre regex help -- I thought maybe this would do it:

Quote:
(?:...)
A non-grouping version of regular parentheses. Matches whatever regular expression is inside the parentheses, but the substring matched by the group cannot be retrieved after performing a match or referenced later in the pattern.
I tried your more complicated regex in Calibre's test window, modifying it to "not retrieve" the comma:
Code:
^((?P<author>([^\-_0-9]+)(?:,)([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?((?P<series>[^0-9\-]+)(\s*-\s*)?(?P<series_index>[0-9.]+)\s*-\s*)?(?P<title>[^\-_0-9]+)
but it returned the exact result your original regex did.

Now I realize that what it means is that in a more normal regex, the objects in such a set are not placed into the numeric variables for reuse later [ie: \1 \2 \3 or $1 $2 $3 depending on your regex flavor.] But because that comma is contained within a larger set, that larger set is returned to the label <author>.

BTW, your original regex found the Author as "Last, First" not "First Last," in the test window, so I cannot comment to its effectiveness.

m a r

Last edited by rogue_ronin; 01-18-2010 at 11:59 PM.
rogue_ronin is offline  
Old 01-19-2010, 01:51 PM   #30
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by rogue_ronin View Post
My first take is that I think the problem comes down to the fact that the (?<author>) function has to include the comma because you have to find the beginning and the end of the name -- if there were separate functions for First and Last you could exclude the comma.
I agree - there is no way to not find the comma as part of the author's name.

Quote:
BTW, your original regex found the Author as "Last, First" not "First Last," in the test window, so I cannot comment to its effectiveness.
The reason for the difference is that he has the "Swap author firstname and lastname" option checked.

I think it's an error to keep the comma in the lastname. There's no way to get rid of it that I can see short of fixing the code, so I searched the code:

lines 135 to 144 of meta.py have:
Code:
            if prefs['swap_author_names'] and mi.authors:
                def swap(a):
                    parts = a.split()
                    if len(parts) > 1:
                        t = parts[-1]
                        parts = parts[:-1]
                        parts.insert(0, t)
                    return ' '.join(parts)
                mi.authors = [swap(x) for x in mi.authors]
I'm a rank beginner in python, but I can read this, and if the swap option is checked it makes an array of character strings from the author's name using the split() function, then finds the last element in that array (variable "t") and sticks it at the beginning of that array. Presumably, split() leaves the comma at the end of the next to last element in that array (which becomes the lastname and the last element after swap() runs).

Dropping the last char of the parts[:-1] string, if it is a comma, will work for simple cases. However, this area of the code could probably be improved even more. For example, the code above will change "Tolkien, J R R" to "R Tolkien, J R"

I have the bare minimum of skill to improve this code, but I suspect someone who is more familiar with python, ebooks and the philosophy of calibre could do better. For example, do you want to split the name at the comma, instead of at the last character string, or are commas in author names common? Do you want to do something special with "John T Smith, Jr." or "Smith, John T, Jr." or add more checkbox options or what? Anyone who wants to compile a list of various author name formats, single and multiple that might be encountered and comments on what the code should do in each case could help whoever wants to improve this code.

I'd suggest anyone who wants this improved should add a ticket and get back here to post the ticket number and their comments on exactly how the improvement should work.

BTW, If anyone wants a simple fix to their own code, adding the two lines below will do it:

Code:
            if prefs['swap_author_names'] and mi.authors:
                def swap(a):
                    parts = a.split()
                    if len(parts) > 1:
                        t = parts[-1]
                        parts = parts[:-1]
                        if parts[-1].endswith(','):
                            parts[-1]=parts[-1][:-1]
                        parts.insert(0, t)
                    return ' '.join(parts)
                mi.authors = [swap(x) for x in mi.authors]
For those who haven't ever played with source code or programming, it's not really that hard. Kovid and python have made it easy. In Windows, you just need to get one program (Bazaar) and run it once to retrieve the source code, then set an environment variable to tell calibre to use it. I do simple fixes like this for special cases.

edit:
An even simpler fix is to change the split() in the original code to split(','). This splits on the comma (assumes that the firstname and lastname are separated by a comma - as mine all are). This correctly swaps a name like "Tolkien, J R R."

Last edited by Starson17; 01-19-2010 at 04:47 PM.
Starson17 is offline  
Closed Thread

Tags
regex, regular expressions


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Regular Expression Help smartmart Calibre 5 10-17-2010 06:19 AM
Need Help Creating a Regular Expression Worm Calibre 9 08-18-2010 02:20 PM
Regular Expression Help Needed dloyer4 Calibre 1 07-25-2010 11:37 PM
Help with the regular expression Dysonco Calibre 9 03-22-2010 11:45 PM
I don't know how to use wilcards and regular expression.... superanima Sigil 4 02-21-2010 10:42 AM


All times are GMT -4. The time now is 12:05 PM.


MobileRead.com is a privately owned, operated and funded community.