Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 01-20-2010, 10:33 AM   #31
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Starson17 View Post
I think it's an error to keep the comma in the lastname.
I made a simple fix and submitted it as Ticket #4620. It splits the author's name at a comma, if there is one (with swap checkbox selected) and otherwise splits at the first space instead of the last space to deal with middle names and initials.
Starson17 is offline  
Old 01-20-2010, 06:17 PM   #32
rogue_ronin
Banned
rogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-books
 
Posts: 475
Karma: 796
Join Date: Sep 2008
Location: Honolulu
Device: Nokia 770 (fbreader)
Nice!

Hope it's accepted.

m a r
rogue_ronin is offline  
Advert
Old 01-22-2010, 09:50 AM   #33
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by rogue_ronin View Post
Nice!
Hope it's accepted.
According to bug tracker it will be in the next release. It's a trivial change, and, I suspect it takes Kovid longer to look at the proposed change and make sure I haven't buggered up his code than to do it himself. Nonetheless, it's nice to feel like I've made a tiny contribution. Plus, it does address a problem that was bothering me.

Now I'm off to read some more on Python. There's this other tiny issue that bugs me ...
Starson17 is offline  
Old 05-23-2010, 07:40 PM   #34
melvin
Junior Member
melvin began at the beginning.
 
Posts: 1
Karma: 10
Join Date: May 2010
Device: ipad
Calibre - ebook import Metadata - publisher, date

Hi there,

I've just started to use Calibre and so far, I think I like it. Many thanks to Kovind for the wonderful work.

My pdf ebook/document collection was formatted as title - author - publisher - date - series.pdf.

I failed to locate any help in the forum on how to import the publisher and date into the metadata. I was wondering if any person can help on this matter. Thanks.
melvin is offline  
Old 05-23-2010, 07:58 PM   #35
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by melvin View Post
I failed to locate any help in the forum on how to import the publisher and date into the metadata.
If they work, they'd be like this:
publisher: ?P<publisher>
published date: ?P<pubdate>
entered date: ?P<timestamp>

I've never tried any of those, and they aren't in the regex test, so they may not work. If they don't, put in an enhancement request.
Starson17 is offline  
Advert
Old 05-24-2010, 02:04 AM   #36
vinco
Junior Member
vinco began at the beginning.
 
Posts: 7
Karma: 10
Join Date: May 2010
Device: Nook
I need to come up with a regex to detect and remove page numbers from the bottom of PDF pages to convert to Epub for nook usage. The page numbers translate over as bolded, with a paragraph break after them. The HTML code I'd like to remove is (page numbers indicated below by ###)

<b>Page ###</b></p><p>

Thanks for the help.
vinco is offline  
Old 05-24-2010, 09:44 AM   #37
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by vinco View Post
I need to come up with a regex to detect and remove page numbers from the bottom of PDF pages to convert to Epub for nook usage. The page numbers translate over as bolded, with a paragraph break after them. The HTML code I'd like to remove is (page numbers indicated below by ###)

<b>Page ###</b></p><p>

Thanks for the help.
Try this:
Code:
<b>Page /d+.*<p>
Starson17 is offline  
Old 05-24-2010, 08:11 PM   #38
vinco
Junior Member
vinco began at the beginning.
 
Posts: 7
Karma: 10
Join Date: May 2010
Device: Nook
Code:
<b>Page /d+.*<p>
Does not seem to work. A straight
Code:
<b>Page 1</b></p><p>
seems to work, but obviously only catches the first page. Apologies for my complete lack of experience with RegEx's
vinco is offline  
Old 05-24-2010, 10:45 PM   #39
tonyx3
Connoisseur
tonyx3 began at the beginning.
 
Posts: 55
Karma: 10
Join Date: Jan 2010
Device: Nexus One
It's a bit messier than Starson17's solution, but if that's not working, try this:

Code:
<b>Page [0-9]{1,3}</b></p><p>
Basically, just copy an example of the part you want to match, then replace the actual digits with [0-9] (or /d, like Starson used, but you said it didn't work), and then use the squiggly brackets to set the minimum and maximum number of digits in a row (I assumed no more than 999 pages).

But Starson17's regex should have matched.. make sure all the tags surrounding the page number are correct. Maybe copy/paste an actual example, and then replace the middle of it with the regex, to be sure you didn't miss a space or something.
tonyx3 is offline  
Old 05-24-2010, 10:47 PM   #40
darkmonk
Connoisseur
darkmonk began at the beginning.
 
Posts: 58
Karma: 12
Join Date: Jan 2009
Device: none
<b>Page [0-9]{1,4}</b></p><p>

Should remove it. Of course, this may not work from calibre. I use sigil to remove those sorts of things.

Last edited by darkmonk; 05-24-2010 at 10:48 PM. Reason: edit: I see this is pointless, as I was beaten by two minutes.
darkmonk is offline  
Old 05-24-2010, 11:34 PM   #41
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,115
Karma: 22670164
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
@vinco

you need a backslash not a forward slash in

<b>Page /d+.*<p>
kovidgoyal is offline  
Old 05-25-2010, 12:54 AM   #42
vinco
Junior Member
vinco began at the beginning.
 
Posts: 7
Karma: 10
Join Date: May 2010
Device: Nook
Still confused.

Code:
<b>Page  \d+</b></p><p>
ended up working perfectly in the test, but when I set that as the footer to remove in calibre, the converted epub still contains the page numbers.

I'm converting from a PDF, to EPUB.

A sample of the XML generated from the PDF is below.

Code:
     Since nothing material was destroyed when the Eddorians were forced into the next plane of existence, their historical records also have become available. Those records-folios and tapes and playable discs of platinum alloy, resistant indefinitely even to Eddore's noxious atmosphere agree with those of the Arisians upon this point. Immediately before the Coalescence began there was one, and only <b>Page  1</b></p><p>
one, planetary solar system in the Second Galaxy; and, until the advent of Eddore, the Second Galaxy was entirely devoid of intelligent life. </p><p>

Last edited by vinco; 05-25-2010 at 12:59 AM.
vinco is offline  
Old 05-25-2010, 08:38 AM   #43
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kovidgoyal View Post
you need a backslash not a forward slash in

<b>Page /d+.*<p>
Starson17 is offline  
Old 05-25-2010, 10:14 AM   #44
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,848
Karma: 7035877
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by vinco View Post
Code:
<b>Page  \d+</b></p><p>
ended up working perfectly in the test, but when I set that as the footer to remove in calibre, the converted epub still contains the page numbers.
With apologies in advance for asking, did you remember to check the 'Remove footer' checkbox?

In your example, there are two spaces between 'Page' and '1'. If that is an accurate copy, then you need to match more than 1 space there. Try
Code:
<b>Page +\d+</b></p><p>
If that doesn't work, then it wouldn't surprise me if there are newlines buried in the middle of the the text you are trying to match. Try
Code:
<b>\s*Page +\d+\s*</b>\s*</p>\s*<p>
chaley is offline  
Old 05-25-2010, 12:26 PM   #45
rocketgranny
Member
rocketgranny is on a distinguished road
 
Posts: 19
Karma: 54
Join Date: Feb 2010
Location: San Francisco, CA
Device: Nook
Guys,
Name Changer is a great tool for fixing filenames in a directory before you import into Calibre. Works on a Mac.
rocketgranny is offline  
Closed Thread

Tags
regex, regular expressions


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Regular Expression Help smartmart Calibre 5 10-17-2010 05:19 AM
Need Help Creating a Regular Expression Worm Calibre 9 08-18-2010 01:20 PM
Regular Expression Help Needed dloyer4 Calibre 1 07-25-2010 10:37 PM
Help with the regular expression Dysonco Calibre 9 03-22-2010 10:45 PM
I don't know how to use wilcards and regular expression.... superanima Sigil 4 02-21-2010 09:42 AM


All times are GMT -4. The time now is 05:37 PM.


MobileRead.com is a privately owned, operated and funded community.