[GUI Plugin] Extract ISBN - Page 5

drMerry · 04-11-2011, 02:46 PM

Hi,

Something new. (I'm not stalking to tear down your product, but just because I really love it!!).
I've got a lot of scientific papers. These papers often do have a last chapter "Recommended further readings" And yes, with ISBN.

In this case there are 3 options:
1. ISBN of document is not available
2. ISBN of document is on first page(s)
3. ISBN is at the end of document (after further readings)
(I've not seen ISBN before further readings.)

Is it possible to change the behavior in look at first x pages top down
Look at last x pages bottom up?
Or would this decrease speed a lot?

kiwidude · 04-11-2011, 02:51 PM

Are these PDFs or other types of documents?

The plugin already does look at the first 10 and last 5 pages of PDFs (and the entire document for any other format).

The only thing that is different that you are asking for is for the last x pages check to work backwards. The question is why - what would you hope to achieve? It certainly wouldn't gain much speed in PDFs.

drMerry · 04-11-2011, 06:44 PM

My problem is that in scientific documents / books (most PDF) You have the last chapter something like this:

Quote:

Recommended further readings
Checking ISBN in pdf - K.I. Wi Dude. ISBN:1234567891234

EPub and PDF Pro's and cons. - dr. Merry ISBN: 4321987654321

About the author
Robin Hood is a financial expert.

Calibre Publishing 2011
ISBN: 9786453210235

So as you can see, the first 2 ISBN-numbers are not related to the current document. The last one is.
This is often seen in scientific documents.
The real ISBN is on one of the first pages,
After the further reading section
or not in the document at all.

kiwidude · 04-11-2011, 07:17 PM

Ok, thanks, I understand now.

I'll put on the list to make a change to cater for this for PDF documents hwhich is a fairly trivial change. But not for other format types as yet (as they currently have no concept of "pages").

olandese · 04-13-2011, 03:06 PM

After the plugin sets the ISBN i am not able anymore to open the book, i have to restart Calibre and then i can open the book again. I am using Calibre 0.7.54

kiwidude · 04-13-2011, 03:23 PM

Hi olandese, welcome to MobileRead.

That is a new behaviour and to be honest I can't think how it could be related to the plugin.

What do you mean "not able" to open the book - exactly what happens?
What action are you doing to open the book?
If you run Calibre in debug mode using ctrl+shift+r can you post any messages that appear when you try to open a book in this scenario.
What book format(s) does this apply to that you are scanning/opening.
Does this apply to all books or just specific ones?

olandese · 04-13-2011, 06:15 PM

Hi kiwidude!

i did a little more investigation and the problem occours only with .chm files.
After the plugin runs and it sets the isbn number i try to open the file (double click on it) from calibre and i get the following error:

calibre, version 0.7.54
ERROR: Unhandled exception: <b>WindowsError</b>:[Error 1223] De bewerking is geannuleerd door de gebruiker: u"R:\\Calibre\\O'Reilly\\Perl 6 & Parrot Essentials 2nd (443)\\Perl 6 & Parrot Essentials 2nd - O'Reilly.chm"

Traceback (most recent call last):
File "site-packages\calibre\gui2\actions\view.py", line 156, in view_triggered
File "site-packages\calibre\gui2\actions\view.py", line 195, in _view_books
File "site-packages\calibre\gui2\actions\view.py", line 52, in view_format
File "site-packages\calibre\gui2\actions\view.py", line 87, in _view_file
File "site-packages\calibre\gui2\actions\view.py", line 78, in _launch_viewer
File "site-packages\calibre\gui2\__init__.py", line 628, in open_local_file
WindowsError: [Error 1223] De bewerking is geannuleerd door de gebruiker: u"R:\\Calibre\\O'Reilly\\Perl 6 & Parrot Essentials 2nd (443)\\Perl 6 & Parrot Essentials 2nd - O'Reilly.chm"

It seems to be that the file is still in use.

kiwidude · 04-13-2011, 06:29 PM

Hi olandese,

Thanks for the details. It sounds like something Kovid might have interest in on the Calibre bug tracker. My plugin just calls Calibre code to try to read the ISBN, and I would suspect that in the case of CHM files that input converter isn't releasing resources properly somehow.

I'll confirm it isn't anything to do with the plugin itself and put it on the bug tracker for you if you like.

kiwidude · 04-13-2011, 06:39 PM

By the way folks I have found a few "false positive" situations in using this plugin which I don't forsee being able to do anything about.

Here are a couple of examples I came across:

A Wrox book which in one of the leading pages has a list of other Wrox books with their ISBNs, before the actual page containing the ISBN for this book. So it picks up the ISBN for some other Wrox book in that list rather than the one you have. The filth of publishers advertising in their own books I'm afraid.
A book that had 2222222222 somewhere in the leading text. As it turns out by some coincidence that passes the valid ISBN check.

It is rare enough to not avoid using the plugin, but a reminder that this is just a tool trying to automate a human function of reading text and as such can sometimes get it wrong.

olandese · 04-14-2011, 04:08 AM

Quote:

Originally Posted by kiwidude

Hi olandese,

Thanks for the details. It sounds like something Kovid might have interest in on the Calibre bug tracker. My plugin just calls Calibre code to try to read the ISBN, and I would suspect that in the case of CHM files that input converter isn't releasing resources properly somehow.

I'll confirm it isn't anything to do with the plugin itself and put it on the bug tracker for you if you like.

Yes please!

the plugin is also much faster with pdf than with chm files.

kiwidude · 04-14-2011, 04:20 AM

Quote:

Originally Posted by olandese

Yes please!

the plugin is also much faster with pdf than with chm files.

From posts I have seen elsewhere I believe chm is a difficult format for Calibre to handle, and is best avoided in general if possible if you intend to read the book anywhere but on your pc. This plugin just calls the same code to read the book pages in calibre that doing a conversion would do, except for PDFs which it was easier to put an optimisation in for. So any issue with performance I can do nothing about unless calibre is able to do it's part faster.

kiwidude · 04-14-2011, 08:44 AM

Quote:

Originally Posted by olandese

i did a little more investigation and the problem occours only with .chm files. ...

It seems to be that the file is still in use.

I've confirmed the issue and you are dead right, the chm input reader plugin is not releasing a file handle. I'll add a report to the bug tracker, I've isolated it down enough to prove a fix that "works" but Kovid or someone will need to put it in the right place. So it will require a Calibre release to get the plugin to allow you to open CHM files without restarts.

olandese · 04-14-2011, 10:03 AM

Quote:

Originally Posted by kiwidude

I've confirmed the issue and you are dead right, the chm input reader plugin is not releasing a file handle. I'll add a report to the bug tracker, I've isolated it down enough to prove a fix that "works" but Kovid or someone will need to put it in the right place. So it will require a Calibre release to get the plugin to allow you to open CHM files without restarts.

Fine, i will wait for the next Calibre release

drMerry · 04-14-2011, 04:06 PM

Well, all numbers with 10 times the same number are correct.

I think you can add a string to tell 10 times same number is not correct. The plugin will tell there is no ISBN found. There are 10 books (at the moment not assigned) numbers with false negative. But there will be a lot less false positive.

The problem with the list of ISBN-numbers is a bit like my earlier question about a list at the end of the book.
There are 2 things you can do about this:
1. Let it be
2. Test if there are more ISBN-numbers in the part you're looking at. and then:
a. choose first or last insert it.
b. Tell user there are more ISBN-numbers found, so no one is entered
c. Tell user there are more ISBN-numbers found, they can choose the right one out of a list

But the whole part 2 will slow down the process (extreme) because you have to go on testing also if you have already found one and you have to create a list of numbers (also if you have only one number) which is slower than just 1 string.

kiwidude · 04-14-2011, 04:18 PM

@drMerry - I did not realise all 10 times the same number were "valid" (but not really) ISBNs - as you say that sounds a sensible suggestion to check for that and discard it.

As for the multiple ISBNs, I'm going to let it be. The user wouldn't have a clue which is the right one without actually opening the book and it all just gets too hard.

I have today seen how Kovid is handling the background downloading of metadata in the new code for 0.8. I'm going to steal it and use the same approach. What it will do is use the jobs mechanism to run the extract ISBN on your books, and then pops up a dialog when it is finished toat that point start updating the books. It also looks for the last modified of the books and asks the user what to do should they have edited the book while the job was running. So that should keep people concerned with either speed or blocking Calibre happy.

That will however mean I need to rethink all those "interactive" options for choosing books. I might make it that you never get asked, and it always just uses a preferred order. Or I could make it that you can actually define your own preferred order in the configuration dialog, rather than using the preferred conversion input order. What do people think?

I will also change that scan last pages logic to look in the reverse direction for you drMerry.

My other thought which I mentioned on another thread was to add an option to allow scanning for an ASIN as well as ISBN. However is that idea flawed - do ASIN only books actually have the ASIN printed inside them? Does anyone have some examples of books with an ASIN they can give me? The search wouldn't have the same flexibility of numbers, that we changed ISBN to - I was thinking it was just search for ASIN: xxxxxxxxxx or similar. But as I say if ASIN is actually not included inside the PDF/EPUB then it is all a silly idea really

04-13-2011, 03:23 PM	#66
kiwidude Calibre Plugins Developer Posts: 4,688 Karma: 2162246 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	Hi olandese, welcome to MobileRead. That is a new behaviour and to be honest I can't think how it could be related to the plugin. What do you mean "not able" to open the book - exactly what happens? What action are you doing to open the book? If you run Calibre in debug mode using ctrl+shift+r can you post any messages that appear when you try to open a book in this scenario. What book format(s) does this apply to that you are scanning/opening. Does this apply to all books or just specific ones?

04-13-2011, 06:39 PM	#69
kiwidude Calibre Plugins Developer Posts: 4,688 Karma: 2162246 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	By the way folks I have found a few "false positive" situations in using this plugin which I don't forsee being able to do anything about. Here are a couple of examples I came across: A Wrox book which in one of the leading pages has a list of other Wrox books with their ISBNs, before the actual page containing the ISBN for this book. So it picks up the ISBN for some other Wrox book in that list rather than the one you have. The filth of publishers advertising in their own books I'm afraid. A book that had 2222222222 somewhere in the leading text. As it turns out by some coincidence that passes the valid ISBN check. It is rare enough to not avoid using the plugin, but a reminder that this is just a tool trying to automate a human function of reading text and as such can sometimes get it wrong.

04-14-2011, 04:18 PM	#75
kiwidude Calibre Plugins Developer Posts: 4,688 Karma: 2162246 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	@drMerry - I did not realise all 10 times the same number were "valid" (but not really) ISBNs - as you say that sounds a sensible suggestion to check for that and discard it. As for the multiple ISBNs, I'm going to let it be. The user wouldn't have a clue which is the right one without actually opening the book and it all just gets too hard. I have today seen how Kovid is handling the background downloading of metadata in the new code for 0.8. I'm going to steal it and use the same approach. What it will do is use the jobs mechanism to run the extract ISBN on your books, and then pops up a dialog when it is finished toat that point start updating the books. It also looks for the last modified of the books and asks the user what to do should they have edited the book while the job was running. So that should keep people concerned with either speed or blocking Calibre happy. That will however mean I need to rethink all those "interactive" options for choosing books. I might make it that you never get asked, and it always just uses a preferred order. Or I could make it that you can actually define your own preferred order in the configuration dialog, rather than using the preferred conversion input order. What do people think? I will also change that scan last pages logic to look in the reverse direction for you drMerry. My other thought which I mentioned on another thread was to add an option to allow scanning for an ASIN as well as ISBN. However is that idea flawed - do ASIN only books actually have the ASIN printed inside them? Does anyone have some examples of books with an ASIN they can give me? The search wouldn't have the same flexibility of numbers, that we changed ISBN to - I was thinking it was just search for ASIN: xxxxxxxxxx or similar. But as I say if ASIN is actually not included inside the PDF/EPUB then it is all a silly idea really Last edited by kiwidude; 04-14-2011 at 04:20 PM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Extract ISBN from PDF?	mdroberts	Calibre	14	12-16-2016 08:32 AM
[Old Thread] Extract ISBN from file name	ChristianQ	Calibre	59	12-09-2015 06:08 AM
[GUI Plugin] Plugin Updater Deprecated	kiwidude	Plugins	159	06-19-2011 01:27 PM
[Old Thread] Auto Extract ISBN-Feature request	UnraisedArc	Calibre	60	03-23-2011 10:31 AM
Displaying ISBN column in the main GUI	tilleydog	Library Management	26	02-25-2011 05:08 AM

04-11-2011, 02:46 PM	#61
drMerry Addict Posts: 293 Karma: 21022 Join Date: Mar 2011 Location: NL Device: Sony PRS-650	Hi, Something new. (I'm not stalking to tear down your product, but just because I really love it!!). I've got a lot of scientific papers. These papers often do have a last chapter "Recommended further readings" And yes, with ISBN. In this case there are 3 options: 1. ISBN of document is not available 2. ISBN of document is on first page(s) 3. ISBN is at the end of document (after further readings) (I've not seen ISBN before further readings.) Is it possible to change the behavior in look at first x pages top down Look at last x pages bottom up? Or would this decrease speed a lot?

04-11-2011, 02:51 PM	#62
kiwidude Calibre Plugins Developer Posts: 4,688 Karma: 2162246 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	Are these PDFs or other types of documents? The plugin already does look at the first 10 and last 5 pages of PDFs (and the entire document for any other format). The only thing that is different that you are asking for is for the last x pages check to work backwards. The question is why - what would you hope to achieve? It certainly wouldn't gain much speed in PDFs.

04-11-2011, 07:17 PM	#64
kiwidude Calibre Plugins Developer Posts: 4,688 Karma: 2162246 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	Ok, thanks, I understand now. I'll put on the list to make a change to cater for this for PDF documents hwhich is a fairly trivial change. But not for other format types as yet (as they currently have no concept of "pages").

04-13-2011, 03:06 PM	#65
olandese Junior Member Posts: 4 Karma: 10 Join Date: Apr 2011 Device: none	After the plugin sets the ISBN i am not able anymore to open the book, i have to restart Calibre and then i can open the book again. I am using Calibre 0.7.54

04-13-2011, 06:15 PM	#67
olandese Junior Member Posts: 4 Karma: 10 Join Date: Apr 2011 Device: none	Hi kiwidude! i did a little more investigation and the problem occours only with .chm files. After the plugin runs and it sets the isbn number i try to open the file (double click on it) from calibre and i get the following error: calibre, version 0.7.54 ERROR: Unhandled exception: <b>WindowsError</b>:[Error 1223] De bewerking is geannuleerd door de gebruiker: u"R:\\Calibre\\O'Reilly\\Perl 6 & Parrot Essentials 2nd (443)\\Perl 6 & Parrot Essentials 2nd - O'Reilly.chm" Traceback (most recent call last): File "site-packages\calibre\gui2\actions\view.py", line 156, in view_triggered File "site-packages\calibre\gui2\actions\view.py", line 195, in _view_books File "site-packages\calibre\gui2\actions\view.py", line 52, in view_format File "site-packages\calibre\gui2\actions\view.py", line 87, in _view_file File "site-packages\calibre\gui2\actions\view.py", line 78, in _launch_viewer File "site-packages\calibre\gui2\__init__.py", line 628, in open_local_file WindowsError: [Error 1223] De bewerking is geannuleerd door de gebruiker: u"R:\\Calibre\\O'Reilly\\Perl 6 & Parrot Essentials 2nd (443)\\Perl 6 & Parrot Essentials 2nd - O'Reilly.chm" It seems to be that the file is still in use.

04-13-2011, 06:29 PM	#68
kiwidude Calibre Plugins Developer Posts: 4,688 Karma: 2162246 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	Hi olandese, Thanks for the details. It sounds like something Kovid might have interest in on the Calibre bug tracker. My plugin just calls Calibre code to try to read the ISBN, and I would suspect that in the case of CHM files that input converter isn't releasing resources properly somehow. I'll confirm it isn't anything to do with the plugin itself and put it on the bug tracker for you if you like.

04-14-2011, 04:06 PM	#74
drMerry Addict Posts: 293 Karma: 21022 Join Date: Mar 2011 Location: NL Device: Sony PRS-650	Well, all numbers with 10 times the same number are correct. I think you can add a string to tell 10 times same number is not correct. The plugin will tell there is no ISBN found. There are 10 books (at the moment not assigned) numbers with false negative. But there will be a lot less false positive. The problem with the list of ISBN-numbers is a bit like my earlier question about a list at the end of the book. There are 2 things you can do about this: 1. Let it be 2. Test if there are more ISBN-numbers in the part you're looking at. and then: a. choose first or last insert it. b. Tell user there are more ISBN-numbers found, so no one is entered c. Tell user there are more ISBN-numbers found, they can choose the right one out of a list But the whole part 2 will slow down the process (extreme) because you have to go on testing also if you have already found one and you have to create a list of numbers (also if you have only one number) which is slower than just 1 string.

Advert

Advert