Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 05-24-2011, 06:40 AM   #61
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,688
Karma: 2162246
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
Find duplicates has no preferences.
kiwidude is offline   Reply With Quote
Old 05-24-2011, 06:41 AM   #62
drMerry
Addict
drMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmos
 
drMerry's Avatar
 
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
Quote:
Originally Posted by kiwidude View Post
Find duplicates has no preferences.
Whoops, that would explain my confusion in the previous post.

So, in that case, I would suggest removing the plugin and reinstalling it.

If you have problems with more plugin, you might want to reinstall calibre 0.8.2
drMerry is offline   Reply With Quote
Advert
Old 05-27-2011, 03:13 AM   #63
htweedie
Junior Member
htweedie began at the beginning.
 
Posts: 2
Karma: 10
Join Date: May 2011
Device: Kindle
It appears to be able to. If I try and change the settings on Find Duplicates (1.1.0) by Grant Drake via the Customize Plugin button I get a requester allowing me to specify the Keyboard shortcuts.

I presume this is what should happen?

Quote:
Originally Posted by drMerry View Post
Is it possible to change settings on preferences->(advanced) Plugins->User Interface Action Plugins->Duplicate Check

===

EDIT (I did mean Duplicate off-course)
htweedie is offline   Reply With Quote
Old 05-27-2011, 03:52 AM   #64
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,688
Karma: 2162246
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
@htweedie - yes that is correct, had forgotten it had keyboard shortcut preferences. Though these bear no relevance to the issue you posted. Did you get it resolved? If not, someone will need to try to replicate it. Exactly what steps did you take in your librarry - what search restrictions did you have in place if any, what kind of duplicate search options did you check etc. It looks like a possible bug in Calibre but as you have posted in this thread I am presuming you did some action with this plugin to cause it?
kiwidude is offline   Reply With Quote
Old 05-30-2011, 06:58 PM   #65
drMerry
Addict
drMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmos
 
drMerry's Avatar
 
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
I was thinking of 3 new criteria for duplicate file finding. This are criteria that are a 'second pass'. So first there is a duplicate check on the normal way. After that, if results are found, a new check is done to match any of these criteria

1. Has same file type
true false no-check
2. Has a max-difference in pages of:
3. Has a max-difference in size of:

If you have this list (test same author, same title):
And Then There Were None - Agatha Christie - EPUB - 0.1MB - 180 pages
And Then There Were None - Agatha Christie - EPUB - 0.1MB - 199 pages
And Then There Were None - Agatha Christie - EPUB - 0.5MB - 180 pages
And Then There Were None - Agatha Christie - PDF - 0.1MB - 180 pages
Anna Karenina - Graf Leo Tolstoy - PDF - 0.1MB - 180 pages
Anna Karenina - Graf Leo Tolstoy - PDF - 0.2MB - 180 pages

Results would be:
Spoiler:
1. true:
And Then There Were None - Agatha Christie - EPUB - 0.1MB - 180 pages
And Then There Were None - Agatha Christie - EPUB - 0.1MB - 199 pages
And Then There Were None - Agatha Christie - EPUB - 0.5MB - 180 pages
Anna Karenina - Graf Leo Tolstoy - PDF - 0.1MB - 180 pages
Anna Karenina - Graf Leo Tolstoy - PDF - 0.2MB - 180 pages


1. false:
And Then There Were None - Agatha Christie - EPUB - 0.1MB - 180 pages
And Then There Were None - Agatha Christie - EPUB - 0.1MB - 199 pages
And Then There Were None - Agatha Christie - EPUB - 0.5MB - 180 pages
And Then There Were None - Agatha Christie - PDF - 0.1MB - 180 pages

1. no-check (all results)

2. 0
And Then There Were None - Agatha Christie - EPUB - 0.1MB - 180 pages
And Then There Were None - Agatha Christie - EPUB - 0.5MB - 180 pages
Anna Karenina - Graf Leo Tolstoy - PDF - 0.1MB - 180 pages
Anna Karenina - Graf Leo Tolstoy - PDF - 0.2MB - 180 pages

2. -1 (all results)

3. 0
And Then There Were None - Agatha Christie - EPUB - 0.1MB - 180 pages
And Then There Were None - Agatha Christie - EPUB - 0.1MB - 199 pages
And Then There Were None - Agatha Christie - PDF - 0.1MB - 180 pages

3. -1 I think you understand


The advantage would be that you could filter for example books with great page or size difference, there books are likely to be no duplicate.
Books with just 1 or 2 pages difference is more likely to be duplicate.
The advantage of option 1 would be in case you have books with different file-formats, If you should only have books with different formats in your view, it is easy to perform a merge-action on it.

All options should be optional because your current search should have to work like it does now of course.
drMerry is offline   Reply With Quote
Advert
Old 05-30-2011, 07:56 PM   #66
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,688
Karma: 2162246
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
There are a number of issues with this.

Firstly - for duplicate books that come back with the same title and author, suddenly saying they are not duplicates of each other because of something about their book formats doesn't make sense. There is no consistency with what formats you may have associated to each book record, and whether they do or do not overlap formats.

Secondly, "pages" is not an available property of a book. It is something that can only be approximated with a computation. And for formats other than ePub or Mobi, that computation requires a conversion. So your Find Duplicates check will now take forever to run.

Thirdly I hold scepticism about using things like number of pages or file size to dictate whether two books are the "same". Certainly you can tell they are "different", but you can never say with any certainty they are the "same". Particularly given the widly differing approximations you get from page calculations. And having just a higher res image significantly skews file sizes. So I really don't see how it can do anything other than tell you they differ. The only thing that tells you books are definitely the same is a binary comparison.

Finally, doing anything at "format" level is problematic with the Calibre UI. There is no way for the UI to show rows of book formats, you can only see books. This has already been discussed/highlighted by the binary duplicate check which is at format level.

This plugin is primarily about finding duplicate book records. If it is bringing back books which you don't think are duplicates of each other (as happens increasingly the fuzzier the algorithm), then the appropriate solution to that is to create the exclusions for those authors or titles.

So then the only other issue is having got book records which you know *are* duplicate books, how do you resolve the formats they contain. And again I keep coming back to that being a merge issue, not a find duplicate book issue. Though I would never use #pages or file size to tell me which format to keep when merging. You always have to open the formats side by side to decide that. You could have a crappy PDF conversion with completely screwed paragraphs with blank lines in between totally affect the page count. And as I said above images alone can dramatically affect file size.

You either have some scenario in mind where you think pages/file size would be useful, or you are just proposing some random thoughts. I don't mind random thoughts as sometimes they spark better ones, but in this case I don't see where you are going with this one?
kiwidude is offline   Reply With Quote
Old 05-30-2011, 08:19 PM   #67
capnm
Groupie
capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'
 
Posts: 156
Karma: 10001
Join Date: Feb 2011
Device: sony
Quote:
-- This brings up another thought -- would it be practical to add a "fuzzy" option to your binary compare, something like you open the epub and check the hash of the largest folder for a match?
(and should I bring this question up in the Duplicate Check thread?)
Quote:
You are correct that it is a question for the Find Duplicates thread, but I will give you an answer here anyways. I have no interest at this point in changing the plugin to start looking at ePub content - it will dramatically slow it down by many orders of magnitude and opens the door for a number of other issues which I am quite happy to avoid.
Yeah, the more I thought about it I realized just grabbing the pertinent CRC out of the zip file header wouldn't be that simple.

So I added 7zipFM to the OpenWith plugin. So if I suspect two epubs are identical (except for metadata/timestamps) I can quickly eyeball the CRC's of the OEPBS/OPS/whatever folder.
Works for me

By the way -- Find Duplicates is a truly elegant example of good design & functionality.


Now if I could only apply it to the boxes of books in my closet, and attic, and basement ...
capnm is offline   Reply With Quote
Old 05-31-2011, 09:18 AM   #68
drMerry
Addict
drMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmos
 
drMerry's Avatar
 
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
Quote:
Originally Posted by kiwidude View Post
You either have some scenario in mind where you think pages/file size would be useful, or you are just proposing some random thoughts. I don't mind random thoughts as sometimes they spark better ones, but in this case I don't see where you are going with this one?
Thank you for the information.
Well, I do have a scenario in mind.
As I said, this function is a second pass, filtering duplicate files.
So it does not say that files are unmarked as possible duplicates, it just filters the results one way.

For example, when I run some duplicate checks, I get in return a list of 1200 books. All possible duplicates.
Let's say I have these duplicates inside the list:

marked:duplicate_group_0001:
Book A Epub
Book C pdf

marked:duplicate_group_0002:
Book A Epub
Book B Epub (is a binary duplicate of A)

If I added the different formats, I would only see group 1 giving me the option to easily merge this group. So I can eliminate some of the dups a lot faster.

For book size (of course you can't tell dups by size, but as this is a filter after the dup test...) it is a little different.

When I have 1200 possible duplicate books, I would be happy to see all books with a small size-difference. When I see a book of 0.7MB and one of 12.3 MB, I can imagine the content of the book is not the same (technical information, presentation for user can be (bmp <-> jpg)).
But if I could only see the books having say, less than 1k difference, I would have a list of books that are far more likely to be duplicates. For example, if one book has downloaded comments and the other has not. I could just open the books, take a quick look and see if they are the same before I remove them.

The page-function could be used with your page-count plugin. If I see a possible duplicate book with same number of pages (or +/- 1) the change it is a duplicate increases. A book of 100 and 326 pages are more likely to be different.
So in stead of pages, you could make it a custom-field compare to compare 2 int ore floating fields
This than would directly add the option to hide books with the same name but a different series-index (a filed that could be custom set by the user)

***EDIT***
One (manually filtered) example is in the screenshot below. As you can see I added a [other version] for some books to remove them from title check.
You can also see the difference in booksize / page-numbers. They are all non-duplicates, filtering on pages would exclude these books from view.
Attached Thumbnails
Click image for larger version

Name:	duplicate_baantjer.jpg
Views:	871
Size:	157.1 KB
ID:	72117  

Last edited by drMerry; 05-31-2011 at 09:26 AM.
drMerry is offline   Reply With Quote
Old 05-31-2011, 09:30 AM   #69
drMerry
Addict
drMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmos
 
drMerry's Avatar
 
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
Quote:
Originally Posted by kiwidude View Post
Firstly - for duplicate books that come back with the same title and author, suddenly saying they are not duplicates of each other because of something about their book formats doesn't make sense.
At the other hand, this is something you do yourself too of course.
If I run a check, 2 books are said duplicates. In another check, these books are maybe not shown, or are shown to be duplicates of other books.

This is just based on options you select before you run your test. So if the filter options are in that screen, users know they can get have other results than when they select other options...
drMerry is offline   Reply With Quote
Old 06-12-2011, 08:23 AM   #70
drMerry
Addict
drMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmos
 
drMerry's Avatar
 
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
The plugin has trown an exception (after all thise time, so it is a (one of the?) very rare error)

Spoiler:
calibre, version 0.8.5
ERROR: Exceptie niet opgevangen: <b>NameError</b>:global name 'error_dialog' is not defined

Traceback (most recent call last):
File "calibre_plugins.find_duplicates.action", line 169, in mark_groups_as_duplicate_exemptions
File "calibre_plugins.find_duplicates.duplicates", line 276, in check_can_mark_exemption
NameError: global name 'error_dialog' is not defined
drMerry is offline   Reply With Quote
Old 06-12-2011, 09:17 AM   #71
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,688
Karma: 2162246
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
Thx drMerry. The non-existent value add of Eclipse/PyDev as a Python development environment strikes again. Code and fix, code and fix... new version up shortly.
kiwidude is offline   Reply With Quote
Old 06-12-2011, 09:36 AM   #72
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,688
Karma: 2162246
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
v1.1.1 Released

Changes in this release:
  • Add van to list of ignored author words
  • Fix bug of error dialog not referenced correctly
kiwidude is offline   Reply With Quote
Old 06-13-2011, 04:51 AM   #73
Philosopher
Connoisseur
Philosopher began at the beginning.
 
Philosopher's Avatar
 
Posts: 77
Karma: 12
Join Date: Jun 2010
Device: Kindle
Is it possible to integrate this plug-in with the Jobs indicator to monitor its progress and know that it is still working?
Philosopher is offline   Reply With Quote
Old 06-13-2011, 06:22 AM   #74
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,688
Karma: 2162246
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
Not with the jobs indicator, no, as this plugin does not use jobs. For most searches it should complete within a few seconds. I contemplated a progress dialog but that would slow it down for most searches. How large is your database, what search are you running and how long does it take?
kiwidude is offline   Reply With Quote
Old 06-13-2011, 06:51 AM   #75
Loeffel
Connoisseur
Loeffel began at the beginning.
 
Loeffel's Avatar
 
Posts: 58
Karma: 10
Join Date: Mar 2011
Device: Kindle 3 3G
Hello Kiwidude,

just read the last update history. With "van" you mean the addition to some names that come from former or actual titles of nobility?
May I ask what else do you have in the exception list? I assmume "Mc" and "von", if I understand this right.
Loeffel is offline   Reply With Quote
Reply

Tags
cross library duplicates, in library duplicates


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[GUI Plugin] Quality Check kiwidude Plugins 1214 11-14-2024 12:05 PM
[GUI Plugin] Generate Cover kiwidude Plugins 834 11-09-2024 01:39 AM
[GUI Plugin] View Manager kiwidude Plugins 415 05-11-2024 04:28 AM
[GUI Plugin] Open With kiwidude Plugins 403 04-01-2024 09:39 AM
[GUI Plugin] Plugin Updater **Deprecated** kiwidude Plugins 159 06-19-2011 01:27 PM


All times are GMT -4. The time now is 11:29 PM.


MobileRead.com is a privately owned, operated and funded community.