04-26-2011, 06:08 PM | #1 |
Calibre Plugins Developer
Posts: 4,685
Karma: 2162246
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
[GUI Plugin] Find Duplicates
This plugin will help you to identify duplicate authors, titles, formats, series, publishers, tags and identifiers in your calibre libraries.
If duplicates are found, you are presented the results with the ability to resolve the variations (e.g. by deleting or merging). You can also exclude from future duplicate comparisons. Main Features:
Special Notes:
Paypal Donations: Last edited by kiwidude; 03-17-2024 at 12:28 AM. Reason: New version |
04-26-2011, 06:53 PM | #2 |
Reader
Posts: 46
Karma: 162
Join Date: Nov 2010
Location: Hannover
Device: Kindle KB and Kindle Fire HD 8.9
|
Thank you. Works good.
|
Advert | |
|
04-27-2011, 12:37 AM | #3 | |
Junior Member
Posts: 2
Karma: 10
Join Date: Feb 2011
Device: kobo
|
I installed this in version 0.7.57 using the newest version of plugin updater and get this error when I try to open the drop down menu. I get a different version of this error when I just click the Find Duplicates icon on the toolbar.
Quote:
|
|
04-27-2011, 01:18 AM | #4 |
Member
Posts: 21
Karma: 10
Join Date: Mar 2011
Device: Kindle
|
So out of curiosity, why couldn't this be a content based search instead of title/author
calibre can read the contents and display them I know it would take longer but if you have a book with 95% of the same words, its probably a dupe regardless |
04-27-2011, 01:51 AM | #5 |
US Navy, Retired
Posts: 9,867
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
I think you may have answered your own question. I'm not a programmer but after following the discussion in the thread that came up with this plugin I think saying it would take longer might just be a bit of an understatement.
Then again, what the heck do I know. The reply should be educational. |
Advert | |
|
04-27-2011, 01:54 AM | #6 | |
Connoisseur
Posts: 94
Karma: 124056
Join Date: Nov 2010
Location: Canada
Device: Kobo Clara HD, Kindle Paperwhite 10th Gen, Kindle 7th Gen
|
Quote:
|
|
04-27-2011, 04:26 AM | #7 | |
Calibre Plugins Developer
Posts: 4,685
Karma: 2162246
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Quote:
For a start, every format of every book has to be converted to a single format. If you have ever seen the posts on this forum about how it took one particular conversion x hours to run - well multiply that out for users with large libraries and you can see it would have a running time of days if not weeks. What about all those books that calibre can't convert, like image based PDF files, CBZ files etc? Or people who have empty book entries for wish list items or representing their paperback editions which have no electronic versions to compare? Don't those deserve duplicate consideration too? Then to round it all off, every time you add even just a single book format to your library, you would have to incur the whole penalty all over again, as it must compare that books content with every other book. Well unless you kept that whole temp directory structure of hundreds of thousands if not millions of files around, but even then you must still incur a very expensive cost of reading all the file contents and applying a fuzzy heuristic to compare the text. By comparison, with this plugin I can test 40000 books in under a second and once my exemptions are in place any future comparisons will take negligible time to perform and maintain. That is not to say a content based search would not have some advantages of course. One problem this plugin cannot help you with is books that had the wrong filename or metadata when imported. So you think you have book 5 in a series but in actual fact it Is just a copy of book 3 or whatever. However a visual inspection will reveal that, which you should do before you merge identical formats anyways. That was one of the reasons I requested starson to enhance automerge so that identical formats do not have to be discarded, giving you a chance to compare them first. So, there are some of the reasons why I didn't take that approach. It just isn't workable in my opinion, or certainly not for many users. |
|
04-27-2011, 05:08 AM | #8 | |
Calibre Plugins Developer
Posts: 4,685
Karma: 2162246
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Quote:
|
|
04-27-2011, 12:29 PM | #9 |
Junior Member
Posts: 2
Karma: 10
Join Date: Feb 2011
Device: kobo
|
Deleted the Find Duplicates.json and that fixed it. Thank you
Last edited by snafa; 04-27-2011 at 12:43 PM. |
04-27-2011, 03:10 PM | #10 |
Addict
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
|
A little (big?) problem.
Calibre on old pc. Using 1.2GB of mem! on a special search Error removed from clipboard on killing calibre Used the plugin on db known by kiwidude I exempt the large list of duplicates (also known by kd) used: Title soundex 8 author similar Show all groups Sort groups by number of duplicates Calibre mem size on start: 130mb So mem expanded about 10 times. After closing error, calibre was still open, mem did not decrease. EDIT: while ctrl + \ was lost. I added \ as next-shortcut Last edited by drMerry; 04-27-2011 at 03:11 PM. Reason: added custom made option |
04-27-2011, 03:30 PM | #11 |
Addict
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
|
Feature request:
It would be nice to add an option to exempt books based on author, title-part or tag and authors based on tags or part of name. Then it would be possible to exempt: books of calibre (news) books with special tag (other version, second edition) books with special part in name [other version] [.. edition] <- tricky, what would you do in case of 4th edition, 4th edition and 5th edition. Ignore all or show the 2 4th edition versions? Authors with special label (my fav author, English Author 1950, Dutch Author 1968) Authors with special parts (Jr.) |
04-27-2011, 03:37 PM | #12 |
Calibre Plugins Developer
Posts: 4,685
Karma: 2162246
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
drMerry - re your "feature request" - you can do this already by applying a search restriction before you do your duplicate search. So come up with a search that covers all the stuff you want to exempt, for reuse puposes save that as a saved search, set it as the search restriction and you should be good to go.
Re your other problem. Memory usage during "normal" comparisons isn't an issue. I suspect what you have done however is created an enormous exemption group. How many members did it have in it? That is something we may need to think of a more optimal storage strategy for, because you end up with some kind of logarithmic or exponential storage problem if your groups starting having hundreds (or more) members in that you try to exempt. |
04-27-2011, 03:48 PM | #13 |
Addict
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
|
@1 This is a good solution I think because it is already part of Calibre. At the other hand, I already have a lot of this searches, but that is a personal thing, solution works for me.
@2 I've a large group indeed. I exempt the books that gave a problem previous (did not yet rename them) I exempt books I previous marked as not duplicate (put [other version] in title) So at the moment there are 269 books exempt (no need if I use solution for 1) The script is (even fast (I have 2 pc's, even on my old pc it is a fast process, with more exempts it is slower) So I think a complete test would be no big problem. To solve the problem maybe you could use the following workflow (do not know how it is implemented at this moment): A: Spoiler:
B: Spoiler:
|
04-27-2011, 03:58 PM | #14 |
Calibre Plugins Developer
Posts: 4,685
Karma: 2162246
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
@drMerry - I think the simplest solution to your issue right now is to do Show all book exemptions, remove all those ones in that group, and instead use a search restriction before you search for duplicates.
The problem I believe is due to the way exemptions are stored, as every book is being stored as being exempt with every other book. This isn't a scalable approach if (as you have) your group contains a massive number of books. Right now I will see what others think on the dev thread about how we solve it - either we prevent you marking the group as exempt in the first place by putting in a threshold, or we change the way exemptions are stored. However you have a workaround in the meantime I believe. In what I would term "normal" usage your exemption groups should not be very big - the 99% scenario I perceive as being 2-3 books/authors in a group. However allowing very fuzzy searches and in your case storing a large number of near duplicate titles as people will have who store magazines etc this situation will arise. |
04-27-2011, 04:03 PM | #15 |
Addict
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
|
Correction
Option 1 is not the same. I can add a filter but if I add a filter like not Title:"2nd edition" It would not show duplicates for 2nd edition, 2nd-edition and 2 nd edition If the option was provided in exempt, it is provided on the plugin and on run-time. So 2nd-edition would match 2nd edition and show it because it is a new book. It would also show new books with 2nd edition because your exempts are set based on books. New books would not yet have the exempt flag set (the flag is set on books at the moment I add a tag, not at every time the plugin runs) |
Tags |
cross library duplicates, in library duplicates |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[GUI Plugin] Generate Cover | kiwidude | Plugins | 834 | Today 01:39 AM |
[GUI Plugin] Quality Check | kiwidude | Plugins | 1207 | Yesterday 09:39 PM |
[GUI Plugin] View Manager | kiwidude | Plugins | 415 | 05-11-2024 04:28 AM |
[GUI Plugin] Open With | kiwidude | Plugins | 403 | 04-01-2024 09:39 AM |
[GUI Plugin] Plugin Updater **Deprecated** | kiwidude | Plugins | 159 | 06-19-2011 01:27 PM |