05-24-2011, 06:40 AM | #61 |
Calibre Plugins Developer
Posts: 4,685
Karma: 2162246
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Find duplicates has no preferences.
|
05-24-2011, 06:41 AM | #62 |
Addict
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
|
|
Advert | |
|
05-27-2011, 03:13 AM | #63 |
Junior Member
Posts: 2
Karma: 10
Join Date: May 2011
Device: Kindle
|
It appears to be able to. If I try and change the settings on Find Duplicates (1.1.0) by Grant Drake via the Customize Plugin button I get a requester allowing me to specify the Keyboard shortcuts.
I presume this is what should happen? |
05-27-2011, 03:52 AM | #64 |
Calibre Plugins Developer
Posts: 4,685
Karma: 2162246
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
@htweedie - yes that is correct, had forgotten it had keyboard shortcut preferences. Though these bear no relevance to the issue you posted. Did you get it resolved? If not, someone will need to try to replicate it. Exactly what steps did you take in your librarry - what search restrictions did you have in place if any, what kind of duplicate search options did you check etc. It looks like a possible bug in Calibre but as you have posted in this thread I am presuming you did some action with this plugin to cause it?
|
05-30-2011, 06:58 PM | #65 |
Addict
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
|
I was thinking of 3 new criteria for duplicate file finding. This are criteria that are a 'second pass'. So first there is a duplicate check on the normal way. After that, if results are found, a new check is done to match any of these criteria
1. Has same file type true false no-check 2. Has a max-difference in pages of: 3. Has a max-difference in size of: If you have this list (test same author, same title): And Then There Were None - Agatha Christie - EPUB - 0.1MB - 180 pages And Then There Were None - Agatha Christie - EPUB - 0.1MB - 199 pages And Then There Were None - Agatha Christie - EPUB - 0.5MB - 180 pages And Then There Were None - Agatha Christie - PDF - 0.1MB - 180 pages Anna Karenina - Graf Leo Tolstoy - PDF - 0.1MB - 180 pages Anna Karenina - Graf Leo Tolstoy - PDF - 0.2MB - 180 pages Results would be: Spoiler:
The advantage would be that you could filter for example books with great page or size difference, there books are likely to be no duplicate. Books with just 1 or 2 pages difference is more likely to be duplicate. The advantage of option 1 would be in case you have books with different file-formats, If you should only have books with different formats in your view, it is easy to perform a merge-action on it. All options should be optional because your current search should have to work like it does now of course. |
Advert | |
|
05-30-2011, 07:56 PM | #66 |
Calibre Plugins Developer
Posts: 4,685
Karma: 2162246
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
There are a number of issues with this.
Firstly - for duplicate books that come back with the same title and author, suddenly saying they are not duplicates of each other because of something about their book formats doesn't make sense. There is no consistency with what formats you may have associated to each book record, and whether they do or do not overlap formats. Secondly, "pages" is not an available property of a book. It is something that can only be approximated with a computation. And for formats other than ePub or Mobi, that computation requires a conversion. So your Find Duplicates check will now take forever to run. Thirdly I hold scepticism about using things like number of pages or file size to dictate whether two books are the "same". Certainly you can tell they are "different", but you can never say with any certainty they are the "same". Particularly given the widly differing approximations you get from page calculations. And having just a higher res image significantly skews file sizes. So I really don't see how it can do anything other than tell you they differ. The only thing that tells you books are definitely the same is a binary comparison. Finally, doing anything at "format" level is problematic with the Calibre UI. There is no way for the UI to show rows of book formats, you can only see books. This has already been discussed/highlighted by the binary duplicate check which is at format level. This plugin is primarily about finding duplicate book records. If it is bringing back books which you don't think are duplicates of each other (as happens increasingly the fuzzier the algorithm), then the appropriate solution to that is to create the exclusions for those authors or titles. So then the only other issue is having got book records which you know *are* duplicate books, how do you resolve the formats they contain. And again I keep coming back to that being a merge issue, not a find duplicate book issue. Though I would never use #pages or file size to tell me which format to keep when merging. You always have to open the formats side by side to decide that. You could have a crappy PDF conversion with completely screwed paragraphs with blank lines in between totally affect the page count. And as I said above images alone can dramatically affect file size. You either have some scenario in mind where you think pages/file size would be useful, or you are just proposing some random thoughts. I don't mind random thoughts as sometimes they spark better ones, but in this case I don't see where you are going with this one? |
05-30-2011, 08:19 PM | #67 | ||
Groupie
Posts: 156
Karma: 10001
Join Date: Feb 2011
Device: sony
|
Quote:
Quote:
So I added 7zipFM to the OpenWith plugin. So if I suspect two epubs are identical (except for metadata/timestamps) I can quickly eyeball the CRC's of the OEPBS/OPS/whatever folder. Works for me By the way -- Find Duplicates is a truly elegant example of good design & functionality. Now if I could only apply it to the boxes of books in my closet, and attic, and basement ... |
||
05-31-2011, 09:18 AM | #68 | |
Addict
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
|
Quote:
Well, I do have a scenario in mind. As I said, this function is a second pass, filtering duplicate files. So it does not say that files are unmarked as possible duplicates, it just filters the results one way. For example, when I run some duplicate checks, I get in return a list of 1200 books. All possible duplicates. Let's say I have these duplicates inside the list: marked:duplicate_group_0001: Book A Epub Book C pdf marked:duplicate_group_0002: Book A Epub Book B Epub (is a binary duplicate of A) If I added the different formats, I would only see group 1 giving me the option to easily merge this group. So I can eliminate some of the dups a lot faster. For book size (of course you can't tell dups by size, but as this is a filter after the dup test...) it is a little different. When I have 1200 possible duplicate books, I would be happy to see all books with a small size-difference. When I see a book of 0.7MB and one of 12.3 MB, I can imagine the content of the book is not the same (technical information, presentation for user can be (bmp <-> jpg)). But if I could only see the books having say, less than 1k difference, I would have a list of books that are far more likely to be duplicates. For example, if one book has downloaded comments and the other has not. I could just open the books, take a quick look and see if they are the same before I remove them. The page-function could be used with your page-count plugin. If I see a possible duplicate book with same number of pages (or +/- 1) the change it is a duplicate increases. A book of 100 and 326 pages are more likely to be different. So in stead of pages, you could make it a custom-field compare to compare 2 int ore floating fields This than would directly add the option to hide books with the same name but a different series-index (a filed that could be custom set by the user) ***EDIT*** One (manually filtered) example is in the screenshot below. As you can see I added a [other version] for some books to remove them from title check. You can also see the difference in booksize / page-numbers. They are all non-duplicates, filtering on pages would exclude these books from view. Last edited by drMerry; 05-31-2011 at 09:26 AM. |
|
05-31-2011, 09:30 AM | #69 | |
Addict
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
|
Quote:
If I run a check, 2 books are said duplicates. In another check, these books are maybe not shown, or are shown to be duplicates of other books. This is just based on options you select before you run your test. So if the filter options are in that screen, users know they can get have other results than when they select other options... |
|
06-12-2011, 08:23 AM | #70 |
Addict
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
|
The plugin has trown an exception (after all thise time, so it is a (one of the?) very rare error)
Spoiler:
|
06-12-2011, 09:17 AM | #71 |
Calibre Plugins Developer
Posts: 4,685
Karma: 2162246
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Thx drMerry. The non-existent value add of Eclipse/PyDev as a Python development environment strikes again. Code and fix, code and fix... new version up shortly.
|
06-12-2011, 09:36 AM | #72 |
Calibre Plugins Developer
Posts: 4,685
Karma: 2162246
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
v1.1.1 Released
Changes in this release:
|
06-13-2011, 04:51 AM | #73 |
Connoisseur
Posts: 77
Karma: 12
Join Date: Jun 2010
Device: Kindle
|
Is it possible to integrate this plug-in with the Jobs indicator to monitor its progress and know that it is still working?
|
06-13-2011, 06:22 AM | #74 |
Calibre Plugins Developer
Posts: 4,685
Karma: 2162246
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Not with the jobs indicator, no, as this plugin does not use jobs. For most searches it should complete within a few seconds. I contemplated a progress dialog but that would slow it down for most searches. How large is your database, what search are you running and how long does it take?
|
06-13-2011, 06:51 AM | #75 |
Connoisseur
Posts: 58
Karma: 10
Join Date: Mar 2011
Device: Kindle 3 3G
|
Hello Kiwidude,
just read the last update history. With "van" you mean the addition to some names that come from former or actual titles of nobility? May I ask what else do you have in the exception list? I assmume "Mc" and "von", if I understand this right. |
Tags |
cross library duplicates, in library duplicates |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[GUI Plugin] Generate Cover | kiwidude | Plugins | 834 | Today 01:39 AM |
[GUI Plugin] Quality Check | kiwidude | Plugins | 1207 | Yesterday 09:39 PM |
[GUI Plugin] View Manager | kiwidude | Plugins | 415 | 05-11-2024 04:28 AM |
[GUI Plugin] Open With | kiwidude | Plugins | 403 | 04-01-2024 09:39 AM |
[GUI Plugin] Plugin Updater **Deprecated** | kiwidude | Plugins | 159 | 06-19-2011 01:27 PM |