10-31-2022, 09:23 AM | #1051 | |
Grand Sorcerer
Posts: 12,038
Karma: 7257323
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Quote:
Code:
python: def evaluate(book, context): db = context.db new_marks = remove_val_from_marks(db, 'bbb') db.data.set_marked_ids(new_marks) return 'a string' def remove_val_from_marks(db, val): return {k:v for k,v in db.data.marked_ids.items() if v != val} Last edited by chaley; 10-31-2022 at 09:26 AM. |
|
10-31-2022, 06:37 PM | #1052 |
Calibre Plugins Developer
Posts: 4,688
Karma: 2162246
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Find Duplicates v1.10.7 Released
Release Notes:
https://github.com/kiwidude68/calibr...icates-v1.10.7 Thanks to @chaley for the code suggestion! @Eddie87 you should now be able to do that workflow discussed of applying a custom marker to the results, clearing the virtual library and then searching for your custom marker. |
Advert | |
|
10-31-2022, 08:38 PM | #1053 | |
Guru
Posts: 774
Karma: 340954
Join Date: Sep 2017
Location: Argentina
Device: moon+ reader, kindle paperwhite
|
Quote:
|
|
11-01-2022, 06:20 PM | #1054 | |
Junior Member
Posts: 5
Karma: 10
Join Date: Oct 2022
Device: Kindle
|
Thanks a lot!!
Quote:
My main library which I update every day and is the "master" for ceirtain books; today has 51310 books. On that one I add books, run your plugin to do a binary compare, I remove the newly added binare duplicates and then I compare title/author again using your plugin, and finally I manually check old and new versions and decide what version to keep among the duplicates. I also have another one that includes more books (90271 today), on that I also add books every now and then, I also use your plugin to maintain. A couple of times a month, I make sure to copy the ones that are on the "master" one and are not in the second one, for that I use the binary compare and copy the ones NOT in common. Today I used the new plugin and found 50640 duplicated and 670 not marked after I removed the virtual library, all went fine today. Comparison takes a while that's te reason I only binary compare the libraries once or twice a month. |
|
11-07-2022, 08:02 AM | #1055 |
Custom User Title
Posts: 9,575
Karma: 64960983
Join Date: Oct 2018
Location: Canada
Device: Kobo Libra H2O, formerly Aura HD
|
Checking duplicates by identifier requires a separate search for each id type. Would it make sense to have "any" as an option in the dropdown?
|
Advert | |
|
11-07-2022, 09:18 AM | #1056 | |
Calibre Plugins Developer
Posts: 4,688
Karma: 2162246
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Quote:
There are also two other complications to consider. The first is the same problem that a binary book duplicate search has - the user would not know "which" identifier is the duplicate in each pair. Secondly it is entirely feasible for users to have hundreds or even thousands of different identifier types with all sorts of urn numbers etc (see the discussion recently in this thread about the dropdown of identifiers exploding in size), which might make such this search extremely slow. |
|
11-16-2022, 07:28 AM | #1057 |
Custom User Title
Posts: 9,575
Karma: 64960983
Join Date: Oct 2018
Location: Canada
Device: Kobo Libra H2O, formerly Aura HD
|
I found a glitch that may be partially the result of Find Duplicates. As I'm not entirely sure and didn't want to crosspost it, I posted here:
https://www.mobileread.com/forums/sh....php?p=4274079 Thanks |
12-04-2022, 06:16 PM | #1058 |
Junior Member
Posts: 4
Karma: 10
Join Date: Dec 2022
Device: Several
|
Hi Kiwidude and all,
I have been using the Find Duplicates a lot. It is incredibly powerful and useful. However, after it shows a long list of resulting lines, I would like to be able to select (or mark) at once all the lines but the first line of each group. Is there a way to do that? I just cannot find it (after a lot of trial and googling). I would gain a lot of time. After selecting, I could check and adjust the selection (exclude false duplicates) ; then delete all the selected lines at once. If it is not currently possible, then maybe you could consider including that in a future version of the plugin ? In that case some more options would be nice to have, for instance exclude the smallest or the biggest book of each group when building the selection (instead of excluding the first one). I am aware that the plugin can remove duplicate files after a binary compare. But I have hundreds of real duplicates that appear with identical or soundex matching, while not being not exactly the same file. Thanks |
12-11-2022, 02:12 PM | #1059 |
Junior Member
Posts: 4
Karma: 10
Join Date: Dec 2022
Device: Several
|
Hello again... Well, anybody, no clue, really ? After finding duplicates, I would like to be able to select (or mark) at once all the epub rows but the first line of each group. Is there a way to do that?
Thus providing a much quicker way of deleting a huge number of duplicates (even when the files are not exact duplicates). Thanks ! |
12-11-2022, 03:55 PM | #1060 |
Well trained by Cats
Posts: 30,452
Karma: 58055868
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
IMHO
Don't Nothing says the first is the better version. Add the count pages Plugin (and configure at least the Pages column) now you have an additional detail to consider Next, consider the metadata shown for each. Some, All, mostly cr*p Now choose, repeat the consider for each book Use an Intake Library, Use the Find Library Duplicates option before the merge (Copy to Library: <main one>: Delete ) |
12-11-2022, 06:15 PM | #1061 | |
Wizard
Posts: 1,139
Karma: 1954142
Join Date: Aug 2015
Device: Kindle
|
Quote:
|
|
12-11-2022, 07:07 PM | #1062 |
Junior Member
Posts: 4
Karma: 10
Join Date: Dec 2022
Device: Several
|
Many thanks, theducks and capink, very clever ideas ! However only for the case of identical titles, authors and languages, if I understood you.
First I followed theducks process : I found these duplicates (same titles, authors) and corrected their language field. I adjusted the merging options for copying to another library. I adjusted this selectes list of books (using a special column) to manage the case of different formats (PDF vs EPUB) I created a temporary library, and moved these selected books there, then back to my main library. This allowed to delete 70 rows within a list of 140 duplicates, in a few minutes : thank you for that ! And I also used your idea to look at the pages count. But now 1000 duplicates with similar titles and similar author still remain... And most are real duplicates. In many cases, the author is spelled lightly differently between two duplicates. I know that the plugin can address that, but it seems a very long process. Then I tried capink method. Through Options/Advanced I associated the CTL+K shortcut with Add to "DUP" list in the Reading List plugin. This allows to keep trace of the 1000 Find Duplicates results after quitting the Find Duplicates plugin (another way would be to write in some custom column). But then I am more or less stuck. I can manually delete them, but I have 1000 rows. So I still guess it would very very useful to be able to select all rows but the first one in each duplicates group. Or if this is not possible for technical reasons, maybe it is possible to export and reimport the list to excel ? Then I would edit it within excel. I will investigate further, but not tonight Thanks again |
12-13-2022, 12:07 PM | #1063 |
Junior Member
Posts: 4
Karma: 10
Join Date: Dec 2022
Device: Several
|
Hello again
For the information of whoever interested - Eventually, I found a way to delete quickly a list of 1000 probable duplicates: Find Duplicates plugin with same titles but similar authors : 2000 rows. I selected only the epub books. I move them to a temporary library (slow but automatic). Then I export that list to excel (through Create a catalog / CSV). With excel formulas, I find the lines where the same line has the same title and a similar number of pages (threshold of 80 pages). I export them to a CSV file, keeping the columns : UUID, title, authors. Then I use the Import List Plugin to import this CSV with matching method = UUID. I import that in a DUP list created with the Reading List plugin. Then I delete the 1000 books at once in the DUP list. Thus I manage by hand only a few cases needing special attention. Many thanks again |
12-13-2022, 01:03 PM | #1064 |
the rook, bossing Never.
Posts: 12,378
Karma: 92073397
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
Beware identical titles that are unrelated books:
Bambi (one is woodland, other a romance, a nickname) Dancer's Luck (One is SF&F fantasy and other Ballet) My problem is different editions with same or slightly differing titles (and or author name) and stupid Gutenberg often puts their release date rather than print edition date in Published Metadata date. I've not figured out yet how to use the plug-in. Also I don't want to delete an older duplicate as different versions might have been sent to different ereaders. Should I add a preference rank column for the same book but content doesn't exactly match? |
12-13-2022, 02:11 PM | #1065 |
Well trained by Cats
Posts: 30,452
Karma: 58055868
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Find Duplicates Removes NOTHING, so feel free to experiment and peruse the results. Since they are marked in the regular GUI, you can View... Edit metadata (further. FD uses Calibre metadata, not the book)
The types of searches are configurable. The Search method is configurable. AND you can exempt books from future results |
Tags |
cross library duplicates, in library duplicates |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[GUI Plugin] Quality Check | kiwidude | Plugins | 1214 | 11-14-2024 12:05 PM |
[GUI Plugin] Generate Cover | kiwidude | Plugins | 834 | 11-09-2024 01:39 AM |
[GUI Plugin] View Manager | kiwidude | Plugins | 415 | 05-11-2024 04:28 AM |
[GUI Plugin] Open With | kiwidude | Plugins | 403 | 04-01-2024 09:39 AM |
[GUI Plugin] Plugin Updater **Deprecated** | kiwidude | Plugins | 159 | 06-19-2011 01:27 PM |