04-16-2011, 10:19 AM | #91 |
Grand Sorcerer
Posts: 11,866
Karma: 7036359
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Author matches
Ahhh, I get it now.
I think that dealing with authors as idolse suggests is in fact a different function, but one that could be integrated rather nicely by using restrictions. What we want to see is something like the tag browser, listing the authors that are similar. Assume I search for fuzzy authors, and get groups of similar authors. Lets assume I get 10 variants of E Smith. If I can set a restriction to a group, then in the tag browser I will see only those ten variants, along with any co-authors. By clicking on the authors in the TB, I can look at the books and change any authors I wish. Flat-out mistakes can be fixed directly on the tag browser. All this would take would be to have a mode/option where the restriction is set to a group instead of all duplicates. Of course, I can do this myself by using 'Next Group' to set the search bar, then using the new "*Current search" restriction option to copy that to the restriction. Given how easy this is, I am not convinced that it needs to be made into dup-check mode. However, I do note that because I changed the line in the restriction box to the search, it is a bit hard to know which line to select to copy the search. I think I will add the '*' to the beginning of the search so that the line in the combobox is more obvious. Perhaps it would work if 'show one group at a time' set the restriction but left the search empty? In fact, that might be better in general... |
04-16-2011, 10:41 AM | #92 |
Calibre Plugins Developer
Posts: 4,664
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
@charles - I'm going to need to think about what you said but fundamentally it sounds like what has been niggling in the back of my mind - wanting an author-centric view of the duplicates for this mode.
I think from as per my last few posts that it also requires different logic in determining the groups contents, rationalising/managing them as you navigate and any applying of exemptions. Using the tag browser to give the author centric filtered view is a great idea, particularly as you say you can quickly rename authors or ctrl+click to see certain combinations of authors you want to focus on. The "one group at a time" using restriction instead of search is something I hadn't considered. I guess my only slightest of hesitation is that our reliance on restrictions will preclude the user from (easily) doing any kind of more generic searching while they are contemplating that group. Say for instance you wanted to for some reason see all the books for that author. You would have to exit duplicate search mode first and then start it again. However that "limitation" would also apply to the show all duplicate groups mode which has a restriction. And perhaps it really just isn't necessary, you have the books you are considering duplicates in front of you so what more do you want I'll apply the restriction for one group at a time - at least that way the behaviour is consistent with show all groups. |
Advert | |
|
04-16-2011, 02:34 PM | #93 | |
Calibre Plugins Developer
Posts: 4,664
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Quote:
I'm sitting here looking into this now and just not liking it much My natural instinct (admittedly as I am testing rather than "using") is to ignore where the green highlighting is and grab a bunch of rows on screen that grab my attention. I'm thinking all the warning dialogs in the world are not going to get my brain around the fact that I selected rows 3,4,5 but rows 1 & 2 as the "current group" are what exemptions will get added for. The dialog will tell me, but after the first time the brain will just go "yeah yeah" and ignore it in future. I think the disconnect between row selections and actual affected rows is too great. So I propose one of two things: Either... (1) When you choose the menu option to Mark group as exempt, it moves the selection to the current group row(s) that are affected. Then the dialog appears. So you get a visual reminder (until you stop the dialog nagging of course). Or... (2) I go back to the idea of row selection based exemptions. My reasoning for not allowing the user to do free-form selection was to do with if the user did not select all the rows within a group, you can get loads of confusion about partitioning without re-running searches etc. And quite possibly the user isn't actually understanding what they are doing, and you would either end up with weird cross group exemptions that make no sense or nothing "happening" because behind the scenes we rationalise them out. So if... Group 1 has books (10,11,12) Group 2 has books (13,14) What does it mean if the user selects books 12 & 13? Do we think they mean that they are trying to say that 12 & 13 are not duplicates of each other? As clearly it makes no sense to create a duplicate exemption for them since they were not put in the same group. Alternatively, did they mean that 12 is not a duplicate of anything in its group and neither is 13 in its group? So they want exemptions created of (10,12), (11,12) and (13,14) resulting in only (10,11) being left? I think we have to give the user the benefit of the doubt in that they mean "not duplicates of each other". So the selection may contain a mixture of valid pairs plus invalid "single row from a group" selections. That leaves the final issue of this second option being updating the UI. Unless we re-run the search, then for various combinations nothing will actually change on screen. So perhaps we treat it like removing an exemption from the "show all" screen that it does exactly that of re-running the search after each time you mark a selection as exempt. So any partitioning gets applied etc. Thoughts? |
|
04-16-2011, 04:36 PM | #94 | |
Addict
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
|
Quote:
I think option 1 is a nice one. I myself like to be able to select them all quick. I've got more then 6000 books at the moment. Due to the fact I did not look very close when adding, and Calibre does not move, but copy the items, I have around 1000 duplicates in this set. It takes a lot of time if I get them one at a time. An other problem I found (maybe it is mentioned earlier, but had not much time to read it all and if not mentioned, I think it is important to know) on version 0.3 is the fact that the plugin does not use the selection part (drop down on left main screen). I have one author I know I have a lot of duplicates. I made a saved search on him and I made the sub selection. When I start the duplicate scan, It starts scanning all my files, not the 250 I had from this author. |
|
04-16-2011, 07:02 PM | #95 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
FYI, I get an error after searching for dupes if all dupes have either been removed or marked exempt.
Spoiler:
|
Advert | |
|
04-16-2011, 07:25 PM | #96 |
Calibre Plugins Developer
Posts: 4,664
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
@Starson17 - yeah I found that one too
|
04-17-2011, 05:49 AM | #97 | |
Grand Sorcerer
Posts: 11,866
Karma: 7036359
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Quote:
Of course, changing the selections will toss the ones I have, which can cause angst if I was getting ready to edit some metadata. However, I can exhibit learning behavior and not do that. |
|
04-17-2011, 05:55 AM | #98 | |
Calibre Plugins Developer
Posts: 4,664
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Quote:
|
|
04-17-2011, 07:37 AM | #99 |
Grand Sorcerer
Posts: 11,866
Karma: 7036359
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
@kiwidude: Found what seems to be a problem with the find_dups plugin initialization. The highlight config flag is being reset to False during startup, regardless of what the stored preference value is.
Sequence: - Start calibre. Highlighting is disabled. - Enable highlighting - Exit calibre, then restart - The highlighting flag is set to the saved value of True - Find_dups is initialized and clears it back to false. The stack trace for when the flag is erroneously set to False is: Code:
File "site.py", line 103, in main File "site.py", line 85, in run_entry_point File "calibre_dev\src\calibre\debug.py", line 187, in main File "calibre_dev\src\calibre\gui2\main.py", line 382, in main File "calibre_dev\src\calibre\gui2\main.py", line 286, in run_gui File "calibre_dev\src\calibre\gui2\main.py", line 253, in initialize File "calibre_dev\src\calibre\gui2\main.py", line 234, in initialize_db File "calibre_dev\src\calibre\gui2\main.py", line 203, in initialize_db_stage2 File "calibre_dev\src\calibre\gui2\main.py", line 159, in start_gui File "calibre_dev\src\calibre\gui2\ui.py", line 329, in initialize File "calibre_plugins.find_duplicates.action", line 50, in initialization_complete File "calibre_plugins.find_duplicates.duplicates", line 261, in __init__ File "calibre_plugins.find_duplicates.duplicates", line 277, in clear_duplicates_mode File "calibre_plugins.find_duplicates.duplicates", line 283, in restore_previous_gui_state File "calibre_dev\src\calibre\gui2\search_box.py", line 382, in set_highlight_only_button_icon |
04-17-2011, 07:44 AM | #100 |
Calibre Plugins Developer
Posts: 4,664
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Thx Charles I will re-test that scenario for the next version. All that code has had to change with hooking into the search cleared event. I don't know if I was just too tired when I wrote it but you have no idea how something that sounds so simple could cause me so many issues (as so many subtle ways that the event will get fired). I "think" I got there in the end but I will try to give it a good thrashing. It was a really good suggestion, nice to have a simple toolbar button to exit out of the search mode. It's just the permutations that in the end I took a bit of a brute force approach to ensuring I was not connected when I didn't want to be. It's to do with the independent nature of the plugin from other actions the user could be doing in the gui that makes it a bit more complex than I would have hoped.
@drMerry - the search is "supposed" to restrict the duplicate search respecting a value in that restriction combo on the left. Again I will retest this with the next version as again that code has all had to be massaged for this release. |
04-18-2011, 03:34 PM | #101 |
Calibre Plugins Developer
Posts: 4,664
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
v0.4 Beta
Got a bit distracted writing the new Goodreads metadata plugin for a few days but that is done and this got some attention again.
Changes in this beta:
I've set the minimum version requires of Calibre as 0.7.56 since 0.7.55 is known to have other issues. So... other than any new grenades I have planted in the code I think from a "Find duplicate book" perspective this is functionally complete? The big todo item now becomes the handling of the find duplicate author algorithms. I have given it zero thought since my last posts but am expecting it to have some significant differences that I need to think through before I ramble on about here again. Thinking through how I did things with my own tool I believe it will require a separate exemption list, as this is between author name pairings not book pairs. That has a fair few implications but am sure we will figure something out. As always, feedback appreciated. Last edited by kiwidude; 04-19-2011 at 12:45 PM. Reason: Later version in thread |
04-18-2011, 04:54 PM | #102 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Thinking out loud here - Suppose I tell you that AuthorA and AuthorB are not the same, even though the algorithm sees them as similar. Can I then say anything about whether BookA by AuthorA and BookB by AuthorB are the same? I suppose not. Father and son write a book, but I've got format 1 under Father's name and Format 2 under the son's name. |
|
04-19-2011, 04:45 AM | #103 | |
Calibre Plugins Developer
Posts: 4,664
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Quote:
So if I have Steve Smith and S. Smith as authors, and I decide that these are not duplicate authors from a duplicate author search. As I am displaying all books by those two authors at once before I make that exemption, that is my opportunity to make sure that any wrong author values on individual books between the two are rectified (this is where the Search the Internet plugin with fantastic fiction are gold to me). Then if it happened to be the case that both authors had written a book with a title that is similar enough to appear in a duplicate search, you might argue that it should automatically be excluded, as you have already said the author sets are distinct. However if we did this I see the potential issue of you adding another format for this book in future to your library where once again the author has the wrong value on it. Now you will never see it appear as a duplicate, unless you removed the author exclusions. That is a bit nasty and subtle. Note that unless you run the 'xxx title, ignore author' book algorithms you are unlikely to have an overlap for the above scenario as it needs a more fuzzy author match which will only be offered for author based searches, not book ones. Similar author just does punctuation and comma name flipping. And there must be a relatively small % of books in the world which are written with an exact enough title match by different authors that have such subtly different author names. So I think it is safer to not apply the author exclusion list to book searches and let the user make book based exemptions instead. At least that way if they import books in future with the wrong name on they have a chance of picking that up from a duplicate book based search. Not if it is a new title of course but there is only so much we an do! |
|
04-19-2011, 06:07 AM | #104 | |
Grand Sorcerer
Posts: 11,866
Karma: 7036359
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
First, the new version works very well.
Comments: - I like the manage duplicates dialog. - if I run a test that finds one group, then mark that group as exempt, I get the message "No further duplicate groups exist for 'None'". If I subsequently run the test, I get "No duplicate groups were found using 'similar title, similar author'". Perhaps the 'None' was supposed to be 'similar title, similar author'? - Using the restriction in 'One group at a time' mode does exactly what I expect and want. The tag browser is very useful for (tada) browsing, because it shows only the values for the books in question. I can quickly scan other metadata such as series and tags simply by looking that the items in the browser, rather than scrolling the library view and sorting. - I was unable to make anything break by pushing the clear button or by clearing the restriction. However, using the tag browser to do searches has the side effect of leaving duplicate_check mode when cycling through searches, because one of the states clears the search. I don't know if this is a problem, and if it is, I don't know how to fix it. - The problem where the use_marks configuration flag was being reset has been fixed. Quote:
As for mixing author exemptions with book exemptions, kiwidude's 'complexity of interaction' argument is spot-on. I imagine trying to write documentation describing how things work, and end up pulling my hair. Finally, and probably a red herring, there are situations where S Smith and Steve Smith are in fact the same author, but listed differently on purpose. This happens all the time in academic papers, where the author name varies slightly from paper to paper. Do I need another kind of exemption to handle these? I do recognize that other people might want to work differently. There is nothing that forces me to use author exemptions. My argument against them is based mostly on complexity, especially as this code will be integrated into trunk, where it might be touched (maintained) by more than one person as calibre evolves. |
|
04-19-2011, 06:56 AM | #105 | |||||
Calibre Plugins Developer
Posts: 4,664
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Quote:
Quote:
It is all caused by hooking into the wrong signal. What I am really interested in is the user clicking the clear button action on the toolbar, not in the search being cleared. I have added all sorts of filth to the code to try to disconnect/connect around doing actions which result in the search being cleared, but that doesn't work when as you say actions like tag browser clicking result in another scenario I can't differentiate between. I would like to rip all my filth out and instead directly hook into the triggered signal of the clear search button action. You have any objections/thoughts on that? I should have pulled the pin on my current hacks and proposed this days ago, but I was playing whack-a-mole with the event triggering instead of a fresh perspective. Quote:
So to make this plugin more complete/useful imho we *need* an ignore title based search. But the problem with trying to treat such searches as "book searches" is that our normal exemption model and grouping model does not fit. As I think we are all agreed on you will want to see all the books by those authors who have been found to be similar, to then be able to review what are genuine data entry/import errors versus author names that for whatever reason you decide are valid to be treated as not duplicates of each other. It also sounds like we are in agreement that trying to apply such author based exemptions to book searches is a bad idea. So that takes one aspect of the complexity out. Quote:
Quote:
I've only started last night thinking through all the implications and how it would fit. For instance when you are reviewing groups of authors, you are not going to want the "show all duplicates/highlight mode" option - instead it will be one group at a time and then the tag browser to filter within that group as you like or rename authors etc. So the Find duplicates dialog either needs a different dialog/menu option, or rearranging so that the options of how to view the results is either disabled or made a suboption of book based searches. But I need to finish reviewing what is involved before I know for sure the impact. There is already a house of cards that has started to have been built by the permutations of individual versus group review and in particular adding duplicate exemptions. I have no interest in making a rod for my own back or anyone else's by making this more complex than it is currently. However I am convinced we do need ignore title searches, and if I have to rewrite the way I have done the code so far to support them then better to do that now and get it sorted while it is fresh in my mind than down the track imho. Last edited by kiwidude; 04-19-2011 at 06:58 AM. |
|||||
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Duplicate Detection | Philosopher | Library Management | 114 | 09-08-2022 07:03 PM |
[GUI Plugin] Plugin Updater **Deprecated** | kiwidude | Plugins | 159 | 06-19-2011 12:27 PM |
Duplicate Detection | albill | Calibre | 2 | 10-26-2010 02:21 PM |
New Plugin Type Idea: Library Plugin | cgranade | Plugins | 3 | 09-15-2010 12:11 PM |
Help with Chapter detection | ubergeeksov | Calibre | 0 | 09-02-2010 04:56 AM |