01-24-2011, 02:16 AM | #16 |
Calibre Plugins Developer
Posts: 4,652
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Yeah, sounds like you have a plan. I think the only way to retain sanity is to add books in a very disciplined fashion such as by author or series as you suggest.
My original plan was to just "get everything in there" and then clean it up. Unfortunately I keep getting distracted by writing plugins etc so my backlog keeps growing and I am betwixt and between... I think I need to change my approach - I've written tools that do a lot of preprocessing outside of Calibre to workround the issues somewhat but it's really only bandaids and delays the inevitable. I think rather than the goal of getting everything into Calibre first, I will just start afresh with a new library and do author by author starting with the ones most likely to be read first. I'll obviously still have an enormous duplicated mish-mash mess of books for everything not yet processed, but I had that anyway before I found Calibre |
01-24-2011, 09:52 AM | #17 | ||||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
Quote:
In my case, I usually had only one really good master format and most of my duplicates were converted from that master. The master format would always be added to Calibre. My plan was to worry about the "best" format later. If I was unhappy with a format when I went to read it, I could look to see if I had a better one that was skipped during the import. Usually I would find one good master format in the record and could use Calibre's excellent conversion capabilities to get a copy that was even better than whatever had been skipped. Quote:
|
||||
01-24-2011, 11:18 AM | #18 |
Wizard
Posts: 4,552
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
|
I find that as I am past the bulk input stage, one of the commonest cases for encountering a duplciae is that I add a newer (and normally better) version of a book. I would certainly like an easier way of saying it is the NEWEST one that is important and over-write the existing copy. At the moment I find I have to go through the Edit Metadata route to achieve this (or add as a duplicate and then merge) - and cannot simply do it via the standard Add Books route.
Hopefully this will be one of the issues that will be kept in mind if a more effective dialog on what to dump and what to keep is developed. |
01-24-2011, 12:23 PM | #19 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
I was aware that my default of overwrite could cause loss of book formats you don't want overwritten. There was risk of an error by the code that identifies author/title, etc. of matching books, and pointed it out to Kovid when I uploaded the source, but for me, it just worked better, even with that risk (I always keep my source). Kovid felt that the standard philosophy of Calibre is to always default to the lower risk option - in this case that option was to not overwrite existing formats. I understood why he preferred that. For a period of time, I ran custom code with my reverse default, which worked better in my work flow. I thought about offering an option switch or tweak to "default overwrite" to control the option switch to "autosort/automerge", but there are already so many options ... |
|
01-24-2011, 01:38 PM | #20 |
Wizard
Posts: 4,552
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
|
I understand Kovid's point of view about being paranoid about unexpected data loss. I was just making the point that if someone is designing/working on a replacement dialog or feature the reverse use case where newest wins is not that uncommon. The trick will be to come up with a user-friendly way of offering the user control without, as you say, an overwhelming number of choices.
If I saw a simple solution I would already be suggesting it . Maybe a user preference with a simple way of toggling it might be the answer as one tends to be working in one mode during bulk import and another in daily library maintenance. |
01-24-2011, 01:47 PM | #21 | ||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
|
||
01-24-2011, 06:08 PM | #22 | ||
Calibre Plugins Developer
Posts: 4,652
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Quote:
Quote:
Well to be honest in the last thread I discussed this with you my reading of the responses was that everyone was waiting for a new merge dialog. I have a number of suggestions/ideas for possible approaches and even some slightly better Python/Qt skills to be able to contribute. I figured there was no point in pursuing it further if there was no real interest in some "partial" solutions even if they do address the main issues I face. I completely understand your lack of motivation to be too involved given you no longer have a need to do bulk imports yourself |
||
01-25-2011, 09:29 AM | #23 | |||||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
Quote:
Quote:
Quote:
That said, however, it's quite possible that work is in the far future, and/or that it won't be all that useful, so if you are motivated to provide a partial solution to problems you face - feel free to work on that area and submit it to Kovid. We are in agreement that there's room for improvement there. |
|||||
01-25-2011, 11:21 AM | #24 |
creator of calibre
Posts: 44,014
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Just so you know I'm working on a refactoring of the metadata download system right now, which will likely include a way to specify what metadata should be merged, both from individual sources and overall.
|
01-25-2011, 07:03 PM | #25 | |
Calibre Plugins Developer
Posts: 4,652
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Quote:
Appreciate the update from Kovid on the merge stuff etc, look forward to the results in future. There were a couple of ideas I had a while ago in the absence of a full-on dialog. I think the dialog will be great for when you have multiple book records in Calibre and want to merge them. However I am curious to see where it's involvement may be in the processing of actually adding books. IMHO I am not sure I *always* want to be interactively prompted when doing bulk adds. Importing can already be a fairly involved and time consuming process of cleaning up filenames, adding certain subdirectories of files, separating html folder imports of one per folder from multiple books per folder to use different add menus, deleting your input folders once in Calibre etc. If you were adding a lot of books with a lot of duplicates, any interactive dialog forcing you to make choices then and there might not be practical given how time consuming it can be to open each version up and decide a "winner". Now if you are only adding a single or small number of books, an interactive choice might be desirable - don't put off until tomorrow what can be done today and all that. But what if you need to stop/do something else partway through? What does Calibre do with all the "unresolved" conflicts? Any kind of "abort" in the process can leave you with a messy mish-mash of partially imported books from a subfolder tree, and an absolute nightmare to "continue on" from. So one approach to this which I briefly mentioned in a previous thread would be an additional option for the automerge. Currently when you turn it on, any new formats for an existing book get merged, and any duplicate formats get thrown away. What I would like is the same behaviour for new formats, but that duplicate formats get created as a new book entry in Calibre, and that the two books then get marked as being duplicates. For instance just add a "Duplicate" tag to both entries. Then on my rainy day when I finally get around to cleaning up my Calibre entries I can just do a search for the "Duplicate" tag. Sort by author/title to see the conflicts I need to resolve and go through a review/merge process in my own time with them. That way I have both formats safely stored in Calibre, can safely delete my source folders and can continue adding stuff in bulk. Additional duplications of the same format would create further "Duplicate" tagged books in Calibre. Just random rambling thoughts. As I said in a previous post I think I am going to have to start again with a fresh library and change the way I add books as the way Calibre handles this "today" isn't quite working for me. An additional option such as I suggested above to the automerge would dramatically improve things. Then the final icing would be an addition to your "merge formats only" menu option to popup a dialog in the case of conflicts of formats, allows me to launch viewers for each duplicate format (a side-by-side mode in ebook-viewer.exe would be amazing but that's a pipe dream), select/tick which versions to keep, I remove the "Duplicate" tag and job done... Last edited by kiwidude; 01-25-2011 at 07:13 PM. Reason: typos |
|
01-26-2011, 10:16 AM | #26 | ||||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
Quote:
Quote:
|
||||
01-26-2011, 06:07 PM | #27 | ||
Calibre Plugins Developer
Posts: 4,652
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Quote:
One thing I don't understand (perhaps it is legacy code not yet addressed?) is why you would have different matching logic between having automerge turned on or off. Surely a duplicate is a duplicate - you either automatically merge it using the choice in Preferences, or you prompt the user what to do interactively (giving them the three choices)? Quote:
I wonder if running a duplicates search could be done as a GUI plugin. However I am hesitant to start investigating down that plugin route unless Kovid agrees (after all we can just deprecate the plugin later) as it seems like a feature that he perhaps may want built into Calibre to give wider user exposure. Plus he could obviously write it way better than I would anyway, though he has to find the precious time to do it first. The sub-options within automerge such as "create new book for duplicate formats" would require Calibre source changes of course. |
||
01-27-2011, 08:37 PM | #28 |
Junior Member
Posts: 2
Karma: 10
Join Date: Jun 2010
Device: none
|
I am collecting books and stuff for a similar to project alice,, (massive data upload of literature seems interesting to me for a ai if based on vml, soo my collection is large but I also spidered webpages and pdf'ed them too, and well I just like have a private archive to search and sift as we get connection problems up here and websplits often. I sometimes play the ebooks as audio, (depending on the format which tools I use if at home I use a winxp machine with msreader read out loud via .lit files or, pdf read out loud (adobe acrobat though been considering trying some other stuff) my linux machine mainly a fedora 14 uses carnival/ festival, (I also must confess the linux machine is running my calibre, I initially had trouble around 6500 books or items when using winxp (even sp3) but with linux, WOW! my speed and stability of calibre is vastly improved, when importing -add more then 10000 items at a time, I do get a pile of errors from drm to interpretor stuff, but I have not crashed it yet, (with linux)
If I am adding a huge collection I break it to smaller chunks 10 gig seems nice, then run flint across to detect dupes then fdupes maybe depending on my patience,, I like the idea of converting all to epub, then if something happens I can just copy or move out all epubs if I have to rebuild (plus this cleans out any drme'd mistakes which inadvertently got added (some tools for downloading or syncing do funny things from signatures to read write permissions,,, so,,hahah) I really like calibre extremely ambitious, all in one swiss army knife for documents,, and last year or so stability has really spun up nicely,,, (not sure if I would try a windows server again but maybe if I was gonna suggest anything, laguage translation (tough as heck and nearly impossible but this would really round out my thoughts, (and perhaps a option to delete the original files on a import after successful copy? (just the option would be nice ) but great program actually a server in its own right, I would love to know how others test this,, ( I am sure a single file would be quick for testing import and conversion, but yes even huge archives seem stable now to me :-) even if they seem pointless, and if a Aritificial Intelligence was ever developed, hmm some of these archives would be fun to upload or recode? |
01-28-2011, 04:12 PM | #29 | ||||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
Quote:
Quote:
That's part of why I never did any of the "option to do something with duplicates" or "option to control fine-grain of metadata during merging." One has to present the results, provide selection boxes for what to do, provide an option to view the book, or metadata, etc. and I just haven't had time to play with QT enough to learn that stuff. Heck, I tried to just change the custom user recipes into alphabetical order, and couldn't get there, even with Kovid's sample code of alphabetical order built-in recipes. I ended up cheating and making the search produce an alphabetical order result dataset so I didn't have to sort the results in the GUI. |
||||
01-28-2011, 09:51 PM | #30 | ||
Calibre Plugins Developer
Posts: 4,652
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Quote:
Quote:
There are two options to the GUI approach that I can think of. One is to modify the Calibre source code to reuse the library view. This would probably be the best long term option, but as it touches the very core of Calibre it would need pretty close Kovid supervision to gain any liklihood of patch acceptance if done by a Python muppet like myself. Plan B would be to do it in a popup window as part of a GUI plugin. I reckon given enough time I could pretty much cope with writing that, though obviously you would be more "constrained" in functionality by not being on the official library view. The advantage is that you could happily add columns and right-clicks all related to just the task at hand (resolving duplicates) safely encapsulated within a plugin that Kovid doesn't have to worry about The downside is that there are a number of things you take for granted in library view that would likely involve considerable duplication of code to offer. So it might start off pretty crude and basic. But IF the intent is just to list books that are duplicates, allow you to view formats and then merge the results it might be feasible? I would presume you must already be doing what to me is the "hard part" of using the Calibre model/db to identify duplicates for a given book. So presumably rather than iterating over a collection of "adding" books you instead iterate over "all" books. Could be very slow, but I imagine you could do a few things like snapshot the results of the last time you "searched" and work with that until the user "refreshes" the duplicate search again. Again just thinking out loud before prematurely optimizing. If I did all of that and it "worked" well enough, then the next step could be to "loosen the reigns" of that automerge option by adding the three sub-options I proposed and hence allowing the duplicate rows to be created when formats are duplicated. Anyone have any thoughts on this? Bad idea/waste of time/etc? |
||
Tags |
duplicate |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Duplicate Detection | albill | Calibre | 2 | 10-26-2010 02:21 PM |
Help with Chapter detection | ubergeeksov | Calibre | 0 | 09-02-2010 04:56 AM |
Device Detection doom | Alberto Franches | Calibre | 6 | 06-24-2010 05:38 PM |
Device detection? | totanus | ePub | 1 | 12-17-2009 07:05 AM |
Structure detection v5.5 and v6.2 | AlexBell | Calibre | 2 | 07-29-2009 10:11 PM |