01-22-2018, 11:00 AM | #1 |
Enthusiast
Posts: 49
Karma: 102
Join Date: Sep 2010
Location: 52.88504N 06.85904E
Device: PC
|
Vast rewrites when importing a wrongly cased author
When I have an author (e.g., "Asimov, Isaac") in my library, and I import a book that has the same author name but with differing case, e.g. "ASIMOV, Isaac", Calibre interprets this as an author name change and updates the filenames on disk and all existing opf files of books of that author.
This can make importing a folder of books pretty slow. Is there a way to avoid this? Running Calibre 2.78.0 on Fedora 25. |
01-22-2018, 11:30 AM | #2 |
Well trained by Cats
Posts: 30,378
Karma: 58053698
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
2.78 has a cane and a beard (btw Kovid never updates past Major releases.)
Calibre is an RDB, the case change is written into 1 location (Authors table), The Books Table use the Index value from the Authors table. No change for all existing, 1 new entry for the currently being imported book. It is unfortunate, that calibre updates ANY fields case. But, that is the way it works, from almost day one. |
Advert | |
|
01-22-2018, 12:07 PM | #3 |
creator of calibre
Posts: 44,356
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Renaming files is pretty fast, it shouldn't really bottleneck things (it works by hard linking and then deleting the originals, in a transactional manner). And as theducks said, updating the database is of no consequence, since it only needs to update a single row for a case change.
|
01-22-2018, 01:13 PM | #4 |
Enthusiast
Posts: 49
Karma: 102
Join Date: Sep 2010
Location: 52.88504N 06.85904E
Device: PC
|
It is the rewriting of all the opf files that takes the time.
|
01-22-2018, 01:20 PM | #5 |
Well trained by Cats
Posts: 30,378
Karma: 58053698
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
|
Advert | |
|
01-22-2018, 01:43 PM | #6 |
Enthusiast
Posts: 49
Karma: 102
Join Date: Sep 2010
Location: 52.88504N 06.85904E
Device: PC
|
As far as I can judge, it feels synchronous to me.
When in Calibre, I change in the autor column the case of one of the letters of a name of an artist with several books, I hear a lot of disk activity (all the opf files being rewritten) and Calibre is non-responsive until this completes. Do I have one of my settings wrong? |
01-22-2018, 03:46 PM | #7 |
null operator (he/him)
Posts: 20,937
Karma: 27620688
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
@Scirius - On my Win 10 system the OPF writes occur at about 1/sec, with little effect on calibre or general performance. That's been true on 3 different computers running Windows and Mint since I first installed calibre (0.8.??) in early 2012.
The only settings I can think that might, and its a pretty big might, have a bearing on the issue are
In my backup scripts, I have a set of calibredb backup_metadata commands to flush any pending OPF writes before the actual backup task is started. BR |
01-22-2018, 04:09 PM | #8 |
Deviser
Posts: 2,265
Karma: 2090983
Join Date: Aug 2013
Location: Texas
Device: none
|
A good way to avoid that issue is to not import raw books directly into your Main Library, but rather into a Workbench Library. Scrub, standardize and add missing metadata to the books while still in your Workbench Library. Such activity has zero impact on the books already in your Main Library. Finally, move your new books from your Workbench Library to your Main Library.
If your Main Library is actually fairly trashed-out, then rename it Workbench, and create a fresh Main. Move only the clean books from Workbench to Main, and work on the remaining books now in your Workbench. Be sure to Vacuum the metadata.db file, since it grows to an enormous size over time. Do so with Library > Library Maintenance > Check Library, which first vacuums metadata.db and then compares it to your physical books, and vice-versa. If you do not want any .opf files being created while you make many mass-changes in your Workbench Library, you can purge the queue for just the WorkBench Library using the Job Spy plugin, which has a Tool to do just that (highlighted in the attachment). That tool is designed to be used when it is a total waste of cpu and disk to create .opf files at a rate of 1 per second constantly for ever-changing metadata that is actively being worked on in a Workbench Library. When you move your clean books to your Main Library, a final .opf file will be created for that new book in that Library. Obviously, do not purge the queue for your Main Library. Those .opf files allow you to recover from a corrupt metadata.db file. The exception to that rule might be creating a new Custom Column that is easily recomputed, and you don't want 50,000 .opf files being created just to be able to restore a Custom Column that is easily re-populated on-demand (e.g. from a simple, fast job). DaltonST Last edited by DaltonST; 01-22-2018 at 04:20 PM. Reason: typo. |
01-22-2018, 07:11 PM | #9 |
creator of calibre
Posts: 44,356
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
OPF writing is not synchronous. And it is rate limited.
|
01-23-2018, 12:19 AM | #10 |
null operator (he/him)
Posts: 20,937
Karma: 27620688
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
@DaltonST - you are assuming that all metadata is known at the time a book is added to a library, with fiction that's probably true, but with non-fiction less so. Especially with subject tags.
Imagine a passing reference in a biography of famous person A to person B who later becomes also (in)famous. FX: I wonder how many Hollywood bios have a passing mention of HW, someone might have a "Mentioned" column they want to update. Or an obscure financial instrument confected in the 1970s that becomes very relevant 30 years later. Who knew the role that CoCo bonds would play on Wall Street in the 2007/8 crash, or what a CDS was before the Greek debt crisis struck. A full text X1 search for 'Coco Bond', 'Contingent Convertible Bond' or 'Enhanced Capital Note' on one of my libraries spat out a results list of ~450 documents - thanks to one of my favourite plugins I was able to tag them with 'CoCo Bonds' in the blink of an eye BR |
01-23-2018, 12:35 AM | #11 | |
Grand Sorcerer
Posts: 6,344
Karma: 12117215
Join Date: Jun 2009
Location: Madrid, Spain
Device: Kobo Clara/Aura One/Forma,XiaoMI 5, iPad, Huawei MediaPad, YotaPhone 2
|
Quote:
|
|
01-23-2018, 12:52 AM | #12 |
null operator (he/him)
Posts: 20,937
Karma: 27620688
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
|
01-23-2018, 03:12 AM | #13 | |
Grand Sorcerer
Posts: 6,344
Karma: 12117215
Join Date: Jun 2009
Location: Madrid, Spain
Device: Kobo Clara/Aura One/Forma,XiaoMI 5, iPad, Huawei MediaPad, YotaPhone 2
|
Quote:
|
|
01-23-2018, 05:33 AM | #14 |
Enthusiast
Posts: 49
Karma: 102
Join Date: Sep 2010
Location: 52.88504N 06.85904E
Device: PC
|
Using inotify monitoring I found that the opf writing is indeed asynchronous.
The "hard linking and then deleting the originals, in a transactional manner" is the process that causes the disk activity. A rough timing yields 3-4 renames per second (internal SATA, WD disk, Raid 1). In other words, if I have 100 books with author "Unknown", and I change the author of one book to "unknown", Calibre is unresponsive for 20-30 seconds. It is not a big deal, I can live with this. I just wanted to make sure this is normal behaviour and I'm not doing anything wrong. |
01-23-2018, 06:20 AM | #15 |
creator of calibre
Posts: 44,356
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Wow 3-4 renames per second is pretty terrible performance. Are you sure the hardlinking is being done and not a file copy (if the hardlinking fails, calibre falls back to plain file copies).
|
Tags |
author name, case, import books |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Importing and maintaining title, author | AndrewKantor | Calibre | 19 | 08-25-2014 04:34 AM |
Author and author sort fields and importing books | guspasho | Calibre | 3 | 03-13-2013 11:07 PM |
Changing Author Name when Importing from Kindle | africalass | Library Management | 7 | 04-15-2012 02:27 PM |
Importing expression title 'by' author help? | wn1ytw | Library Management | 5 | 05-08-2011 06:00 PM |
Your vast electronic library... is it safe? | Bob Russell | Lounge | 10 | 04-08-2007 07:14 PM |