04-20-2019, 01:24 AM | #1 |
Enthusiast
Posts: 36
Karma: 10
Join Date: Feb 2017
Device: Kobo Aura H2O
|
Best strategy for metadata management for Kobo using Calibre?
I'm looking to automate the metadata handling for books on my Kobo H2O to make searching and organizing my library easier. I think the easiest way to do this is going to be to insert metadata into some field into the epub, import it using Calibre, and then use Calibre to set up collections and put the metadata into another field like subtitle. Setting up a plugboard template to push data into subtitle and to have Calibre create collections based on a column are straightforward thanks to the existing work some great people have done on the Kobo Calibre plugin. I have some questions about automating the earlier steps in the process.
What columns will Calibre automatically populate for me from metadata in an epub? Is there anything beyond Tags? I know that for instance epubs downloaded from AO3 will show up with entries in the Tags column in Calibre. I'd like to be able to add metadata to another custom column, though, because I expect I'm going to have to do a lot of filtering to get the set of possible tags down to something small enough so that the resulting collections won't overwhelm the Kobo. It would be nice if I could leave the existing AO3 metadata field intact so that I can use it later if I need to regenerate tags. I also have epubs I'm generating myself from my own tools and from other sources that I want to tag. Is there a good way of adding metadata to cbzs? What kind of problems am I likely to encounter when automatically converting tags (which may contain Unicode, for instance) to collections? Just for laughs, I tried earlier with the full set of AO3 tags in my library and confirmed that it will crash trying to load the DB, probably because it runs out of memory. Has anyone played around with how many collections it's possible to put into the Kobo DB before the UI starts to become too slow or crashes? My goal is to automate as much of this pipeline as possible and to avoid making it too slow. I know there's a calibre-db CLI,but writing code that calls a CLI is going to involve a lot of indirection that will make the script slower and harder to write. Is there a programmatic interface I should be looking at? Has anyone else tried to do something like this and open-sourced tools I don't know about? Are there pitfalls I should be aware of? This is a follow-up to a previous question I asked about what metadata Kobo's software will read, where davidfor kindly established that it won't pick up useful metadata for sideloaded epubs. |
04-20-2019, 01:38 AM | #2 |
hopeless n00b
Posts: 5,110
Karma: 19597086
Join Date: Jan 2009
Location: in the middle of nowhere
Device: PW4, PW3, Libra H2O, iPad 10.5, iPad 11, iPad 12.9
|
If you know python, you can always write your own plugin.
https://manual.calibre-ebook.com/creating_plugins.html I'm not quite up for a crash course in programming and python so I just use a combination of Job Spy plugin (scrub tags) and Calibre Bulk Metadata Editor search & replace (regex mode). I have a #freeform custom column to which I use to hold a copy of the original tags from the AO3 EPUB. Note, I believe you can also use FanFicFare to download AO3 metadata and do the filtering via its personal.ini file. Calibre is also capable of importing custom column data from the EPUB's OPF file (assuming custom column exists in the library and uses correct type and format) so another approach is to edit the OPF files inside the EPUB to add custom columns populated with your data in case you're writing in a different language. Last edited by ilovejedd; 04-20-2019 at 01:41 AM. |
Advert | |
|
04-20-2019, 04:03 PM | #3 | |||
Enthusiast
Posts: 36
Karma: 10
Join Date: Feb 2017
Device: Kobo Aura H2O
|
Quote:
Quote:
Quote:
|
|||
04-20-2019, 05:56 PM | #4 | |
hopeless n00b
Posts: 5,110
Karma: 19597086
Join Date: Jan 2009
Location: in the middle of nowhere
Device: PW4, PW3, Libra H2O, iPad 10.5, iPad 11, iPad 12.9
|
Quote:
I haven't tried it with columns built from other columns but that shouldn't really matter. |
|
04-20-2019, 10:18 PM | #5 | |
Grand Sorcerer
Posts: 24,905
Karma: 47303822
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
Quote:
Extending FFF to cover other sites isn't that hard. At least, it isn't if the site is well structured. If you are downloading from sites such as AO3, it is worth looking at. It is intended for story posting sites. It isn't for stores or more traditionally published books. For those, there are a myriad of metadata source plugins. For collections on a Kobo device, any column, or columns, can be used. I've always found tags to be useless as there are to many. But, you can populate another column by some method, or use a "column built from other columns" with an appropriate template. |
|
Advert | |
|
04-23-2019, 01:09 AM | #6 | |
Enthusiast
Posts: 36
Karma: 10
Join Date: Feb 2017
Device: Kobo Aura H2O
|
Quote:
Code:
<meta name="foo" content="bar"/> Code:
<meta name="calibre:user_metadata:#test" content="{"display": {"description": "", "is_names": true}, "colnum": 1, "column": "value", "#extra#": null, "table": "custom_column_1", "is_category": true, "rec_index": 22, "is_csp": false, "#value#": ["foo, bar"], "category_sort": "value", "is_multiple": "|", "is_multiple2": {"cache_to_list": "|", "ui_to_list": "&", "list_to_ui": " & "}, "datatype": "text", "name": "Test", "is_editable": true, "kind": "field", "label": "test", "search_terms": ["#test"], "link_column": "value", "is_custom": true}"/> |
|
04-23-2019, 02:41 AM | #7 | |
hopeless n00b
Posts: 5,110
Karma: 19597086
Join Date: Jan 2009
Location: in the middle of nowhere
Device: PW4, PW3, Libra H2O, iPad 10.5, iPad 11, iPad 12.9
|
Quote:
Note, I think the comma may be a hard coded special character for tag-like custom columns (even if it's set to contain names) so avoid using it in the value. Assuming "foo, bar" is meant to be a single tag in custom column #test, just try removing the comma. Assuming "foo, bar" is supposed to be two separate tags in custom column #test, that should be "foo", "bar". |
|
04-23-2019, 10:32 PM | #8 | ||
Enthusiast
Posts: 36
Karma: 10
Join Date: Feb 2017
Device: Kobo Aura H2O
|
Quote:
Does anyone know how this works? Quote:
|
||
04-24-2019, 12:03 AM | #9 | ||
hopeless n00b
Posts: 5,110
Karma: 19597086
Join Date: Jan 2009
Location: in the middle of nowhere
Device: PW4, PW3, Libra H2O, iPad 10.5, iPad 11, iPad 12.9
|
Quote:
Alternately, if you want to avoid EPUB surgery, you can keep the modified OPF file alongside the EPUB using the same filename as the EPUB (or I think even metadata.opf if one title per folder). I've noticed Calibre always respected external OPF files during import if they exist. Quote:
Code:
"#value#": ["foo", "bar"] |
||
04-24-2019, 12:34 AM | #10 |
Enthusiast
Posts: 36
Karma: 10
Join Date: Feb 2017
Device: Kobo Aura H2O
|
For my personal AO3 library, after I stripped all the rating, warning, category, and character tags, there are more than 6000 distinct tags. Canonicalizing them reduces this to more than 5000. Dropping tags that only appear once reduces this to 1277 tags, which is still probably too many collections for a Kobo to handle, I'm assuming? I'm going to look at how much I lose by starting to less-frequent tags, though I'm curious if anyone has other ideas for reducing the number of tags with minimal loss of information.
|
04-24-2019, 12:45 AM | #11 | |
Enthusiast
Posts: 36
Karma: 10
Join Date: Feb 2017
Device: Kobo Aura H2O
|
Quote:
http://www.idpf.org/epub/301/spec/ep...tainer-zipreqs I tried rebuilding my test epubs (with the content.opf edits) with the mimetype file first and this fixed the issue. Thanks! I see. In this case, it's me misinterpreting what the Calibre GUI is doing. Thanks for the pointer, now I think I can construct the custom Calibre meta tags correctly. |
|
04-24-2019, 12:47 AM | #12 | ||
Grand Sorcerer
Posts: 24,905
Karma: 47303822
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
Quote:
Calibre reads the metadata when the book is first added (assuming appropriate options are set). It will read the custom columns from the OPF in the epub if matching columns exist in the library. Otherwise, they will be ignore. You can also read metadata from the file inside calibre from the Edit Metadata screen. Select the format you want to extract the metadata from in the top right corner and press the appropriate button. Calibre doesn't automatically update the book when you change the metadata in calibre. It will update the metadata.opf file in the directory with the book. The actual book gets updated when you send it outside the library (save-to-disk and send-to-device - these do not update the copy of the book in the library), edit the book, convert it, use Polish book or use the Embed metadata function. Basically, you have to do something to have the book updated. If you replace the copy of a book in the library, calibre doesn't change the metadata either in it's database or the file. This includes dropping the new version on the details pain of the exiting book, or adding the book and having duplicates merged automatically. If you want to update the metadata from the new version of the book, you need to do this manually as above. I would expect the same if you use the command-line to replace the book. And as @ilovejedd said, if there is an external OPF file when you are adding a new book, calibre will use it for the metadata over the metadata in the book. Changing that is slightly simpler than changing the book. Quote:
Code:
"#value#": ["foo", "bar"] |
||
04-24-2019, 01:06 AM | #13 | |
hopeless n00b
Posts: 5,110
Karma: 19597086
Join Date: Jan 2009
Location: in the middle of nowhere
Device: PW4, PW3, Libra H2O, iPad 10.5, iPad 11, iPad 12.9
|
Quote:
|
|
04-24-2019, 05:40 AM | #14 | |
Grand Sorcerer
Posts: 24,905
Karma: 47303822
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
Quote:
Performance is a separate issue. I haven't loaded a lot of collections for a while (I'm at four pages at the moment). But, the performance tends to be related as much to the number of books in the collections, as to the number of collections. By that I mean adding a collection with 500 books in it, will have a bigger impact than adding ten collections with 10 books in each. This is with recent firmware. There was major fix to collection management some time last year. There was a point where I had about 40 pages of collections and it took over a minute to open the collections list. And I added a single collection with about 1000 books on it and it went to over 2 minutes to open the list. I supplied this to Kobo and they were able to find the issue. The latter came down to about 10 seconds. |
|
04-29-2019, 03:02 AM | #15 |
Enthusiast
Posts: 36
Karma: 10
Join Date: Feb 2017
Device: Kobo Aura H2O
|
Yeah, 1277 is definitely too many . Actually, probably several hundred is too many, so I'm going to have to figure out some heuristics to cut down on the tags more. The note about large collections being slow is very good to know---I went back and chopped some tags that swept up too many books, so my largest collection is now ~150 books. I need to refine further, since my non-AO3 tags generate another 300+ (though some of those tags can be merged by normalizing capitalization).
At the moment, I'm using a very crude approach: I've created a custom column, and I have two scripts that literally invoke calibredb repeatedly. The AO3 script handles AO3 tags directly (bypassing the epub) and only calls calibredb to look up the calibre id and then to run calibredb set_metadata. The non-AO3 script looks up the Tags column and then runs calibredb set_metadata to add some of them to the custom column. This is slow and awkward, so I'd really like to write a plugin to do it instead. I spent some time looking through the API documentation for plugins, but I didn't see any discussion of how to write a plugin that does the kinds of things I want here. Does anyone know of a good example plugin that could get me started or some other resource that would help me? |
Tags |
calibre, kobo, kobo calibre database, metadata |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Management of annotations with Kobo and Calibre | Ziggi | Plugins | 2 | 11-13-2015 02:10 PM |
Metadata Management on Android not working? | TheStretchedElf | Devices | 0 | 08-08-2012 10:10 AM |
Kobo to Calibre Metadata Issue | joelarthurs | Calibre | 0 | 01-21-2012 04:10 PM |
kobo management strategy | baronrus | Kobo Reader | 1 | 03-25-2011 05:34 PM |
Automatic Metadata Management | gxxshock | Calibre | 2 | 12-28-2008 01:48 PM |