01-29-2015, 05:51 PM | #1 |
Fully Converged
Posts: 18,171
Karma: 14021202
Join Date: Oct 2002
Location: Switzerland
Device: Too many to count here.
|
Importing MobileRead library into calibre library
Hi there,
We are currently contemplating the migration of the existing MobileRead library to a calibre library. Ultimately we would like to detach the MobileRead library from the forums and present it as its own OPDS-powered website, with the content powered and managed by calibre. One aspect that I am currently struggling with is how we could preserve a link to the original attachment id. In other words, once books have been (batch) imported to calibre, we need to have a marker (a customer column for example) that contains the id of the original attachment that we can refer back to if needed. I see two possibilities right now:
What I am curious about is, are there any better ways of handling this situation? There may be even more information from the original attachment that we would like to extract and embed in the calibre database. For example, the timestamp of the original upload date. Again, this could probably be encoded in the filename if we could somehow tell calibre that this part signifies a custom field named "upload date", or it could later be inserted manually in the calibre database using the book id<->attachment id correlation. Thanks for your help. Alex |
01-29-2015, 08:39 PM | #2 |
null operator (he/him)
Posts: 21,000
Karma: 27620706
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
@Alex - as things stand the only metadata items that can be extracted from file names are those you see below
However - if you add the custom columns you want to the library, and you add the books via/with an opf, then if the opf contains the data for a custom column it will be populated - eg if you created a column called origattach/Original Attachment and you had something like this in the opf files Code:
<meta name="calibre:user_metadata:#origattach" blah-blah, "#value#":"MR attachment id 1234",blah blah> Which begs the question - how would you go about getting such values into an opf file - I'll leave that to others, who have the requisite scripting and regular expression skills. BR |
Advert | |
|
01-29-2015, 08:55 PM | #3 |
Ex-Helpdesk Junkie
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
@BR -- I finally have my excuse, I guess...
@ Alexander, I have for a while wanted to be able to store the original filename in a custom column (for reasons explained HERE). But... I was always too lazy. One way would be to patch calibre, to add in a feature that allows keeping the file contents metadata and overriding it with the filename regex (and also enabling custom columns in the from_filename metadata). The other way would be to add it via a script, possibly via calibre-debug, possibly via bash. I will have to see sometime soon, if I can bash something together. Last edited by eschwartz; 01-29-2015 at 08:58 PM. |
01-29-2015, 09:10 PM | #4 | |
null operator (he/him)
Posts: 21,000
Karma: 27620706
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
BR |
|
01-29-2015, 09:15 PM | #5 |
Ex-Helpdesk Junkie
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
I am thinking of duplicating the regex functionality, actually. I personally would match it all to the #origfile field, but others might prefer differently. Hence -- fluidity.
|
Advert | |
|
01-29-2015, 09:53 PM | #6 | ||
null operator (he/him)
Posts: 21,000
Karma: 27620706
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
Quote:
BR |
||
01-29-2015, 10:11 PM | #7 |
null operator (he/him)
Posts: 21,000
Karma: 27620706
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
On the other hand once the attachment id is in the database, producing a csv with calibre book number, attachment id is easy enough. As Alex alluded, it could be used to extract data from the attachment id file for use in calibredb set_custom commands.
BR |
01-29-2015, 10:30 PM | #8 |
Well trained by Cats
Posts: 30,446
Karma: 58055868
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
What about adding extra, Unique pattern, tags to the OPF, then use S&R to move them to the extra fields: MRID123456
|
01-29-2015, 10:32 PM | #9 |
Grand Sorcerer
Posts: 24,905
Karma: 47303822
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
As calibre can read an OPF file that is supplied with the book, maybe the way to go is to generate that. I'm pretty sure this doesn't need the manifest and spine, so it would be relatively simple. The OPF can include custom columns, so adding an original file name that way would work.
The other thing is that it sounds like the attachment id could be used as an identifier. Storing it as an identifier and creating a MobileRead metadata source plugin would allow easy navigation back to the source. And the metadata source plugin doesn't have to have search capabilities. It just needs to translate the identifier into a display string and a URL. |
01-29-2015, 11:00 PM | #10 |
creator of calibre
Posts: 44,546
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
If I were you, I'd write a simple script to do it with calibredb, like this, in bash like code:
Code:
for filename, attachment_id in filenames: book_id=$(calibredb add filename | grep 'Added book ids:' | cut -d: -f2 | cut -d' ' -f2) calibredb set_metadata book_id --field identifiers:mobileread:attachment_id calibredb set_metadata --list-fields to get their names. You can even create the custom columns using calibredb, if you dont want to involve the GUI at all. |
01-30-2015, 08:25 AM | #11 |
Fully Converged
Posts: 18,171
Karma: 14021202
Join Date: Oct 2002
Location: Switzerland
Device: Too many to count here.
|
Thanks! Following your tips, I did some test on the following book:
https://www.mobileread.com/forums/sho...d.php?t=255021 Code:
$ calibredb set_metadata --list-fields Title Field name Attachment ID #attachmentid Uploader #uploader Author Sort author_sort ... $ calibredb add ~/mrlibrary/AlexBell/255021/134208_What\ Diantha\ Did\ -\ Charlotte\ Perkins\ Gilman.mobi Backing up metadata Added book ids: 6 Notifying calibre of the change $ calibredb set_metadata 6 --field \#uploader:"AlexBell" Title : What Diantha Did Title sort : What Diantha Did Author(s) : Charlotte Perkins Gilman [Gilman, Charlotte Perkins] Publisher : Bellware Tags : humanism, servant question, romance Languages : eng Timestamp : 2015-01-30T09:51:46+00:00 Published : 2015-01-24T13:00:00+00:00 Identifiers : mobi-asin:cec96dd0-e53e-4f26-b392-166f8c160ce4 Comments : <p class="description">'What Diantha Did' was serialised in 'The Forerunner' from November 1909 to October 1910, and published separately in 1910. The main themes are 'The servant question' and the grief caused by having to do work in which one is not interested; set against a background of future female in-laws who would be ashamed to earn their own living, and a fiance who believes that 'No man - that is a man - would marry a woman and let her run a business.'</p> Uploader : AlexBell Backing up metadata Notifying calibre of the change $ calibredb set_metadata 6 --field \#attachmentid:"134208" Title : What Diantha Did Title sort : What Diantha Did Author(s) : Charlotte Perkins Gilman [Gilman, Charlotte Perkins] Publisher : Bellware Tags : humanism, servant question, romance Languages : eng Timestamp : 2015-01-30T09:51:46+00:00 Published : 2015-01-24T13:00:00+00:00 Identifiers : mobi-asin:cec96dd0-e53e-4f26-b392-166f8c160ce4 Comments : <p class="description">'What Diantha Did' was serialised in 'The Forerunner' from November 1909 to October 1910, and published separately in 1910. The main themes are 'The servant question' and the grief caused by having to do work in which one is not interested; set against a background of future female in-laws who would be ashamed to earn their own living, and a fiance who believes that 'No man - that is a man - would marry a woman and let her run a business.'</p> Uploader : AlexBell Attachment ID : 134208 Backing up metadata Notifying calibre of the change Also changing the identifier would work as suggested: Code:
$ calibredb set_metadata 6 --field identifiers:mobileread:134208 Title : What Diantha Did Title sort : What Diantha Did Author(s) : Charlotte Perkins Gilman [Gilman, Charlotte Perkins] Publisher : Bellware Tags : humanism, servant question, romance Languages : eng Timestamp : 2015-01-30T11:42:14+00:00 Published : 2015-01-24T13:00:00+00:00 Identifiers : mobileread:134208 Comments : <p class="description">'What Diantha Did' was serialised in 'The Forerunner' from November 1909 to October 1910, and published separately in 1910. The main themes are 'The servant question' and the grief caused by having to do work in which one is not interested; set against a background of future female in-laws who would be ashamed to earn their own living, and a fiance who believes that 'No man - that is a man - would marry a woman and let her run a business.'</p> Uploader : AlexBell Attachment ID : 134208 Backing up metadata Notifying calibre of the change As Kovid suggested using a small script to parse the filenames and to embed the info via calibredb set_metadata during the import process seems like an easy solution. @BetterRed, @davidfor, I tested importing the extra metadata via an opf file (which would be another easy solution as I could include all the extra data in the opf while extracting the attachments from the MR database), but it appears that calibre then erases part of the existing metadata. Code:
$ calibredb add ~/mrlibrary/AlexBell/255021/134208_What\ Diantha\ Did\ -\ Charlotte\ Perkins\ Gilman.mobi Backing up metadata Added book ids: 8 Notifying calibre of the change $ calibredb show_metadata 8 Title : What Diantha Did Title sort : What Diantha Did Author(s) : Charlotte Perkins Gilman [Gilman, Charlotte Perkins] Publisher : Bellware Tags : romance, humanism, servant question Languages : eng Timestamp : 2015-01-30T12:18:08+00:00 Published : 2015-01-24T13:00:00+00:00 Identifiers : mobi-asin:cec96dd0-e53e-4f26-b392-166f8c160ce4 Comments : <p class="description">'What Diantha Did' was serialised in 'The Forerunner' from November 1909 to October 1910, and published separately in 1910. The main themes are 'The servant question' and the grief caused by having to do work in which one is not interested; set against a background of future female in-laws who would be ashamed to earn their own living, and a fiance who believes that 'No man - that is a man - would marry a woman and let her run a business.'</p> $ cat ~/mrlibrary/AlexBell/255021/metadata.opf <?xml version='1.0' encoding='utf-8'?> <package xmlns="http://www.idpf.org/2007/opf" unique-identifier="uuid_id" version="2.0"> <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf"> <meta name="calibre:user_metadata:#uploader" content="{"kind": "field", "#value#": "alex", "column": "value", "colnum": 1, "is_multiple": null, "is_multiple2": {}, "search_terms": ["#uploader"], "is_csp": false, "is_category": true, "table": "custom_column_1", "is_custom": true, "is_editable": true, "rec_index": 22, "link_column": "value", "label": "uploader", "#extra#": null, "datatype": "text", "name": "Uploader", "category_sort": "value", "display": {"use_decorations": 0}}"/> </metadata> </package> $ calibredb set_metadata 8 ~/mrlibrary/AlexBell/255021/metadata.opf Title : Unknown Title sort : Unknown Author(s) : Unknown Publisher : Bellware Tags : romance, humanism, servant question Languages : eng Timestamp : 2015-01-30T12:18:08+00:00 Published : 2015-01-24T13:00:00+00:00 Identifiers : mobi-asin:cec96dd0-e53e-4f26-b392-166f8c160ce4 Comments : <p class="description">'What Diantha Did' was serialised in 'The Forerunner' from November 1909 to October 1910, and published separately in 1910. The main themes are 'The servant question' and the grief caused by having to do work in which one is not interested; set against a background of future female in-laws who would be ashamed to earn their own living, and a fiance who believes that 'No man - that is a man - would marry a woman and let her run a business.'</p> Uploader : alex Backing up metadata Notifying calibre of the change @davidfor, I love the idea of a metadata source plugin that could translate the attachment id to a relevant URL! Would that be very difficult to write? |
01-30-2015, 09:01 AM | #12 |
creator of calibre
Posts: 44,546
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
No there is no perfomance advantage to identifiers vs custom columns. The main advantage is that in the calibre GUI identifiers can become clickable links. Note that you dont have to bother with writing a metadata plugin for MR. Simply add the full URL, like this
identifiers:url:http://whatever that will automatically become a link in the UI |
01-30-2015, 02:51 PM | #13 |
frumious Bandersnatch
Posts: 7,536
Karma: 19000001
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
But we would need a single book record to have several identifiers (of the same kind), one for each format. With a custom column we can store the different ids as a list, can we do this with standard identifiers?
|
01-30-2015, 03:14 PM | #14 |
Ex-Helpdesk Junkie
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
See the OverDrive Link plugin, which does the same thing for OverDrive library links.
{identifiers:select(odid)} == Code:
UUID****-****-****-****-************@library1.lib.overdrive.com&\ UUID****-****-****-****-************@library2.lib.overdrive.com&\ UUID****-****-****-****-************@library3.lib.overdrive.com |
01-30-2015, 04:15 PM | #15 | |
null operator (he/him)
Posts: 21,000
Karma: 27620706
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
That approach required me to remove the .opf file masquerading as a format file. But no matter, calibre invariably has more than one way to get the desired result - sometimes I wonder if too many ways BR |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Importing one library into another one | chemi | Library Management | 4 | 01-02-2013 11:24 AM |
Howto unzip/unrar some books while importing the calibre library ? | maxarsys | Library Management | 5 | 11-08-2012 09:18 AM |
Helping importing to Calibre library please | himitsuhieki | Library Management | 3 | 08-18-2011 11:10 AM |
Importing the Calibre library into the Sony Reader Library | Fortissimo | Reading and Management | 0 | 02-02-2011 03:18 PM |
Kindle and Calibre user with problem importing large library into Calibre | pleabargain | Calibre | 1 | 12-07-2010 11:19 AM |