01-06-2023, 12:56 PM | #1 |
Deviser
Posts: 2,265
Karma: 2090983
Join Date: Aug 2013
Location: Texas
Device: none
|
[GUI Plugin] Extract People & Other Metadata
[GUI Plugin] Extract People & Other Metadata
Summary: EPOM uses your personally-constructed 'Python Regular Expressions' to search the text of an ebook and extract metadata from it. Documentation: The EPOM 'User Guide' is comprised of all of its ToolTips plus any images and related files attached below. Requires Minimum Calibre Version: 6.11.0 Other Useful Calibre Plugins to Consider:
Version History: Spoiler:
Last edited by DaltonST; 01-07-2023 at 12:18 PM. |
01-06-2023, 12:57 PM | #2 |
Deviser
Posts: 2,265
Karma: 2090983
Join Date: Aug 2013
Location: Texas
Device: none
|
For Future Use
For Future Use
|
Advert | |
|
05-23-2023, 09:02 AM | #3 |
light mode user
Posts: 66
Karma: 16268
Join Date: May 2023
Location: New England
Device: I use the Calibre ebook-viewer on macos and Apple Books on ios.
|
Great plugin!
I installed Extract People & Other Metadata through calibre user plugins and used it to extract Ao3 links from the epubs I downloaded, though I'm not sure if I did it the best way. (I have been intentionally avoiding fanficfare) Thought it would be good to share if anyone else needed my janky solution.
I couldn't figure out how to remove a set string at the beginning and end of the text so here's the solution I made. I made a custom column named link. I set up three #link custom column extractors: To identify work links I used the keyword "Posted originally on the.+$" I wanted to make I got the right link, sometimes there are other work links in the file. I filtered the result to just the work id number and s/ "s\/\d+" Then I in tweaks I added Code:
REMOVE_CHARACTERS=.<>[]() CALIBRE_TEMPLATE_LANGUAGE_BUILTIN=re(1, 'Posted originally on the Archive of Our Own at ', '') CALIBRE_TEMPLATE_LANGUAGE_BUILTIN=re(1, 'Posted originally on the Archive of Our Ownhttp://archiveofourownorg/ at ', '') CALIBRE_TEMPLATE_LANGUAGE_BUILTIN=re(1, 's/','http://archiveofourown.org/works/') To identify series links it was easier just assume there was only one and use the keyword "http:\/\/archiveofourown.org\/series\/.+$" There was no period after to remove, but I filtered for the properly formatted url just in case. I only turned this extractor on for works I combined with epubmerge. For fanfiction.net works that I downloaded with ficlab I used the keyword "based on content retrieved from.+$" I filtered for the story id and s/ "s\/\d+" I replaced s/ with the website url after removing extraneous characters. Code:
REMOVE_CHARACTERS=. CALIBRE_TEMPLATE_LANGUAGE_BUILTIN=re(1, 'based on content retrieved from ', '') CALIBRE_TEMPLATE_LANGUAGE_BUILTIN=re(1, 's/', 'https://www.fanfiction.net/s/') |
05-23-2023, 05:57 PM | #4 |
light mode user
Posts: 66
Karma: 16268
Join Date: May 2023
Location: New England
Device: I use the Calibre ebook-viewer on macos and Apple Books on ios.
|
I now realize this can be done through fanficfare or mass search and replace to find urls in books grab metadata, and it correctly make the url an identifier... oops.
https://www.mobileread.com/forums/sh...ntifiers+links https://www.mobileread.com/forums/sh...nk#post4320727 https://www.mobileread.com/forums/sh...ntifiers+links |
09-14-2023, 11:39 PM | #5 |
want to learn what I want
Posts: 1,280
Karma: 6433040
Join Date: Sep 2020
Device: Calibre E-book viewer
|
I wish I had tried this amazing plugin before using the Job Spy similar tool!
|
Advert | |
|
01-06-2024, 01:09 AM | #6 |
Member
Posts: 11
Karma: 10
Join Date: Jul 2021
Device: Windows 11
|
This is an awesome tool. Is there a way to bulk extract info from multiple ebooks, instead of doing one at a time?
If it's necessary to do 1 at a time, is there a way to avoid the result screen popping up (where it shows the updated book, and you need to click back to the home screen, then do another search to get to where you originally was)? Thank you! |
03-11-2024, 03:26 PM | #7 |
want to learn what I want
Posts: 1,280
Karma: 6433040
Join Date: Sep 2020
Device: Calibre E-book viewer
|
edit: it's ok now
Last edited by Comfy.n; 03-11-2024 at 04:31 PM. |
06-24-2024, 06:53 AM | #8 |
Member
Posts: 11
Karma: 10
Join Date: Jul 2021
Device: Windows 11
|
@Comfy.n: Awesome, tysm!! This has been super useful. A small issue with this is that when I do multiple books without using FTS index, the result pops up 1 by 1. Consequentially, only the last extracted book get marked. This makes it a bit inconvenient when going back and forth between different search pages. This aside, the function definitely works.
Question: Is there anyway I can use this to extract the first 3-4 paragraphs from an epub? Background info: I'm trying to generate a cover from the first page of an epub. Calibre's default "set cover from book" doesn't seem to work too well, so my plan is to 1. Use EPOM to extract the first 3-4 paragraphs from the epub (into custom column #first_pars) 2. Use Generate Cover to create a cover with text from #first_pars |
06-24-2024, 08:44 PM | #9 |
want to learn what I want
Posts: 1,280
Karma: 6433040
Join Date: Sep 2020
Device: Calibre E-book viewer
|
Well I've used EPOM just for extracting translators and original titles, using the FTS index. That was not too challenging. In your case it would be better if Dalton could help, but it's been almost a year he's away from MR, unfortunately. Or maybe some regex power user.
I don't see an easy way to detect the exact beginning of the text, given the ebooks' structure variations, however you could try something like this - set the tweak MAXIMUM_LENGTH_TO_ACCEPT= to a large value - then populate the #first-pars column using regex to match, say, the first 1000 chars in the book |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[GUI Plugin] Extract ISBN | kiwidude | Plugins | 545 | 09-25-2024 04:02 AM |
[GUI Plugin] ePub Extended Metadata | un_pogaz | Plugins | 20 | 08-10-2024 06:48 PM |
[GUI plugin] Extract tables of contents | Phssthpok | Plugins | 3 | 02-11-2024 08:47 AM |
[GUI Plugin] Zotero Metadata Importer | DaltonST | Plugins | 291 | 08-07-2023 01:38 PM |
[GUI Plugin] Clean Metadata | WS64 | Plugins | 28 | 01-06-2022 09:09 PM |