|
|
Thread Tools | Search this Thread |
08-08-2023, 10:04 AM | #61 | |
Grand Sorcerer
Posts: 28,042
Karma: 199464182
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
I'm attaching a simple test epub that demonstrates that both the "bm4" and "bm4-s01" ids get properly removed by the plugin when they are both truly unused Last edited by DiapDealer; 08-08-2023 at 10:07 AM. |
|
08-08-2023, 10:06 AM | #62 |
Grand Sorcerer
Posts: 28,042
Karma: 199464182
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Whoops! May have spoken too soon. I can reproduce your results.
Last edited by DiapDealer; 08-08-2023 at 10:33 AM. |
Advert | |
|
08-08-2023, 10:33 AM | #63 |
Grand Sorcerer
Posts: 28,042
Karma: 199464182
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Simple fix. There's no reason to join the python list of potential ids into a string before checking to see if an id is IN said list. In fact doing so causes the problem. bm4 will ALWAYS be IN a string that contains bm4_s01. The IN comparator will work on a python list without concatenating the list's elements into a string first. And will treat all the lists' elements as individual.
I'm attaching a test plugin with an updated cutils.py file (lines 146-149). The plugin dev can do with it what they will. Last edited by DiapDealer; 08-08-2023 at 10:32 PM. |
08-08-2023, 01:13 PM | #64 |
Groupie
Posts: 152
Karma: 474196
Join Date: Jan 2011
Location: Ottawa
Device: Kobo Aura H2O
|
Hi, your guerilla update works well - I applied it to my previously-cleaned epub and it removed an additional three unused IDs, without removing used ones (i.e. "bm4-s01" remained).
Thanks for your help, DiapDealer, and thanks again for the great plugin, Slowsmile! |
08-08-2023, 09:09 PM | #65 |
Evangelist
Posts: 440
Karma: 77256
Join Date: Sep 2011
Device: none
|
i mentioned an issue a while back that as far as i know is not resolved. if an id for example “bm4” is listed in multiple files or every file yet is used once or only in a few, the unused instances are not removed. in my case, some publishers use some sequential numbering for paragraphs such that after using the plugin, an epub can have thousands of remaining unused ids.
|
Advert | |
|
08-09-2023, 02:08 AM | #66 | |
Bibliophagist
Posts: 40,579
Karma: 157444380
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
|
|
08-09-2023, 08:39 PM | #67 |
Evangelist
Posts: 440
Karma: 77256
Join Date: Sep 2011
Device: none
|
elsevier epubs do that. penguin epubs also often add ids to all paragraphs through i haven’t checked those to see if the numbering starts from some same beginning in those.
yes i can fix them with sed, adding the file name as prefix to all ids and then remove, but something easier someday would be nice. |
08-10-2023, 11:21 AM | #68 |
Sigil Developer
Posts: 8,160
Karma: 5450818
Join Date: Nov 2009
Device: many
|
FWIW ...
In epub2 the OPF guide section can include ids (fragments in the url) which point into xhtml files that should not be removed. And under epub2 the adobe pagemap.xml can and typically do point to ids in xhtml files that will break if removed. And, technically the same id can be re-used as long as they are in different xhtml files, so determining if used or not should really keep track of filenames. And, technically under epub3, EPUB Canonical Fragment Identifiers (cfis) can use ids to point to specific spots in xhtml files for either internal or external cfi links, bookmarks, annotation points that may exist outside the epub itself (from cloud based web cfi links). And technically, under epub3 that supports javascript, those ids could be referenced for dynamic searching or popup footnotes or by the js code itself. So you really need you take care of all of the potential use points or you can never know if an id is used or not. Therefore, without a really good reason, removing ids is probably not the best idea ... unless you truly know or control the epubs full production. Even numbered paragraph ids are useful for reflowable epub locations used in printed academic citations and are more correct than page numbers in many cases. The overhead of parsing even a thousand ids in a single xhtml file is minuscule compared to the time takes the parser to parse and create the initial DOM tree itself. So removing them is rarely or ever necessary from a performance perspective. My 2 cents ... Last edited by KevinH; 08-10-2023 at 12:50 PM. |
08-10-2023, 04:43 PM | #69 |
Evangelist
Posts: 440
Karma: 77256
Join Date: Sep 2011
Device: none
|
Such Elsevier epubs can perhaps have 15000+ ids. yes i stopped using the plugin. all headers up to h6 can have ids and I generally exclude anything past a 3rd level header for each chapter from a remade toc. as i may want to readd those in the future i no longer use the plugin.
an option to not remove ids from headers would be nice but maybe that is not often used by others. other times i may try to add bibliographic links to academic titles, since i may want to check references, by adding regex of last name to each paragraph and then links to such. not exactly accurate but good enough yet in such cases there ends up being duplicate ids. a different issue i’ll need to figure out an easy way to remove such. maybe applescript with bbedit. |
08-10-2023, 06:54 PM | #70 |
Resident Curmudgeon
Posts: 76,474
Karma: 136564766
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Can someone please create a version of this plugin for calibre? Thanks.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Removing plugin in Calibre removes it from all instances of Calibre | oblox | Calibre | 9 | 09-09-2016 06:39 AM |
iPad Possibility to sync bookmarks through side loaded ePubs (Any iOS software?) | andsoitgoes | Apple Devices | 12 | 04-13-2012 08:38 PM |
Modify bookmarks in epubs | silentguy | Development | 3 | 08-03-2011 06:37 PM |
Sideloaded ePubs, chapters and bookmarks | Steven Lyle Jordan | Nook Color & Nook Tablet | 10 | 02-05-2011 07:35 PM |
Problem (bug) with bookmarks in PDF plugin (2.0 RC2) | luite | iRex | 1 | 07-12-2010 03:36 AM |