10-21-2020, 03:40 AM | #1 |
Enthusiast
Posts: 26
Karma: 38
Join Date: Nov 2019
Location: Paris, France
Device: none
|
Regex-function to merge endnotes files in editor
Do you have old epubs with one xhtml page per endnote? It is from version 3 of calibre that Kovid proposed a checkbox (in the docx configuration) preventing this separation between the endnotes during a docx -> epub conversion by calibre. An epub -> epub conversion can't change it.
The interface of calibre makes it easy to manually group the notes into a single page, the longest being to determine which files are affected... I try to do this with a regex-function to run in automatic mode. It runs without an error message. I have two issues: 1) The editor interface is not updated at the end of the regex-function. If I save a copy of the epub, and examine that file, it shows that the merge was successful. Without really knowing whether to look this side, I tried using apply_container_update_to_gui, but I was unsuccessful. How to update the interface? 2) The function is executed in the "spine order". But how to indicate that we want to start with the 1st file of the book regardless of the current file, in order to group the notes in the 1st note file. A test file is joined. Spoiler:
Last edited by EbookMakers; 11-23-2020 at 09:30 PM. |
10-21-2020, 05:27 AM | #2 |
creator of calibre
Posts: 44,393
Karma: 23798586
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Just ctrl-click the files in the files browser in the editor, then right lick and choose merge.
|
Advert | |
|
10-21-2020, 07:24 AM | #3 |
Enthusiast
Posts: 26
Karma: 38
Join Date: Nov 2019
Location: Paris, France
Device: none
|
Thank you for your answer. I Know, this is the reason why I started writing : "The interface of calibre makes it easy to manually group the notes into a single page, the longest being to determine which files are affected...". And the regex without the function can help me find which files are affected.
Will it make the function do what I hope? Last edited by EbookMakers; 10-21-2020 at 07:29 AM. |
10-21-2020, 08:08 AM | #4 |
creator of calibre
Posts: 44,393
Karma: 23798586
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
To refresh the ui use the boss object
from calibre.gui2.tweak_book.boss import get_boss get_boss().apply_container_update_to_gui() |
10-21-2020, 09:11 AM | #5 |
Enthusiast
Posts: 26
Karma: 38
Join Date: Nov 2019
Location: Paris, France
Device: none
|
Thanks a lot, Kovid. I'll try.
Last edited by EbookMakers; 10-21-2020 at 10:15 AM. |
Advert | |
|
10-21-2020, 10:15 AM | #6 |
Enthusiast
Posts: 26
Karma: 38
Join Date: Nov 2019
Location: Paris, France
Device: none
|
By modifying the function as you indicated, in the editor, the files which are not the "merge master" are deleted. But the "merge master" file still contains only one note.
However, the function is executed correctly: on a "commit as" we find all the desired modifications in the saved file. Is there also a solution to force the search to start with the 1st file and not with the current file ? Last edited by EbookMakers; 10-21-2020 at 10:19 AM. |
11-23-2020, 09:05 PM | #7 |
Enthusiast
Posts: 26
Karma: 38
Join Date: Nov 2019
Location: Paris, France
Device: none
|
A test epub is attached to the lead post of this topic. We can think of two solutions. A solution for well-behaved people like you and even me, and a solution for rascals. They use the same regex:
Code:
<body[^\n]*\n\K\s*(<h[^>]*>[^<]*</h\d>)?\s*<dl[^>]*>\s*<dt[^>]*>\[<a\b(?:(?!</dl).)+</dl>\s*(?=</body>) On an epub respecting the html syntax resulting from a docx -> epub conversion, the regex selects: - the note in files containing one note and only one according to the syntax of the conversion, ensuring that the note is surrounded by the pair of body tags. - in optional group 1, the title preceding the 1st note only (after the conversion). The regex successively selects the solitary notes which respect the syntax of the conversion. It therefore also allows you to know the name of the xhtml files which contain them. Asking the regex for counting would tell if the epub is affected by the purpose of the regex-function. Merging of notes should only be requested if there are at least two notes. If group 1 exists, the file contains the 1st note. We cannot predict on which (active) file the regex will start. We can ask that it browse the files in the “spine” order with the parameter: replace.file_order = 'spine' We only know that the occurrence for which group 1 exists is the 1st note. Both solutions rely on this characteristic to obtain a file with the notes starting with the 1st note and then in the correct order. Otherwise, as stated in a previous message, the order of the notes in the result file would depend on the active file when launching the regex. One argument to the replace function is “data”, which is a persistent ׅ “dic” during the execution of the function. Our two functions store their information in this dic. It is possible to request that the function be executed a last time after the last occurrence: replace.call_after_last_match = True It is in this last time that the merge will be requested. Merge updates notes calls in the text and the opf file (since it deletes files). The display must then be updated in the editor as written above by Kovid: get_boss (). apply_container_update_to_gui () A major problem is that the result of the regex-function comes from the “return” of the “replace” function, even though the merge is executed after processing the last occurrence! One would have expected that the result of the regex-function would come from the "merge". The main difference between the two solutions is how to work around this problem. Both functions are commented. |
11-23-2020, 09:06 PM | #8 |
Enthusiast
Posts: 26
Karma: 38
Join Date: Nov 2019
Location: Paris, France
Device: none
|
The function for rascals
The function builds two lists of filenames that it feeds depending on whether it has already encountered the file containing the title or not, which depends on the active file when the regex is launched. The file containing the title is the one containing the 1st note, the one on which the merge will be done. The two lists of file names are merged to get the complete list of files to be merged. We raise an exception with the 'raise' instruction to stop the function after the 'merge' and before the 'return' which would cancel the result. This is the dirty side of the job. At runtime, a warning message appears, which says: Merging: Files merged out. Code:
from calibre.gui2.tweak_book import current_container from calibre.gui2.tweak_book.boss import get_boss from calibre.ebooks.oeb.polish.split import merge class Merging(LookupError): pass # Warning : very dirty work around # Custom class exception, to provoke the end of job without return # If we use return, we loose the result of the merging def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): if match is None: # this is the last passage (all matches found) ctnr = current_container() # Merge files whose name is in the list 'note_files_list' # (stored in persistent dic 'data', a parameter of replace()) # into the file whose name is in 'merge_master' (also stored in data) data['note_files_list'] = data['first_notes'] + data['last_notes'] if data and len(data['note_files_list']) > 1: merge(ctnr, 'text', data['note_files_list'], data['merge_master']) get_boss().apply_container_update_to_gui() # very dirty trick : get out without applying 'return' : raise Merging("Files merged out") else: if 'merge_master' not in data : # data is empty, therefore it's the 1st iteration # the list of files and the master of merge are initialized data['note_files_list'] = [] data['merge_master'] = [] if match.group(1): # If group 1 exists, the note contains the title and therefore the note is the 1st note # The master of merge becomes the current note file data['first'] = True data['merge_master'] = file_name data['first_notes'] = [file_name] data['last_notes'] = [] else: data['first'] = False data['first_notes'] = [] data['last_notes'] = [file_name] # Ask for a passage after the last find (match will be None) # Ask for processing the files in the order they appear in the book replace.call_after_last_match = True replace.file_order = 'spine' else: if match.group(1): data['first'] = True # The master of merge becomes the current note file data['merge_master'] = file_name # Increments the list of files by adding the name of the current file if data['first']: data['first_notes'].append(file_name) else: data['last_notes'].append(file_name) return match.group() Last edited by EbookMakers; 11-24-2020 at 09:37 AM. |
11-23-2020, 09:07 PM | #9 |
Enthusiast
Posts: 26
Karma: 38
Join Date: Nov 2019
Location: Paris, France
Device: none
|
The function for well-behaved people
This function is not interrupted, the ‘return’ is executed. But this 'return' replaces the selected note by the set of notes already encountered, and ordered by a mechanism similar to that of the rascal function. Therefore, after 'merging' and 'returning', we get a single file containing all of the ordered notes. Code:
# <body[^\n]*\n\K\s*(<h[^>]*>[^<]*</h\d>)?\s*<dl[^>]*>\s*<dt[^>]*>\[<a\b(?:(?!</dl).)+</dl>\s*(?=</body>) from calibre.gui2.tweak_book import current_container from calibre.gui2.tweak_book.boss import get_boss from calibre.ebooks.oeb.polish.split import merge def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): if match is None: # this is the last passage (all matches found) ctnr = current_container() # Merge at least 2 files whose name is in the list 'note_files_list' # (stored in persistent dict 'data', a parameter of replace()) # into the file whose name is in 'merge_master' (also stored in data) if data and len(data['note_files_list']) > 1: merge(ctnr, 'text', data['note_files_list'], data['merge_master']) get_boss().apply_container_update_to_gui() else: if 'merge_master' not in data : # data is empty, therefore it's the 1st iteration # the list of files is initialized with the current note file # The master of merge is the current note file data['note_files_list'] = [file_name] data['merge_master'] = file_name if match.group(1): # If group 1 exists, the note contains the title and therefore the note is the 1st note data['first'] = True data['first_notes'] = match.group() data['last_notes'] = '' else: data['first'] = False data['first_notes'] = '' data['last_notes'] = match.group() # Ask for a passage after the last find (match will be None) # Ask for processing the files in the order they appear in the book replace.call_after_last_match = True replace.file_order = 'spine' else: # Increments the list of files by adding the name of the current file # The master of merge becomes the current note file data['note_files_list'].append(file_name) data['merge_master'] = file_name if match.group(1): data['first'] = True if data['first']: # If first is true, the function has already processed the 1st note, # we concatenate in first_notes data['first_notes'] = data['first_notes'] + match.group() else: # Otherwise in last_notes data['last_notes'] = data['last_notes'] + match.group() data['all_notes'] = data['first_notes'] + data['last_notes'] # print (['note_files_list'], data['merge_master']) return data['all_notes'] Last edited by EbookMakers; 11-23-2020 at 09:24 PM. |
11-24-2020, 09:39 AM | #10 |
Enthusiast
Posts: 26
Karma: 38
Join Date: Nov 2019
Location: Paris, France
Device: none
|
I modified the function of the #8 message to add the 'pass' statement in the class 'Merging', instead of relying on the comment to do nothing.
Last edited by EbookMakers; 11-24-2020 at 01:27 PM. |
11-26-2020, 11:09 PM | #11 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Hi
Maybe I misunderstood something, but I can't see why it would be necessary to use a regex-function for this. 1. The calibre editor, as Kovid wrote, can merge all the notes placed in their own pages. 2. On the CSS side, I fail to see the usefulness of ordered list code for the footnotes. It's a separate issue of course. Here is the end result after these two changes. |
11-27-2020, 09:35 AM | #12 |
Enthusiast
Posts: 26
Karma: 38
Join Date: Nov 2019
Location: Paris, France
Device: none
|
The regex alone is interesting since its count allows to know immediately if the epub is concerned or not. Unless we created them ourselves, we don't necessarily know a lot about our epubs.
It is correct that it is not necessary to use a regex-function to merge note files, this was also written from the #1 message. This is just a small, unpretentious exercise that first shows a use of 'merge' and 'apply_container_update_to_gui'. It uses 'replace.call_after_last_match = True' and shows that the content of the 'return' triumphs over changes in the text by this last call when one would expect the opposite. It gives 2 ways to overcome this constraint. It also shows some data manipulation in the persistent dic 'data'. |
Tags |
regex-function |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Help creating possible Regex-Function | MerlinMama | Editor | 14 | 03-03-2020 05:53 AM |
Predefined regex for Regex-function | sherman | Editor | 3 | 01-19-2020 05:32 AM |
Merge Books function behaviour change | toomuchreading | Library Management | 4 | 04-11-2018 02:20 PM |
Regex Function about «» and “” | senhal | Editor | 8 | 04-06-2016 02:12 AM |
Is there a way to merge tags, preferably via regex? | Awfki | Calibre | 7 | 10-31-2015 03:55 PM |