View Single Post
Old 03-24-2024, 03:26 PM   #61
lomkiri
Zealot
lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.
 
lomkiri's Avatar
 
Posts: 136
Karma: 1000102
Join Date: Jul 2021
Device: N/A
Replace a list of words by the words of another list (works with big numbers)

In the thread Search and Replace from a List, moldy asked how he could replace a list of names by the names of another list, i.e.
["John", "Paul", "George", "Ringo"] by
["Mick", "Keith", "Ronnie", "Charlie"].

A way to do this with generic regex and function is to use an external json file including a dict, so we have just to adapt the dict to our needs, and it works with any number of words in the list (even hundreds, which was the case for Moldy).

With our example, let put the file change_words.json in the config folder of calibre, this file containing :
Code:
{
  "John": "Mike",
  "Paul": "Keith",
  "George": "Ronnie",
  "Ringo": "Charlie"
}
Code:
find : <body[^>]*>\K(.+)</body>
"dot all" must be checked
This regex selects the whole html page, i.e. all text in the <body>
The regex inside the function will avoid everything inside <>, so the html tags won't be scanned.
The function, in its simpler way (without any counters) is:
Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    # Replace words using a dict in a json file (without counters)
    from calibre.utils.config import JSONConfig
    import regex

    # Put the file 'change_words.json' in the config-folder of calibre
    # If you choose another name for the json, change it here:
    fname = 'change_words.json'

    # Load json only at first passage
    # data will retain its values throught all passages when "replace all"
    if number == 1:
        data['equiv'] = JSONConfig(fname)
        if not data['equiv']:
            print(f'Problem loading {fname}, no treatment will be done')

    # normal passage
    m = match.group()
    for key, val in data['equiv'].items():
        # Find key, excluding everything between <...>
        m = regex.sub(rf'\b{key}\b(?![^<>]*>)', val, m)
    return m
Note: Each replace is a whole page, so the number of changes will be the number of pages scanned, even if in the 1st page there is one hundred change and none in the other pages.

If we want a counter with the number of real changes, the code is the one below. Calibre will open a "debug window" at the end of the replace action with some counters. It is also possible to write a json file with the number of changes by word (enable by default). Set the variables "do_count" and "count_by_name" as you need.

Counters will be more accurate with "replace all".
If replacements are made one by one, the counters will be reset at each file.

Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    # Replace words using a dict in a json file, with possibility of counters
    from calibre.utils.config import JSONConfig
    import regex

    ### Parameters
    # Put the file 'change_words.json' in the config-folder of calibre
    # If you choose another name for the json, change it here:
    fname = 'change_words.json'

    # If do_count is True, will write the total of changes.
    # It count_by_name is also True, the function will write the counters by name in the
    # file "change_words_counters.json" (in the config-folder of calibre)
    do_count = True    # put False if you don't want any counter
    count_by_name = True
    counters_fname = 'change_words_counters.json'
    ### End Parameters


    # === Last passage: if counters were asked in the heading of this function
    if match == None:
        if data['total'] == 0:
            print('No occurrence found.\n'
                  f"We had to change a list of {len(data['equiv'])} words (in {fname})")
            return

        if count_by_name:
            json = JSONConfig(counters_fname)
            json.clear()
            json.update(data['counters'])
            json.commit()

        print(f"We had to change a list of {len(data['equiv'])} words (in {fname})\n"
              f"In this list, {len(data['counters'])} words had at least one occurrence\n"
              f"=== The total of all changes is: {data['total']} ===\n\n"
              f"The file {counters_fname} has been written with the counters by word" if count_by_name else '')
        return

    # === First passage
    # Load the json file only at first passage
    # data will retain its values throught all passages when "replace all"
    if number == 1:
        data['equiv'] = JSONConfig(fname)
        if not data['equiv']:
            print(f'Problem loading {fname}, no treatment will be done')
        if do_count:
            replace.call_after_last_match = True    # Ask for last passage
            data['total'] = 0
            data['counters']= {}


    # === normal passage
    m = match.group()
    for key, val in data['equiv'].items():
        # Find key, excluding everything between <...>
        m, n = regex.subn(rf'\b{key}\b(?![^<>]*>)', val, m)
        if do_count and n:
            data['total'] += n
            data['counters'][key] = data['counters'].get(key, 0) + n
    return m

Last edited by lomkiri; 03-24-2024 at 04:44 PM.
lomkiri is offline   Reply With Quote