Template to search notes? - Page 2

chaley · 09-13-2024, 12:27 PM

Quote:

Originally Posted by Comfy.n

Today on v7.18, it took about the same time as yesterday while running from source, on a timed attempt. ~ 12 minutes.

Well, more thought required ...

How many author notes do you have?

Comfy.n · 09-13-2024, 12:37 PM

11877

chaley · 09-13-2024, 03:14 PM

@Comfy_n: I am trying to see why this takes so long and failing. Having a large library to work with will help. Any chance you would be willing to share your metadata.db and the contents of the .cal_notes folder? The notes.db isn't enough -- I need the resources as well.

Comfy.n · 09-13-2024, 03:16 PM

sure I can

chaley · 09-13-2024, 04:33 PM

Quote:

Originally Posted by Comfy.n

sure I can

Thanks.

I think I have found the bottleneck. Getting your metadata.db and the notes folder will let me be sure.

NB: I don't need the books themselves.

chaley · 09-14-2024, 12:41 PM

Try this template. On my machine and using your library the search completes in 4 or 5 seconds.

Note that it is suitable to be used as a stored template with 2 arguments. I tested it as "books_with_notes_containing_text" with this as the actual template search.

Click image for larger version

Name: Clipboard01.jpg
Views: 53
Size: 71.4 KB
ID: 210811

Code:

python:
def evaluate(book, context):
    if context.arguments is None or len(context.arguments) != 2:
        # Set these to what you want
        field_name = 'authors'
        value_in_note = 'a'
    else:
        field_name = context.arguments[0]
        value_in_note = context.arguments[1]

    db = context.db.new_api

    # check if we have already cached the notes
    note_items = context.globals.get('items_with_notes', None)
    if note_items is not None:
        # We've already fetched which items have notes.
        # Get the cached search result values
        note_search_results = context.globals['note_search_results']
        item_name_map = context.globals['item_name_map']
    else:
        # First time. Get the items with notes and initialize
        # the search value cache.
        note_items = db.get_all_items_that_have_notes(field_name)
        context.globals['items_with_notes'] = note_items
        note_search_results = {}
        context.globals['note_search_results'] = note_search_results
        # db.get_item_id() uses a linear search. Avoid this by getting
        # and caching the map
        item_name_map = db.get_item_name_map(field_name)
        context.globals['item_name_map'] = item_name_map

    # Check if this book is a match -- that the field has a note containing
    # the desired text.
    
    # We must first get the item_id for the value in the field to be checked.
    field_value = book.get(field_name)
	if not field_value:
		return ''
    # if the field is multi-valued, use the first value
    if isinstance(field_value, list):
		if len(field_value) == 0:
			return ''
        field_value = field_value[0]
    # Now get the cached internal ID of the value in field_name
    item_id = item_name_map[field_value]

    # Does the item have a note? If not, give up now.
    if item_id not in note_items:
        return ''

    # The item has a note. Have we already checked it?
    if item_id not in note_search_results:
        # Item has a note but we haven't seen it before. Do the compare
        # on the plain text version of the note.
        result = ''
        # Get the note.
        note = db.notes_data_for(field_name, item_id)
        if note:
            # Get the plain text of the note.
            note = note['searchable_text'].partition('\n')[2]
        if note:
            # use a case insensitive compare to check if the search value is in the note
            from calibre.utils.icu import primary_contains
            result = 'Yes' if primary_contains(value_in_note, note) else ''
        # Cache the result of the comparison
        note_search_results[item_id] = result
        context.globals['note_search_results'] = note_search_results
    # Return the cached value
    return note_search_results.get(item_id, '')

Comfy.n · 09-14-2024, 12:49 PM

yes that works

Montana Harper · 09-21-2024, 07:16 PM

Quote:

Originally Posted by chaley

Try this template. On my machine and using your library the search completes in 4 or 5 seconds.

This is the solution I chose, as my library is currently 17000+ books and is probably going to expand relatively quickly. (It's my fanfic library, the size of which is limited only by my interest and AO3's server stability.

) It works wonderfully!

One question, though: is there any way to modify that template to iterate over all the authors for a given title? That would be the absolute ideal solution for me.

chaley · 09-22-2024, 08:59 AM

Quote:

Originally Posted by Montana Harper

One question, though: is there any way to modify that template to iterate over all the authors for a given title? That would be the absolute ideal solution for me.

This template checks every value for a field in a book. If the field is "authors" then it checks every author.

Code:

python:
def evaluate(book, context):
	if context.arguments is None or len(context.arguments) != 2:
		# Set these to what you want
		field_name = 'authors'
		value_in_note = 'note'
	else:
		field_name = context.arguments[0]
		value_in_note = context.arguments[1]

	db = context.db.new_api

	# check if we have already cached the notes
	note_items = context.globals.get('items_with_notes', None)
	if note_items is not None:
		# We've already fetched which items have notes.
		# Get the cached search result values
		note_search_results = context.globals['note_search_results']
		item_name_map = context.globals['item_name_map']
	else:
		# First time. Get the items with notes and initialize
		# the search value cache.
		note_items = db.get_all_items_that_have_notes(field_name)
		context.globals['items_with_notes'] = note_items
		note_search_results = {}
		context.globals['note_search_results'] = note_search_results
		# db.get_item_id() uses a linear search. Avoid this by getting
		# and caching the map
		item_name_map = db.get_item_name_map(field_name)
		context.globals['item_name_map'] = item_name_map

	# Check if this book is a match -- that the field has a note containing
	# the desired text.	
	# We must first get the item_id for the value in the field to be checked.
	field_values = book.get(field_name)
	if not field_values:
		return ''
	
	# We want to check every value in the item, so use a list.
	# If the given field is not multi-valued, turn it into a list
	if not isinstance(field_values, (tuple, list, set)):
		field_values = (field_values,)

	# Loop over the field values, checking each one. Stop on first success
	result = ''
	for field_value in field_values:
		# Get the cached internal ID of the value in field_name
		item_id = item_name_map[field_value]

		# Does the item have a note? If not, give up now.
		if item_id not in note_items:
			continue

		# The item has a note. Have we already checked it?
		result = note_search_results.get(item_id)
		if result is not None:
			# We've already checked this item.
			if result:
				# It matched. Break out of the loop
				break
			# It didn't match. Check the next item
			continue
		# Item has a note but we haven't seen it before. Do the compare
		# on the plain text version of the note.
		# Get the note.
		note = db.notes_data_for(field_name, item_id)
		if note:
			# Get the plain text of the note.
			note = note['searchable_text'].partition('\n')[2]
		if note:
			# use a case insensitive compare to check if the search value is in the note
			from calibre.utils.icu import primary_contains
			result = 'Yes' if primary_contains(value_in_note, note) else ''
		# Cache the result of the comparison
		note_search_results[item_id] = result
		if result:
			break

# Cache the updated results
	context.globals['note_search_results'] = note_search_results
	return result

Montana Harper · 09-28-2024, 10:24 PM

Quote:

Originally Posted by chaley

This template checks every value for a field in a book. If the field is "authors" then it checks every author.

Perfect!

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Template Search: Exact matching	ownedbycats	Library Management	3	04-03-2022 06:01 PM
Template: Converting a search & replace into a template	ownedbycats	Library Management	11	03-26-2021 05:32 AM
Nova Pro : Notes Template	MachinaCarnis	Onyx Boox	0	01-30-2020 02:57 PM
template or search feature question	bulldogmo	Calibre	2	08-06-2014 07:34 PM

09-13-2024, 12:37 PM	#17
Comfy.n want to learn what I want Posts: 1,352 Karma: 6874872 Join Date: Sep 2020 Device: Calibre E-book viewer	11877

09-13-2024, 03:14 PM	#18
chaley Grand Sorcerer Posts: 12,097 Karma: 7908993 Join Date: Jan 2010 Location: Notts, England Device: Kobo Libra 2	@Comfy_n: I am trying to see why this takes so long and failing. Having a large library to work with will help. Any chance you would be willing to share your metadata.db and the contents of the .cal_notes folder? The notes.db isn't enough -- I need the resources as well.

09-13-2024, 03:16 PM	#19
Comfy.n want to learn what I want Posts: 1,352 Karma: 6874872 Join Date: Sep 2020 Device: Calibre E-book viewer	sure I can

09-14-2024, 12:49 PM	#22
Comfy.n want to learn what I want Posts: 1,352 Karma: 6874872 Join Date: Sep 2020 Device: Calibre E-book viewer	yes that works