Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 05-18-2024, 12:07 PM   #16
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,160
Karma: 5450818
Join Date: Nov 2009
Device: many
Okay, I spent some time investigating this:

1. exiftool requires a C library and or command line subprocess interface so that is out

2. Pillow will allow you to use getxmp() but you need the "defusedxml" module. Luckily defusedxml is pure python and small and can be added to a plugin easily. So using Pillow can be used.

BUT:

Pillow made the insane decision to NOT return the raw XML for post processing, and instead creates some horrible nested dict that contains other dicts and lists and so accessing a single element or even walking the list requires recursion and is a real pain. Especially with all of the namespaces being used in the official example. Talk about an xml namespace nightmare!

And as far as I can tell attribute values are lost, case is mutated, etc. It should have just returned the pure xml since there are many xml parsers and tools like bs4 that could be used to get what is needed.

Especially when you do not know the exact namespaces or structure employed. And especially if you may need to access to multiple langauge versions of the same alt text.

3. So that just leaves the following I threw together to based on fragments I could find on on the web (stack exchange) glued together with a few pieces of my own:


Code:
import sys
import os
from bs4 import BeautifulSoup

filename = "test.jpg"
f = open(filename, "rb")
d = f.read()
xmp_str = b""

while d:
    xmp_start = d.find(b"<x:xmpmeta")
    xmp_end = d.find(b"</x:xmpmeta")
    xmp_str += d[xmp_start : xmp_end + 12]
    d = d[xmp_end + 12 :]

alt_text_dict = {}

xmpAsXML = BeautifulSoup(xmp_str, 'xml')
if xmpAsXML:
    node = xmpAsXML.find('AltTextAccessibility')
    if node:
        for element in node.find_all('li'):
            # print(element.prefix, element.namespace, element.name, element['xml:lang'], element.text)
            lang = element.get('xml:lang', 'x-default')
            alt_text_dict[lang] = element.text

for k, v in alt_text_dict.items():
    print(k, v)

All of this could be rewritten into a nice routine but ... it literally walks the entire binary data file looking for particular starting strings (which depend on the x prefix namespace being defined) and ending strings. If a different prefix is used, this search will fail.

This is a mess and very very time consuming for large images.

So I will probably have to dig into the Pillow getxmp() implementation code to try to more quickly just extract the xml and not some horrible nested dictionary.

Before doing all of that, I wonder just how many epub images actually have any of the xmp metadata at all?

Otherwise this seems to be an exercise in futility, since the metadata takes up room, all image optimizers I know of (which are regularly run on images before adding them to an epub) remove this metadata completely. Removing all metadata also prevents some image orientation issues.

So not sure if this is worth the work.

What are people's thoughts on this.

Last edited by KevinH; 05-18-2024 at 12:53 PM.
KevinH is online now   Reply With Quote
Old 05-18-2024, 12:46 PM   #17
oston
Connoisseur
oston ought to be getting tired of karma fortunes by now.oston ought to be getting tired of karma fortunes by now.oston ought to be getting tired of karma fortunes by now.oston ought to be getting tired of karma fortunes by now.oston ought to be getting tired of karma fortunes by now.oston ought to be getting tired of karma fortunes by now.oston ought to be getting tired of karma fortunes by now.oston ought to be getting tired of karma fortunes by now.oston ought to be getting tired of karma fortunes by now.oston ought to be getting tired of karma fortunes by now.oston ought to be getting tired of karma fortunes by now.
 
Posts: 78
Karma: 2138296
Join Date: Nov 2016
Device: ipad, Kindle Scribe, Kobo Libra 2
Many thanks for looking into this, Kevin.

Based on what I have seen with the images I regularly encounter, seeing alt-text entries in image metadata is very unusual.

I work with a small non-profit publisher, creating epub versions of their print books and the books often have images. But this is the first time I have ever seen images that contain alt-text entries in their metadata.

It was nice not having to write alt-text , but it was also not at all difficult to copy and paste the alt-text using the very user-friendly alt text feature in Access-Aide.

So I do not think that we need to proceed with this feature.

Another reason not to do anything relating to alt-text in image metadata is that some of the alt-text entries that I saw more properly belonged in an extended description.
See: https://kb.daisy.org/publishing/docs....html#extended
So much of what is actually needed for accessibility depends on the context in which the image is used. So the alt-text in an image metadata might not be suitable for every context in which the image is used. So it's probably better to work directly with each image to provide the best alt-text in the circumstances.

Thanks again, for looking into this, it was very helpful because it prompted a deeper dive into this entire topic.

Jim

Last edited by oston; 05-18-2024 at 12:48 PM.
oston is offline   Reply With Quote
Advert
Old 05-18-2024, 01:45 PM   #18
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,640
Karma: 23191067
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by KevinH View Post
What are people's thoughts on this.
Since IPTC metadata seems to be less commonly used than EXIF metadata, a compromise might be grabbing the ImageDescription EXIF metadata entry with Pillow.

This requires only a few lines of code:
Spoiler:
Code:
# -*- coding: utf-8 -*-
import sys, os
from io import BytesIO
from PIL import Image

def run(bk):
    imgdata = bk.readfile('Image1219.jpg')
    img = Image.open(BytesIO(imgdata)).convert('L')
    image_description = None
    exif = img.getexif()
    # 270 = ImageDescription
    if exif and 270 in exif:
        image_description = exif[270]
        print(image_description)

    return 0

def main():
  print ("I reached main when I should not have\n")
  return -1

if __name__ == "__main__":
  sys.exit(main())


The code will return the string: A Prince looks out between the bars of a prison window.
(It refers to this image provided by the OP.)

IMHO, automatically extracting some human generated description with Acess-Aide is better than extracting no description at all.

@oston would extracting the ImageDescription information be helpful to you?
Doitsu is offline   Reply With Quote
Old 05-18-2024, 02:24 PM   #19
oston
Connoisseur
oston ought to be getting tired of karma fortunes by now.oston ought to be getting tired of karma fortunes by now.oston ought to be getting tired of karma fortunes by now.oston ought to be getting tired of karma fortunes by now.oston ought to be getting tired of karma fortunes by now.oston ought to be getting tired of karma fortunes by now.oston ought to be getting tired of karma fortunes by now.oston ought to be getting tired of karma fortunes by now.oston ought to be getting tired of karma fortunes by now.oston ought to be getting tired of karma fortunes by now.oston ought to be getting tired of karma fortunes by now.
 
Posts: 78
Karma: 2138296
Join Date: Nov 2016
Device: ipad, Kindle Scribe, Kobo Libra 2
Quote:
Originally Posted by Doitsu View Post
Since IPTC metadata seems to be less commonly used than EXIF metadata, a compromise might be grabbing the
IMHO, automatically extracting some human generated description with Acess-Aide is better than extracting no description at all.

@oston would extracting the ImageDescription information be helpful to you?
Thanks for this information, Doitsu.
I am by no means experienced enough to give a valuable answer. I am just trying to learn as much as I can about making accessible epubs.

In the images I have seen, until I saw this latest set of images, I had not seen any Image Descriptions or alt-text in image meta-data.

But hopefully someone who is very experienced with Image Descriptions and accessibility issues will see this and give a more informed answer.

Sorry that I'm not able to be more helpful.
oston is offline   Reply With Quote
Old 05-18-2024, 03:58 PM   #20
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,160
Karma: 5450818
Join Date: Nov 2009
Device: many
Using exif ImageDescription would be easy to add to AccessAide if that helps.

FWIW, I am just so disappointed that Pillow did not return the xml in their getxmp() method instead of nested mess of dicts and lists. Really makes accessing specific xmp metadata hard to work with.
KevinH is online now   Reply With Quote
Advert
Old 05-20-2024, 08:54 AM   #21
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,160
Karma: 5450818
Join Date: Nov 2009
Device: many
The Pillow dev guys nicely gave me a snippet of code that will return the actual xml across all 4 image types that support it now. That makes Pillow the obvious best candidate. So I should be able to query for Alt Text and if not present, fall back to exif ImageDescription.

I think that might be worth adding to a future version of AccessAide.
KevinH is online now   Reply With Quote
Old 05-20-2024, 09:42 AM   #22
oston
Connoisseur
oston ought to be getting tired of karma fortunes by now.oston ought to be getting tired of karma fortunes by now.oston ought to be getting tired of karma fortunes by now.oston ought to be getting tired of karma fortunes by now.oston ought to be getting tired of karma fortunes by now.oston ought to be getting tired of karma fortunes by now.oston ought to be getting tired of karma fortunes by now.oston ought to be getting tired of karma fortunes by now.oston ought to be getting tired of karma fortunes by now.oston ought to be getting tired of karma fortunes by now.oston ought to be getting tired of karma fortunes by now.
 
Posts: 78
Karma: 2138296
Join Date: Nov 2016
Device: ipad, Kindle Scribe, Kobo Libra 2
Thanks, very much, Kevin. That will be helpful.
oston is offline   Reply With Quote
Old 05-20-2024, 02:49 PM   #23
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 40,536
Karma: 157444380
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
It would be handy and possibly, if the gods are kind, save me from manually adding all alt texts.
DNSB is offline   Reply With Quote
Old 05-23-2024, 02:57 PM   #24
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,160
Karma: 5450818
Join Date: Nov 2009
Device: many
Access-Aide Version v095 has now been released. It is available via our Sigil Plugin Index as an attachment or from my github repo:

https://github.com/kevinhendricks/Access-Aide

It now includes the ability to take EMPTY alt attributes and look up the image's own metadata for XMP AltTextAccessibility or failing that, exif ImageDescription to auto fill alt attribute values.

It will NOT overwrite any existing image alt value.

Hope this helps,

KevinH
KevinH is online now   Reply With Quote
Old 05-23-2024, 03:08 PM   #25
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,160
Karma: 5450818
Join Date: Nov 2009
Device: many
In case anyone else wants to add this feature to their own code, here is the sample code:

Code:
import sys
from bs4 import BeautifulSoup
from PIL import Image

# extract base language from language code
def baselang(lang):
    if len(lang) > 3:
        if lang[2:3] in "-_":
            return lang[0:2]
    return None

def parse_xmpxml_for_alttext(xmpxml):
    xmpmeta = BeautifulSoup(xmpxml, 'xml')
    alt_dict = {}
    if xmpmeta:
        node = xmpmeta.find('AltTextAccessibility')
        if node:
            for element in node.find_all('li'):
                lang = element.get('xml:lang', 'x-default')
                alt_dict[lang] = element.text
                lg = baselang(lang)
                if lg:
                    alt_dict[lg] = element.txt
    return alt_dict


def get_image_metadata_alttext(imgpath, tgtlang):
    xmpxml = None
    description = ""
    with Image.open(imgpath) as im:
        if im.format == 'WebP':
            if "xmp" in im.info:
                xmpxml = im.info["xmp"]
        if im.format == 'PNG':
            if "XML:com.adobe.xmp" in im.info:
                xmpxml = im.info["XML:com.adobe.xmp"]
        if im.format == 'TIFF':
            if 700 in im.tag_v2:
                xmpxml = im.tag_v2[700]
        if im.format == 'JPEG':
            for segment, content in im.applist:
                if segment == "APP1":
                    marker, xmp_tags = content.split(b"\x00")[:2]
                    if marker == b"http://ns.adobe.com/xap/1.0/":
                        xmpxml = xmp_tags
                        break
        exif = im.getexif()
        # 270 = ImageDescription
        if exif and 270 in exif:
            description = exif[270]
    if not xmpxml:
        return description
    alt_dict = parse_xmpxml_for_alttext(xmpxml)
    # first try full language code match
    if tgtlang in alt_dict:
        return alt_dict[tgtlang]
     # next try base language code match
    lg = baselang(tgtlang)
    if lg and lg in alt_dict:
        return alt_dict[lg]
    # use default
    if 'x-default' in alt_dict:
        return alt_dict['x-default']
    # otherwise fall back to exif image description
    return description



imgpath = "test.jpg"
lang = 'en-US'
print(get_image_metadata_alttext(imgpath, lang))

Last edited by KevinH; 05-24-2024 at 12:57 PM.
KevinH is online now   Reply With Quote
Old 05-23-2024, 03:37 PM   #26
BeckyEbook
Guru
BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.
 
BeckyEbook's Avatar
 
Posts: 783
Karma: 2298438
Join Date: Jan 2017
Location: Poland
Device: Various
@KevinH: It is essential to add try/except from line 482, as it throws an error if there is no metadata in the image.

Spoiler:
Code:
Traceback (most recent call last):
  File "C:\Program Files\Sigil\plugin_launchers\python\launcher.py", line 142, in launch
    self.exitcode = target_script.run(container)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Becky\AppData\Local\sigil-ebook\sigil\plugins\Access-Aide\plugin.py", line 482, in run
    alttext = get_image_metadata_alttext(imgpath, plang)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Becky\AppData\Local\sigil-ebook\sigil\plugins\Access-Aide\plugin.py", line 238, in get_image_metadata_alttext
    xmpxml = im.info["XML:com.adobe.xmp"]
             ~~~~~~~^^^^^^^^^^^^^^^^^^^^^
KeyError: 'XML:com.adobe.xmp'
Error: 'XML:com.adobe.xmp'
BeckyEbook is offline   Reply With Quote
Old 05-23-2024, 03:44 PM   #27
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,160
Karma: 5450818
Join Date: Nov 2009
Device: many
I will check for that key first to prevent the keyerror.

Thanks!
KevinH is online now   Reply With Quote
Old 05-23-2024, 04:03 PM   #28
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,160
Karma: 5450818
Join Date: Nov 2009
Device: many
Should now be fixed in v0.9.6 just posted.

Thank you @BeckyEbook!
KevinH is online now   Reply With Quote
Old 05-23-2024, 11:06 PM   #29
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 40,536
Karma: 157444380
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Tested 0.9.6 on 4 ePubs with images. One worked well since it had decent metadata, one worked on 4 out of 10 images, the last two had no useful metadata. Still going to save me time and effort so thanks very much!
DNSB is offline   Reply With Quote
Old 05-24-2024, 01:03 PM   #30
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,160
Karma: 5450818
Join Date: Nov 2009
Device: many
FYI: There is an indentation whitespace issue. So a new version of Access Aide (this time 0.9.7) will be coming later this evening fixing that. It only impacts jpeg images with multiple APP1 segments none of which are xmp metadata.

So the alt_text in your 4 epubs should be correct as is.

Update:

Version 0.9.7 just posted has this new fix. Hopefully the last one.

Last edited by KevinH; 05-24-2024 at 03:22 PM.
KevinH is online now   Reply With Quote
Reply

Tags
access-aide, alt text


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[Plugin] Access-Aide - help improve epub accessibility KevinH Plugins 147 10-15-2024 11:25 AM
Bug: splitting pages after using Access Aide oston Sigil 4 04-08-2024 08:59 AM
[Editor Plugins] Access Aide wolf123 Plugins 5 07-08-2023 02:10 PM
access-aide failure oston Sigil 5 06-27-2023 04:42 PM
Alt Text in epub Lancelot ePub 3 09-11-2013 04:55 AM


All times are GMT -4. The time now is 06:26 PM.


MobileRead.com is a privately owned, operated and funded community.