Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 03-29-2020, 03:49 AM   #1
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,625
Karma: 23190435
Join Date: Dec 2010
Device: Kindle PW2
Searching within tags

Quote:
Originally Posted by carmenchu View Post
Well: so far, in 'non greedy' mode,
(?<=\>)\b([^<]+)(?=\</) selects between tags, not nested
(?<=\>)\b([^<]+)(?=\<) skips tags.
Useful when the mouse gets temperamental, and one wishes to manually extract/move some text.
for the Sigil User Guide and the links to regex references
If you have basic programming skills, you could also write an ad-hoc Sigil plugin using the BeautifulSoup library, which is bundled with Sigil, to manipulate tags. (The Sigil API documentation is here.)
This will save you the hassle of coming up with complex regular expressions.

For example the following minimal plugin code:

Spoiler:
Code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys, os
from sigil_bs4 import BeautifulSoup

def run(bk):

    # get all html files
    for (html_id, href) in bk.text_iter():
        file_name = os.path.basename(href)
        html = bk.readfile(html_id)
        
        # convert html to soup
        soup = BeautifulSoup(html, 'html.parser')
        orig_html = str(soup)
        
        # get all span tags
        spans = soup.find_all('span')
        for span in spans:
            if 'class' in span.attrs:
                if 'Calibre13' in span['class']:
                    # remove class attribute
                    del span['class']
                    # change <span> to <b>
                    span.name = 'b'
                else:
                    # delete <span> tags with other classes
                    span.unwrap()
            else:
                # delete <span> tags w/o classes
                span.unwrap()

        # update file with changes
        if str(soup) != orig_html:
            bk.writefile(html_id, str(soup))
            print(file_name, 'updated')

    print('Done')
    return 0


will look for span tags with a Calibre13 class and replace them with <b> tags. (All other <span> tags will be deleted.)

Before:

Code:
<p>This should be <span class="Calibre6 Calibre13 Calibre2">bolded</span>. <span class="Calibre2">This span is redundant</span> <span>and this span should also be deleted.</span></p>
After:

Code:
<p>This should be <b>bolded</b>. This span is redundant and this span should also be deleted.</p>
If you want to test the plugin code:
  • Create a MyPlugin folder in the Sigil plugins folder
  • Save the plugin code as plugin.py in that folder.
  • Create a plugin.xml file with the following contents:
    Spoiler:
    Code:
    <?xml version="1.0" encoding="UTF-8"?>
    <plugin>
        <name>MyPlugin</name>
        <type>edit</type>
        <autostart>true</autostart>
        <author>carmenchu</author>
        <description>bs4 test</description>
        <engine>python3.4</engine>
        <version>0.0.1</version>
        <oslist>unx,win,osx</oslist>
    </plugin>

    and also save it in the MyPlugin folder.
(To run the plugin, select Plugins > Edit > MyPlugin.)

Last edited by Doitsu; 03-29-2020 at 03:55 AM.
Doitsu is offline   Reply With Quote
Old 04-06-2020, 10:16 AM   #2
carmenchu
Groupie
carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.carmenchu ought to be getting tired of karma fortunes by now.
 
Posts: 183
Karma: 266070
Join Date: Dec 2010
Device: Win7,Win10,Lubuntu,smartphone
Quote:
Originally Posted by Doitsu View Post
If you have basic programming skills, you could also write an ad-hoc Sigil plugin using the BeautifulSoup library, which is bundled with Sigil, to manipulate tags. (The Sigil API documentation is here.)...
Thanks: very useful for what I am trying to do as a plugin.
Only, I do need a little help with syntax to make this modified code work:
Spoiler:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys, os
from sigil_bs4 import BeautifulSoup

def run(bk):

# get all html files
for (html_id, href) in bk.text_iter():
file_name = os.path.basename(href)
html = bk.readfile(html_id)

# convert html to soup
soup = BeautifulSoup(html, 'html.parser')
orig_html = str(soup)

# get all i tags
italics = soup.find_all('i') # how for 'i', 'b', 'small', 'br', 'h1/2/3...'
for i in italics:
if 'class' in i.attrs:
print(file_name, 'found') # finds
if 'calibre' in i['class']:
# remove class attribute
print(file_name, 'found attrib') # doesn't find "calibre3"
del i['class']
# # change <span> to <b>
# span.name = 'b'
# else:
# # delete <span> tags with other classes
# span.unwrap()
# else:
# # delete <span> tags w/o classes
# span.unwrap()

# update file with changes
if str(soup) != orig_html:
bk.writefile(html_id, str(soup))
print(file_name, 'updated')

print('Done')
return 0

1. how to pass to soup.find_all() a list of tags as argument
2. how to rework
Code:
if 'calibre' in tag['class']
so that it would match a substring, i.e., 'calibre15'.
3. Would the code work as well for selecting <meta... /> tag by 'name' and deleting it? How?
Maybe it's trivial, but I am green--python 2.+ for Gimp is the fartest I have gone. And couldn't make anything of your link
Thanks!

* Sorry for the delay: too many irons...
** Does this get 'out of topic'? (better in plug-ins)
carmenchu is offline   Reply With Quote
Advert
Old 04-06-2020, 10:45 AM   #3
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,625
Karma: 23190435
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by carmenchu View Post
1. how to pass to soup.find_all() a list of tags as argument
You can use a list as the search parameter. For example:
Code:
tags = soup.find_all(['i', 'b', 'small', 'br'])
Quote:
Originally Posted by carmenchu View Post
2. how to rework
Code:
if 'calibre' in tag['class']
so that it would match a substring, i.e., 'calibre15'.
You can use a regular expression with find_all().

Quote:
Originally Posted by carmenchu View Post
3. Would the code work as well for selecting <meta... /> tag by 'name'
Yes, they're treated like all other tags.

Quote:
Originally Posted by carmenchu View Post
[...]and deleting it? How?
You can delete tags with decompose(). However, since this modifies the "soup," it should be done last.

Quote:
Originally Posted by carmenchu View Post
** Does this get 'out of topic'? (better in plug-ins)
I also think it should be moved to plugins.
Doitsu is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Searching for entries with no tags sasilk Library Management 3 09-02-2017 11:48 AM
Searching between tags mariaclaudia Calibre 4 05-10-2017 05:38 PM
Searching Tags MzPepper Library Management 4 03-29-2017 02:36 PM
Searching tags to show tags I want even when it has a tag I do not Jade Aislin Library Management 2 04-25-2012 01:01 PM
Searching for empty tags iain_benson Calibre 2 01-27-2009 05:04 PM


All times are GMT -4. The time now is 01:35 AM.


MobileRead.com is a privately owned, operated and funded community.