Searching within tags

Doitsu · 03-29-2020, 03:49 AM

Quote:

Originally Posted by carmenchu

Well: so far, in 'non greedy' mode,
(?<=\>)\b([^<]+)(?=\</) selects between tags, not nested
(?<=\>)\b([^<]+)(?=\<) skips tags.
Useful when the mouse gets temperamental, and one wishes to manually extract/move some text.

for the Sigil User Guide and the links to regex references

If you have basic programming skills, you could also write an ad-hoc Sigil plugin using the BeautifulSoup library, which is bundled with Sigil, to manipulate tags. (The Sigil API documentation is here.)
This will save you the hassle of coming up with complex regular expressions.

For example the following minimal plugin code:

Spoiler:

will look for span tags with a Calibre13 class and replace them with <b> tags. (All other <span> tags will be deleted.)

Before:

Code:

<p>This should be <span class="Calibre6 Calibre13 Calibre2">bolded</span>. <span class="Calibre2">This span is redundant</span> <span>and this span should also be deleted.</span></p>

After:

Code:

<p>This should be <b>bolded</b>. This span is redundant and this span should also be deleted.</p>

If you want to test the plugin code:

Create a MyPlugin folder in the Sigil plugins folder
Save the plugin code as plugin.py in that folder.

Create a plugin.xml file with the following contents:

Spoiler:

and also save it in the MyPlugin folder.

(To run the plugin, select Plugins > Edit > MyPlugin.)

carmenchu · 04-06-2020, 10:16 AM

Quote:

Originally Posted by Doitsu

If you have basic programming skills, you could also write an ad-hoc Sigil plugin using the BeautifulSoup library, which is bundled with Sigil, to manipulate tags. (The Sigil API documentation is here.)...

Thanks: very useful for what I am trying to do as a plugin.
Only, I do need a little help with syntax to make this modified code work:

Spoiler:

1. how to pass to soup.find_all() a list of tags as argument
2. how to rework

Code:

if 'calibre' in tag['class']

so that it would match a substring, i.e., 'calibre15'.
3. Would the code work as well for selecting <meta... /> tag by 'name' and deleting it? How?
Maybe it's trivial, but I am green--python 2.+ for Gimp is the fartest I have gone. And couldn't make anything of your link

Thanks!

* Sorry for the delay: too many irons...
** Does this get 'out of topic'? (better in plug-ins)

Doitsu · 04-06-2020, 10:45 AM

Quote:

Originally Posted by carmenchu

1. how to pass to soup.find_all() a list of tags as argument

You can use a list as the search parameter. For example:

Code:

tags = soup.find_all(['i', 'b', 'small', 'br'])

Quote:

Originally Posted by carmenchu

2. how to rework

Code:

if 'calibre' in tag['class']

so that it would match a substring, i.e., 'calibre15'.

You can use a regular expression with find_all().

Quote:

Originally Posted by carmenchu

3. Would the code work as well for selecting <meta... /> tag by 'name'

Yes, they're treated like all other tags.

Quote:

Originally Posted by carmenchu

[...]and deleting it? How?

You can delete tags with decompose(). However, since this modifies the "soup," it should be done last.

Quote:

Originally Posted by carmenchu

** Does this get 'out of topic'? (better in plug-ins)

I also think it should be moved to plugins.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Searching for entries with no tags	sasilk	Library Management	3	09-02-2017 11:48 AM
Searching between tags	mariaclaudia	Calibre	4	05-10-2017 05:38 PM
Searching Tags	MzPepper	Library Management	4	03-29-2017 02:36 PM
Searching tags to show tags I want even when it has a tag I do not	Jade Aislin	Library Management	2	04-25-2012 01:01 PM
Searching for empty tags	iain_benson	Calibre	2	01-27-2009 05:04 PM

Advert