03-27-2021, 10:11 PM | #1 |
Member
Posts: 15
Karma: 10
Join Date: Dec 2020
Device: epub
|
Problems with Beautifulsoup with custom tags
Hi!, i'm having troubles to add a custom tag with my plugin using Beautifulsoup:
The code: Code:
html = '<p id="nt3"><sup>[3]</sup> Note 1. <a href="../Text/Section0001.xhtml#nt3"><<</a></p>' ## BeautifulSoup parser soup = BeautifulSoup(html, "html.parser") orig_soup = str(soup) original_tag = soup.p dict_atributes = {"xml:lang" : "la"} new_tag = soup.new_tag("i", attrs=dict_atributes) new_tag.string = "Ibid" original_tag.insert(1, " ") original_tag.insert(2, new_tag) original_tag.insert(3, ".") print("OUT:\n" + str(original_tag)) Code:
$ python test.py OUT: <p id="nt3"><sup>[3]</sup> <i xml:lang="la">Ibid</i>. Note 1. <a href="../Text/Section0001.xhtml#nt3"><<</a></p> Code:
OUT: <p id="nt3"><sup>[3]</sup> <i attrs="{'xml:lang': 'la'}">Ibid</i>. Note 1. <a href="../Text/Section0001.xhtml#nt3"><<</a></p> Thanks! PS: Using python 3.8 and Sigil 1.4.3 |
03-27-2021, 11:49 PM | #2 |
Sigil Developer
Posts: 8,160
Karma: 5450818
Join Date: Nov 2009
Device: many
|
What are the double xml escaped "<" as part of the text for?
How are getting the OUT? If you print it from the plugin, it will pass through an xml encode xml decode pass when being returned from the plugin process over stdout as xml. So instead of printing to see this value, simply write to a log file from the plugin so you can see exactly what BeautifulSoup is generating. Here, my guess it is exactly identical to what you see outside, it is just getting unencoded passing back in the stdout xml file from the plugin. |
03-28-2021, 12:37 AM | #3 | |||
Member
Posts: 15
Karma: 10
Join Date: Dec 2020
Device: epub
|
They are to return to the call of the reference in the text. Like a back button:
Quote:
From the output of the print() function shown in the Plugin Runner. Its output is consistent with the bk.writefile(). Quote:
Quote:
Here its the exact code: Code:
#!/usr/bin/env python # -*- coding: utf-8 -*- import sys, os, re import xml.etree.ElementTree as ET try: from sigil_bs4 import BeautifulSoup except: from bs4 import BeautifulSoup def run(bk): html = '<p id="nt3"><sup>[3]</sup> Note 1. <a href="../Text/Section0001.xhtml#nt3"><<</a></p>' ## BeautifulSoup parser soup = BeautifulSoup(html, "html.parser") orig_soup = str(soup) original_tag = soup.p dict_atributes = {"xml:lang" : "la"} new_tag = soup.new_tag("i", attrs=dict_atributes) new_tag.string = "Ibid" original_tag.insert(1, " ") original_tag.insert(2, new_tag) original_tag.insert(3, ".") output = "OUT:\n" + str(original_tag) f = open("log.txt", "w") f.write(output) f.close() print(output) return 0 def main(): html = '<p id="nt3"><sup>[3]</sup> Note 1. <a href="../Text/Section0001.xhtml#nt3"><<</a></p>' ## BeautifulSoup parser soup = BeautifulSoup(html, "html.parser") orig_soup = str(soup) original_tag = soup.p dict_atributes = {"xml:lang" : "la"} new_tag = soup.new_tag("i", attrs=dict_atributes) new_tag.string = "Ibid" original_tag.insert(1, " ") original_tag.insert(2, new_tag) original_tag.insert(3, ".") output = "OUT:\n" + str(original_tag) f = open("log.txt", "w") f.write(output) f.close() print(output) if __name__ == "__main__": sys.exit(main()) Last edited by ebray187; 03-28-2021 at 12:40 AM. |
|||
03-28-2021, 12:49 PM | #4 |
Sigil Developer
Posts: 8,160
Karma: 5450818
Join Date: Nov 2009
Device: many
|
If you compare that to your first post you will see they are not the same. The printed output is showing the < < decoded when it should not be to be safely used.
The issue is you trying to assign an attribute as a dict. It is being converted to what is needed when run outside of the plugin environment but not inside. My guess is the default dict type is different. One may be an ordered dict collection while the other is not. Have you tried assigning that attribute in a different way? Sigil's internal bs4 version has many modifications to work on older Python 3 versions back to 3.4, so it may be using different types than a recent BS4 version that only runs on a limited set of Python3 versions. |
03-28-2021, 12:53 PM | #5 | |
Sigil Developer
Posts: 8,160
Karma: 5450818
Join Date: Nov 2009
Device: many
|
I did notice this:
Quote:
Last edited by KevinH; 03-28-2021 at 12:57 PM. |
|
03-28-2021, 01:03 PM | #6 | |
Sigil Developer
Posts: 8,160
Karma: 5450818
Join Date: Nov 2009
Device: many
|
Here are alternative ways to add an attribute ...
Quote:
|
|
03-28-2021, 01:18 PM | #7 |
Sigil Developer
Posts: 8,160
Karma: 5450818
Join Date: Nov 2009
Device: many
|
I took a peek at the latest BS4 source at launchpad and they have changed how they handle passing the attrs attribute.
So doing it in two steps will be more compliant with other versions of both bs4 and python3 implementations. |
03-28-2021, 01:21 PM | #8 | |
Member
Posts: 15
Karma: 10
Join Date: Dec 2020
Device: epub
|
Quote:
Last edited by ebray187; 03-28-2021 at 01:25 PM. |
|
03-28-2021, 01:28 PM | #9 |
Sigil Developer
Posts: 8,160
Karma: 5450818
Join Date: Nov 2009
Device: many
|
There is a fully html5 compliant gumbo parser already there as well as a very simple serial parser called quickparser in place, and there is also a html5lib parser as well that is guaranteed to be there in for use by Sigil plugins.
Surely one of those will do what you need. As for using bs4 as long as you split the new_tag creation from attribute addition in that piece, it does work on all versions of BS4 and back to Python 3.4. |
03-28-2021, 01:31 PM | #10 |
Grand Sorcerer
Posts: 28,040
Karma: 199464182
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
It (the colon) should just be a string when used as an attribute name.
tag["xml:lang"] = "la" to be more compatible with all version of BeautifulSoup |
03-28-2021, 01:41 PM | #11 | |
Member
Posts: 15
Karma: 10
Join Date: Dec 2020
Device: epub
|
Quote:
Thanks KevinH for your help. Last edited by ebray187; 03-28-2021 at 01:43 PM. |
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom column from tags | jelzin | Library Management | 4 | 03-15-2021 03:30 PM |
Custom Tags Disappearing | wolffe | Library Management | 3 | 01-05-2019 05:48 PM |
custom columns from only certain tags | areyou | Library Management | 2 | 12-15-2012 06:33 AM |
Custom columns vs tags | Artha | Calibre | 3 | 11-22-2011 10:25 AM |
Help with template for custom column from tags | africalass | Library Management | 2 | 07-16-2011 12:47 PM |