08-06-2024, 08:08 AM | #1 |
Junior Member
Posts: 4
Karma: 10
Join Date: Aug 2024
Device: none
|
XHTML closing tag issue
Simply by parsing and committing an xhtml file, using the container, I see the following changes to the content.
Before parsing: <span x="y"/> <div id="w"/> After committing: <span x="y"></span> <div id="w"></div> Is there a 'tweak' setting or other method I can use to preserve the compact version of for tags with only attributes and no content? |
08-06-2024, 08:43 AM | #2 |
creator of calibre
Posts: 44,668
Karma: 24966646
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
No there isn't, IIRC.
|
Advert | |
|
08-07-2024, 12:40 PM | #3 |
Junior Member
Posts: 4
Karma: 10
Join Date: Aug 2024
Device: none
|
I have not looked at the Calibre code. Hopefully there is an option on the serializer to use compact closing tags (i.e. <sometag/>) instead of <sometag></sometag>. If this is the case, how difficult do you think it would be to add a tweak to set this option? Is this something you or I can look at? I would need a few hints where to start looking.
|
08-07-2024, 12:44 PM | #4 |
creator of calibre
Posts: 44,668
Karma: 24966646
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
This is done deliberately, see line 441 in oeb/base.py
|
08-07-2024, 02:20 PM | #5 | |
Resident Curmudgeon
Posts: 77,028
Karma: 138588794
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
|
|
Advert | |
|
08-08-2024, 07:25 AM | #6 |
Junior Member
Posts: 4
Karma: 10
Join Date: Aug 2024
Device: none
|
I see the comment about some browser based renderers. I wonder if this is still true. I've no idea when this code was written. But my boss needs self closing tags. So if, at line 160 we had:
Code:
def close_self_closing_tags(raw): if self_closing_tags_permitted(): return raw for_unicode = isinstance(raw, str) repl = as_string_type(r'<\g<tag>\g<arg>></\g<tag>>', for_unicode) pat = self_closing_pat(for_unicode) return pat.sub(repl, raw) |
08-08-2024, 08:54 AM | #7 |
creator of calibre
Posts: 44,668
Karma: 24966646
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I'm not going to change that as it is a completely cosmetic change that has the potential to break rendering in the real world. You are welcome to run calibre from source and make the change for yourself.
|
08-08-2024, 10:47 AM | #8 | |
Junior Member
Posts: 4
Karma: 10
Join Date: Aug 2024
Device: none
|
Quote:
Implementing such a change myself (apart from not having a Linux system to do the build) would mean our Client could no longer install any future Calibre updates. Fingers crossed, best regards Paul |
|
08-08-2024, 11:14 AM | #9 |
creator of calibre
Posts: 44,668
Karma: 24966646
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
<tag></tag> is perfectly well formed XML. I suggest you tell your client to read the XML specifications.
You don't need Linux to do a build, indeed you don't need to do a build at all. calibre can run from source without needing building see https://manual.calibre-ebook.com/develop.html I am not going to accept this change in calibre as it has the potential to break things for people for no actual benefit to any calibre user. |
08-08-2024, 05:08 PM | #10 |
Bibliophagist
Posts: 41,442
Karma: 158775170
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Is that first line in your sample real? The x in x="y" is not a valid attribute.
Code:
Before parsing: <span x="y"/> <div id="w"/> After committing: <span x="y"></span> <div id="w"></div> |
08-08-2024, 05:54 PM | #11 | ||
Reading till the spring
Posts: 12,571
Karma: 94058919
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
Quote:
First line before opening <html> <?xml version='1.0' encoding='utf-8'?> Only inside <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> <meta otherstuff /> <link rel="stylesheet" type="text/css" href="stylesheet.css"/> </head Inside the <body > Valid tags for ebooks that are self closing: <hr /> <br /> <img src="whatever" alt="sometext"> and height & width should be via a class. maybe <col /> to set properties of a table column, but tables in ebooks are tricky </body> An ebook isn't a web page, nor an XML application for a browser. Such a setting in general is crazy. These are not converted self closing tags, but tags with no content. Code:
<span x="y"></span> <div id="w"></div> Code:
<span x="y"/> <div id="w"/> Quote:
mobi is HTML3 azw3 & epub2 is HTML5 epub3 is extended. Here is a list of self closing tags for web pages. The ones used in ebooks listed above and in bold. <col /> should be avoided in ebooks. Code:
Name Description < !DOCTYPE html> Special Case, never use a forward slash. Used to tell the browser what type of document you are using. <br /> Used to create a breaking space on a page. Content following this tag will be moved to the next line in the browser. <area /> Used for image mapping, making specified areas of an image clickable by associating a hyperlink <base /> Creates a base url for all other hyperlinks on a given webpage so that relative urls can be used. The base element is placed inside the head. <col /> This element represents a column or several columns in a column group <command /> A multipurpose element for representing commands. <hr /> Horizontal rule – creates a horizontal line across the page. <embed /> Used to embed multimedia objects such as a video into a page. <input /> This element allows a visitor to enter information such as username, password, email address etc… <link /> Used to link external pages to a webpage such as external style sheets. <wbr /> Word Break Opportunity. Specifies where in a line of text it would be OK to break into the next line on different screen sizes. <track /> Used in conjunction with audio or video to add subtitle or caption tracks. <meta /> A multipurpose element used to represent metadata. Metadata is information about data. <param /> Defines parameters for object elements such as audio or video <source /> Allows multiple media sources to be specified for audio and video elements. <keygen /> Control used to generate a key pair (public – private) and allows the public key to be submitted to the user. <img /> Image – Used to add images to a web page. However while <span /> <div /> etc can be legal in all browsers in web pages, I've never seen such in ebooks in 12 years and never used them in 30 years of web site editing. |
||
08-08-2024, 06:08 PM | #12 | |
Reading till the spring
Posts: 12,571
Karma: 94058919
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
Quote:
It might confuse people and break ebook conversions. |
|
08-08-2024, 06:23 PM | #13 |
Reading till the spring
Posts: 12,571
Karma: 94058919
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
Self closing tags are also alternatively known as void tags, empty tags, singletons tags, etc. i.e these tags do not have contents and also can not have any child.
Container tags can be empty <tag></tag> The Void / Empty / Singleton / Inherently self closing tags may end in />, but some do not. They can't be written as an opening and closing tag. Illegal (malformed) XML/HTML <hr></hr> <br></br> <col></col> So container tags can be <tag></tag>, i.e. have no content. But singleton tags are never a pair of opening and closing tags, they are self closing, complete, though some end in /> and some contexts will accept <br> and <hr>. The comment is a special case of void or singleton <!-- This is a comment --> But this also is a valid comment tag. All the enclosed HTML is disabled. Code:
<!-- <ul class="top_menu_right"> <li> <form action="search.php" method="post"> <input type="hidden" name="do" value="process" /> <input type="hidden" name="showposts" value="0" /> <input type="hidden" name="childforums" value="1" /> <input type="hidden" name="securitytoken" value="1723152374-234a53b9ef86581494348ed4767ee7d9f993eecf" /> <input type="hidden" name="s" value="" /> <input type="text" class="button" name="query" size="20" style="width:120px" /> <input name="search" value="Search" type="submit" class="button" style="width: 5em;" /> </form> </li> </ul> --> Last edited by Quoth; 08-08-2024 at 06:32 PM. |
08-08-2024, 06:46 PM | #14 | |
Reading till the spring
Posts: 12,571
Karma: 94058919
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
A further search
Quote:
Container elements with no content may work as <tag />, but it's as likely to break in a web browser, hence ALWAYS use <tag></tag> even if no content, such as <script type="text/javascript" src="external.js">< /script> because it will break without a separate closing tag. NEVER <script type="text/javascript" src="external.js" /> Most ereaders don't support script at all. As noted earlier, ebooks only support a subset of void elements. |
|
08-08-2024, 06:59 PM | #15 | |
Reading till the spring
Posts: 12,571
Karma: 94058919
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
And finally!
Quote:
Using <title /> instead of <title></title> actually causes a blank page with all the content <body>stuff</body> missing and only visible via view code on some browsers. But remember, ebook CSS and HTML (and XML) is a special subset, even if rendered by a web page renderer, which might render stuff that breaks totally on real ereaders. |
|
Tags |
xhtml tags |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Inserting self closing tags (xhtml) | roger64 | Editor | 2 | 03-04-2018 06:49 AM |
TOC nav.xhtml issue | ebookscovers | Conversion | 1 | 05-06-2017 12:12 PM |
questions on self-closing tags and legal xhtml in epubs | KevinH | ePub | 5 | 04-23-2012 11:12 PM |
Issue closing ebook reader in Linux | hairybiker | Calibre | 8 | 05-31-2011 03:17 PM |
Issue with converting xhtml to epub using Calibre | sg31 | Calibre | 0 | 10-20-2009 07:26 PM |