Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 08-06-2024, 08:08 AM   #1
ptm@xact.co.uk
Junior Member
ptm@xact.co.uk began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Aug 2024
Device: none
Question XHTML closing tag issue

Simply by parsing and committing an xhtml file, using the container, I see the following changes to the content.

Before parsing:
<span x="y"/>
<div id="w"/>

After committing:
<span x="y"></span>
<div id="w"></div>

Is there a 'tweak' setting or other method I can use to preserve the compact version of for tags with only attributes and no content?
ptm@xact.co.uk is offline   Reply With Quote
Old 08-06-2024, 08:43 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,708
Karma: 24967300
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
No there isn't, IIRC.
kovidgoyal is offline   Reply With Quote
Advert
Old 08-07-2024, 12:40 PM   #3
ptm@xact.co.uk
Junior Member
ptm@xact.co.uk began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Aug 2024
Device: none
I have not looked at the Calibre code. Hopefully there is an option on the serializer to use compact closing tags (i.e. <sometag/>) instead of <sometag></sometag>. If this is the case, how difficult do you think it would be to add a tweak to set this option? Is this something you or I can look at? I would need a few hints where to start looking.
ptm@xact.co.uk is offline   Reply With Quote
Old 08-07-2024, 12:44 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,708
Karma: 24967300
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
This is done deliberately, see line 441 in oeb/base.py
kovidgoyal is offline   Reply With Quote
Old 08-07-2024, 02:20 PM   #5
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 77,187
Karma: 138591138
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by ptm@xact.co.uk View Post
I have not looked at the Calibre code. Hopefully there is an option on the serializer to use compact closing tags (i.e. <sometag/>) instead of <sometag></sometag>. If this is the case, how difficult do you think it would be to add a tweak to set this option? Is this something you or I can look at? I would need a few hints where to start looking.
Most tags are <sometag>Text from the book</sometag>. That's why it should not be done. <sometag/>This is text from the book will not work.
JSWolf is offline   Reply With Quote
Advert
Old 08-08-2024, 07:25 AM   #6
ptm@xact.co.uk
Junior Member
ptm@xact.co.uk began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Aug 2024
Device: none
Quote:
Originally Posted by kovidgoyal View Post
This is done deliberately, see line 441 in oeb/base.py
I see the comment about some browser based renderers. I wonder if this is still true. I've no idea when this code was written. But my boss needs self closing tags. So if, at line 160 we had:

Code:
def close_self_closing_tags(raw):
    if self_closing_tags_permitted():
        return raw
    for_unicode = isinstance(raw, str)
    repl = as_string_type(r'<\g<tag>\g<arg>></\g<tag>>', for_unicode)
    pat = self_closing_pat(for_unicode)
    return pat.sub(repl, raw)
And in 'tweaks', wherever they are coded, the self_closing_tags_permitted() function is added. Returning false by default, but true if so configured by the user. Then we get the best of both worlds. Am I on the right track?
ptm@xact.co.uk is offline   Reply With Quote
Old 08-08-2024, 08:54 AM   #7
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,708
Karma: 24967300
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I'm not going to change that as it is a completely cosmetic change that has the potential to break rendering in the real world. You are welcome to run calibre from source and make the change for yourself.
kovidgoyal is offline   Reply With Quote
Old 08-08-2024, 10:47 AM   #8
ptm@xact.co.uk
Junior Member
ptm@xact.co.uk began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Aug 2024
Device: none
Quote:
Originally Posted by kovidgoyal View Post
I'm not going to change that as it is a completely cosmetic change that has the potential to break rendering in the real world. You are welcome to run calibre from source and make the change for yourself.
May I ask you to reconsider? The Client using our plugin says it is a requirement. They suggest <tag></tag> (with no text between) is badly formed XML. The suggestion in my previous post would maintain backwards compatibility. So I do not believe it would affect anyone, unless they use the 'tweak' setting.

Implementing such a change myself (apart from not having a Linux system to do the build) would mean our Client could no longer install any future Calibre updates.

Fingers crossed, best regards
Paul
ptm@xact.co.uk is offline   Reply With Quote
Old 08-08-2024, 11:14 AM   #9
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,708
Karma: 24967300
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
<tag></tag> is perfectly well formed XML. I suggest you tell your client to read the XML specifications.

You don't need Linux to do a build, indeed you don't need to do a build at all. calibre can run from source without needing building see https://manual.calibre-ebook.com/develop.html

I am not going to accept this change in calibre as it has the potential to break things for people for no actual benefit to any calibre user.
kovidgoyal is offline   Reply With Quote
Old 08-08-2024, 05:08 PM   #10
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 41,615
Karma: 161499388
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Is that first line in your sample real? The x in x="y" is not a valid attribute.

Code:
Before parsing:
<span x="y"/>
<div id="w"/>

After committing:
<span x="y"></span>
<div id="w"></div>
DNSB is offline   Reply With Quote
Old 08-08-2024, 05:54 PM   #11
Quoth
Reading till the spring
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 12,642
Karma: 96386975
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
Quote:
Originally Posted by ptm@xact.co.uk View Post
Simply by parsing and committing an xhtml file, using the container, I see the following changes to the content.

Before parsing:
<span x="y"/>
<div id="w"/>

After committing:
<span x="y"></span>
<div id="w"></div>

Is there a 'tweak' setting or other method I can use to preserve the compact version of for tags with only attributes and no content?
It's only historic accident that regular content tags can be self closing.

First line before opening <html>
<?xml version='1.0' encoding='utf-8'?>
Only inside <head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<meta otherstuff />
<link rel="stylesheet" type="text/css" href="stylesheet.css"/>
</head
Inside the <body >
Valid tags for ebooks that are self closing:

<hr />
<br />

<img src="whatever" alt="sometext"> and height & width should be via a class.
maybe <col /> to set properties of a table column, but tables in ebooks are tricky
</body>

An ebook isn't a web page, nor an XML application for a browser.

Such a setting in general is crazy.
These are not converted self closing tags, but tags with no content.
Code:
<span x="y"></span>
	<div id="w"></div>
This is obsolete for empty tags:
Code:
<span x="y"/>
	<div id="w"/>
As ebooks are "zipped", it saves nothing. Compact (as with clever C that should take 3 lines fitted as one statement) is not always a good idea.
Quote:
The HTML WG has discussed this issue: the intention was to allow old (HTML-only) browsers to accept XHTML 1.0 documents by following the guidelines,
So, yes, every browser will accept <div />, <span /> etc if the doc type allows it, but it's actually now wrong and totally inappropriate for ebooks.
mobi is HTML3
azw3 & epub2 is HTML5
epub3 is extended.


Here is a list of self closing tags for web pages. The ones used in ebooks listed above and in bold. <col /> should be avoided in ebooks.
Code:
Name	Description
< !DOCTYPE html>	Special Case, never use a forward slash. Used to tell the browser what type of document you are using.
 <br />	Used to create a breaking space on a page. Content following this tag will be moved to the next line in the browser.
<area />	Used for image mapping, making specified areas of an image clickable by associating a hyperlink
<base />	Creates a base url for all other hyperlinks on a given webpage so that relative urls can be used. The base element is placed inside the head.
<col />	This element represents a column or several columns in a column group
 <command />	 A multipurpose element for representing commands.
 <hr />	 Horizontal rule – creates a horizontal line across the page.
 <embed />	 Used to embed multimedia objects such as a video into a page.
 <input />	 This element allows a visitor to enter information such as username, password, email address etc…
 <link />	 Used to link external pages to a webpage such as external style sheets.
 <wbr />	 Word Break Opportunity. Specifies where in a line of text it would be OK to break into the next line on different screen sizes.
 <track />	 Used in conjunction with audio or video to add subtitle or caption tracks.
 <meta />	 A multipurpose element used to represent metadata. Metadata is information about data.
 <param />	 Defines parameters for object elements such as audio or video
 <source />	 Allows multiple media sources to be specified for audio and video elements.
 <keygen />	 Control used to generate a key pair (public – private) and allows the public key to be submitted to the user.
 <img />	 Image – Used to add images  to a web page.
This isn't definitive.

However while <span /> <div /> etc can be legal in all browsers in web pages, I've never seen such in ebooks in 12 years and never used them in 30 years of web site editing.
Quoth is offline   Reply With Quote
Old 08-08-2024, 06:08 PM   #12
Quoth
Reading till the spring
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 12,642
Karma: 96386975
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
Quote:
Originally Posted by ptm@xact.co.uk View Post
They suggest <tag></tag> (with no text between) is badly formed XML. The suggestion in my previous post would maintain backwards compatibility. So I do not believe it would affect anyone, unless they use the 'tweak' setting.
No, it's not badly formed XML. Depending on the properties of <tag> it might be pointless and do nothing, but that's not the same thing as malformed.

It might confuse people and break ebook conversions.
Quoth is offline   Reply With Quote
Old 08-08-2024, 06:23 PM   #13
Quoth
Reading till the spring
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 12,642
Karma: 96386975
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
Self closing tags are also alternatively known as void tags, empty tags, singletons tags, etc. i.e these tags do not have contents and also can not have any child.
Container tags can be empty <tag></tag>

The Void / Empty / Singleton / Inherently self closing tags may end in />, but some do not.
They can't be written as an opening and closing tag. Illegal (malformed) XML/HTML
<hr></hr>
<br></br>
<col></col>


So container tags can be <tag></tag>, i.e. have no content.
But singleton tags are never a pair of opening and closing tags, they are self closing, complete, though some end in /> and some contexts will accept <br> and <hr>.

The comment is a special case of void or singleton
<!-- This is a comment -->

But this also is a valid comment tag. All the enclosed HTML is disabled.

Code:
<!-- 
<ul class="top_menu_right">
			<li>
				<form action="search.php" method="post">
				<input type="hidden" name="do" value="process" />
				<input type="hidden" name="showposts" value="0" />
				<input type="hidden" name="childforums" value="1" />
				<input type="hidden" name="securitytoken" value="1723152374-234a53b9ef86581494348ed4767ee7d9f993eecf" />
				<input type="hidden" name="s" value="" />
				<input type="text" class="button" name="query" size="20" style="width:120px" />
				<input name="search" value="Search" type="submit" class="button" style="width: 5em;" />
				</form>
			</li>
			</ul>
-->
An ebook can have comments, but unlike web pages, shouldn't need them.

Last edited by Quoth; 08-08-2024 at 06:32 PM.
Quoth is offline   Reply With Quote
Old 08-08-2024, 06:46 PM   #14
Quoth
Reading till the spring
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 12,642
Karma: 96386975
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
A further search
Quote:
They're called "void" elements in HTML 5. They're listed in the official W3 spec.

A void element is an element whose content model never allows it to have contents under any circumstances.

As of April 2013, they are:

area, base, br, col, command, embed, hr, img, input, keygen, link, meta, param, source, track, wbr

As of December 2018 (HTML 5.2), they are:

area, base, br, col, embed, hr, img, input, link, meta, param, source, track, wbr
Void type elements are never <tag></tag>
Container elements with no content may work as <tag />, but it's as likely to break in a web browser, hence ALWAYS use <tag></tag> even if no content, such as <script type="text/javascript" src="external.js">< /script> because it will break without a separate closing tag.
NEVER <script type="text/javascript" src="external.js" />
Most ereaders don't support script at all.

As noted earlier, ebooks only support a subset of void elements.
Quoth is offline   Reply With Quote
Old 08-08-2024, 06:59 PM   #15
Quoth
Reading till the spring
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 12,642
Karma: 96386975
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
And finally!
Quote:
C.3. Element Minimization and Empty Element Content
Given an empty instance of an element whose content model is not EMPTY (for example, an empty title or paragraph) do not use the minimized form (e.g. use <p></p> and not <p />).
from https://www.w3.org/TR/xhtml1/#dtds

Using <title /> instead of <title></title> actually causes a blank page with all the content <body>stuff</body> missing and only visible via view code on some browsers.

But remember, ebook CSS and HTML (and XML) is a special subset, even if rendered by a web page renderer, which might render stuff that breaks totally on real ereaders.
Quoth is offline   Reply With Quote
Reply

Tags
xhtml tags


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Inserting self closing tags (xhtml) roger64 Editor 2 03-04-2018 06:49 AM
TOC nav.xhtml issue ebookscovers Conversion 1 05-06-2017 12:12 PM
questions on self-closing tags and legal xhtml in epubs KevinH ePub 5 04-23-2012 11:12 PM
Issue closing ebook reader in Linux hairybiker Calibre 8 05-31-2011 03:17 PM
Issue with converting xhtml to epub using Calibre sg31 Calibre 0 10-20-2009 07:26 PM


All times are GMT -4. The time now is 12:17 AM.


MobileRead.com is a privately owned, operated and funded community.