Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 02-09-2009, 07:22 PM   #46
Valloric
Created Sigil, FlightCrew
Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.
 
Valloric's Avatar
 
Posts: 1,982
Karma: 350515
Join Date: Feb 2008
Device: Kobo Clara HD
Quote:
Originally Posted by Komenor View Post
Ok, but if the user can edit the source code, it's harder to certify the quality of the ePub output. We can use Tidy, of course, but it is an additional pre-requirement to the application...
You cannot guarantee that your application's output will be valid epub. Not in any realistic (and useful) editor.

Let me elaborate...

Any ebook editor needs to able to import (X)HTML. That's a given. If it's a good editor, then it will handle a lot more than just HTML, but let's stick to just that for now.

OK, so the application accepts an HTML file. Is the file valid HTML? You can't make that a precondition. I'm sorry, you just can't. Most HTML out there is nowhere near being valid, and the user could need to import HTML he didn't write himself.

So the app needs to accept invalid HTML, that is, HTML that display OK on a modern browser but that does not follow the required standards. And with that, you just blew any possibility of having a guarantee that the epub you export will always be standards compliant.

Why?

Well, you can't design a useful algorithm that accepts invalid HTML and outputs valid HTML. A useful algorithm would have these requirements, for any input:

1. Always output valid HTML.
2. The resultant HTML would always correctly represent the content of the original HTML and the intent of its author.

The first one is easy. If you remove the second one, for any input, just output whatever you like. But with the second requirement, you get a specification that cannot be fulfilled by any implementation, because it's incomputable.

Now, you could design an algorithm that fulfills both requirements for some input, but not for all. And no, not even Tidy can give you that, because it is theoretically impossible.

So you're stuck now. You can't guarantee your users that you will always output a valid epub file no matter what they import. You can do your best (and you should), but in the end... The second requirement is much more important than the first one. So you fix what you can and possibly tell the user about what you can't.

If they really care about producing a valid epub file, they will have to fix the errors your app can't fix themselves. And so you make it easy for them and give them access to the source. And if they introduce any errors whilst editing the source, it's their fault. They will probably have to fix it by editing the source, too.

Now if you wanted an editor that could only create epub files from scratch, then you could guarantee standard compliance if you disallow direct source code editing. But you don't want to make that kind of editor.

Your output can only be as good as the input (maybe slightly better, for trivial errors in the original file). The editor can't turn shit into gold, and can not give guarantees about compliance. Any that does is flat-out lying.

Last edited by Valloric; 02-09-2009 at 07:26 PM.
Valloric is offline   Reply With Quote
Old 02-10-2009, 12:01 AM   #47
llasram
Reticulator of Tharn
llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.
 
llasram's Avatar
 
Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
Quote:
Originally Posted by Valloric View Post
1. Always output valid HTML.
2. The resultant HTML would always correctly represent the content of the original HTML and the intent of its author.

The first one is easy. If you remove the second one, for any input, just output whatever you like. But with the second requirement, you get a specification that cannot be fulfilled by any implementation, because it's incomputable.
Well, would depend on what you meant by "represent the content of the original HTML." It would be fairly easy to strip all semantic tag information from source HTML and translate into it into nothing but <div/>, <span/>, <a/>, and <img/> tags with appropriate CSS. That would make it trivial to output valid XHTML which retained exactly the same formatting characteristics as specified by the author.
llasram is offline   Reply With Quote
Advert
Old 02-10-2009, 05:25 AM   #48
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,536
Karma: 19000001
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
The program could accept invalid (X)HTML, and issue a warning if the final (X)HTML does not validate.
Jellby is offline   Reply With Quote
Old 02-10-2009, 09:42 AM   #49
Valloric
Created Sigil, FlightCrew
Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.
 
Valloric's Avatar
 
Posts: 1,982
Karma: 350515
Join Date: Feb 2008
Device: Kobo Clara HD
Quote:
Originally Posted by llasram View Post
Well, would depend on what you meant by "represent the content of the original HTML." It would be fairly easy to strip all semantic tag information from source HTML and translate into it into nothing but <div/>, <span/>, <a/>, and <img/> tags with appropriate CSS. That would make it trivial to output valid XHTML which retained exactly the same formatting characteristics as specified by the author.
Again, this would work for some input, but not for all. I also put "the intent of the author" in that prerequisite too. The author of the original file could write relatively complex HTML that does not validate and that you could not convert into standards compliant XHTML which faithfully represents the input file.

There's really no point discussing it, this is computer science 101: conversion of input from one language with non-deterministic rules (that is, non-validating HTML) to another with deterministic rules (standards compliant XHTML) whilst keeping all of the source information. An algorithm to perform this conversion for all input cannot be designed. It is theoretically impossible.

But that doesn't mean the application can't fix some errors and output valid XHTML. I'm just saying you can't guarantee compliance and not have to mangle the input in some situations. And even then it wouldn't work for some cases.

Quote:
Originally Posted by Jellby View Post
The program could accept invalid (X)HTML, and issue a warning if the final (X)HTML does not validate.
My working idea too. Fix what you can, inform about what you can't, but don't mangle the input in any way or form. It is more important to guarantee to the user that you won't make some tiny change half-way through the novel he's importing than it is to guarantee standards compliance.

You can't piss off your users by trying to twist and turn their HTML into something it can't automatically become.
Valloric is offline   Reply With Quote
Old 02-10-2009, 11:15 AM   #50
llasram
Reticulator of Tharn
llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.
 
llasram's Avatar
 
Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
Quote:
Originally Posted by Valloric View Post
Again, this would work for some input, but not for all. I also put "the intent of the author" in that prerequisite too. The author of the original file could write relatively complex HTML that does not validate and that you could not convert into standards compliant XHTML which faithfully represents the input file.

There's really no point discussing it, this is computer science 101: conversion of input from one language with non-deterministic rules (that is, non-validating HTML) to another with deterministic rules (standards compliant XHTML) whilst keeping all of the source information. An algorithm to perform this conversion for all input cannot be designed. It is theoretically impossible.
I really don't understand what you're getting at I'm afraid. I could write "fubby ducky loopy sunbird" and mean "Good morning, how are you?" and there would be no chance of conversion because the intent is all in my mind. With arbitrarily bad HTML the only possible interpretation of the author's intent is how some renderer renders that content. All contemporary HTML renderers use the same CSS box model for all rendering. Converting arbitrarily bad HTML into XHTML which displays the same is simply a matter applying the same rules the browser does in order to produce the box model instance it renders.

XHTML validity is a property of two components: XML validity and adherence to the XHTML schema, yah? Conversion of HTML w/o closing tags to valid XML with complete elements can be tricky, but the browser necessarily does essential the same thing in deciding what content ends up within what boxes. The Python lxml.html library calibre uses does an excellent job, matching for all practical purposes what most Web browsers produce. Producing schema-validating XHTML is where my proposal to strip all semantic tags comes in. CSS-based rendering doesn't care if you have a <div/> within a <p/> or a <sup/> within an <a/>. One just needs to extract the CSS applied to each element, then convert the element tags into ones which validate against the schema.
llasram is offline   Reply With Quote
Advert
Old 02-10-2009, 12:11 PM   #51
Valloric
Created Sigil, FlightCrew
Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.
 
Valloric's Avatar
 
Posts: 1,982
Karma: 350515
Join Date: Feb 2008
Device: Kobo Clara HD
Quote:
Originally Posted by llasram View Post
With arbitrarily bad HTML the only possible interpretation of the author's intent is how some renderer renders that content. All contemporary HTML renderers use the same CSS box model for all rendering.
Quote:
Originally Posted by llasram View Post
The Python lxml.html library calibre uses does an excellent job, matching for all practical purposes what most Web browsers produce.
There is no argument here.

I agree that you could very well design an algorithm that converts non-valid HTML into valid XHTML for most HTML people will write. It's what your "lxml.html" library does (although I've never used it) and it's what Tidy does as well.

But you can't do it for all possible arbitrarily bad HTML. You're assuming the user checked how his source displayed in a browser. If he did, then it's not a matter of parsing arbitrarily bad HTML. It's not a non-deterministic rule system anymore: the source follows the deterministic rendering rules of the browser he used to check his work. Converting from a deterministic language to another deterministic language is certainly possible. And while you could say that the vast majority of HTML authors would do just that (check the display in a browser) before importing, you can't categorically state it.

So let's sum this up... you can create an algorithm that can convert most practical non-conforming HTML into valid XHTML, but not all HTML one could write. If one were to say he could, one would be shoving a grave ignorance of computer science theory.
Valloric is offline   Reply With Quote
Old 02-11-2009, 10:25 AM   #52
Komenor
Enthusiast
Komenor doesn't litterKomenor doesn't litter
 
Komenor's Avatar
 
Posts: 29
Karma: 100
Join Date: Dec 2008
Location: France
Device: Sony PRS-505
Quote:
Originally Posted by Valloric View Post
You cannot guarantee that your application's output will be valid epub. Not in any realistic (and useful) editor.

Let me elaborate...

Any ebook editor needs to able to import (X)HTML. That's a given. If it's a good editor, then it will handle a lot more than just HTML, but let's stick to just that for now.
I never said that my hypothetic editor will be able to import (X)HTML !

If it is only for modifying the fonts, the justification and other text formatting, the editor must only accept to import pure text file and only that.
Then give tools for text formatting (plus eventually "tables" and "pictures" support).

It is a choice : a "poor" editor with certified XHTML/ePub output or a good editor with no certification (or warnings on bad inputs).
Komenor is offline   Reply With Quote
Old 02-11-2009, 01:46 PM   #53
Valloric
Created Sigil, FlightCrew
Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.
 
Valloric's Avatar
 
Posts: 1,982
Karma: 350515
Join Date: Feb 2008
Device: Kobo Clara HD
Quote:
Originally Posted by Komenor View Post
It is a choice : a "poor" editor with certified XHTML/ePub output or a good editor with no certification (or warnings on bad inputs).
A "good" editor would embed some sort of validation of the final epub file. So if you don't get a warning when exporting, you're in the clear. And most of the time, the editor will be able to convert the user's non-conforming HTML into conforming XHTML.

Here's several use cases:

1. The user imports valid HTML. It is easily converted into XHTML. He then makes certain edits, and tries to export the book as an epub file. The epub file is created, the validator runs through it and finds no errors. All is well in ebook land.

2. The user imports invalid HTML. An algorithm tries to correct the input and create valid XHTML, and succeeds. The user then makes certain edits, and tries to export the book as an epub file. The epub file is created, the validator runs through it and finds no errors. All is well in ebook land.

3. The user imports invalid HTML. An algorithm tries to correct the input and create valid XHTML, and does not succeed: errors are thrown, the user is informed. The user opens the source view and tries to fix the problems. The user then makes certain other edits, and tries to export the book as an epub file. The epub file is created, the validator runs through it and finds no errors. All is well in ebook land.

4. The user imports invalid HTML. An algorithm tries to correct the input and create valid XHTML, and does not succeed: errors are thrown, the user is informed. The user opens the source view and tries to fix the problems. The user then makes certain other edits, and tries to export the book as an epub file. The epub file is created, the validator runs through it and finds errors. The user is informed, but the file remains--maybe the user doesn't care (if it's a file for personal use... who knows). If he does care, he makes more changes, and tries to export the file. The change/export process repeats until no errors are thrown.

So you see, the user can get an epub file that is certifiably valid.

Last edited by Valloric; 02-11-2009 at 02:13 PM. Reason: typo
Valloric is offline   Reply With Quote
Old 02-12-2009, 05:59 PM   #54
mtravellerh
book creator
mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.
 
mtravellerh's Avatar
 
Posts: 9,656
Karma: 3856660
Join Date: Oct 2008
Location: Luxembourg
Device: Kindle Scribe
Now THAT makes sense. Can't wait for that piece of software, honestly!
mtravellerh is offline   Reply With Quote
Old 02-13-2009, 01:25 PM   #55
Timoleon
Time Enough at Last
Timoleon ought to be getting tired of karma fortunes by now.Timoleon ought to be getting tired of karma fortunes by now.Timoleon ought to be getting tired of karma fortunes by now.Timoleon ought to be getting tired of karma fortunes by now.Timoleon ought to be getting tired of karma fortunes by now.Timoleon ought to be getting tired of karma fortunes by now.Timoleon ought to be getting tired of karma fortunes by now.Timoleon ought to be getting tired of karma fortunes by now.Timoleon ought to be getting tired of karma fortunes by now.Timoleon ought to be getting tired of karma fortunes by now.Timoleon ought to be getting tired of karma fortunes by now.
 
Timoleon's Avatar
 
Posts: 387
Karma: 1151316
Join Date: Feb 2008
Location: New England
Device: iPad 3, iPhone 5, Kindle 3, Fire, Sony PRS-350
Valloric's comments #53 should be used as a touchstone for any decent ePub editor. Great analysis and synopsis!
Timoleon is offline   Reply With Quote
Old 02-13-2009, 01:39 PM   #56
GeoffC
Chocolate Grasshopper ...
GeoffC ought to be getting tired of karma fortunes by now.GeoffC ought to be getting tired of karma fortunes by now.GeoffC ought to be getting tired of karma fortunes by now.GeoffC ought to be getting tired of karma fortunes by now.GeoffC ought to be getting tired of karma fortunes by now.GeoffC ought to be getting tired of karma fortunes by now.GeoffC ought to be getting tired of karma fortunes by now.GeoffC ought to be getting tired of karma fortunes by now.GeoffC ought to be getting tired of karma fortunes by now.GeoffC ought to be getting tired of karma fortunes by now.GeoffC ought to be getting tired of karma fortunes by now.
 
GeoffC's Avatar
 
Posts: 27,599
Karma: 20821184
Join Date: Mar 2008
Location: Scotland
Device: Muse HD , Cybook Gen3 , Pocketbook 302 (Black) , Nexus 10: wife has PW
Seemingly complex, though?
GeoffC is offline   Reply With Quote
Old 02-13-2009, 05:36 PM   #57
Valloric
Created Sigil, FlightCrew
Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.
 
Valloric's Avatar
 
Posts: 1,982
Karma: 350515
Join Date: Feb 2008
Device: Kobo Clara HD
Quote:
Originally Posted by GeoffC View Post
Seemingly complex, though?
It's as complex as it needs to be. If you remove something, you negatively impact the quality and usefulness of the editor.

From the programmer's perspective though, it is fairly complex. But the user doesn't care about that, does he? Of course he doesn't, nor should he.
Valloric is offline   Reply With Quote
Old 02-14-2009, 05:48 AM   #58
GeoffC
Chocolate Grasshopper ...
GeoffC ought to be getting tired of karma fortunes by now.GeoffC ought to be getting tired of karma fortunes by now.GeoffC ought to be getting tired of karma fortunes by now.GeoffC ought to be getting tired of karma fortunes by now.GeoffC ought to be getting tired of karma fortunes by now.GeoffC ought to be getting tired of karma fortunes by now.GeoffC ought to be getting tired of karma fortunes by now.GeoffC ought to be getting tired of karma fortunes by now.GeoffC ought to be getting tired of karma fortunes by now.GeoffC ought to be getting tired of karma fortunes by now.GeoffC ought to be getting tired of karma fortunes by now.
 
GeoffC's Avatar
 
Posts: 27,599
Karma: 20821184
Join Date: Mar 2008
Location: Scotland
Device: Muse HD , Cybook Gen3 , Pocketbook 302 (Black) , Nexus 10: wife has PW
It was, of course, the complexity of the programmers task that I was referring to.
GeoffC is offline   Reply With Quote
Old 02-23-2009, 07:01 AM   #59
richardigp
Enthusiast
richardigp began at the beginning.
 
Posts: 27
Karma: 18
Join Date: Dec 2008
Location: Currently living in Pune India
Device: Sony
We just threw an Open Office to ePub Convertor into fray. It goes by the name of eScape. It does most of the advanced styling and formatting that is on the wish list above. Auto generation of OPF, NCX, etc. and free form modification of Stylesheets to create a book the way you want it to look. You can read about it and try it here. It's completely free for non-commercial use, but not Open Source.

It's a different approach. Rather than try and interprete endless inline and para styles, we define custom Structure-Styles and you have to put those on. There is a growing online tutorial here, so you can see if you can live with this different approach.

There are about 30 styles including drop & raised caps, small-caps, and lots of other blocks like epigraph, extract, notebox, code, boxed text, poem, notes, references, etc. All major book sections are predefined. If you want to comment, suggest please do so at our Publishing With XML blog.
richardigp is offline   Reply With Quote
Old 02-23-2009, 03:31 PM   #60
Valloric
Created Sigil, FlightCrew
Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.
 
Valloric's Avatar
 
Posts: 1,982
Karma: 350515
Join Date: Feb 2008
Device: Kobo Clara HD
Quote:
Originally Posted by richardigp View Post
It's a different approach. Rather than try and interprete endless inline and para styles, we define custom Structure-Styles and you have to put those on.
Questions:

1. How do you convert existing epub books to your format? Is it even possible to load existing epub books and edit them?

2. How do you guarantee display fidelity? Last time I checked, OpenOffice.org did not have an advanced XHTML renderer.

3. SVG? OO.org doesn't support it. Do you?

4. How do you handle the "longdesc" attribute? Do you support it?

5. Object tags?

6. DTBook?

7. XML islands?

8. Font embedding?

These are just from the top of my head. Haven't yet had the time to try out eScape, but I'm going to.
Valloric is offline   Reply With Quote
Reply

Tags
epub application, epub creation, epub editor, wishlist


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
tools for epub creation Toxaris ePub 15 03-05-2010 05:54 AM
creation tools for Mac? nathanb Workshop 1 09-11-2009 11:33 PM
on-the-fly epub creation ilovejedd ePub 19 04-16-2009 08:36 PM
Half of book missing after running converter tools, ideas? ficbot Other formats 0 04-11-2009 01:42 PM
epub creation tools jbenny ePub 20 03-13-2009 01:30 PM


All times are GMT -4. The time now is 05:59 PM.


MobileRead.com is a privately owned, operated and funded community.