XSLT vs InDesign - best conversion process?

KmC · 04-11-2013, 10:35 PM

Hello,

Let me begin by apologizing if this sounds like a naive question, but I wanted to ask whether there are any inherent advantages of using a transform to export XML to EPUB, rather than using InDesign. It's not a simple DocBook-EPUB because the source XML is in a specific kind of XML (based on the NLM [National Library of Medicine] tag library), and thus we would need someone to develop the XSLT specifically for us.
I suggested pouring the source XML into InDesign (since it's already tagged and ready to go) and exporting it that way. It seems that it would be cheaper, easier, faster, and wouldn't require anyone to take evening classes in xProc.
Am I missing something? (I VERY well could be)

Thanks so much for any help provided.

Toxaris · 04-12-2013, 03:00 AM

Just keep in mind that an ePUB exported from InDesign typically needs some touching up. The export is not always correct.

Tex2002ans · 04-12-2013, 06:28 AM

Quote:

Originally Posted by KmC

It's not a simple DocBook-EPUB because the source XML is in a specific kind of XML (based on the NLM [National Library of Medicine] tag library), and thus we would need someone to develop the XSLT specifically for us.

Since it already is in XML, there is no need to introduce some bloated middleman program which could introduce its own errors (InDesign will make some assumptions with the code, and may potentially cause much more work. Plus as Toxaris said, InDesign does not export EPUB correctly in many cases).

I would say the best bet is working directly from the XML -> (Transformation) -> XHTML. From there, you could insert the XHTML into an EPUB using Sigil, and do any other needed tweaks from there (hopefully the Transformation can be created to minimize/eliminate the work to be done).

As long as these XML files all use the same classes, then you can create one master CSS file which can then be used in all books.

As long as all these documents are tagged in a very consistent way, this should be the best bet.

Quote:

Originally Posted by KmC

It seems that it would be cheaper, easier, faster, and wouldn't require anyone to take evening classes in xProc.
Am I missing something? (I VERY well could be)

Would it be possible to give samples of any XML files wanting to be converted?

KmC · 04-12-2013, 08:38 PM

Thank you very much, you are wonderful people. I can't provide actual examples of the XML, given the proprietary nature of the content; however, I don't think any are necessary. You have both answered my question. I suppose I'm just less than comfortable working with XSLT at the moment, and didn't want to have to outsource the actual creation of the transformation, but the advice you've provided is in keeping with the existing workflow, and I have no reason to suspect I know better.

In the meantime, I will invest more time in getting comfortable with XSLT, and try and avoid interfering until I know what I'm talking about.

Thanks again.

dgatwood · 04-13-2013, 06:22 PM

If you are familiar with pretty much any programming language, you might find it easier to use a tree-based (DOM) XML parser, then walk the DOM tree and manipulate it in one or more passes, and write the result out to disk. It's pretty much just like manipulating HTML elements with JavaScript code, if you've ever done that.

If you don't know any programming languages, then XSLT is probably the better of those two choices. Then again, PHP, Perl, Python, and Ruby are all relatively easy to learn (with PHP being probably the most straightforward in terms of having a fairly lightweight and consistent syntax), so maybe it's time to pick up a new hobby.

KmC · 04-16-2013, 12:51 PM

Quote:

Originally Posted by dgatwood

If you are familiar with pretty much any programming language, you might find it easier to use a tree-based (DOM) XML parser, then walk the DOM tree and manipulate it in one or more passes, and write the result out to disk. It's pretty much just like manipulating HTML elements with JavaScript code, if you've ever done that.

If you don't know any programming languages, then XSLT is probably the better of those two choices. Then again, PHP, Perl, Python, and Ruby are all relatively easy to learn (with PHP being probably the most straightforward in terms of having a fairly lightweight and consistent syntax), so maybe it's time to pick up a new hobby.

Sorry, I had walked away from the post, so I didn't see your reply right away. I don't want to be too much of a bother, but I may prefer that option (or at least try and float it) since we could write it in house rather than creating an XSLT. Would you mind elaborating a bit on the first option? I'm good with JavaScript and using the DOM, but I promise not to do this myself. I will leave the programming to the programmers, but I would at least like to understand it better so as to be as helpful as possible. (and maybe try it myself over lunch)

Thanks so much for your help.

Tex2002ans · 04-16-2013, 05:45 PM

Quote:

Originally Posted by KmC

Would you mind elaborating a bit on the first option? I'm good with JavaScript and using the DOM, but I promise not to do this myself. I will leave the programming to the programmers, but I would at least like to understand it better so as to be as helpful as possible. (and maybe try it myself over lunch)

Well you would have to pick whatever programming language you are most comfortable in program in, and then look for an XML parsing library in that language. (Hopefully I understood what dgatwood was saying).

You would then use this library to go through the XML (perhaps convert it into an array of strings), and from there, step through the array and make any changes as needed.

For example, one pass, you might change all <title> into <span class="title">:

Original:

Code:

<title>TitleofSong</title>
<artist>ArtistofSong</artist>

After "Title" pass:

Code:

<span class="title">TitleofSong</span>
<artist>ArtistofSong</artist>

Then you may have an "Artist" method which will add the paragraph tags, and add a colon between Artist and Title.

After "Artist" pass:

Code:

<p><span class="artist">ArtistofSong</span>: <span class="title">TitleofSong</span></p>

Then a "Body" method might add all the stuff to make it an XHTML file:

Code:

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title></title>
  <link href="../Styles/stylesheet.css" rel="stylesheet" type="text/css" />
</head>

<body>
<p><span class="artist">ArtistofSong</span>: <span class="title">TitleofSong</span></p>
</body>
</html>

It is pretty much doing what the XSLT would do, except manually, and in multiple "passes". (If XSLT makes zero sense to you, but you know a programming language well, this is a way to go).

Without having a sample of what XML you are working with, I can only give general (not very helpful in my view) overviews. (Perhaps someone else might be more insightful?)

AlPe · 04-18-2013, 03:58 PM

Quote:

Originally Posted by Tex2002ans

As long as these XML files all use the same classes, then you can create one master CSS file which can then be used in all books.

As long as all these documents are tagged in a very consistent way, this should be the best bet.

I come here to say precisely this. The initial investment of time might be higher than using crappy InDesign, but it pays off as soon as you have more than a couple of eBooks to convert.

Plus, there are DocBook -> EPUB (via XSLT) tools around, which you might find useful to start from. For example:

http://sourceforge.net/projects/docbook/files/epub3/
http://www.ibm.com/developerworks/xm.../section5.html
http://en.wikibooks.org/wiki/XQuery/DocBook_to_ePub

curiousgeorge · 04-18-2013, 04:03 PM

Code:

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title></title>
  <link href="../Styles/stylesheet.css" rel="stylesheet" type="text/css" />
</head>

<body>
<p><span class="artist">ArtistofSong</span>: <span class="title">TitleofSong</span></p>
</body>
</html>

Not trying to mean or pick on you in particular but why do you include the Doctype when you have already declared the doc an XML file? I always exclude the doctype. Also note that you are limited per IDPF on what type of doctype to use encase anyone is wondering. Also,

Code:

standalone="no"

is not needed since it is the default but do not include

Code:

standalone="yes"

Tex2002ans · 04-18-2013, 04:50 PM

Quote:

Originally Posted by AlPe

I come here to say precisely this. The initial investment of time might be higher than using crappy InDesign, but it pays off as soon as you have more than a couple of eBooks to convert.

Plus the "master CSS" file makes it easy as pie to fix any errors you find instead of having to do it on a per-book basis.

Quote:

Originally Posted by AlPe

Plus, there are DocBook -> EPUB (via XSLT) tools around, which you might find useful to start from. For example:

http://sourceforge.net/projects/docbook/files/epub3/
http://www.ibm.com/developerworks/xm.../section5.html
http://en.wikibooks.org/wiki/XQuery/DocBook_to_ePub

Fantastic, thanks for the info. I will have to look into this as well.

I am very interested in the long-term storage of ebooks, and storing them in a way which will make it (relatively) easy/automated to convert into other formats in the future.

Quote:

Originally Posted by curiousgeorge

Not trying to mean or pick on you in particular but why do you include the Doctype when you have already declared the doc an XML file? I always exclude the doctype. [...]

I just quickly pulled it out of an EPUB I was working on as an example to clarify the "multiple passes"... I wasn't trying to make the code as minimal as possible.

curiousgeorge · 04-19-2013, 10:06 AM

Quote:

Originally Posted by Tex2002ans

I just quickly pulled it out of an EPUB I was working on as an example to clarify the "multiple passes"... I wasn't trying to make the code as minimal as possible.

No thats cool. I just see it as a common practice and I think people dont understand what they are doing and I just happened to see you did it and I wanted to know the logic behind it. All good!!

dgatwood · 04-19-2013, 10:03 PM

Quote:

Originally Posted by Tex2002ans

You would then use this library to go through the XML (perhaps convert it into an array of strings), and from there, step through the array and make any changes as needed.

I would avoid converting to/from strings because of all the fun that entities cause when doing so. Leave the DOM objects as a tree and just manipulate the objects to change the tag names, add or remove attributes, etc. But otherwise, yes.

For example, in PHP, you can do something like this:

Code:

<?php

require('phpQuery/phpQuery.php'); // JQuery port to PHP

$doc = new DOMDocument();
if (!$doc->loadXML(file_get_contents("/path/to/file.xml")) {
    // handle error
    exit(1);
}

/* Useful function */
function changeElementTagName($elt, $newTagName)
{
    $newelt = $Document->createElement(newTagName);

    // Clone the element's attributes
    foreach($elt->attributes as $attribute) {
        $newelt->setAttribute($attribute->name, $attribute->value);
    }

    // Clone the element's content
    foreach($elt->childNodes as $child) {
        $newelt->appendChild($child->cloneNode(true));
    }

    // Replace the node in the tree
    $elt->parentNode->replaceChild($newelt, $elt);
}


foreach (pq("title") as $titleelt) {
    changeElementTagName($titleelt, "span");
    $titleelt->setAttribute("class", "title");
}

$outputstring = $doc->saveXML();
print $outputstring;

?>

Or whatever. (Note: I have not actually tested this code, and it may not even parse correctly.)

04-11-2013, 10:35 PM	#1
KmC Junior Member Posts: 3 Karma: 10 Join Date: Apr 2013 Device: none	XSLT vs InDesign - best conversion process? Hello, Let me begin by apologizing if this sounds like a naive question, but I wanted to ask whether there are any inherent advantages of using a transform to export XML to EPUB, rather than using InDesign. It's not a simple DocBook-EPUB because the source XML is in a specific kind of XML (based on the NLM [National Library of Medicine] tag library), and thus we would need someone to develop the XSLT specifically for us. I suggested pouring the source XML into InDesign (since it's already tagged and ready to go) and exporting it that way. It seems that it would be cheaper, easier, faster, and wouldn't require anyone to take evening classes in xProc. Am I missing something? (I VERY well could be) Thanks so much for any help provided.

04-18-2013, 04:03 PM	#9
curiousgeorge Connoisseur Posts: 53 Karma: 10 Join Date: Aug 2012 Location: Nashville, Tn Device: ipad, Kindle Fire	Code: <?xml version="1.0" encoding="utf-8" standalone="no"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title></title> <link href="../Styles/stylesheet.css" rel="stylesheet" type="text/css" /> </head> <body> <p><span class="artist">ArtistofSong</span>: <span class="title">TitleofSong</span></p> </body> </html> Not trying to mean or pick on you in particular but why do you include the Doctype when you have already declared the doc an XML file? I always exclude the doctype. Also note that you are limited per IDPF on what type of doctype to use encase anyone is wondering. Also, Code: standalone="no" is not needed since it is the default but do not include Code: standalone="yes"

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Mobigen KindleGen Conversion Process Documentation	sarafnikit	Kindle Formats	9	03-29-2012 07:09 PM
Runaway conversion process!	johnb0647	Calibre	3	02-28-2012 05:37 AM
Trying to understand conversion process	AlexBell	Conversion	4	06-16-2011 07:46 AM
Help w/ Conversion Process	dftr	Workshop	2	06-20-2009 08:33 PM
New Conversion Process	Gideon	Kindle Formats	2	02-19-2009 11:04 PM

04-12-2013, 03:00 AM	#2
Toxaris Wizard Posts: 4,520 Karma: 121692313 Join Date: Oct 2009 Location: Heemskerk, NL Device: PRS-T1, Kobo Touch, Kobo Aura	Just keep in mind that an ePUB exported from InDesign typically needs some touching up. The export is not always correct.

04-12-2013, 08:38 PM	#4
KmC Junior Member Posts: 3 Karma: 10 Join Date: Apr 2013 Device: none	Thank you very much, you are wonderful people. I can't provide actual examples of the XML, given the proprietary nature of the content; however, I don't think any are necessary. You have both answered my question. I suppose I'm just less than comfortable working with XSLT at the moment, and didn't want to have to outsource the actual creation of the transformation, but the advice you've provided is in keeping with the existing workflow, and I have no reason to suspect I know better. In the meantime, I will invest more time in getting comfortable with XSLT, and try and avoid interfering until I know what I'm talking about. Thanks again.

04-13-2013, 06:22 PM	#5
dgatwood Curmudgeon Posts: 629 Karma: 1623086 Join Date: Jan 2012 Device: iPad, iPhone, Nook Simple Touch	If you are familiar with pretty much any programming language, you might find it easier to use a tree-based (DOM) XML parser, then walk the DOM tree and manipulate it in one or more passes, and write the result out to disk. It's pretty much just like manipulating HTML elements with JavaScript code, if you've ever done that. If you don't know any programming languages, then XSLT is probably the better of those two choices. Then again, PHP, Perl, Python, and Ruby are all relatively easy to learn (with PHP being probably the most straightforward in terms of having a fairly lightweight and consistent syntax), so maybe it's time to pick up a new hobby.