Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 04-11-2013, 11:35 PM   #1
KmC
Junior Member
KmC began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Apr 2013
Device: none
XSLT vs InDesign - best conversion process?

Hello,

Let me begin by apologizing if this sounds like a naive question, but I wanted to ask whether there are any inherent advantages of using a transform to export XML to EPUB, rather than using InDesign. It's not a simple DocBook-EPUB because the source XML is in a specific kind of XML (based on the NLM [National Library of Medicine] tag library), and thus we would need someone to develop the XSLT specifically for us.
I suggested pouring the source XML into InDesign (since it's already tagged and ready to go) and exporting it that way. It seems that it would be cheaper, easier, faster, and wouldn't require anyone to take evening classes in xProc.
Am I missing something? (I VERY well could be)

Thanks so much for any help provided.
KmC is offline   Reply With Quote
Old 04-12-2013, 04:00 AM   #2
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
Just keep in mind that an ePUB exported from InDesign typically needs some touching up. The export is not always correct.
Toxaris is offline   Reply With Quote
Old 04-12-2013, 07:28 AM   #3
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,304
Karma: 12587727
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by KmC View Post
It's not a simple DocBook-EPUB because the source XML is in a specific kind of XML (based on the NLM [National Library of Medicine] tag library), and thus we would need someone to develop the XSLT specifically for us.
Since it already is in XML, there is no need to introduce some bloated middleman program which could introduce its own errors (InDesign will make some assumptions with the code, and may potentially cause much more work. Plus as Toxaris said, InDesign does not export EPUB correctly in many cases).

I would say the best bet is working directly from the XML -> (Transformation) -> XHTML. From there, you could insert the XHTML into an EPUB using Sigil, and do any other needed tweaks from there (hopefully the Transformation can be created to minimize/eliminate the work to be done).

As long as these XML files all use the same classes, then you can create one master CSS file which can then be used in all books.

As long as all these documents are tagged in a very consistent way, this should be the best bet.

Quote:
Originally Posted by KmC View Post
It seems that it would be cheaper, easier, faster, and wouldn't require anyone to take evening classes in xProc.
Am I missing something? (I VERY well could be)
Would it be possible to give samples of any XML files wanting to be converted?
Tex2002ans is online now   Reply With Quote
Old 04-12-2013, 09:38 PM   #4
KmC
Junior Member
KmC began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Apr 2013
Device: none
Thank you very much, you are wonderful people. I can't provide actual examples of the XML, given the proprietary nature of the content; however, I don't think any are necessary. You have both answered my question. I suppose I'm just less than comfortable working with XSLT at the moment, and didn't want to have to outsource the actual creation of the transformation, but the advice you've provided is in keeping with the existing workflow, and I have no reason to suspect I know better.

In the meantime, I will invest more time in getting comfortable with XSLT, and try and avoid interfering until I know what I'm talking about.

Thanks again.
KmC is offline   Reply With Quote
Old 04-13-2013, 07:22 PM   #5
dgatwood
Curmudgeon
dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.
 
dgatwood's Avatar
 
Posts: 629
Karma: 1623086
Join Date: Jan 2012
Device: iPad, iPhone, Nook Simple Touch
If you are familiar with pretty much any programming language, you might find it easier to use a tree-based (DOM) XML parser, then walk the DOM tree and manipulate it in one or more passes, and write the result out to disk. It's pretty much just like manipulating HTML elements with JavaScript code, if you've ever done that.

If you don't know any programming languages, then XSLT is probably the better of those two choices. Then again, PHP, Perl, Python, and Ruby are all relatively easy to learn (with PHP being probably the most straightforward in terms of having a fairly lightweight and consistent syntax), so maybe it's time to pick up a new hobby.
dgatwood is offline   Reply With Quote
Old 04-16-2013, 01:51 PM   #6
KmC
Junior Member
KmC began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Apr 2013
Device: none
Quote:
Originally Posted by dgatwood View Post
If you are familiar with pretty much any programming language, you might find it easier to use a tree-based (DOM) XML parser, then walk the DOM tree and manipulate it in one or more passes, and write the result out to disk. It's pretty much just like manipulating HTML elements with JavaScript code, if you've ever done that.

If you don't know any programming languages, then XSLT is probably the better of those two choices. Then again, PHP, Perl, Python, and Ruby are all relatively easy to learn (with PHP being probably the most straightforward in terms of having a fairly lightweight and consistent syntax), so maybe it's time to pick up a new hobby.
Sorry, I had walked away from the post, so I didn't see your reply right away. I don't want to be too much of a bother, but I may prefer that option (or at least try and float it) since we could write it in house rather than creating an XSLT. Would you mind elaborating a bit on the first option? I'm good with JavaScript and using the DOM, but I promise not to do this myself. I will leave the programming to the programmers, but I would at least like to understand it better so as to be as helpful as possible. (and maybe try it myself over lunch)

Thanks so much for your help.
KmC is offline   Reply With Quote
Old 04-16-2013, 06:45 PM   #7
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,304
Karma: 12587727
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by KmC View Post
Would you mind elaborating a bit on the first option? I'm good with JavaScript and using the DOM, but I promise not to do this myself. I will leave the programming to the programmers, but I would at least like to understand it better so as to be as helpful as possible. (and maybe try it myself over lunch)
Well you would have to pick whatever programming language you are most comfortable in program in, and then look for an XML parsing library in that language. (Hopefully I understood what dgatwood was saying).

You would then use this library to go through the XML (perhaps convert it into an array of strings), and from there, step through the array and make any changes as needed.

For example, one pass, you might change all <title> into <span class="title">:

Original:
Code:
<title>TitleofSong</title>
<artist>ArtistofSong</artist>
After "Title" pass:
Code:
<span class="title">TitleofSong</span>
<artist>ArtistofSong</artist>
Then you may have an "Artist" method which will add the paragraph tags, and add a colon between Artist and Title.

After "Artist" pass:
Code:
<p><span class="artist">ArtistofSong</span>: <span class="title">TitleofSong</span></p>
Then a "Body" method might add all the stuff to make it an XHTML file:

Code:
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title></title>
  <link href="../Styles/stylesheet.css" rel="stylesheet" type="text/css" />
</head>

<body>
<p><span class="artist">ArtistofSong</span>: <span class="title">TitleofSong</span></p>
</body>
</html>
It is pretty much doing what the XSLT would do, except manually, and in multiple "passes". (If XSLT makes zero sense to you, but you know a programming language well, this is a way to go).

Without having a sample of what XML you are working with, I can only give general (not very helpful in my view) overviews. (Perhaps someone else might be more insightful?)
Tex2002ans is online now   Reply With Quote
Old 04-18-2013, 04:58 PM   #8
AlPe
Digital Amanuensis
AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.
 
AlPe's Avatar
 
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
Quote:
Originally Posted by Tex2002ans View Post
As long as these XML files all use the same classes, then you can create one master CSS file which can then be used in all books.

As long as all these documents are tagged in a very consistent way, this should be the best bet.
I come here to say precisely this. The initial investment of time might be higher than using crappy InDesign, but it pays off as soon as you have more than a couple of eBooks to convert.

Plus, there are DocBook -> EPUB (via XSLT) tools around, which you might find useful to start from. For example:

http://sourceforge.net/projects/docbook/files/epub3/
http://www.ibm.com/developerworks/xm.../section5.html
http://en.wikibooks.org/wiki/XQuery/DocBook_to_ePub
AlPe is offline   Reply With Quote
Old 04-18-2013, 05:03 PM   #9
curiousgeorge
Connoisseur
curiousgeorge began at the beginning.
 
Posts: 53
Karma: 10
Join Date: Aug 2012
Location: Nashville, Tn
Device: ipad, Kindle Fire
Code:
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title></title>
  <link href="../Styles/stylesheet.css" rel="stylesheet" type="text/css" />
</head>

<body>
<p><span class="artist">ArtistofSong</span>: <span class="title">TitleofSong</span></p>
</body>
</html>
Not trying to mean or pick on you in particular but why do you include the Doctype when you have already declared the doc an XML file? I always exclude the doctype. Also note that you are limited per IDPF on what type of doctype to use encase anyone is wondering. Also,
Code:
standalone="no"
is not needed since it is the default but do not include
Code:
standalone="yes"
curiousgeorge is offline   Reply With Quote
Old 04-18-2013, 05:50 PM   #10
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,304
Karma: 12587727
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by AlPe View Post
I come here to say precisely this. The initial investment of time might be higher than using crappy InDesign, but it pays off as soon as you have more than a couple of eBooks to convert.
Plus the "master CSS" file makes it easy as pie to fix any errors you find instead of having to do it on a per-book basis.

Quote:
Originally Posted by AlPe View Post
Plus, there are DocBook -> EPUB (via XSLT) tools around, which you might find useful to start from. For example:

http://sourceforge.net/projects/docbook/files/epub3/
http://www.ibm.com/developerworks/xm.../section5.html
http://en.wikibooks.org/wiki/XQuery/DocBook_to_ePub
Fantastic, thanks for the info. I will have to look into this as well.

I am very interested in the long-term storage of ebooks, and storing them in a way which will make it (relatively) easy/automated to convert into other formats in the future.

Quote:
Originally Posted by curiousgeorge View Post
Not trying to mean or pick on you in particular but why do you include the Doctype when you have already declared the doc an XML file? I always exclude the doctype. [...]
I just quickly pulled it out of an EPUB I was working on as an example to clarify the "multiple passes"... I wasn't trying to make the code as minimal as possible.
Tex2002ans is online now   Reply With Quote
Old 04-19-2013, 11:06 AM   #11
curiousgeorge
Connoisseur
curiousgeorge began at the beginning.
 
Posts: 53
Karma: 10
Join Date: Aug 2012
Location: Nashville, Tn
Device: ipad, Kindle Fire
Quote:
Originally Posted by Tex2002ans View Post
I just quickly pulled it out of an EPUB I was working on as an example to clarify the "multiple passes"... I wasn't trying to make the code as minimal as possible.
No thats cool. I just see it as a common practice and I think people dont understand what they are doing and I just happened to see you did it and I wanted to know the logic behind it. All good!!
curiousgeorge is offline   Reply With Quote
Old 04-19-2013, 11:03 PM   #12
dgatwood
Curmudgeon
dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.
 
dgatwood's Avatar
 
Posts: 629
Karma: 1623086
Join Date: Jan 2012
Device: iPad, iPhone, Nook Simple Touch
Quote:
Originally Posted by Tex2002ans View Post
You would then use this library to go through the XML (perhaps convert it into an array of strings), and from there, step through the array and make any changes as needed.
I would avoid converting to/from strings because of all the fun that entities cause when doing so. Leave the DOM objects as a tree and just manipulate the objects to change the tag names, add or remove attributes, etc. But otherwise, yes.

For example, in PHP, you can do something like this:

Code:
<?php

require('phpQuery/phpQuery.php'); // JQuery port to PHP

$doc = new DOMDocument();
if (!$doc->loadXML(file_get_contents("/path/to/file.xml")) {
    // handle error
    exit(1);
}

/* Useful function */
function changeElementTagName($elt, $newTagName)
{
    $newelt = $Document->createElement(newTagName);

    // Clone the element's attributes
    foreach($elt->attributes as $attribute) {
        $newelt->setAttribute($attribute->name, $attribute->value);
    }

    // Clone the element's content
    foreach($elt->childNodes as $child) {
        $newelt->appendChild($child->cloneNode(true));
    }

    // Replace the node in the tree
    $elt->parentNode->replaceChild($newelt, $elt);
}


foreach (pq("title") as $titleelt) {
    changeElementTagName($titleelt, "span");
    $titleelt->setAttribute("class", "title");
}

$outputstring = $doc->saveXML();
print $outputstring;

?>
Or whatever. (Note: I have not actually tested this code, and it may not even parse correctly.)
dgatwood is offline   Reply With Quote
Reply

Tags
conversion, indesign, journal articles, xslt


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Mobigen KindleGen Conversion Process Documentation sarafnikit Kindle Formats 9 03-29-2012 08:09 PM
Runaway conversion process! johnb0647 Calibre 3 02-28-2012 06:37 AM
Trying to understand conversion process AlexBell Conversion 4 06-16-2011 08:46 AM
Help w/ Conversion Process dftr Workshop 2 06-20-2009 09:33 PM
New Conversion Process Gideon Kindle Formats 2 02-20-2009 12:04 AM


All times are GMT -4. The time now is 03:45 PM.


MobileRead.com is a privately owned, operated and funded community.