View Single Post
Old 01-05-2014, 02:35 PM   #1
skreutzer
Software Developer
skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.
 
skreutzer's Avatar
 
Posts: 190
Karma: 89000
Join Date: Jan 2014
Location: Germany
Device: PocketBook Touch Lux 3
Sigil as front end for automated XML based processing workflows?

Hello,

I'm developing automated XML processing workflows for my own projects and for everyone as and with free software. I'm interested in improving Sigil to make it a valuable front end for such processing workflows.

For obvious reasons, direct formatting (for instance with inline CSS declarations) is a pretty bad idea, since it will lead to the loss of logical information and also will make the output files specific to a target format. With semantic markup, the information is of universal use and can be processed to various output formats.

I have heard that Sigil already relies on formatting by style templates (no direct formatting), so I assume Sigil output to be quite usable for automated processing workflows. However, I would also like the feature of exporting and importing style templates, so that a definition of CSS classes could be loaded and used by the Sigil user to prepare a text for an automated processing system. The definitions should be editable to comfort the user in terms of the WYSIWYG rendering, but also to prepare styles that are going to be exported. Selection of a style template should be a matter of one or two clicks (most used styles on the top, or applying one currently selected style multiple times).

I've already implemented one automated processing workflow for one of my projects (well, the project description is in German, but you may just look at the images - a translation into English will follow in future, also videos in English to show the workflow and the tools), and I'm now going to build such workflows for more common input, that is XHTML or EPUB to process any kind of text automatically into various output formats. As a first step, I started to develop a small tool html2epub (which is in a very early stage, CSS in the header gets lost etc.). I plan to work on a full processing backend to support output in XHTML, EPUB2, EPUB3, PDF via FO, PDF via LaTeX, plain text. Supported input formats could be custom XML, XHTML, EPUB. Each step of the processing workflow could be adjusted by XML manipulation, for instance to automatically add linked footnotes (back and forth references) for EPUB2 output. Also, one is still capable of manually adjusting the files involved in this process to get a perfect result, if the automated result isn't good enough or some parts of the processing aren't automatable yet. Little helper tools could make configuration more easy, or provide a GUI for the command line tools.

I assume that it would be beneficial for a lot of people to have a software which encourages semantic markup, so authors could be required to use this software if they want to feed into the automated processing system and get their files out in various formats with good quality. Even if authors refuse to use such software, they could hand their plain text over to some other person who would do the semantic markup for them, as part of a service. The question is, if Sigil could become this software, since it is already specialized for such kind of tasks. Unfortunately, Calibre is going into the opposite direction, making Calibre output less usable.

For an author, importing style definitions of, let's say, a publishing house, self-publishing-online-platform or the typesetting guy would ensure that the resulting files will fit the automated processing system, or could be converted to it easily since the definition of the used style templates is known. In any case, an author could also just specify his own style templates or use a default, so that other software will be able to interpret it after configuration (just match style templates to the formatting options a processing system supports).

Since I'm a C++ developer (no Qt or Boost yet though) and Sigil is licensed under GNU GPL3 (which I really appreciate), I may start to play around with it a little and write code for it. I tried to build it on the 100% entirely free operating system gNewSense 3.0, but unfortunately it refuses to link due to
Code:
Qt5.2.0/5.2.0/gcc/lib/libQt5WebKit.so.5.2.0: undefined reference to `gst_x_overlay_set_window_handle'
, and I don't know how to fix this, since I already installed the gstreamer dev-packages, don't know how to link with cmake against a more recent version from the gstreamer website or the problem might be that the needed package is not entirely free, so Qt5.2.0 and Sigil wouldn't be usable for the free software world.

Please note that I'm not interested in working on the support of non-free, proprietary operating systems or software, nor on secret formats or anything like it. I'm trying to free things up, not the opposite.

So I would like to get ideas from people who run and/or implement such processing workflows or would like to have one available, there might be the opportunity to collaborate on a solution as and with free software. You could also use my results I've produced so far, but as always, time is limited, so progress is made in small steps ;-)


Sincerely,
Stephan Kreutzer



Please note, the original post was much longer, talking about the advantages of semantic markup and the disadvantages of direct formatting (inline CSS), but as I've learned that Sigil is already using a style template approach, I cut all that out.

Last edited by skreutzer; 02-20-2014 at 04:28 PM.
skreutzer is offline   Reply With Quote