Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 06-08-2012, 03:45 AM   #1
SauliusP.
Plugin developer
SauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notes
 
SauliusP.'s Avatar
 
Posts: 108
Karma: 24394
Join Date: Feb 2012
Location: Lithuania
Device: Kindle
[Input Plugin] DOCX Input

Hello,

THIS PLUGIN IS OBSOLETE FROM Calibre version 0.9.34, as a native plugin supersedes it. This plugin won't be supported anymore.

As an article writer I have lots of DOCX and tried to find good free alternative for DOCX to EPUB, AZW3 or MOBI conversion. However, good EPUB tools are not free, and Amazon's conversion service did not satisfy me, it makes formatting crappy and "not book like". So here they are, my own conversion tools. Please feel free to use them for your own purposes. Development will continue, I will constantly add new features. I was quite surprised there is no other plugin for Calibre, as DOCX format is comparatively simple.

DOCX Input plugin converts a DOCX file format to OEB (if I'm not mistaken, bunch of HTMLs with OPF file and CSS stylesheets). Then Calibre converts it to anything it supports. My main target is AZW3 (KF8) and MOBI, but no hacks included for better support.

Did you know that with this plugin you can view DOCX files in the Calibre without opening them in Word?! Just go to Settings > Behavior and tick the DOCX format in the list Use internal viewer for!

TTF and OTF (type "OTTO") font embedding is supported.
Note: it is your legal responsibility to embed fonts (check copyright before).
OTF is supported only of TTF type or type "OTTO", i.e. single font/family in the file. TrueType Collections are not yet supported.

The next post contains features, userguide, other information and also a demo docx file, to show off the supported features of the plugin.


SUPPORTED FEATURES
Spoiler:

1. Conversion to CSS and filtering of Word styles (only in-use styles are converted).
2. Paragraph properties: left, right indents, first line indent, last rendered page break (might be: manual page break, style-based page break, section break etc).
3. Images support. Wrapped around pictures are floated to left or right side. There is no alignment in Word itself, so I calculate it like this: if image is 7 centimeters (or more) off the left page boundary, I assume it is "right-aligned".
4. Tables (also multi-level table in a cell support).
5. Everything until first rendered page break is considered to be "a cover". I.e. most of my documents, that I convert, include some type of cover and a manual page break.
6. Font embedding of DejaVu Serif (included into plugin itself).
7. Footnotes are saved into individual HTML files and superscript links are added.
8. Paragraphs, that have TOC level styles applied (like Heading 1, 2 etc., or custom ones), are converted to appropriate level h1, h2 etc. HTML tags.
9. Font-sizes are converted to pt (same value, as you see in Word itself).
10. Indents are converted to em (just looks better).
11. Line breaks.
12. Options dialogue (via "Customize plugin"): Cover—force use first image in document, even if metadata contains another one on/off; drop content until first page break (assuming, that first page is just a cover image followed by a page break); embed fonts on/off, particularly useful when testing output with Calibre's EPUB viewer, which drops formatting because of Qt bug.
13. Strike-through (double strike-through is converted as single), subscript and superscript.
14. Underline support.
15. TTF font embedding.
16. Font face support (with embedding).
17. Lists support: numbered, bulleted, nested, continued.
18. OTF type "OTTO" embedding.
19. Paragraphs "before"/"after" setting.


NOT SUPPORTED
Spoiler:

1. Table styling. Now only collapsed 1px black borders are hard-coded.
2. Footnotes back-link.
3. No endnotes support and is not planned. If required, I convert all endnotes to footnotes beforehand.
4. Another fancy things, like vector graphics, OLEs, effects etc. Not planned either.


PLANNED
Spoiler:

1. Options to switch font-size units: em, pt, px, %.
2. Table styling (if not too difficult).
3. Embedding of the embedded fonts in the DOCX.


User Guide
Spoiler:

When you call conversion dialogue, there you'll see "DOCX Input" icon on the left. There you might want to choose some options to adjust conversion from DOCX.



1. Use first found image as cover. Default: ON. The very first picture in the document will be used as a book's cover during conversion.
2. Skip contents until first page break. Default ON. This is tightly tangled with above. I usually have a book like this: first page contains cover image and it is followed immediately by page/section break. So if I'm using this cover image as a cover, there is no need to repeat it again in the output book.
3. Replace paragraph spacing with empty lines. Default OFF. If paragraph has a "before" or "after" setting greater than 0 and option is ON, an empty paragraph will be included appropriately before or after it. Otherwise "before" or "after" will be set as a margin.
4. Embed fonts: All (all TTF fonts, found in document), DejaVu Serif (if you are not sure if it is legal to embed another fonts), None (when converting to font-unaware format, like MOBI).
5. Set "Normal" font family to "Serif". This is particulary useful, when one converts book to AZW3 (KF8) format and wants to leave majority of the text to be displayed in native Kindle font (Caecilia LT or another, configured by user). I.e. leave font family styling only for headers, captions and other types of highlighted text.
66. Scan fonts. It is to save some extra CPU and I/O cycles. Fonts are not installed very often, so it is best to scan for them occasionally. For the first time use click this button for plugin to gather all installed TTF fonts in your OS (tested on Windows and Linux, Mac font directories are also included).

To get best results Calibre should be also tuned a bit.
1. To generate TOC, go to Common Options, Table of Contents and add expressions for HTML headings (use wizard or input //h:h1 for Level 1 TOC, //h:h2 for Level 2 and //h:h3 for Level 3).
2. For EPUB conversion go to EPUB output options and tick "No default cover" and "No SVG cover".


All critiques, crashes and suggestions are most welcome, but I will not be quick in responses or new features development. At the moment I'm quite satisfied with plugins.

Version history:
Spoiler:
Version 0.0.22 2013-01-11
Fixed another image processing bug, when file is exported from another programs.

Version 0.0.21 2012-12-18
Fixed image processing bug. If there's only one image in the document and it is used as a cover, but it has also occurancies in other places, it dissapears from there.

Version 0.0.20 2012-12-17
Some formatting problems addressed with hanging indents, especially in lists. However, there will be some inaccuracies with lists. Kindle with KF8 supports them perfectly, older MOBI does not. Internal Calibre viewer shows everything nicely, but CoolReader application fails with negative first line indents (hanging indents). Demo DOCX sheet is also updated. Found why version history was not available in Calibre. Fixed it.

Version 0.0.19 2012-12-12
Bug fix, reported by Czech "book brothers", which caused plugin to crash. Includes numbering styles, previously not taken into account.

Version 0.0.18 2012-11-21
Long awaited (by some users) change: paragraph background colour ("shading" in Word terms) and characters background colour (a.k.a. "highlight").

Version 0.0.17 2012-11-16
New features:
  • Default font subfamily embedding, when required is not present. E.g. if one has only "Regular" font, but sets it to "Bold" in Word, "Regular" family will be included anyway. Supported default subfamilies are: "Regular", "Book", "Normal", "Medium". Rescan your fonts after this update!
  • Vertical paragraph spacing with "Before" and "After" settings. Also included a tick mark to replace "Before" and "After" with empty paragraphs for better e-reader compatibility.
  • Back-links in the footer text.

Version 0.0.16 2012-11-12
Bug Fixes:
  • Styles overriding in Word 2010 document and leaving some formatting behind.
  • Style naming in final CSS, overhead prefix added where not required.

Version 0.0.15 2012-10-08
New features:
  • Word 2010 styling included (stylesWithEffects.xml).
  • Special CSS style selector naming, conversion from Word styles, that start with number (invalid in CSS).
  • Few generic speed optimizations (non-functional).

Version 0.0.14 2012-10-04
Bug Fixes:
  • Fonts not embedded, when font-family is set directly, not via styles.
  • Type of "Book" subfamily not included instead of "Regular", when latter is not present.

Version 0.0.13 2012-10-02
New features:
  • OTF type "OTTO" font embedding.
  • "Normal" style font family substitution with generic "Serif" (mainly for KF8).

Version 0.0.12 2012-09-28
Bug Fixes:
  • Cover not generated.
  • Failure on table styles, if present in document.
New features:
  • TTF font embedding.
  • List conversion, nested lists as well, either numbered or bulletted.

Version 0.0.11 2012-09-17
Fixed bug with non-inline images, that caused crash of the plugin.

Version 0.0.10 2012-08-13
Fixed bug with skipping content until first page break when there is no page break in the document.

Version 0.0.9 2012-08-09
Plugin configuration (which is accessed very very inconveniently) finally changed to normal input options. Thanks to Kovid for enhancing Calibre's code to accept such a feature.

Not a version really 2012-07-18
Real motivation for new version release, as today received first donation. Thanks, Keith!

Version 0.0.8 2012-06-17
Bug Fixes:
  • Intermittent underline of text due to non-standard false-underline handling in text formatting tags.
New features:
  • Underline support.
  • Table width set to 100%.
  • Right-side alignment of pictures.

Version 0.0.7 2012-06-17
Bug Fixes:
  • Some text missing in paragraph. Due to "characters" method in SAX, sometimes it adds text in several chunks.
New features:
  • Strike-through, superscript and subscript support

Version 0.0.6 2012-06-12
Bug Fixes:
  • Page cover href pointed to html instead of image file
New features:
  • Customization dialogue added
  • Line-break support
  • A bit more distinguishable footnote link (Atlantis-like)

Version 0.0.5 2012-06-08
Initial release with few little bugs and initial features 1–10.

Attached Files
File Type: zip Calibre-DOCX-Input.v0.0.22.zip (699.6 KB, 36871 views)

Last edited by SauliusP.; 06-07-2013 at 05:26 AM.
SauliusP. is offline  
Old 06-08-2012, 04:25 AM   #2
SauliusP.
Plugin developer
SauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notes
 
SauliusP.'s Avatar
 
Posts: 108
Karma: 24394
Join Date: Feb 2012
Location: Lithuania
Device: Kindle
Note: Kovid refactored and included DOCX Metadata Reader plugin into Calibre itself from version 0.8.56.
Attached Thumbnails
Click image for larger version

Name:	docx-input-options.png
Views:	11978
Size:	64.6 KB
ID:	96224  
Attached Files
File Type: zip sp-macros.zip (360 Bytes, 1139 views)
File Type: zip docx-input-demo.docx.zip (30.3 KB, 994 views)

Last edited by SauliusP.; 12-18-2012 at 03:17 AM.
SauliusP. is offline  
Advert
Old 06-10-2012, 08:20 PM   #3
longnh
Junior Member
longnh began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jun 2012
Device: Kindle
Smile

Thank you SauliusP, your plugin is really useful.
longnh is offline  
Old 06-12-2012, 03:10 AM   #4
SandFish
Member
SandFish began at the beginning.
 
Posts: 24
Karma: 10
Join Date: Jan 2012
Location: Northern Germany
Device: tolino vision 2
DOCX Input

Hi SauliusP., for a long time I waited for such a tool!
Thanks a million to everybody for the great CALIBRE software and ist Plugins!!!
SandFish
SandFish is offline  
Old 07-03-2012, 08:21 AM   #5
odyn1982
Junior Member
odyn1982 began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jul 2012
Device: Kindle3
Hi,

I have a problem with the file docx, which incorrectly Calibre convert to mobi. When you open a docx file, text styles are correct, but not in mobi. Where is the problem?
http://www.mediafire.com/view/?pgjpfwvnn3w48gc
odyn1982 is offline  
Advert
Old 07-04-2012, 01:20 AM   #6
SauliusP.
Plugin developer
SauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notes
 
SauliusP.'s Avatar
 
Posts: 108
Karma: 24394
Join Date: Feb 2012
Location: Lithuania
Device: Kindle
Quote:
Originally Posted by odyn1982 View Post
Where is the problem?
Hi, odyn1982,

Problem is in complex formatting ant poor style use. I will fine-tune the plugin to get better results. Thank you for your feedback.
SauliusP. is offline  
Old 07-06-2012, 10:10 AM   #7
SandFish
Member
SandFish began at the beginning.
 
Posts: 24
Karma: 10
Join Date: Jan 2012
Location: Northern Germany
Device: tolino vision 2
Hi everybody,
moreover, if the text contains no picture the Calibre conversion from DOCX to EPUB is not working, error message is: "empty spine".
The first edition of the plugin worked fine.
Still, Calibre and its Plugins is great software!!
SandFish is offline  
Old 07-10-2012, 07:34 AM   #8
SauliusP.
Plugin developer
SauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notes
 
SauliusP.'s Avatar
 
Posts: 108
Karma: 24394
Join Date: Feb 2012
Location: Lithuania
Device: Kindle
Quote:
Originally Posted by SandFish View Post
Hi everybody,
moreover, if the text contains no picture the Calibre conversion from DOCX to EPUB is not working, error message is: "empty spine".
The first edition of the plugin worked fine.
Still, Calibre and its Plugins is great software!!
SandFish, you might want to uncheck "Use first image as a cover" checkbox in plugin preferences. If that does not help, then there's some other bug in the plugin. Sorry for inclusion of those options, at the beginning I did not realize there are Input Options dialogues in the Calibre. Will include that in the next release.
SauliusP. is offline  
Old 08-10-2012, 05:20 PM   #9
Yurik
Junior Member
Yurik began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Aug 2012
Device: Kindle 4
Hi,
I create simlpe docx file, add several words without any formatting.
When i try to convert in MOBI there are several errors. But if uncheck "Skip content until first page break" it is ok. Can you fix this problem?
I save simple file as doc because there is no possibility to upload docx.
Attached Files
File Type: doc Microsoft Office Word.doc (25.5 KB, 875 views)
Yurik is offline  
Old 08-10-2012, 05:48 PM   #10
Yurik
Junior Member
Yurik began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Aug 2012
Device: Kindle 4
Python function terminated unexpectedly
Spine is empty (Error Code: 1)
Traceback (most recent call last):
File "site.py", line 132, in main
File "site.py", line 109, in run_entry_point
File "site-packages\calibre\utils\ipc\worker.py", line 192, in main
Yurik is offline  
Old 08-11-2012, 12:26 AM   #11
kmarkla
Junior Member
kmarkla began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Aug 2012
Device: kindle touch
Bulk Conversions

Your plugin works great with individual conversions as long as I go into the
"DOCX Input" and disable the "Skip content until first page break" option. However, when I try to "bulk convert" as opposed to "convert individually" I am not given the option to do this. Since I have a lot of DocX documents I'd like to convert to MOBI, doing them individually is a hassle. Is there a way to do this with bulk conversions?
kmarkla is offline  
Old 08-13-2012, 02:53 AM   #12
SauliusP.
Plugin developer
SauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notes
 
SauliusP.'s Avatar
 
Posts: 108
Karma: 24394
Join Date: Feb 2012
Location: Lithuania
Device: Kindle
Hi all,

I will fix the problem with skipping content, however regarding bulk conversion I cannot do much. There is no possibility in Calibre to add global options for external plugin as of yet. I can only do my best to make it more stable, but best results will be achieved only with individual conversion (which I do myself, too).

BR.
SauliusP. is offline  
Old 08-13-2012, 04:44 AM   #13
SauliusP.
Plugin developer
SauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notes
 
SauliusP.'s Avatar
 
Posts: 108
Karma: 24394
Join Date: Feb 2012
Location: Lithuania
Device: Kindle
Fix in the new 0.0.10 version, should not fail with unchecked "Skip content until first pagebreak".
SauliusP. is offline  
Old 08-13-2012, 06:11 AM   #14
Firedancer885
Occassional Beta Tester
Firedancer885 can teach chickens to fly.Firedancer885 can teach chickens to fly.Firedancer885 can teach chickens to fly.Firedancer885 can teach chickens to fly.Firedancer885 can teach chickens to fly.Firedancer885 can teach chickens to fly.Firedancer885 can teach chickens to fly.Firedancer885 can teach chickens to fly.Firedancer885 can teach chickens to fly.Firedancer885 can teach chickens to fly.Firedancer885 can teach chickens to fly.
 
Posts: 283
Karma: 3516
Join Date: Nov 2010
Location: Hungary
Device: Samsung Galaxy Tab 4 (wifi only)
Hi Saul,

I can't update the plugin. I started Calibre, got the notification for the update from 0.0.9 to 0.0.10. Updated, restarted and got another notification for 0.0.10 to 0.1.0. Updated restarted and the notification is still there for the update to 0.1.0. What's the problem?
Firedancer885 is offline  
Old 08-13-2012, 06:19 AM   #15
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,688
Karma: 2162246
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
It is just a mismatch between the plugin index thread and what is inside the plugin itself. It should be fixed now.
kiwidude is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
DOCX Input and DOCX Metadata Reader SauliusP. Development 5 06-15-2012 03:17 AM
Understanding html input plugin nimblebooks Conversion 3 02-26-2012 02:06 AM
telling the input plugin to allow a rel=nofollow nimblebooks Conversion 0 02-22-2012 06:01 PM
Plugin which uses net as input and output medve Development 0 12-04-2011 04:20 PM
Looking For MHT Input Conversion Plugin FlooseMan Dave Plugins 4 03-30-2010 06:52 PM


All times are GMT -4. The time now is 12:14 AM.


MobileRead.com is a privately owned, operated and funded community.