Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 11-05-2015, 01:21 PM   #1
oberon567
Junior Member
oberon567 began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Nov 2015
Device: none
Docx to ePub conversion keeps failing

I am trying to convert a MS Word file to ePub, and it keeps failing. The log is below.

The text is Tibetan language, UTF-8 encoded. I just converted a different Tibetan language text this evening, and it worked without any errors, and I have done it numerous times in the past, too. I used the same settings for this, but it fails almost immediately (17 seconds). Tried with the Word file saved as a Word 2015 file as well as 2013 file.

I am using the newest version of Calibre 64 bit on WIndows 10 Home 64 bit.

I uninstalled Calibre, deleted the fCalibre folder in %User%/AppData/Roaming, and then reinstalled, and nothing. Tried running as Administrator, also nothing.

The Word file is larger than other I have converted (3.9mb) but don't think too large to cause a problem. (?)

I spent numerous hours this evening readying this file for ePub conversion, stripping away a lot of the formatting and so forth. And now it isn't working!

I cannot attach the file here because it is a .docx file, not a .doc, and the file uploader says it is an invalid file type for upload. I can convert and upload a .doc version, if requested. It is written in unicode Tibetan fonts, two of them specifically, but iIf you don't have them it should be legible using MS Himalaya or whatever Apple's default Tibetan font is.

(In my ePub I have chosen to embed all the fonts used. These are the exact same fonts I used in my previous conversions which did work).

Any thoughts or ideas? Thanks!

The error log:
Code:
calibre, version 2.42.0 (win32, isfrozen: True)
Conversion Error: Failed: Convert book 1 of 1 (འདུལ་བ་མདོ་རྩ་བའི་མཆན་འགྲེལ།)

Convert book 1 of 1 (འདུལ་བ་མདོ་རྩ་བའི་མཆན་འགྲེལ།)
Resolved conversion options
calibre version: 2.42.0
{'asciiize': False,
 'author_sort': None,
 'authors': None,
 'base_font_size': 0.0,
 'book_producer': None,
 'change_justification': u'original',
 'chapter': u"//*[((name()='h1' or name()='h2') and re:test(., '\\s*((chapter|book|section|part)\\s+)|((prolog|prologue|epilogue)(\\s+|$))', 'i')) or @class = 'chapter']",
 'chapter_mark': u'both',
 'comments': None,
 'cover': u'C:\\Users\\Gyalten\\AppData\\Local\\Temp\\calibre_c4ydw1\\wiykiq.jpeg',
 'debug_pipeline': None,
 'dehyphenate': True,
 'delete_blank_paragraphs': True,
 'disable_font_rescaling': False,
 'docx_no_cover': False,
 'docx_no_pagebreaks_between_notes': True,
 'dont_split_on_page_breaks': False,
 'duplicate_links_in_toc': False,
 'embed_all_fonts': True,
 'embed_font_family': u'Monlam Uni OuChan2',
 'enable_heuristics': False,
 'epub_flatten': False,
 'epub_inline_toc': False,
 'epub_toc_at_end': False,
 'expand_css': False,
 'extra_css': None,
 'extract_to': None,
 'filter_css': u'',
 'fix_indents': True,
 'flow_size': 260,
 'font_size_mapping': None,
 'format_scene_breaks': True,
 'html_unwrap_factor': 0.4,
 'input_encoding': u'utf-8',
 'input_profile': <calibre.customize.profiles.InputProfile object at 0x000000000256ADD8>,
 'insert_blank_line': False,
 'insert_blank_line_size': 0.5,
 'insert_metadata': False,
 'isbn': None,
 'italicize_common_cases': True,
 'keep_ligatures': False,
 'language': None,
 'level1_toc': None,
 'level2_toc': None,
 'level3_toc': None,
 'line_height': 0.0,
 'linearize_tables': True,
 'margin_bottom': 4.0,
 'margin_left': 4.0,
 'margin_right': 4.0,
 'margin_top': 4.0,
 'markup_chapter_headings': True,
 'max_toc_links': 50,
 'minimum_line_height': 120.0,
 'no_chapters_in_toc': False,
 'no_default_epub_cover': False,
 'no_inline_navbars': False,
 'no_svg_cover': False,
 'output_profile': <calibre.customize.profiles.GenericEink object at 0x0000000002579198>,
 'page_breaks_before': u'/',
 'prefer_metadata_cover': False,
 'preserve_cover_aspect_ratio': True,
 'pretty_print': True,
 'pubdate': None,
 'publisher': None,
 'rating': None,
 'read_metadata_from_opf': u'C:\\Users\\Gyalten\\AppData\\Local\\Temp\\calibre_c4ydw1\\skigqi.opf',
 'remove_fake_margins': True,
 'remove_first_image': False,
 'remove_paragraph_spacing': True,
 'remove_paragraph_spacing_indent_size': 1.5,
 'renumber_headings': True,
 'replace_scene_breaks': u'',
 'search_replace': '[]',
 'series': None,
 'series_index': None,
 'smarten_punctuation': False,
 'sr1_replace': None,
 'sr1_search': None,
 'sr2_replace': None,
 'sr2_search': None,
 'sr3_replace': None,
 'sr3_search': None,
 'start_reading_at': None,
 'subset_embedded_fonts': True,
 'tags': None,
 'timestamp': None,
 'title': None,
 'title_sort': None,
 'toc_filter': None,
 'toc_threshold': 6,
 'toc_title': None,
 'unsmarten_punctuation': False,
 'unwrap_lines': True,
 'use_auto_toc': True,
 'verbose': 2}
InputFormatPlugin: DOCX Input running
on C:\Users\Gyalten\AppData\Local\Temp\calibre_c4ydw1\83jqv2.docx
Python function terminated unexpectedly
  Error in xpath expression (Error Code: 1)
Traceback (most recent call last):
  File "site.py", line 132, in main
  File "site.py", line 109, in run_entry_point
  File "site-packages\calibre\utils\ipc\worker.py", line 193, in main
  File "site-packages\calibre\gui2\convert\gui_conversion.py", line 31, in gui_convert_override
  File "site-packages\calibre\gui2\convert\gui_conversion.py", line 25, in gui_convert
  File "site-packages\calibre\ebooks\conversion\plumber.py", line 1051, in run
  File "site-packages\calibre\customize\conversion.py", line 241, in __call__
  File "site-packages\calibre\ebooks\conversion\plugins\docx_input.py", line 31, in convert
  File "site-packages\calibre\ebooks\docx\to_html.py", line 97, in __call__
  File "site-packages\calibre\ebooks\docx\fields.py", line 104, in __call__
  File "xpath.pxi", line 456, in lxml.etree.XPath.__call__ (src\lxml\lxml.etree.c:147594)
  File "xpath.pxi", line 238, in lxml.etree._XPathEvaluatorBase._handle_result (src\lxml\lxml.etree.c:144977)
  File "xpath.pxi", line 224, in lxml.etree._XPathEvaluatorBase._raise_eval_error (src\lxml\lxml.etree.c:144832)
lxml.etree.XPathEvalError: Error in xpath expression

Last edited by BetterRed; 11-05-2015 at 03:33 PM.
oberon567 is offline   Reply With Quote
Old 11-05-2015, 01:44 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,509
Karma: 24495778
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Simply zip up the docx file and attach that. If the file is copyrighted then use the calibre bug tracker, instead of attaching it here.
kovidgoyal is online now   Reply With Quote
Advert
Old 11-06-2015, 08:45 AM   #3
oberon567
Junior Member
oberon567 began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Nov 2015
Device: none
File...

Thanks, here is the file zipped...

I am going to assume that part of the problem has to do with the fact that the file has a whole mess of styles layered one upon another....

I saved the file as a HTML file and tried to convert that, and it got to 67% (instead of 17%) but then also failed.

So I spent the day entirely rebuilding the file from scratch, scrapping all of the overlapping styles and doing my best to clean it up. The resulting file is basically 1mb smaller. I am trying to convert it now, we will see how it goes, I will update... But if you can look at this first file and see if there is something else amiss, something other than the formatting and style nightmare, please let me know...

Thanks again!

[Attachment not approved until confirmation given that it only contains public domain text and images (or the OP owns the copyright).]
Attachments Pending Approval
File Type: zip tibet file.zip

Last edited by pdurrant; 11-06-2015 at 10:43 AM.
oberon567 is offline   Reply With Quote
Old 11-06-2015, 09:33 AM   #4
oberon567
Junior Member
oberon567 began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Nov 2015
Device: none
Update

I tried again, as mentioned in my previous post, with a new version of the file that was much cleaner.

Again it failed, but the error message was much more useful this time:

Quote:
calibre, version 2.42.0
ERROR: Conversion Failed: <p><b>Failed to convert: ' dul ba mdo rtsa ba'i mchan 'grel mthong ba don 'grub<p>
Many older ebook reader devices are incapable of displaying
EPUB files that have internal components over a certain size.
Therefore, when converting to EPUB, calibre automatically tries
to split up the EPUB into smaller sized pieces. For some
files that are large undifferentiated blocks of text, this
splitting fails.
<p>You can <b>work around the problem</b> by either increasing the
maximum split size under EPUB Output in the conversion dialog,
or by turning on Heuristic Processing, also in the conversion
dialog. Note that if you make the maximum split size too large,
your ebook reader may have trouble with the EPUB.
I will try the suggestions mentioned there... Otherwise I can try editing the file some more.

The problem is that Tibetan is a complex font that does not have actual spaces. Using MS Word you can set the justification to "Thai Justification" to fix this, or you can insert zero-width breaks at every dot (every syllable is separated by a dot). I had run a macro to do this, but maybe Calibre did not pick up the zero width spaces? In any case, I can probably just add paragraph breaks at random points throughout the text, and that might be all that is needed...
oberon567 is offline   Reply With Quote
Reply

Tags
tibetan font


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Epub to Docx conversion? Hattie Conversion 14 09-13-2014 05:12 AM
conversion from docx to epub seems to break my paragraphs xanguera Conversion 2 07-24-2014 01:28 AM
Conversion of Endnotes .docx to .epub profjones Conversion 1 11-01-2013 09:05 AM
Docx to Epub conversion error with 1.5 dapjukebox Calibre 6 10-03-2013 09:18 AM
Horizontal lines in DOCX to EPUB conversion. StevieP Conversion 13 07-05-2013 04:14 AM


All times are GMT -4. The time now is 01:29 PM.


MobileRead.com is a privately owned, operated and funded community.