DOCX Identation - Ebook-Convert

tafr · 07-31-2018, 04:43 AM

According to the documentation for ebook-convert the option
--remove-paragraph-spacing-indent-size=10
should add an identation of 10 em

However this does not work. Neither do the --remove-paragraph-spacing option. The same applies to --insert-blank-line and --insert-blank-line-size

My sourcefile contain p and div tags. And each paragraph should be idented or divided by a blank line.

I have also tried to insert two &nbsp (&#160) at the start of each paragraph via xslt, but these spaces are not shown (no identation appears) in the docx.

However when I replace the &nbsp with ** these characters are displayed in the docx file.

Any suggestions how to solve these problems?

kovidgoyal · 07-31-2018, 05:16 AM

https://www.mobileread.com/forums/sh...d.php?t=186697

tafr · 07-31-2018, 10:01 AM

Hi, thanks for a quick response. Here are the files I am working with including a calibre log file. We usually run the conversion from a python script, but here I have used the graphical interface (app). The result looks identical. I.e. no identation.

Regards, Tage Fredheim

kovidgoyal · 07-31-2018, 10:04 AM

I need the source file, which as per your first post is supposed to be an HTML file?

tafr · 08-01-2018, 04:16 AM

Yes, an xhtml. I could not upload the file with that extension, so I copied it to a docx-file which I called sourcefile.docx. As you can see I have indented some parapgraphs with two blank spaces (in fact &#160) i.e. non breakable spaces which should not be collapsed.

<section epub:type="chapter" id="level2_1">

<p>**Sosiologi og sosialantropologi er to av en rekke ulike samfunnsfag. Andre samfunnsfag er for eksempel statsvitenskap, samfunnsøkonomi, psykologi, samfunnsgeografi, pedagogikk og historie. Alle disse fagene handler om den menneskeskapte verden og konsentrerer seg på ulike måter om menneskelig aktivitet. Men likevel er de så forskjellige at de har ulike navn.</p>
<p>**Noen enkle skiller mellom fagene finnes: Mens psykologene er opptatt av menneskenes atferd og tankeprosesser, studerer sosiologene og sosialantropologene først og fremst de sosiale og kulturelle sammenhengene vi inngår i. Pedagogene er opptatt av spørsmål knyttet til læring, skole og utdanning, mens statsvitere er opptatt av staten, offentlig aktivitet og hvordan det internasjonale systemet fungerer politisk og økonomisk.</p>
<p>**Sosiologi og sosialantropologi er to fag som har mye til felles, både når det gjelder temaer som studeres, og teorier og begreper som brukes. I begge fagene stiller vi for eksempel slike spørsmål:</p>
<div epub:type="pagebreak" class="page-normal" id="page-11" title="11">--- 11 til 298</div><div>

I also need to ident various lists and other text so blank spaces and tabs must be kept.

The docx files are used for blind pupils using a braille reading list, so special formatting and identation is mandatory.

kovidgoyal · 08-01-2018, 04:29 AM

Dont use blank spaces to indent, use the text-indent css property and you will be fine. Use the following in extra css in the conversion settings

Code:

p { text-indent: 2em !important }

kovidgoyal · 08-01-2018, 04:30 AM

And if you need to upload files that mobileread does not support, you can zip them up and attach the zip file.

tafr · 08-01-2018, 05:43 AM

Ok, as mentioned we run the conversion from a python program: Where is "extra css" located?

try:
self.utils.report.info("Konverterer fra XHTML til DOCX...")
process = self.utils.filesystem.run(["/usr/bin/ebook-convert",
html_file,
os.path.join(temp_docxdir, epub.identifier() + ".docx"),
"--no-chapters-in-toc",
"--toc-threshold=0",
"--docx-page-size=a4",
"--linearize-tables",

"--embed-font-family=Verdana", # microsoft fonts must be installed (sudo apt-get install ttf-mscorefonts-installer)
"--docx-page-margin-top=42",
"--docx-page-margin-bottom=42",
"--docx-page-margin-left=70",
"--docx-page-margin-right=56",
"--base-font-size=13"])

kovidgoyal · 08-01-2018, 06:33 AM

--extra-css

08-01-2018, 04:29 AM	#6
kovidgoyal creator of calibre Posts: 44,526 Karma: 24495948 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Dont use blank spaces to indent, use the text-indent css property and you will be fine. Use the following in extra css in the conversion settings Code: p { text-indent: 2em !important }

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Some docx files convert to blank epubs	wftl	Conversion	1	03-19-2018 09:23 PM
Failed to Convert Gutenberg MOBI into DOCX	CrossReach	Conversion	3	08-31-2016 07:58 PM
ebook-convert (docx->html) inserting too many page breaks	xanguera	Conversion	3	07-31-2015 09:05 PM
Word docx won't convert	psilber	Conversion	2	08-09-2014 08:13 AM
Calibre refuses to convert my docx file	DaveMcA	Conversion	3	11-04-2013 03:26 AM

07-31-2018, 04:43 AM	#1
tafr Junior Member Posts: 5 Karma: 10 Join Date: Jul 2018 Device: edge	DOCX Identation - Ebook-Convert According to the documentation for ebook-convert the option --remove-paragraph-spacing-indent-size=10 should add an identation of 10 em However this does not work. Neither do the --remove-paragraph-spacing option. The same applies to --insert-blank-line and --insert-blank-line-size My sourcefile contain p and div tags. And each paragraph should be idented or divided by a blank line. I have also tried to insert two &nbsp (&#160) at the start of each paragraph via xslt, but these spaces are not shown (no identation appears) in the docx. However when I replace the &nbsp with ** these characters are displayed in the docx file. Any suggestions how to solve these problems?

07-31-2018, 05:16 AM	#2
kovidgoyal creator of calibre Posts: 44,526 Karma: 24495948 Join Date: Oct 2006 Location: Mumbai, India Device: Various	https://www.mobileread.com/forums/sh...d.php?t=186697

07-31-2018, 10:04 AM	#4
kovidgoyal creator of calibre Posts: 44,526 Karma: 24495948 Join Date: Oct 2006 Location: Mumbai, India Device: Various	I need the source file, which as per your first post is supposed to be an HTML file?

08-01-2018, 04:16 AM	#5
tafr Junior Member Posts: 5 Karma: 10 Join Date: Jul 2018 Device: edge	Yes, an xhtml. I could not upload the file with that extension, so I copied it to a docx-file which I called sourcefile.docx. As you can see I have indented some parapgraphs with two blank spaces (in fact &#160) i.e. non breakable spaces which should not be collapsed. <section epub:type="chapter" id="level2_1"> <p>Sosiologi og sosialantropologi er to av en rekke ulike samfunnsfag. Andre samfunnsfag er for eksempel statsvitenskap, samfunnsøkonomi, psykologi, samfunnsgeografi, pedagogikk og historie. Alle disse fagene handler om den menneskeskapte verden og konsentrerer seg på ulike måter om menneskelig aktivitet. Men likevel er de så forskjellige at de har ulike navn.</p> <p>Noen enkle skiller mellom fagene finnes: Mens psykologene er opptatt av menneskenes atferd og tankeprosesser, studerer sosiologene og sosialantropologene først og fremst de sosiale og kulturelle sammenhengene vi inngår i. Pedagogene er opptatt av spørsmål knyttet til læring, skole og utdanning, mens statsvitere er opptatt av staten, offentlig aktivitet og hvordan det internasjonale systemet fungerer politisk og økonomisk.</p> <p>**Sosiologi og sosialantropologi er to fag som har mye til felles, både når det gjelder temaer som studeres, og teorier og begreper som brukes. I begge fagene stiller vi for eksempel slike spørsmål:</p> <div epub:type="pagebreak" class="page-normal" id="page-11" title="11">--- 11 til 298</div><div> I also need to ident various lists and other text so blank spaces and tabs must be kept. The docx files are used for blind pupils using a braille reading list, so special formatting and identation is mandatory.

08-01-2018, 04:30 AM	#7
kovidgoyal creator of calibre Posts: 44,526 Karma: 24495948 Join Date: Oct 2006 Location: Mumbai, India Device: Various	And if you need to upload files that mobileread does not support, you can zip them up and attach the zip file.

08-01-2018, 05:43 AM	#8
tafr Junior Member Posts: 5 Karma: 10 Join Date: Jul 2018 Device: edge	Ok, as mentioned we run the conversion from a python program: Where is "extra css" located? try: self.utils.report.info("Konverterer fra XHTML til DOCX...") process = self.utils.filesystem.run(["/usr/bin/ebook-convert", html_file, os.path.join(temp_docxdir, epub.identifier() + ".docx"), "--no-chapters-in-toc", "--toc-threshold=0", "--docx-page-size=a4", "--linearize-tables", "--embed-font-family=Verdana", # microsoft fonts must be installed (sudo apt-get install ttf-mscorefonts-installer) "--docx-page-margin-top=42", "--docx-page-margin-bottom=42", "--docx-page-margin-left=70", "--docx-page-margin-right=56", "--base-font-size=13"])

08-01-2018, 06:33 AM	#9
kovidgoyal creator of calibre Posts: 44,526 Karma: 24495948 Join Date: Oct 2006 Location: Mumbai, India Device: Various	--extra-css