Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 01-31-2023, 04:28 PM   #1
rosewood
Member
rosewood began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Jan 2023
Device: fire hd 10
No Table of Contents and "This HTML file is larger than 260 KB" Error

Hello,
I'm trying to convert a 5.4 Mb plain text file to AZW3. There are 26 chapters in the file, which are of the form:

## Chapter 12 Hero's Return by Jack Straw

The Detect Chapters at XPATH expression in the Convert > Structure Detection is: //h:h2

I am using this same expression for the TOC > Level 1, Level 2 & Level 3 TOC filter in Convert >TOC.

The "Force use of auto-generated Table of Contents" option is ticked.

Unfortunately, no TOC is generated upon conversion.

When the book is opened in the Edit Book utility, one sees a single html file, part0000.html of size 6Mb. The Error check complains that:
"This HTML file is larger than 260 KB".

The html text for the chapter is :
<li class="calibre3"><span>CHAPTER</span> 12 Hero's Return by Jack Straw</li>

With the above information, here are my questions:

1) Is there a maximum size limit for a text file being input for conversion?
2) Is there a way to force the conversion to segment the html output into files of 260 Kb or less ?
3) What changes must I make to successfully generate the TOC?

Many thanks in advance!
rosewood is offline   Reply With Quote
Old 01-31-2023, 08:23 PM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,447
Karma: 58055868
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
EPUB defaults to 260K chunks.
Start with that conversion, then convert that

I quit trying to get conversions to do the heavy lifting and just use the Editor along with the TOC tool . It seems to take me less time (and fustration) to just do write some REGEX and set the H tags, then use the TOC tool:major headings
YMMV
theducks is offline   Reply With Quote
Advert
Old 01-31-2023, 09:22 PM   #3
rosewood
Member
rosewood began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Jan 2023
Device: fire hd 10
Thank you Ducks. But I am finding it hard to translate your reply to plain English.
Am I right with this interpretation:

1. Separate the Chapter headings and put them on top of the plaintext file in the correct order

2. Add h tags to them and put this block right on top of the textfile thus (string_n is the chapter descriptor for chapter n):

<h1> Chapter 1 string1 </h1>
<h1> Chapter 2 string2 </h1>
<h1> Chapter 3 string3 </h1>


Chapter 1 string1
The quick brown fox jumped over the lazy dog. And so on.......

----------------the rest of the text file---------------------------

3. Untick the force ToC generation box in the Conversion > TOC section

Will the converter automatically link the Chapter entries at the top of the file to their corresponding locations in the endproduct azw3 file?

Last edited by rosewood; 01-31-2023 at 09:26 PM. Reason: Forgt to ask question
rosewood is offline   Reply With Quote
Old 01-31-2023, 11:13 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,548
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
1. no
2. Output to epub instead of azw3 And convert the epub to azw3 later. However this restriction is not really applicable to azw3
3. Look at the html it isnt using <h2> so your xpath will not work. Use XPAth that matches the actual html
kovidgoyal is offline   Reply With Quote
Old 02-01-2023, 07:11 AM   #5
Martinoptic
Bibliolater
Martinoptic ought to be getting tired of karma fortunes by now.Martinoptic ought to be getting tired of karma fortunes by now.Martinoptic ought to be getting tired of karma fortunes by now.Martinoptic ought to be getting tired of karma fortunes by now.Martinoptic ought to be getting tired of karma fortunes by now.Martinoptic ought to be getting tired of karma fortunes by now.Martinoptic ought to be getting tired of karma fortunes by now.Martinoptic ought to be getting tired of karma fortunes by now.Martinoptic ought to be getting tired of karma fortunes by now.Martinoptic ought to be getting tired of karma fortunes by now.Martinoptic ought to be getting tired of karma fortunes by now.
 
Martinoptic's Avatar
 
Posts: 5,849
Karma: 4147318
Join Date: Dec 2021
Location: England
Device: none
Or why not use libreoffice or Microsoft word to import the text file, then sort out your headings and save as docx before converting this new docx file to epub or azw3 with calibre? This should also get you a TOC.

Last edited by Martinoptic; 02-01-2023 at 07:13 AM. Reason: Clarify
Martinoptic is offline   Reply With Quote
Advert
Old 02-01-2023, 07:19 AM   #6
rosewood
Member
rosewood began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Jan 2023
Device: fire hd 10
Thank you Kovid. Before I continue, I want to thank you for your brilliant product Calibre. I have recently started using it and find its Look & Feel intuitive, its range of actions extensive and its capabilities powerful. It has only minor bugs, due no doubt, to its rapid evolution. Thank you for sharing this product with the world at large.

Now, regarding point 3 in your post:

Please read post #9 in my previous thread:
https://www.mobileread.com/forums/sh...d.php?t=351746

I repeated this method for the current book - ie putting ## prefixes before the chapter like so

## Chapter 4 Pottery in the Middle Ages

but instead of the expected:

<h2 id="chapter-4-pottery-in-the-middle-ages" class="calibre1">CHAPTER 4 Pottery in the Middle Ages</h2>

I got:

<li class="calibre3"><span>CHAPTER</span> 4 Pottery in the Middle Ages</li>

so for the current book, for unknown reasons, the program failed to translate the ## prefix into the h2 tags.

This post is also relevant:
https://www.mobileread.com/forums/sh...d.php?t=351898

But I will follow your advice to convert text to epub and then epub to azw3 and see if it fixes the problem.
rosewood is offline   Reply With Quote
Old 02-01-2023, 09:07 AM   #7
rosewood
Member
rosewood began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Jan 2023
Device: fire hd 10
Thank you all.
@Kovid: I tried *.txt -> *.epub -> *.azw3 as you suggested but the problem remained.

@MartinOptic: I tried *.txt -> *.docx (in MS Word) and then *.docx -> *.azw3 (in Calibre) as you suggested but the problem remained.

But I solved the problem as follows:

I highlighted the problem book in the book listing on the Calibre main screen

I clicked on the Edit Book icon in Calibre to open up the Edit utility

I exported the 6Mb part0000.html to another directory, then deleted this file in the Edit utility

Have a quick look at post #1 in this thread, where I wrote:

<li class="calibre3"><span>CHAPTER</span> 12 Hero's Return by Jack Straw</li>


I opened the exported part0000.html in a text editor (Wordpad) and replaced

<li class with <h2 class

and I then replaced

</li> with </h2>

then saved the file and shut down Wordpad.

I returned to tha Calibre Edit utility, imported the altered part0000.html and clicked on the Save icon.

Process complete. This generated the desired ToC.

From this I learnt that sometimes its best to avoid the at times dodgy conversion process in Calibre by using external software.
rosewood is offline   Reply With Quote
Old 02-01-2023, 09:15 AM   #8
rosewood
Member
rosewood began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Jan 2023
Device: fire hd 10
I spoke too soon in my previous post.
While the Calibre Book Viewer displayed the ToC, my Fire HD10 did not.
So its back to the drawing board, unfortunately.
rosewood is offline   Reply With Quote
Old 02-01-2023, 10:24 AM   #9
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 12,359
Karma: 92073397
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
You need to properly use paragraph styles in MS Word (or LO Writer) with the heading/outline level set properly, and List style off.

Calibre conversion from docx is practically perfect if the document is styled properly.
Quoth is offline   Reply With Quote
Old 02-01-2023, 10:59 AM   #10
rosewood
Member
rosewood began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Jan 2023
Device: fire hd 10
After noticing that the Calibre conversion utility translated my:

## Chapter 12 Hero's Return by Jack Straw

to

<li class="calibre3"><span>CHAPTER</span> 12 Hero's Return by Jack Straw</li>

I changed the Chapter detection XPath expression in Conversion > Structure detection and in Conversion > ToC to:

//*[(name()='h2' or name()='li')]

This produced the ToC, visible in both the Calibre Viewer and in the Fire HD10.

Thank you Quoth. For my applications, plain text input is easiest. Hopefully the above XPATH expression will see me through from now on. But if the conversion plays up again then I'll give properly styled *.docx a whirl.
rosewood is offline   Reply With Quote
Old 02-01-2023, 06:06 PM   #11
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 21,000
Karma: 27620706
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by Quoth View Post
You need to properly use paragraph styles in MS Word (or LO Writer) with the heading/outline level set properly, and List style off.

Calibre conversion from docx is practically perfect if the document is styled properly.
Quote:
Originally Posted by rosewood View Post
. . .

Thank you Quoth. For my applications, plain text input is easiest. Hopefully the above XPATH expression will see me through from now on. But if the conversion plays up again then I'll give properly styled *.docx a whirl.
FWIW - I loaded a plain text file of ~5,300 lines, ~44,000 words into MS Word last week. It was a 1989 Act of Parliament (since repealed) that obviously came from an OCR scan of the printed original - full of broken paragraphs, shambolic indentations, etc, etc.

It took me about 12 hours over several sessions to get a DOCX and a PDF that conform to the current standards for such documents, which are very specific. I wouldn't have bothered without the Word template I obtained from the parliamentary library.

BR

Last edited by BetterRed; 02-01-2023 at 06:35 PM.
BetterRed is offline   Reply With Quote
Old 02-02-2023, 12:48 AM   #12
Sarmat89
Fanatic
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 502
Karma: 2267928
Join Date: Nov 2015
Device: none
Quote:
Originally Posted by rosewood View Post
I'm trying to convert a 5.4 Mb plain text file to AZW3.
Are you sure that you selected 'markdown' as the text formatting option.
Sarmat89 is offline   Reply With Quote
Old 02-03-2023, 07:01 PM   #13
rosewood
Member
rosewood began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Jan 2023
Device: fire hd 10
Hi Sarmat,
I chose text as the input format & azw3 as the output. When you convert text into markdown, the output is virtually indistinguishable from the text input, or at least it is so for the trial conversions which I made using online txt to markdown converters.

In particular, the text string: ## Chapter textstring remains unchanged during the text to markdown conversion, which is what is important for chapter detection during the final conversion into azw3 format.
rosewood is offline   Reply With Quote
Old 02-07-2023, 12:58 PM   #14
Sarmat89
Fanatic
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 502
Karma: 2267928
Join Date: Nov 2015
Device: none
In the "TXT Input" tab, there is a "Formatting" option. Make sure it is set to "markdown" instead of "plain" or "auto".
Sarmat89 is offline   Reply With Quote
Reply

Tags
conversion errors, max input file size, table of contents


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Epub 2.0.1 validator error: "Error while parsing file: element "img" missing required justin-b-918 ePub 6 04-26-2022 11:02 AM
mktoc.pl: create table of contents in HTML file Pranananda Workshop 4 03-05-2013 12:57 AM
Generating a rough "table of contents" Vanguard3000 Calibre 5 01-09-2011 11:31 PM
TOO SLOW to open "Table of Contents" mdhuang Sony Reader 16 09-06-2007 11:29 PM


All times are GMT -4. The time now is 02:57 AM.


MobileRead.com is a privately owned, operated and funded community.