No Table of Contents and "This HTML file is larger than 260 KB" Error

rosewood · 01-31-2023, 03:28 PM

Hello,
I'm trying to convert a 5.4 Mb plain text file to AZW3. There are 26 chapters in the file, which are of the form:

## Chapter 12 Hero's Return by Jack Straw

The Detect Chapters at XPATH expression in the Convert > Structure Detection is: //h:h2

I am using this same expression for the TOC > Level 1, Level 2 & Level 3 TOC filter in Convert >TOC.

The "Force use of auto-generated Table of Contents" option is ticked.

Unfortunately, no TOC is generated upon conversion.

When the book is opened in the Edit Book utility, one sees a single html file, part0000.html of size 6Mb. The Error check complains that:
"This HTML file is larger than 260 KB".

The html text for the chapter is :
<li class="calibre3"><span>CHAPTER</span> 12 Hero's Return by Jack Straw</li>

With the above information, here are my questions:

1) Is there a maximum size limit for a text file being input for conversion?
2) Is there a way to force the conversion to segment the html output into files of 260 Kb or less ?
3) What changes must I make to successfully generate the TOC?

Many thanks in advance!

theducks · 01-31-2023, 07:23 PM

EPUB defaults to 260K chunks.
Start with that conversion, then convert that

I quit trying to get conversions to do the heavy lifting and just use the Editor along with the TOC tool . It seems to take me less time (and fustration) to just do write some REGEX and set the H tags, then use the TOC tool:major headings
YMMV

rosewood · 01-31-2023, 08:22 PM

Thank you Ducks. But I am finding it hard to translate your reply to plain English.
Am I right with this interpretation:

1. Separate the Chapter headings and put them on top of the plaintext file in the correct order

2. Add h tags to them and put this block right on top of the textfile thus (string_n is the chapter descriptor for chapter n):

<h1> Chapter 1 string1 </h1>
<h1> Chapter 2 string2 </h1>
<h1> Chapter 3 string3 </h1>

Chapter 1 string1
The quick brown fox jumped over the lazy dog. And so on.......

----------------the rest of the text file---------------------------

3. Untick the force ToC generation box in the Conversion > TOC section

Will the converter automatically link the Chapter entries at the top of the file to their corresponding locations in the endproduct azw3 file?

kovidgoyal · 01-31-2023, 10:13 PM

1. no
2. Output to epub instead of azw3 And convert the epub to azw3 later. However this restriction is not really applicable to azw3
3. Look at the html it isnt using <h2> so your xpath will not work. Use XPAth that matches the actual html

Martinoptic · 02-01-2023, 06:11 AM

Or why not use libreoffice or Microsoft word to import the text file, then sort out your headings and save as docx before converting this new docx file to epub or azw3 with calibre? This should also get you a TOC.

rosewood · 02-01-2023, 06:19 AM

Thank you Kovid. Before I continue, I want to thank you for your brilliant product Calibre. I have recently started using it and find its Look & Feel intuitive, its range of actions extensive and its capabilities powerful. It has only minor bugs, due no doubt, to its rapid evolution. Thank you for sharing this product with the world at large.

Now, regarding point 3 in your post:

Please read post #9 in my previous thread:
https://www.mobileread.com/forums/sh...d.php?t=351746

I repeated this method for the current book - ie putting ## prefixes before the chapter like so

## Chapter 4 Pottery in the Middle Ages

but instead of the expected:

<h2 id="chapter-4-pottery-in-the-middle-ages" class="calibre1">CHAPTER 4 Pottery in the Middle Ages</h2>

I got:

<li class="calibre3"><span>CHAPTER</span> 4 Pottery in the Middle Ages</li>

so for the current book, for unknown reasons, the program failed to translate the ## prefix into the h2 tags.

This post is also relevant:
https://www.mobileread.com/forums/sh...d.php?t=351898

But I will follow your advice to convert text to epub and then epub to azw3 and see if it fixes the problem.

rosewood · 02-01-2023, 08:07 AM

Thank you all.
@Kovid: I tried *.txt -> *.epub -> *.azw3 as you suggested but the problem remained.

@MartinOptic: I tried *.txt -> *.docx (in MS Word) and then *.docx -> *.azw3 (in Calibre) as you suggested but the problem remained.

But I solved the problem as follows:

I highlighted the problem book in the book listing on the Calibre main screen

I clicked on the Edit Book icon in Calibre to open up the Edit utility

I exported the 6Mb part0000.html to another directory, then deleted this file in the Edit utility

Have a quick look at post #1 in this thread, where I wrote:

<li class="calibre3"><span>CHAPTER</span> 12 Hero's Return by Jack Straw</li>

I opened the exported part0000.html in a text editor (Wordpad) and replaced

<li class with <h2 class

and I then replaced

</li> with </h2>

then saved the file and shut down Wordpad.

I returned to tha Calibre Edit utility, imported the altered part0000.html and clicked on the Save icon.

Process complete. This generated the desired ToC.

From this I learnt that sometimes its best to avoid the at times dodgy conversion process in Calibre by using external software.

rosewood · 02-01-2023, 08:15 AM

I spoke too soon in my previous post.
While the Calibre Book Viewer displayed the ToC, my Fire HD10 did not.
So its back to the drawing board, unfortunately.

Quoth · 02-01-2023, 09:24 AM

You need to properly use paragraph styles in MS Word (or LO Writer) with the heading/outline level set properly, and List style off.

Calibre conversion from docx is practically perfect if the document is styled properly.

rosewood · 02-01-2023, 09:59 AM

After noticing that the Calibre conversion utility translated my:

## Chapter 12 Hero's Return by Jack Straw

to

<li class="calibre3"><span>CHAPTER</span> 12 Hero's Return by Jack Straw</li>

I changed the Chapter detection XPath expression in Conversion > Structure detection and in Conversion > ToC to:

//*[(name()='h2' or name()='li')]

This produced the ToC, visible in both the Calibre Viewer and in the Fire HD10.

Thank you Quoth. For my applications, plain text input is easiest. Hopefully the above XPATH expression will see me through from now on. But if the conversion plays up again then I'll give properly styled *.docx a whirl.

BetterRed · 02-01-2023, 05:06 PM

Quote:

Originally Posted by Quoth

You need to properly use paragraph styles in MS Word (or LO Writer) with the heading/outline level set properly, and List style off.

Calibre conversion from docx is practically perfect if the document is styled properly.

Quote:

Originally Posted by rosewood

. . .

Thank you Quoth. For my applications, plain text input is easiest. Hopefully the above XPATH expression will see me through from now on. But if the conversion plays up again then I'll give properly styled *.docx a whirl.

FWIW - I loaded a plain text file of ~5,300 lines, ~44,000 words into MS Word last week. It was a 1989 Act of Parliament (since repealed) that obviously came from an OCR scan of the printed original - full of broken paragraphs, shambolic indentations, etc, etc.

It took me about 12 hours over several sessions to get a DOCX and a PDF that conform to the current standards for such documents, which are very specific. I wouldn't have bothered without the Word template I obtained from the parliamentary library.

BR

Sarmat89 · 02-01-2023, 11:48 PM

Quote:

Originally Posted by rosewood

I'm trying to convert a 5.4 Mb plain text file to AZW3.

Are you sure that you selected 'markdown' as the text formatting option.

rosewood · 02-03-2023, 06:01 PM

Hi Sarmat,
I chose text as the input format & azw3 as the output. When you convert text into markdown, the output is virtually indistinguishable from the text input, or at least it is so for the trial conversions which I made using online txt to markdown converters.

In particular, the text string: ## Chapter textstring remains unchanged during the text to markdown conversion, which is what is important for chapter detection during the final conversion into azw3 format.

Sarmat89 · 02-07-2023, 11:58 AM

In the "TXT Input" tab, there is a "Formatting" option. Make sure it is set to "markdown" instead of "plain" or "auto".

01-31-2023, 08:22 PM	#3
rosewood Member Posts: 14 Karma: 10 Join Date: Jan 2023 Device: fire hd 10	Thank you Ducks. But I am finding it hard to translate your reply to plain English. Am I right with this interpretation: 1. Separate the Chapter headings and put them on top of the plaintext file in the correct order 2. Add h tags to them and put this block right on top of the textfile thus (string_n is the chapter descriptor for chapter n): <h1> Chapter 1 string1 </h1> <h1> Chapter 2 string2 </h1> <h1> Chapter 3 string3 </h1> Chapter 1 string1 The quick brown fox jumped over the lazy dog. And so on....... ----------------the rest of the text file--------------------------- 3. Untick the force ToC generation box in the Conversion > TOC section Will the converter automatically link the Chapter entries at the top of the file to their corresponding locations in the endproduct azw3 file? Last edited by rosewood; 01-31-2023 at 08:26 PM. Reason: Forgt to ask question

02-01-2023, 06:11 AM	#5
Martinoptic Bibliophist Posts: 7,011 Karma: 7173892 Join Date: Dec 2021 Location: England Device: none	Or why not use libreoffice or Microsoft word to import the text file, then sort out your headings and save as docx before converting this new docx file to epub or azw3 with calibre? This should also get you a TOC. Last edited by Martinoptic; 02-01-2023 at 06:13 AM. Reason: Clarify

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Epub 2.0.1 validator error: "Error while parsing file: element "img" missing required	justin-b-918	ePub	6	04-26-2022 10:02 AM
mktoc.pl: create table of contents in HTML file	Pranananda	Workshop	4	03-04-2013 11:57 PM
Generating a rough "table of contents"	Vanguard3000	Calibre	5	01-09-2011 10:31 PM
TOO SLOW to open "Table of Contents"	mdhuang	Sony Reader	16	09-06-2007 10:29 PM

01-31-2023, 03:28 PM	#1
rosewood Member Posts: 14 Karma: 10 Join Date: Jan 2023 Device: fire hd 10	No Table of Contents and "This HTML file is larger than 260 KB" Error Hello, I'm trying to convert a 5.4 Mb plain text file to AZW3. There are 26 chapters in the file, which are of the form: ## Chapter 12 Hero's Return by Jack Straw The Detect Chapters at XPATH expression in the Convert > Structure Detection is: //h:h2 I am using this same expression for the TOC > Level 1, Level 2 & Level 3 TOC filter in Convert >TOC. The "Force use of auto-generated Table of Contents" option is ticked. Unfortunately, no TOC is generated upon conversion. When the book is opened in the Edit Book utility, one sees a single html file, part0000.html of size 6Mb. The Error check complains that: "This HTML file is larger than 260 KB". The html text for the chapter is : <li class="calibre3"><span>CHAPTER</span> 12 Hero's Return by Jack Straw</li> With the above information, here are my questions: 1) Is there a maximum size limit for a text file being input for conversion? 2) Is there a way to force the conversion to segment the html output into files of 260 Kb or less ? 3) What changes must I make to successfully generate the TOC? Many thanks in advance!

01-31-2023, 07:23 PM	#2
theducks Well trained by Cats Posts: 31,047 Karma: 60358908 Join Date: Aug 2009 Location: The Central Coast of California Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A	EPUB defaults to 260K chunks. Start with that conversion, then convert that I quit trying to get conversions to do the heavy lifting and just use the Editor along with the TOC tool . It seems to take me less time (and fustration) to just do write some REGEX and set the H tags, then use the TOC tool:major headings YMMV

01-31-2023, 10:13 PM	#4
kovidgoyal creator of calibre Posts: 45,345 Karma: 27182818 Join Date: Oct 2006 Location: Mumbai, India Device: Various	1. no 2. Output to epub instead of azw3 And convert the epub to azw3 later. However this restriction is not really applicable to azw3 3. Look at the html it isnt using <h2> so your xpath will not work. Use XPAth that matches the actual html

02-01-2023, 06:19 AM	#6
rosewood Member Posts: 14 Karma: 10 Join Date: Jan 2023 Device: fire hd 10	Thank you Kovid. Before I continue, I want to thank you for your brilliant product Calibre. I have recently started using it and find its Look & Feel intuitive, its range of actions extensive and its capabilities powerful. It has only minor bugs, due no doubt, to its rapid evolution. Thank you for sharing this product with the world at large. Now, regarding point 3 in your post: Please read post #9 in my previous thread: https://www.mobileread.com/forums/sh...d.php?t=351746 I repeated this method for the current book - ie putting ## prefixes before the chapter like so ## Chapter 4 Pottery in the Middle Ages but instead of the expected: <h2 id="chapter-4-pottery-in-the-middle-ages" class="calibre1">CHAPTER 4 Pottery in the Middle Ages</h2> I got: <li class="calibre3"><span>CHAPTER</span> 4 Pottery in the Middle Ages</li> so for the current book, for unknown reasons, the program failed to translate the ## prefix into the h2 tags. This post is also relevant: https://www.mobileread.com/forums/sh...d.php?t=351898 But I will follow your advice to convert text to epub and then epub to azw3 and see if it fixes the problem.

02-01-2023, 08:07 AM	#7
rosewood Member Posts: 14 Karma: 10 Join Date: Jan 2023 Device: fire hd 10	Thank you all. @Kovid: I tried .txt -> .epub -> .azw3 as you suggested but the problem remained. @MartinOptic: I tried .txt -> .docx (in MS Word) and then .docx -> .azw3 (in Calibre) as you suggested but the problem remained. But I solved the problem as follows: I highlighted the problem book in the book listing on the Calibre main screen I clicked on the Edit Book icon in Calibre to open up the Edit utility I exported the 6Mb part0000.html to another directory, then deleted this file in the Edit utility Have a quick look at post #1 in this thread, where I wrote: <li class="calibre3"><span>CHAPTER</span> 12 Hero's Return by Jack Straw</li> I opened the exported part0000.html in a text editor (Wordpad) and replaced <li class* with <h2 class and I then replaced </li> with </h2> then saved the file and shut down Wordpad. I returned to tha Calibre Edit utility, imported the altered part0000.html and clicked on the Save icon. Process complete. This generated the desired ToC. From this I learnt that sometimes its best to avoid the at times dodgy conversion process in Calibre by using external software.

02-01-2023, 08:15 AM	#8
rosewood Member Posts: 14 Karma: 10 Join Date: Jan 2023 Device: fire hd 10	I spoke too soon in my previous post. While the Calibre Book Viewer displayed the ToC, my Fire HD10 did not. So its back to the drawing board, unfortunately.

02-01-2023, 09:24 AM	#9
Quoth Still reading Posts: 14,016 Karma: 105092227 Join Date: Jun 2017 Location: Ireland Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper	You need to properly use paragraph styles in MS Word (or LO Writer) with the heading/outline level set properly, and List style off. Calibre conversion from docx is practically perfect if the document is styled properly.

02-01-2023, 09:59 AM	#10
rosewood Member Posts: 14 Karma: 10 Join Date: Jan 2023 Device: fire hd 10	After noticing that the Calibre conversion utility translated my: ## Chapter 12 Hero's Return by Jack Straw to <li class="calibre3"><span>CHAPTER</span> 12 Hero's Return by Jack Straw</li> I changed the Chapter detection XPath expression in Conversion > Structure detection and in Conversion > ToC to: //[(name()='h2' or name()='li')] This produced the ToC, visible in both the Calibre Viewer and in the Fire HD10. Thank you Quoth. For my applications, plain text input is easiest. Hopefully the above XPATH expression will see me through from now on. But if the conversion plays up again then I'll give properly styled .docx a whirl.

02-03-2023, 06:01 PM	#13
rosewood Member Posts: 14 Karma: 10 Join Date: Jan 2023 Device: fire hd 10	Hi Sarmat, I chose text as the input format & azw3 as the output. When you convert text into markdown, the output is virtually indistinguishable from the text input, or at least it is so for the trial conversions which I made using online txt to markdown converters. In particular, the text string: ## Chapter textstring remains unchanged during the text to markdown conversion, which is what is important for chapter detection during the final conversion into azw3 format.

02-07-2023, 11:58 AM	#14
Sarmat89 Fanatic Posts: 516 Karma: 2268308 Join Date: Nov 2015 Device: none	In the "TXT Input" tab, there is a "Formatting" option. Make sure it is set to "markdown" instead of "plain" or "auto".

Advert

Advert