|
|
Thread Tools | Search this Thread |
01-31-2023, 04:28 PM | #1 |
Member
Posts: 14
Karma: 10
Join Date: Jan 2023
Device: fire hd 10
|
No Table of Contents and "This HTML file is larger than 260 KB" Error
Hello,
I'm trying to convert a 5.4 Mb plain text file to AZW3. There are 26 chapters in the file, which are of the form: ## Chapter 12 Hero's Return by Jack Straw The Detect Chapters at XPATH expression in the Convert > Structure Detection is: //h:h2 I am using this same expression for the TOC > Level 1, Level 2 & Level 3 TOC filter in Convert >TOC. The "Force use of auto-generated Table of Contents" option is ticked. Unfortunately, no TOC is generated upon conversion. When the book is opened in the Edit Book utility, one sees a single html file, part0000.html of size 6Mb. The Error check complains that: "This HTML file is larger than 260 KB". The html text for the chapter is : <li class="calibre3"><span>CHAPTER</span> 12 Hero's Return by Jack Straw</li> With the above information, here are my questions: 1) Is there a maximum size limit for a text file being input for conversion? 2) Is there a way to force the conversion to segment the html output into files of 260 Kb or less ? 3) What changes must I make to successfully generate the TOC? Many thanks in advance! |
01-31-2023, 08:23 PM | #2 |
Well trained by Cats
Posts: 30,447
Karma: 58055868
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
EPUB defaults to 260K chunks.
Start with that conversion, then convert that I quit trying to get conversions to do the heavy lifting and just use the Editor along with the TOC tool . It seems to take me less time (and fustration) to just do write some REGEX and set the H tags, then use the TOC tool:major headings YMMV |
Advert | |
|
01-31-2023, 09:22 PM | #3 |
Member
Posts: 14
Karma: 10
Join Date: Jan 2023
Device: fire hd 10
|
Thank you Ducks. But I am finding it hard to translate your reply to plain English.
Am I right with this interpretation: 1. Separate the Chapter headings and put them on top of the plaintext file in the correct order 2. Add h tags to them and put this block right on top of the textfile thus (string_n is the chapter descriptor for chapter n): <h1> Chapter 1 string1 </h1> <h1> Chapter 2 string2 </h1> <h1> Chapter 3 string3 </h1> Chapter 1 string1 The quick brown fox jumped over the lazy dog. And so on....... ----------------the rest of the text file--------------------------- 3. Untick the force ToC generation box in the Conversion > TOC section Will the converter automatically link the Chapter entries at the top of the file to their corresponding locations in the endproduct azw3 file? Last edited by rosewood; 01-31-2023 at 09:26 PM. Reason: Forgt to ask question |
01-31-2023, 11:13 PM | #4 |
creator of calibre
Posts: 44,548
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
1. no
2. Output to epub instead of azw3 And convert the epub to azw3 later. However this restriction is not really applicable to azw3 3. Look at the html it isnt using <h2> so your xpath will not work. Use XPAth that matches the actual html |
02-01-2023, 07:11 AM | #5 |
Bibliolater
Posts: 5,849
Karma: 4147318
Join Date: Dec 2021
Location: England
Device: none
|
Or why not use libreoffice or Microsoft word to import the text file, then sort out your headings and save as docx before converting this new docx file to epub or azw3 with calibre? This should also get you a TOC.
Last edited by Martinoptic; 02-01-2023 at 07:13 AM. Reason: Clarify |
Advert | |
|
02-01-2023, 07:19 AM | #6 |
Member
Posts: 14
Karma: 10
Join Date: Jan 2023
Device: fire hd 10
|
Thank you Kovid. Before I continue, I want to thank you for your brilliant product Calibre. I have recently started using it and find its Look & Feel intuitive, its range of actions extensive and its capabilities powerful. It has only minor bugs, due no doubt, to its rapid evolution. Thank you for sharing this product with the world at large.
Now, regarding point 3 in your post: Please read post #9 in my previous thread: https://www.mobileread.com/forums/sh...d.php?t=351746 I repeated this method for the current book - ie putting ## prefixes before the chapter like so ## Chapter 4 Pottery in the Middle Ages but instead of the expected: <h2 id="chapter-4-pottery-in-the-middle-ages" class="calibre1">CHAPTER 4 Pottery in the Middle Ages</h2> I got: <li class="calibre3"><span>CHAPTER</span> 4 Pottery in the Middle Ages</li> so for the current book, for unknown reasons, the program failed to translate the ## prefix into the h2 tags. This post is also relevant: https://www.mobileread.com/forums/sh...d.php?t=351898 But I will follow your advice to convert text to epub and then epub to azw3 and see if it fixes the problem. |
02-01-2023, 09:07 AM | #7 |
Member
Posts: 14
Karma: 10
Join Date: Jan 2023
Device: fire hd 10
|
Thank you all.
@Kovid: I tried *.txt -> *.epub -> *.azw3 as you suggested but the problem remained. @MartinOptic: I tried *.txt -> *.docx (in MS Word) and then *.docx -> *.azw3 (in Calibre) as you suggested but the problem remained. But I solved the problem as follows: I highlighted the problem book in the book listing on the Calibre main screen I clicked on the Edit Book icon in Calibre to open up the Edit utility I exported the 6Mb part0000.html to another directory, then deleted this file in the Edit utility Have a quick look at post #1 in this thread, where I wrote: <li class="calibre3"><span>CHAPTER</span> 12 Hero's Return by Jack Straw</li> I opened the exported part0000.html in a text editor (Wordpad) and replaced <li class with <h2 class and I then replaced </li> with </h2> then saved the file and shut down Wordpad. I returned to tha Calibre Edit utility, imported the altered part0000.html and clicked on the Save icon. Process complete. This generated the desired ToC. From this I learnt that sometimes its best to avoid the at times dodgy conversion process in Calibre by using external software. |
02-01-2023, 09:15 AM | #8 |
Member
Posts: 14
Karma: 10
Join Date: Jan 2023
Device: fire hd 10
|
I spoke too soon in my previous post.
While the Calibre Book Viewer displayed the ToC, my Fire HD10 did not. So its back to the drawing board, unfortunately. |
02-01-2023, 10:24 AM | #9 |
the rook, bossing Never.
Posts: 12,359
Karma: 92073397
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
You need to properly use paragraph styles in MS Word (or LO Writer) with the heading/outline level set properly, and List style off.
Calibre conversion from docx is practically perfect if the document is styled properly. |
02-01-2023, 10:59 AM | #10 |
Member
Posts: 14
Karma: 10
Join Date: Jan 2023
Device: fire hd 10
|
After noticing that the Calibre conversion utility translated my:
## Chapter 12 Hero's Return by Jack Straw to <li class="calibre3"><span>CHAPTER</span> 12 Hero's Return by Jack Straw</li> I changed the Chapter detection XPath expression in Conversion > Structure detection and in Conversion > ToC to: //*[(name()='h2' or name()='li')] This produced the ToC, visible in both the Calibre Viewer and in the Fire HD10. Thank you Quoth. For my applications, plain text input is easiest. Hopefully the above XPATH expression will see me through from now on. But if the conversion plays up again then I'll give properly styled *.docx a whirl. |
02-01-2023, 06:06 PM | #11 | ||
null operator (he/him)
Posts: 21,000
Karma: 27620706
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
Quote:
It took me about 12 hours over several sessions to get a DOCX and a PDF that conform to the current standards for such documents, which are very specific. I wouldn't have bothered without the Word template I obtained from the parliamentary library. BR Last edited by BetterRed; 02-01-2023 at 06:35 PM. |
||
02-02-2023, 12:48 AM | #12 |
Fanatic
Posts: 502
Karma: 2267928
Join Date: Nov 2015
Device: none
|
|
02-03-2023, 07:01 PM | #13 |
Member
Posts: 14
Karma: 10
Join Date: Jan 2023
Device: fire hd 10
|
Hi Sarmat,
I chose text as the input format & azw3 as the output. When you convert text into markdown, the output is virtually indistinguishable from the text input, or at least it is so for the trial conversions which I made using online txt to markdown converters. In particular, the text string: ## Chapter textstring remains unchanged during the text to markdown conversion, which is what is important for chapter detection during the final conversion into azw3 format. |
02-07-2023, 12:58 PM | #14 |
Fanatic
Posts: 502
Karma: 2267928
Join Date: Nov 2015
Device: none
|
In the "TXT Input" tab, there is a "Formatting" option. Make sure it is set to "markdown" instead of "plain" or "auto".
|
Tags |
conversion errors, max input file size, table of contents |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Epub 2.0.1 validator error: "Error while parsing file: element "img" missing required | justin-b-918 | ePub | 6 | 04-26-2022 11:02 AM |
mktoc.pl: create table of contents in HTML file | Pranananda | Workshop | 4 | 03-05-2013 12:57 AM |
Generating a rough "table of contents" | Vanguard3000 | Calibre | 5 | 01-09-2011 11:31 PM |
TOO SLOW to open "Table of Contents" | mdhuang | Sony Reader | 16 | 09-06-2007 11:29 PM |