Load of html debugging

pedz · 03-20-2010, 02:17 PM

I was able to load and convert a small html file with Calibre but I could not even load this html file: HTML 5

So far, no reader really likes that file. But is there any way to get some type of debug from Calibre so I can find the section it does not like and remove it?

Thank you,
Perry

kovidgoyal · 03-20-2010, 10:45 PM

Conversion logs are available by clicking the rotating spinner in the bottom right corner.

pedz · 03-23-2010, 12:17 PM

Either you misunderstood me or I am misunderstanding you.

The error is not when I am doing the conversion. It is when I am "adding" the book. e.g. I go to Add Book and click the top menu item. Then pick the html source and hit Open. I get a dialog box that says "Adding..." and it is blank except for "Add...". After a few minutes, it stops with an error message that reads: ERROR: Adding failed: The add books process seems to have hung. Try restarting calibre and adding the books in smaller increments, until you find the problem book.

The html I am trying to load is pointed to by my original post. I'm happy to try and help debug this.

Starson17 · 03-23-2010, 02:04 PM

Quote:

Originally Posted by pedz

The error is not when I am doing the conversion. It is when I am "adding" the book.

I looked at this, and like you, I had trouble adding it. My first thought was that Calibre was hanging when reading metadata from the file, so I turned on the option to get metadata from the filename. That didn't help. I don't know if Calibre still opens the file even if that option is set.

My second thought is that Calibre will zip up the file and all linked images, etc. it needs. I suspect that's where it's hanging - trying to find additional files it needs. Why don't you try to zip it up manually, then add the zip file to Calibre. If that works, try to view it.

If all that works, you could try splitting the html up into pieces to find the piece causing the problem. If there's a log that shows errors found during the add process, I don't know where it's located.

pedz · 03-23-2010, 02:23 PM

Progress...

I wrote a script to remove all the comments. The comments being defined as:



I can now Add the file. When I try and generate the eBook, I get this error message

Starson17 · 03-23-2010, 02:33 PM

Quote:

Originally Posted by pedz

Progress...

I wrote a script to remove all the comments. The comments being defined as:



I can now Add the file. When I try and generate the eBook, I get this error message

Take a look at the EPUB conversion output options. There's a split size parameter there that IIRC was in the 280 range. Your log indicates a split failure that's just a bit larger than that in the 300 range. I have no idea if this is even the same parameter, but try increasing it to 400 and see if that helps. It will only take a moment to test.

pedz · 03-23-2010, 02:42 PM

Ok thanks.

Can you help me understand the debug output? e.g. Split point: {http://www.w3.org/1999/xhtml}div /*/*[2]/*[4]

I'm thinking may be I can modify the source as an alternative but I'm not understanding where that is pointing me to.

Thanks again.

pedz · 03-23-2010, 02:58 PM

Quote:

Originally Posted by Starson17

Take a look at the EPUB conversion output options. There's a split size parameter there that IIRC was in the 280 range. Your log indicates a split failure that's just a bit larger than that in the 300 range. I have no idea if this is even the same parameter, but try increasing it to 400 and see if that helps. It will only take a moment to test.

Ok. The input file is 4Meg so I moved this up to 5Meg and it got through the processing and I'm able to view it in the viewer on my computer (Mac). I forgot my "nook" at home so I can not see if the nook can open it but from the comments of that parameter, I fear that it will not be able to.

Perhaps Kovid can comment on how best to set this option for huge html files and what exactly is the "splitting" doing.

DoctorOhh · 03-23-2010, 11:58 PM

Quote:

Originally Posted by pedz

Ok. The input file is 4Meg so I moved this up to 5Meg and it got through the processing and I'm able to view it in the viewer on my computer (Mac). I forgot my "nook" at home so I can not see if the nook can open it but from the comments of that parameter, I fear that it will not be able to.

This setting is reader specific. Wallcraft talks about Sony's limit for the PRS-505.

Quote:

Originally Posted by wallcraft

For most authors, Adobe DE's requirement (for the Sony and, eventually, other handheld devices) of < 300 KB per ePub XML file is easily met by using one file per chapter.

Inside each epub (which is just a zip file at heart) are the xml files that make up the book. If any 1 segment is larger then 300k the PRS-505 will choke on the book. Handling the book in chunks makes it easier for low powered devices to process.

Let us know if this works OK on your Nook.

pedz · 03-24-2010, 12:18 AM

I'm still having trouble. You are correct. If I set the value too big, the nook can not open the book.

I've switched to the command line interface so I can repeat my tests. My script current looks like:

Code:

#!/bin/sh

/usr/bin/ebook-convert \
  out.html \
  test.epub \
  -v -v \
  --output-profile nook \
  --max-levels 0 \
  --flow-size 300 \
  --chapter '//*[name()='h2' or name()='h3']' \
  --chapter-mark pagebreak \
  --page-breaks-before '//*[name()='h2' or name()='h3']' \
  --level1-toc '//h:h2' \
  --level2-toc '//h:h3' \
  --level3-toc '//h:h4' \
  --language en \
  --authors "Ian Hickson, David Hyatt, et. al." \
  --pubdate "$( date '+%b %d, %Y' )" \
  --publisher WhatWg.org

I run the above script and then I unzip the test.epub file into its own directory. The files range up to 343153 bytes. 300 * 1024 is only 307200.

The other problem is I am only getting a total of 40 files but if I grep the source for h2 and h3 tags, I hit about 126 of them. So, I'm not understanding how to force things into smaller pieces.

If I remove the flow-size option, the converter dies with a "tree" that is too big.

kovidgoyal · 03-24-2010, 12:30 AM

You need a flow size limit of 260 A tree too big error means somewhere in your html files is one with a lot of unstructured text that calibre cannot find a decent point to split at.

DoctorOhh · 03-24-2010, 12:30 AM

I am at a loss. the only time I ran into this I was fortunate to be able to use a different source file.

You might want to review this thread over in the Sigil forum. Sigil is an epub editor and it sounds like they are talking about the same subject.

Good Luck!

pedz · 03-24-2010, 02:20 PM

Quote:

Originally Posted by kovidgoyal

You need a flow size limit of 260 A tree too big error means somewhere in your html files is one with a lot of unstructured text that calibre cannot find a decent point to split at.

Does the debug give me a clue as to where this point in the source is?

I'm trying to make each h2 or h3 start a new file but I can't seem to do that.

kovidgoyal · 03-24-2010, 11:28 PM

look at the conversion log, it contains plenty of information about what file is currently being split

pedz · 03-27-2010, 01:30 AM

So, I finally have all my parts less than 260K but the book still will not come up in the nook. When I hit the open button, it flashes a few times and then goes back to the list of books.

Any suggestions of how to debug this at this point?

Thank you

03-20-2010, 02:17 PM	#1
pedz Nameless Being	Load of html debugging I was able to load and convert a small html file with Calibre but I could not even load this html file: HTML 5 So far, no reader really likes that file. But is there any way to get some type of debug from Calibre so I can find the section it does not like and remove it? Thank you, Perry

03-24-2010, 12:18 AM	#10
pedz Nameless Being	I'm still having trouble. You are correct. If I set the value too big, the nook can not open the book. I've switched to the command line interface so I can repeat my tests. My script current looks like: Code: #!/bin/sh /usr/bin/ebook-convert \ out.html \ test.epub \ -v -v \ --output-profile nook \ --max-levels 0 \ --flow-size 300 \ --chapter '//[name()='h2' or name()='h3']' \ --chapter-mark pagebreak \ --page-breaks-before '//[name()='h2' or name()='h3']' \ --level1-toc '//h:h2' \ --level2-toc '//h:h3' \ --level3-toc '//h:h4' \ --language en \ --authors "Ian Hickson, David Hyatt, et. al." \ --pubdate "$( date '+%b %d, %Y' )" \ --publisher WhatWg.org I run the above script and then I unzip the test.epub file into its own directory. The files range up to 343153 bytes. 300 * 1024 is only 307200. The other problem is I am only getting a total of 40 files but if I grep the source for h2 and h3 tags, I hit about 126 of them. So, I'm not understanding how to force things into smaller pieces. If I remove the flow-size option, the converter dies with a "tree" that is too big.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Merging multiple HTML files into one HTML file	skoobwoman	Workshop	45	07-11-2014 11:46 AM
HTML load times drastically slower	motormanjh	Calibre	5	10-14-2010 09:44 PM
Calibre Recipe HTML content differs from raw html of index.html.	krunk	Calibre	4	09-20-2010 10:48 PM
PRS-500 Tools for debugging javascript ...?	Clemenseken	Sony Reader Dev Corner	6	05-03-2008 03:51 PM
iLiad Debugging and the iLiad	scotty1024	iRex Developer's Corner	2	10-23-2006 04:43 PM

03-20-2010, 10:45 PM	#2
kovidgoyal creator of calibre Posts: 44,509 Karma: 24495778 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Conversion logs are available by clicking the rotating spinner in the bottom right corner.

03-23-2010, 12:17 PM	#3
pedz Nameless Being	Either you misunderstood me or I am misunderstanding you. The error is not when I am doing the conversion. It is when I am "adding" the book. e.g. I go to Add Book and click the top menu item. Then pick the html source and hit Open. I get a dialog box that says "Adding..." and it is blank except for "Add...". After a few minutes, it stops with an error message that reads: ERROR: Adding failed: The add books process seems to have hung. Try restarting calibre and adding the books in smaller increments, until you find the problem book. The html I am trying to load is pointed to by my original post. I'm happy to try and help debug this.

03-23-2010, 02:23 PM	#5
pedz Nameless Being	Progress... I wrote a script to remove all the comments. The comments being defined as: <!-- xxxxx --> I can now Add the file. When I try and generate the eBook, I get this error message

03-23-2010, 02:42 PM	#7
pedz Nameless Being	Ok thanks. Can you help me understand the debug output? e.g. Split point: {http://www.w3.org/1999/xhtml}div //[2]/*[4] I'm thinking may be I can modify the source as an alternative but I'm not understanding where that is pointing me to. Thanks again.

03-24-2010, 12:30 AM	#11
kovidgoyal creator of calibre Posts: 44,509 Karma: 24495778 Join Date: Oct 2006 Location: Mumbai, India Device: Various	You need a flow size limit of 260 A tree too big error means somewhere in your html files is one with a lot of unstructured text that calibre cannot find a decent point to split at.

03-24-2010, 12:30 AM	#12
DoctorOhh US Navy, Retired Posts: 9,867 Karma: 13806776 Join Date: Feb 2009 Location: North Carolina Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen	I am at a loss. the only time I ran into this I was fortunate to be able to use a different source file. You might want to review this thread over in the Sigil forum. Sigil is an epub editor and it sounds like they are talking about the same subject. Good Luck!

03-24-2010, 11:28 PM	#14
kovidgoyal creator of calibre Posts: 44,509 Karma: 24495778 Join Date: Oct 2006 Location: Mumbai, India Device: Various	look at the conversion log, it contains plenty of information about what file is currently being split

03-27-2010, 01:30 AM	#15
pedz Nameless Being	So, I finally have all my parts less than 260K but the book still will not come up in the nook. When I hit the open button, it flashes a few times and then goes back to the list of books. Any suggestions of how to debug this at this point? Thank you

Advert

Advert