New features in the editor - Page 4

theducks · 01-23-2015, 09:58 AM

Nice

phossler · 01-23-2015, 09:59 AM

Quote:

Barbecued Bear Paws

And it's SO hard to get fresh bear at the local supermarket these days

Tex2002ans · 01-23-2015, 10:56 AM

First of all, congrats on finally adding in the Reports functionality! I will have to mess around with it in the next few weeks. It is quite helpful on some of the extremely large projects I have been working on lately (Sigil chugs on these absolutely massive files).

Quote:

Originally Posted by kovidgoyal

I have no plans to add a links report. The Check Book tool already checks for broken links and allows you to jump to them, and the editor autocompletes href attributes.

I have come up with 4 Use Cases off of the top of my head on why the Sigil Links Report is extremely helpful (and why it should probably be done in Calibre's Reports as well).

Use Case #1:

The Links Report is extremely helpful when you are cleaning up HTML files. I use it all the time when I pull a series of HTML articles off of a website to convert into an EPUB.

Let us say I wanted to strip all of the links in the book, or remove all of the amazon.com links, but keep the ones pointing to cde.com + xyz.com, I can easily sort + spot those and remove them.

Use Case #2:

Also, if you are working on newer books that were exported from OCR (Finereader), it tries to do its best to digitize the links from the original PDF (sometimes gets it wrong if it was broken across lines). So on the visual surface, the link looks perfectly fine, but the link itself is broken.

For example, a link might look like this:

Code:

<a href="http://www.sample.com/">http://www.sample.com/</a><a href="sample/sample.html">sample/sample.html</a>

You would be able to easily spot this error in the Links Report. (Ok ok, I know, I know, horrible sample I came up with!

)

Here is four real life "OCR errors" I caught with the Links Report:

Code:

<p>———. 1936. Liquidity. Minnesota Bankers Assoc. Available at: <a href="http://www">http://www</a>.</p>

  <p>24hgold.com/viewcompanyarticle.aspx?langue = en&amp;articleId = 217737</p>

<p>Nobelprize.org. 2008. John Nash interview, September, 2004. Retrieved January 15, 2008 from <a href="http://nobelprize.org/mediaplayer/index.php?id">http://nobelprize.org/mediaplayer/index.php?id</a> = 429</p>

<p>Montaigne, Michel de. “The Profit of One Man is the Damage of Another.” <span style="font-style:italic;">Essays.</span> Chapter XXI. <a href="http://www.uoregon.edu/%7Erbear/montaigne/">http://www.uoregon.edu/%7Erbear/montaigne/</a>&nbsp;1xxi.htm</p>

<p>Development.” Free-Market News Network, February 14 and 15, at <a href="http://www.freemarketnews.com/Analysis/241/6939/notes.asp?wid">http://www.freemarketnews.com/Analysis/241/6939/notes.asp?wid</a>=241&amp;&nbsp;nid=6939 and http:// <a href="http://www.freemarketnews.com/">www.freemarketnews.com/</a> Analysis/241/6949/notes.asp?&nbsp;wid=241&amp;nid= 6949.</p>

Use Case #3:

It is also extremely helpful when catching inconsistencies in what text is actually wrapped up in the <a> tags. For example, I digitized an entire Journal, at the bottom, it might say something like:

Code:

<p>Please contact the <a href="http://samplesite.com">Sample Site</a>.</p>

and in another section of the book, it might say:

Code:

<p>Please contact the <a href="http://samplesite.com">Sample Sit</a>e.</p>

and:

Code:

<p>Please contact the <a href="http://samplesite.com">sample site</a>.</p>

If you sort the Links Report, you can also easily spot that something odd happened, because you would see "Sample Site" and "Sample Sit" and "sample site".

These are typically very hard to catch with just your naked eye, or even a quick perusal over the code, unless you knew EXACTLY what you were looking for (and even then, easy to miss).

Use Case #4:

It is VERY helpful in catching absolutely useless links. For example, Finereader exports a lot of phantom "bookmark##" links:

Click image for larger version

Name: LinksReport.png
Views: 226
Size: 14.3 KB
ID: 134097

When you are cleaning out all the cruft, the Links Report makes it very easy (as you can see, Finereader also exports "footnote##" links). This is helpful when you want to get rid of as much useless code as possible, and to spot if you actually did remove it all.

Finereader 12 even introduced this cursed "caption#" class... which in all cases I have seen, is 100% worthless. Most of the time I forget to even look for it, and I just accidentally stumble on it when I am looking at the Links Report.

kovidgoyal · 01-23-2015, 11:09 PM

Sigh, it never ends...

Here you go:

https://github.com/kovidgoyal/calibr...27759f74e8d188

It even has a live preview of the link destination, and you can double click to jump to either the link definition or its destination in the editor.

roger64 · 01-24-2015, 12:39 AM

Hi

I was unable in the last weeks to intervene due to severe bandwitdth (...) constraints in China which lasted a little more than one month. Your new report feature is absolutely brilliant and I must congratulate you for implementing it so neatly.

I hurried to check one EPUB. It worked beautifully! What astounded me most was the report on the used characters (112 as a grand total).

As it happens, there were on this EPUB four otf subsetted fonts with a total of 525k including one regular (164k), one italic (153k). Among them, two (bold and bold-italic) were hardly used at all (only for some titles) but occupied respectively 108k and 100k.

I use systematically the subsetting of fonts with the Editor but now, I can't help thinking that the next logical step would be to be able to downsize each font to its really used characters...

I am sorry to make you suffer...

kovidgoyal · 01-24-2015, 05:15 AM

Subsetting a font does reduce it to only used characters

roger64 · 01-24-2015, 06:05 AM

Thanks for your reply that surprised me. So otf fonts seem to be heavier than ttf ones. I use otf files from here:
http://sourceforge.net/projects/linu...bertine/5.3.0/

I thought that the downsizing was possibly not complete because 525k for 110 characters seemed to be a little bloated

Two years ago I had prepared a regular ttf web-font of Linux Libertine using font-squirrel online service and this web-font had a 60k only size and had usually enough characters for a French novel. It is attached here. There was a downside though: any time I needed a Spanish or other foreign accent, I had to make a new web-font and this was a tedious process.

So, as soon as you began subsetting fonts with the Editor, I stopped using it because also otf fonts are nicer (ligatures). I will follow on...

kovidgoyal · 01-24-2015, 08:29 AM

Font files can have arbitrary amounts/types of data associated with every character. Font subsetting removes only the most common/standardised types of data. Some fonts can have extra, font foundry specific data tables, which font subsetting leaves alone since it knows nothing about them.

Divingduck · 01-24-2015, 01:32 PM

Quote:

Originally Posted by kovidgoyal

@Divingduck:As for reporting what fonts are used for what characters, it is possible, but it would be fairly slow, and would only work for embedded fonts, since otherwise the font used is system dependent. Basically, it would use the code from the font subsetting tool.

Maybe it is possible to use a switch for this so that the main functionality works without font detection and if needed the user can enable a font detection. I guess this feature ins’t relevant and it makes no sense to slow down the general report for all.
My main case is to figure out what characters are used with embedded font(s) and which characters are missing and will supported from system fonts and - if possible what general font information for a character is available if there is no embedded font inside an eBook (because a font was deleted, forgotten to embed or what ever)

The idea behind is to have a tool where it is possible to make a valid check for possible font related problems with devices. I know this way isn't perfect but it gives a bit more control.
My ultimate wish is to have a tool where I can make a selection of fonts (e.g. installed fonts in a reader) and compare this with the fonts and used characters in an eBook. Something what I don't saw in any other program so far

About my checks I can say, I am very happy.

The Report shows all kind of files in an eBook and the character analysis shows all involved characters incl. all control characters. In the picture report I miss the information of picture type (bw, grayscale, color).

There comes up one whish: The possibility to mark / copy selected elements of information columns or lines entries to clipboard (via context menu).

About the Links Report of Sigil. I do not often use this, but if I have link problems and I can’t find the problems quick then I took a first look with Sigil too. For simple book structures I can do those things by hand but in other cases with complex structures this report helps me to become a better overview. This is mostly the case if I work with complex web documents with a lot of crosslinks between files.

kovidgoyal · 01-24-2015, 06:40 PM

Quote:

Originally Posted by Divingduck

The Report shows all kind of files in an eBook and the character analysis shows all involved characters incl. all control characters. In the picture report I miss the information of picture type (bw, grayscale, color).

Surely if the picture is grayscale the thumbnail will be gray, so it should be easy to spot gray scale images.

Quote:

There comes up one whish: The possibility to mark / copy selected elements of information columns or lines entries to clipboard (via context menu).

Use the Save button, which exports all data to a csv file, from where you can extract whatever you want using any spreadsheet program.

eschwartz · 01-24-2015, 09:35 PM

Quote:

Originally Posted by kovidgoyal

Sigh, it never ends...

Here you go:

https://github.com/kovidgoyal/calibr...27759f74e8d188

It even has a live preview of the link destination, and you can double click to jump to either the link definition or its destination in the editor.

Awesome, thanks!

Screenshot if anyone wants to know what it looks like.

icallaci · 01-26-2015, 12:07 PM

Do you (would you, could you) have plans to add a report that shows orphan classes used in html files that are NOT in the CSS?

eschwartz · 01-26-2015, 12:16 PM

Orphaned CSS can already be removed with a longstanding tool.

icallaci · 01-26-2015, 12:35 PM

Quote:

Originally Posted by eschwartz

Orphaned CSS can already be removed with a longstanding tool.

Which tool? I don't want them removed. I want to know which classes are orphaned in the HTML so that I can add them to the CSS.

theducks · 01-26-2015, 12:45 PM

Quote:

Originally Posted by icallaci

Which tool? I don't want it removed. I want to know which classes are orphaned in the HTML so that I can add them to the CSS.

Sigils report: Style classes in HTML files No entry in the 'Found in' for that case

01-24-2015, 12:39 AM	#50
roger64 Wizard Posts: 2,608 Karma: 3000161 Join Date: Jan 2009 Device: Kindle PW3 (wifi)	Hi I was unable in the last weeks to intervene due to severe bandwitdth (...) constraints in China which lasted a little more than one month. Your new report feature is absolutely brilliant and I must congratulate you for implementing it so neatly. I hurried to check one EPUB. It worked beautifully! What astounded me most was the report on the used characters (112 as a grand total). As it happens, there were on this EPUB four otf subsetted fonts with a total of 525k including one regular (164k), one italic (153k). Among them, two (bold and bold-italic) were hardly used at all (only for some titles) but occupied respectively 108k and 100k. I use systematically the subsetting of fonts with the Editor but now, I can't help thinking that the next logical step would be to be able to downsize each font to its really used characters... I am sorry to make you suffer... Last edited by roger64; 01-24-2015 at 12:43 AM. Reason: otf

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Editor: ToC Editor: Start entry	macnab69	Editor	2	06-25-2014 11:15 AM
Book Editor TOC Editor Isue?	weberr	Editor	2	04-17-2014 11:13 AM
PRS-600 Features I really would like to see...	eosrose	Sony Reader	5	10-01-2010 05:36 AM
I am looking for the ff. features in an eReader	chris1	Which one should I buy?	1	02-07-2010 11:15 AM
Right now, you can have 2 of 3 features?	surrealmind	Which one should I buy?	10	01-03-2010 10:08 PM

01-23-2015, 09:58 AM	#46
theducks Well trained by Cats Posts: 30,378 Karma: 58053698 Join Date: Aug 2009 Location: The Central Coast of California Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A	Nice

01-23-2015, 11:09 PM	#49
kovidgoyal creator of calibre Posts: 44,356 Karma: 23708270 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Sigh, it never ends... Here you go: https://github.com/kovidgoyal/calibr...27759f74e8d188 It even has a live preview of the link destination, and you can double click to jump to either the link definition or its destination in the editor.

01-24-2015, 05:15 AM	#51
kovidgoyal creator of calibre Posts: 44,356 Karma: 23708270 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Subsetting a font does reduce it to only used characters

01-24-2015, 08:29 AM	#53
kovidgoyal creator of calibre Posts: 44,356 Karma: 23708270 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Font files can have arbitrary amounts/types of data associated with every character. Font subsetting removes only the most common/standardised types of data. Some fonts can have extra, font foundry specific data tables, which font subsetting leaves alone since it knows nothing about them.

01-26-2015, 12:07 PM	#57
icallaci Guru Posts: 790 Karma: 6528026 Join Date: Sep 2012 Device: Kobo Elipsa	Do you (would you, could you) have plans to add a report that shows orphan classes used in html files that are NOT in the CSS?

01-26-2015, 12:16 PM	#58
eschwartz Ex-Helpdesk Junkie Posts: 19,421 Karma: 85397180 Join Date: Nov 2012 Location: The Beaten Path, USA, Roundworld, This Side of Infinity Device: Kindle Touch fw5.3.7 (Wifi only)	Orphaned CSS can already be removed with a longstanding tool.

Advert

Advert