01-23-2015, 09:58 AM | #46 |
Well trained by Cats
Posts: 30,378
Karma: 58053698
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Nice
|
01-23-2015, 09:59 AM | #47 | |
Wizard
Posts: 1,085
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
Quote:
|
|
Advert | |
|
01-23-2015, 10:56 AM | #48 | |
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
First of all, congrats on finally adding in the Reports functionality! I will have to mess around with it in the next few weeks. It is quite helpful on some of the extremely large projects I have been working on lately (Sigil chugs on these absolutely massive files).
Quote:
Use Case #1: The Links Report is extremely helpful when you are cleaning up HTML files. I use it all the time when I pull a series of HTML articles off of a website to convert into an EPUB. Let us say I wanted to strip all of the links in the book, or remove all of the amazon.com links, but keep the ones pointing to cde.com + xyz.com, I can easily sort + spot those and remove them. Use Case #2: Also, if you are working on newer books that were exported from OCR (Finereader), it tries to do its best to digitize the links from the original PDF (sometimes gets it wrong if it was broken across lines). So on the visual surface, the link looks perfectly fine, but the link itself is broken. For example, a link might look like this: Code:
<a href="http://www.sample.com/">http://www.sample.com/</a><a href="sample/sample.html">sample/sample.html</a> Here is four real life "OCR errors" I caught with the Links Report: Code:
<p>———. 1936. Liquidity. Minnesota Bankers Assoc. Available at: <a href="http://www">http://www</a>.</p> <p>24hgold.com/viewcompanyarticle.aspx?langue = en&articleId = 217737</p> <p>Nobelprize.org. 2008. John Nash interview, September, 2004. Retrieved January 15, 2008 from <a href="http://nobelprize.org/mediaplayer/index.php?id">http://nobelprize.org/mediaplayer/index.php?id</a> = 429</p> <p>Montaigne, Michel de. “The Profit of One Man is the Damage of Another.” <span style="font-style:italic;">Essays.</span> Chapter XXI. <a href="http://www.uoregon.edu/%7Erbear/montaigne/">http://www.uoregon.edu/%7Erbear/montaigne/</a> 1xxi.htm</p> <p>Development.” Free-Market News Network, February 14 and 15, at <a href="http://www.freemarketnews.com/Analysis/241/6939/notes.asp?wid">http://www.freemarketnews.com/Analysis/241/6939/notes.asp?wid</a>=241& nid=6939 and http:// <a href="http://www.freemarketnews.com/">www.freemarketnews.com/</a> Analysis/241/6949/notes.asp? wid=241&nid= 6949.</p> It is also extremely helpful when catching inconsistencies in what text is actually wrapped up in the <a> tags. For example, I digitized an entire Journal, at the bottom, it might say something like: Code:
<p>Please contact the <a href="http://samplesite.com">Sample Site</a>.</p> Code:
<p>Please contact the <a href="http://samplesite.com">Sample Sit</a>e.</p> Code:
<p>Please contact the <a href="http://samplesite.com">sample site</a>.</p> These are typically very hard to catch with just your naked eye, or even a quick perusal over the code, unless you knew EXACTLY what you were looking for (and even then, easy to miss). Use Case #4: It is VERY helpful in catching absolutely useless links. For example, Finereader exports a lot of phantom "bookmark##" links: When you are cleaning out all the cruft, the Links Report makes it very easy (as you can see, Finereader also exports "footnote##" links). This is helpful when you want to get rid of as much useless code as possible, and to spot if you actually did remove it all. Finereader 12 even introduced this cursed "caption#" class... which in all cases I have seen, is 100% worthless. Most of the time I forget to even look for it, and I just accidentally stumble on it when I am looking at the Links Report. Last edited by Tex2002ans; 01-23-2015 at 11:40 AM. |
|
01-23-2015, 11:09 PM | #49 |
creator of calibre
Posts: 44,356
Karma: 23708270
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Sigh, it never ends...
Here you go: https://github.com/kovidgoyal/calibr...27759f74e8d188 It even has a live preview of the link destination, and you can double click to jump to either the link definition or its destination in the editor. |
01-24-2015, 12:39 AM | #50 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Hi
I was unable in the last weeks to intervene due to severe bandwitdth (...) constraints in China which lasted a little more than one month. Your new report feature is absolutely brilliant and I must congratulate you for implementing it so neatly. I hurried to check one EPUB. It worked beautifully! What astounded me most was the report on the used characters (112 as a grand total). As it happens, there were on this EPUB four otf subsetted fonts with a total of 525k including one regular (164k), one italic (153k). Among them, two (bold and bold-italic) were hardly used at all (only for some titles) but occupied respectively 108k and 100k. I use systematically the subsetting of fonts with the Editor but now, I can't help thinking that the next logical step would be to be able to downsize each font to its really used characters... I am sorry to make you suffer... Last edited by roger64; 01-24-2015 at 12:43 AM. Reason: otf |
Advert | |
|
01-24-2015, 05:15 AM | #51 |
creator of calibre
Posts: 44,356
Karma: 23708270
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Subsetting a font does reduce it to only used characters
|
01-24-2015, 06:05 AM | #52 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Thanks for your reply that surprised me. So otf fonts seem to be heavier than ttf ones. I use otf files from here:
http://sourceforge.net/projects/linu...bertine/5.3.0/ I thought that the downsizing was possibly not complete because 525k for 110 characters seemed to be a little bloated Two years ago I had prepared a regular ttf web-font of Linux Libertine using font-squirrel online service and this web-font had a 60k only size and had usually enough characters for a French novel. It is attached here. There was a downside though: any time I needed a Spanish or other foreign accent, I had to make a new web-font and this was a tedious process. So, as soon as you began subsetting fonts with the Editor, I stopped using it because also otf fonts are nicer (ligatures). I will follow on... |
01-24-2015, 08:29 AM | #53 |
creator of calibre
Posts: 44,356
Karma: 23708270
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Font files can have arbitrary amounts/types of data associated with every character. Font subsetting removes only the most common/standardised types of data. Some fonts can have extra, font foundry specific data tables, which font subsetting leaves alone since it knows nothing about them.
|
01-24-2015, 01:32 PM | #54 | |
Wizard
Posts: 1,165
Karma: 1410083
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
|
Quote:
My main case is to figure out what characters are used with embedded font(s) and which characters are missing and will supported from system fonts and - if possible what general font information for a character is available if there is no embedded font inside an eBook (because a font was deleted, forgotten to embed or what ever) The idea behind is to have a tool where it is possible to make a valid check for possible font related problems with devices. I know this way isn't perfect but it gives a bit more control. My ultimate wish is to have a tool where I can make a selection of fonts (e.g. installed fonts in a reader) and compare this with the fonts and used characters in an eBook. Something what I don't saw in any other program so far About my checks I can say, I am very happy. The Report shows all kind of files in an eBook and the character analysis shows all involved characters incl. all control characters. In the picture report I miss the information of picture type (bw, grayscale, color). There comes up one whish: The possibility to mark / copy selected elements of information columns or lines entries to clipboard (via context menu). About the Links Report of Sigil. I do not often use this, but if I have link problems and I can’t find the problems quick then I took a first look with Sigil too. For simple book structures I can do those things by hand but in other cases with complex structures this report helps me to become a better overview. This is mostly the case if I work with complex web documents with a lot of crosslinks between files. |
|
01-24-2015, 06:40 PM | #55 | ||
creator of calibre
Posts: 44,356
Karma: 23708270
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Quote:
Quote:
|
||
01-24-2015, 09:35 PM | #56 | |
Ex-Helpdesk Junkie
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Quote:
Screenshot if anyone wants to know what it looks like. |
|
01-26-2015, 12:07 PM | #57 |
Guru
Posts: 790
Karma: 6528026
Join Date: Sep 2012
Device: Kobo Elipsa
|
Do you (would you, could you) have plans to add a report that shows orphan classes used in html files that are NOT in the CSS?
|
01-26-2015, 12:16 PM | #58 |
Ex-Helpdesk Junkie
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Orphaned CSS can already be removed with a longstanding tool.
|
01-26-2015, 12:35 PM | #59 |
Guru
Posts: 790
Karma: 6528026
Join Date: Sep 2012
Device: Kobo Elipsa
|
Which tool? I don't want them removed. I want to know which classes are orphaned in the HTML so that I can add them to the CSS.
Last edited by icallaci; 01-27-2015 at 10:01 AM. |
01-26-2015, 12:45 PM | #60 |
Well trained by Cats
Posts: 30,378
Karma: 58053698
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Editor: ToC Editor: Start entry | macnab69 | Editor | 2 | 06-25-2014 11:15 AM |
Book Editor TOC Editor Isue? | weberr | Editor | 2 | 04-17-2014 11:13 AM |
PRS-600 Features I really would like to see... | eosrose | Sony Reader | 5 | 10-01-2010 05:36 AM |
I am looking for the ff. features in an eReader | chris1 | Which one should I buy? | 1 | 02-07-2010 11:15 AM |
Right now, you can have 2 of 3 features? | surrealmind | Which one should I buy? | 10 | 01-03-2010 10:08 PM |