12-11-2021, 11:54 PM | #1 |
Connoisseur
Posts: 80
Karma: 2137678
Join Date: Dec 2021
Location: Canada
Device: none
|
Indexing in SIGIL - How to add "See also..."
Hi folks. First post. I'm using SIGIL to create an ebook for a non-fiction book that I've already released in paperback and hardcover. It has a fairly extensive index with a lot of sub-topics and cross referencing.
I've managed to get everything to work except that I'm not sure how to approach adding "See also (topic here)" after the index topic. The index editor is too broad sweeping, so I've gone with marking the keywords in the code. That way I know it's going to stay in place. Q. What is the best way to add "See also" lines to the index? Also, each time I update the index, are those new lines going to get overwritten again? (I don't want to create new file; I want everything written into one file and to stay there). Any advice? Thanks. P.S. I'm also a web designer so the ebook xhtml stuff is a breeze. But the logic behind using SIGIL for this one index task remains a mystery. |
12-12-2021, 03:22 PM | #2 | ||||
Wizard
Posts: 2,304
Karma: 12587727
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Hey. Welcome to MobileRead!
Quote:
2. Is this a plain Index? Or is it a fully linked Index?
(If "parrots" moved from page 100->120, will you have to manually update everything? Or will the source file update itself?) Quote:
- - - 1. If you don't mind a plain index: Then insert it into the ebook as is. (This is what I do.) 2. If you insist on fully linking your index: Go back to the source files and generate Real Page Numbers (RPNs). (While you're at it, you can generate the EPUB PageList!) 3. Don't insert the index file into the ebook at all. (Many publishers decide to do this. I don't agree. A dumb/plain index is better than no index at all... even if "an ebook has search"... Indexes serve completely different purposes.) Quote:
Work from what you already have:
If you still want to go through with this... - - - Your best bet is probably: Step 1. Recreate Real Page Numbers (RPNs). (Marking the HTML with a <a id="page123"></a> where the pagebreaks occur.) PDF: Code:
This is an example that was split ----------------- between two pages. Code:
<p>This is an example that was split<a id="page123"></a> between two pages.</p> PDF: Spoiler:
HTML (Plain Index): Spoiler:
HTML (Linked Index): Spoiler:
Step 3. Convert your HTML <a href="page123"> links into:
You can then use Doitsu's "PageList" plugin for Sigil to generate the required RPN files for your EPUB. - - - - - - For more detailed information, see some of the recent topics on this:
If you need even more information, you may also want to look this up in your favorite search engines: Code:
RPNs Tex2002ans site:mobileread.com RPNs Hitch site:mobileread.com Indexes EPUB Tex2002ans site:mobileread.com Indexes EPUB Hitch site:mobileread.com We even covered how useful linked Indexes in an ebook even are... if the relevant text could be multiple "screens" away. For example, the famous: Quote:
It was designed to (dumbly) link to all references of a word/term. It wasn't designed to generate complicated Indexes + Index formatting like:
- - - Side Note: I'd argue that this "concordance" (a list of all usages of a word) is even worse than no Index at all! Side Note #2: I'd even argue this form: Code:
example [1], [2], [3] word [1], [2], [3] Last edited by Tex2002ans; 12-12-2021 at 04:34 PM. |
||||
12-12-2021, 03:46 PM | #3 |
Sigil Developer
Posts: 8,160
Karma: 5450818
Join Date: Nov 2009
Device: many
|
Yes, the Index Editor was just meant to do the grunt work. Once all words are marked and the index.xhtml has been created once, then you will need to edit the resulting index.xhtml and NOT regenerate the index or you will lose all changes. Then you will need to use regular expression search and replaces to add the "See also text and if desired links to it *inside* the index.
If the number of these to handle is too large, then think about creating the list of search and replacement regular expressions in Excel and then export it as CSV and importing from CSV into Sigil's Saved Searches and the running that Saved Search List (once). To see the required columns for Saved Searches, select a current set of saved searches and output them to csv and visually inspect it first. I have taken a text paired word list and used perl/python to create the search entries I wanted, imported it into Excel for cleanup and then loaded it into Sigil's Saved Searched and then ran them on the current file (index.html) Don't forget to make a Checkpoint of your epub just before running your imported saved search in case you make mistakes and need to retry after reverting to the Checkpoint. Last edited by KevinH; 12-12-2021 at 04:00 PM. |
12-12-2021, 08:00 PM | #4 |
Connoisseur
Posts: 80
Karma: 2137678
Join Date: Dec 2021
Location: Canada
Device: none
|
Thanks, Tex and Kevin.
Crap: I just wrote a decent, lengthy post with a lot of info, clicked "go advanced" and lost it all after I had to log in again. DOH! re; Index: I assumed I would have to add the "See also" after finishing the index because it made sense that it would just get erased with each overwrite. I can live with that. I'm taking the word for word link coding approach. Very tedious, but if I get through this process once, I won't have to do it again. I used Affinity Publisher for the print book layout. Excellent and cheap, BUT it does not export to anything but PDF for now. I have Scrivener, but I tried SIGIL because I could see that Scrivener was going to be a struggle in not offering me a WYSIWYG experience on the fly. I also can't code my way past "Hello World" so I don't really see myself doing anything beyond my current skill set —*a few GREP searches whose successful outcome is often more the result of dumb luck than skill. I'm going to reread your posts. You've given me much more to think about; including the usefulness of indexes in a search friendly environment. But then again, mine is a social/human psychology book, so it merits a little bit more work to flesh out. It has a glossary, too, so I can't really scrimp on the index too much. Also, I like SIGIL because it allows me to bounce back and forth between the programs I'm used to using for web coding. I do know my html and css, so at least I'm lucky there. Thanks again. |
12-12-2021, 08:06 PM | #5 |
Connoisseur
Posts: 80
Karma: 2137678
Join Date: Dec 2021
Location: Canada
Device: none
|
P.S. I haven't really used "checkpoints" but rather I just keep making backup copies as my work progresses. Takes up more real estate, but it's a workflow I'm used to.
|
12-13-2021, 01:42 AM | #6 | ||||
Wizard
Posts: 2,304
Karma: 12587727
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Won't lose anything that way. Quote:
Tip: Save files by proper date:
that will make sure your files sort in order alphabetically+chronologically. Checkpoints also are good if you're doing something major—like generating an Index. Makes it easy to jump back to the previous version just in case you messed up. Quote:
What if you copy/paste the text out of Affinity + into another program (like LibreOffice): Does it carry over the rich text formatting? (Italics, bold, hopefully indentation, etc.) If it does, then that'll be a little easier. Then you won't have to manually add the italics back in. Use that intermediate program to export your text to HTML. Quote:
Just work directly from the Print version you have + use regex (regular expressions) to linkify the page numbers. I explain exact methods in those linked threads, but I'll reword it slightly differently here: * * * * * * How to Make Your Index Links 0. Make sure you insert all your page number code at every page break! Code:
<p>This is an example<a id="page123"></a> of a split paragraph.</p> I rename this super merged book file to:
(Make sure you keep the index.xhtml file separate.) 2. Open your Index file. 3. Ctrl+F to open Sigil's Search/Replace. In the dropdowns, make sure these are selected:
+ uncheck the box for "Wrap". 4. Use regex to convert those dumb page numbers into links: Search: (\d+) Replace: <a href="../Text/merged.xhtml#page\1">\1</a> - - - - - - Side Note: If you don't know regular expressions... this is what all the parts are doing: Search:
Replace:
In Plain English, this regular expression is saying: "Hey, look for any number, grab it, then point to that page number in the merged file." - - - - - - 5. Click somewhere right before the very first entry, then press "Replace All". This will go from my "Plain Index" -> "Linked Index" above: Before: Code:
cats, 5 dogs, 123 parrots, 200 Code:
cats, <a href="../Text/merged.xhtml#page5">5</a> dogs, <a href="../Text/merged.xhtml#page123">123</a> parrots, <a href="../Text/merged.xhtml#page200">200</a> Now, you can split your chapter files again: 6.1. You can manually click before each of your chapters, then Insert > Split Marker (Ctrl+Shift+Return). That button inserts this code: Code:
<hr class="sigil_split_marker" /> If you have proper Headings (or some common text that stands out): Search: <h2>Chapter Replace: <hr class="sigil_split_marker" /><h2>Chapter then "Replace All". Before: Code:
<h2>Chapter 1<h2> [...] <h2>Chapter 2</h2> Code:
<hr class="sigil_split_marker" /><h2>Chapter 1<h2> [...] <hr class="sigil_split_marker" /><h2>Chapter 2</h2> Now all your chapter files will be resplit into separate HTML files again. 8. Rename all your HTML files to human-readable names: - Chapter01.xhtml - Chapter02.xhtml - [...] - Chapter99.xhtml * * * * * * Now, when you go back to your Index, you should see all your page number code updated: Code:
cats, <a href="../Text/Chapter01.xhtml#page5">5</a> dogs, <a href="../Text/Chapter15.xhtml#page123">123</a> parrots, <a href="../Text/Chapter28.xhtml#page200">200</a> If you "Target exists?" column has any "no", then you know you have a broken link. Side Note: For more explanation on what the columns mean, see my post from a few months ago: Last edited by Tex2002ans; 12-13-2021 at 01:47 AM. |
||||
12-13-2021, 11:17 PM | #7 |
Connoisseur
Posts: 80
Karma: 2137678
Join Date: Dec 2021
Location: Canada
Device: none
|
Thanks for the help. It's much appreciated.
Today I started using dictation with the index and that made it easier to find phrases in SIGIL (easier on my hands and faster, too). I have one concern and you guys would know the scoop on this: why are some entries in the index (subtopics) not in alphabetic order? Did I do something wrong? Are those reference numbers I see linked to their ordering in the index? P.S. This is my first flowable text ebook, so I'm still getting my wings in terms of understanding the whole process. I did put out a fixed format kindle ebook, but that was easy. Not as work intensive as this. Thanks again. And I will probably have some questions once I get to the "see also" stage. |
12-13-2021, 11:20 PM | #8 |
Connoisseur
Posts: 80
Karma: 2137678
Join Date: Dec 2021
Location: Canada
Device: none
|
Also, Affinity Publisher does not export to anything in the text based world, other than pdf - if I don't flatten the file.
When I created my ebook, I copied and pasted from the original APub doc. layout. I didn't copy and past anywhere else but into SIGIL because I thought I might get app-specific gremlins stuck to my text. Again, it's a learning curve and I'm sure my second time around will be smoother, or at least a lot more informed that I am now. |
12-14-2021, 11:12 AM | #9 |
Connoisseur
Posts: 80
Karma: 2137678
Join Date: Dec 2021
Location: Canada
Device: none
|
Tex, thanks for the extended and insightful post on the page numbering. I just now read every bit of it and it all makes sense. (I read the linked content earlier). Kevin, your input is also helpful, although when perl and python are used in any sentence, my brain glazes over (any success in those areas has also ever been dumb luck for me; I learn from patterns but I never learned to actually speak the language of that code like others have —*I don't spend enough time with it to make studying it worthwhile.)
Here in this thread, the situation is like two responsible adults calling out to a kid who's walking into the riptide and you're yelling: "Don't walk into the riptide" and I'm like: "It's okay, I'm a strong swimmer." At this point, it's a matter of seeing how far I get sucked out into the ocean before I need to get rescued. That said, I made it all the way to "F" in the Index last night. Cool Runnings! |
12-14-2021, 01:03 PM | #10 | |||||
Wizard
Posts: 2,304
Karma: 12587727
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Quote:
Gremlins getting introduced (especially when copying/pasting from outside sources). But, in this specific case, it would be helpful to get the text + italics + (any other basic formatting) out. So you could do something like this:
Then you could do the usual cleaning of dirty HTML -> clean HTML: Code:
<p class="calibre123">dogs, 123</p> <p class="calibre456">Mammals. <i class="calibre1234">See also</i> Animals.</p> Code:
<p class="index">dogs, 123</p> <p class="index">Mammals. <i>See also</i> Animals.</p> (Or, if it's not so bad, just simple S&R. There shouldn't be too much HTML mess introduced.* [Famous last words.]) Side Note: You may even be able to paste your index directly from Affinity Publisher into Sigil's PageEdit. Perhaps the HTML code may be slightly cleaner. Quote:
Quote:
I think I get what you're saying, but not 100% sure. You're talking out of Sigil's Index Editor + Create Index? So while main entries like: Code:
Animals Mammals Zoology Code:
Animals giraffes zebras cats dogs Mammals Zoology Code:
Animals cats dogs giraffes zebras Mammals Zoology
would help. Quote:
99.99% of the time, you don't want them. They are awful for actual, human readers, because they throw away all the advantages of actual ebooks:
With FXL, you have to pinch-zoom, pinch-zoom, scroll, pinch-zoom. They don't sell. They don't work across devices. And they are the absolute worst of all worlds (even worse than just reading a PDF). Last edited by Tex2002ans; 12-14-2021 at 01:09 PM. |
|||||
12-15-2021, 04:39 PM | #11 |
Connoisseur
Posts: 80
Karma: 2137678
Join Date: Dec 2021
Location: Canada
Device: none
|
"Dictation" = voice to text. I speak the phrase into the "Find" field rather than type it.
I have dozens of references for one word, so I always have to type out the words around it so I can hone in on the exact page for that keyword. Speaking the text quickens the pace. Also, when I read some posts about page numbering, I saw that it wasn't an easy fix. Some people were saying "How come the page numbers don't show up?" in device a/b or c. And then someone said "Well, you could use this plugin and..." When I saw that I said to myself, I've got enough problems doing it the simple way. I figure the best road to go down is the one I'm familiar with, that way I know how to find may way back. I don't use that approach in life, but I use it for technical stuff like this. Thanks again. Oh, and the index alphabetizing thing happens in subtopics. I have a psychological breakdown of fear indexed and it goes from "fear of" (f) to horror (h) and then to "fleeing" (back to f again). so it goes from f to h to f. I'm have to investigate more later. |
12-15-2021, 06:24 PM | #12 |
Connoisseur
Posts: 80
Karma: 2137678
Join Date: Dec 2021
Location: Canada
Device: none
|
Just to be clear, I don't put "dozens" of references for one word into the index. I just mean that there can be dozens of instances of a word throughout the book and so I have to isolate it by identifying the words that surround it. Luckily, those words show up in the print file indexing window.
|
12-16-2021, 06:09 AM | #13 | ||||||
Wizard
Posts: 2,304
Karma: 12587727
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
So let's say you have "dog, 123" in your index. You're jumping to page 123's text, then trying to locate the exact word "dog" on that page? (And you're currently using dictation to speak the words instead?) ... and what exactly would the point of that be? If you're not marking indexed words within the original source document (Word, LO, etc.), seems like you're wasting lots of time. Just link to the page #s, and move on. (I explain more in-depth reasoning in the topic below.) Quote:
... But within the past few years, I have slightly loosened on how useless they are in ebooks (for Accessibility reasons). Quote:
(But you shouldn't even have that problem, because you deleted all that and worked from the Print book's text. Right? ) But it would be good to get those Sigil bugs sorted. Not many people go poking around in the index tools. Quote:
Post #129 is where I entered the picture, describing nearly every facet of Indexing/citations in ebooks. (You could also start at #6, where Hitch began posting. But there's a massive amount of ranting/raving from other users... you'd probably gather all the real-life indexes-in-ebooks production by reading each of Hitch's posts + mine.) Now, I don't know if you hired an actual Indexer to create your index. But, in many cases, those exact words/terms just won't show up within the text. Indexes allow you to have more broad strokes or general terms. Here's one of the examples I gave in that post: Quote:
Quote:
|
||||||
12-17-2021, 01:21 PM | #14 |
Connoisseur
Posts: 80
Karma: 2137678
Join Date: Dec 2021
Location: Canada
Device: none
|
The dictation is used as follows:
In the finder field, I may type in the word "competition" —*however, it can show up dozens of times inside the book/html text so that I would have to press "find" continuously until I ran into it along the way. And so, what I do instead is that I look at the original index marker from the book and type in words that surround that word —*which I speak into the finder field via dictation. In this way, I can land on the exact page where that reference is located. Saves me time. Some, anyway. P.S. This particular book is 28 years in the making. A few days spent on an index is not going to make or break my resolve to get the job done. |
Tags |
index, see also, sigil |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
suggestion: add "apply" or "save" functionality in edit toc | davidhcje | Editor | 1 | 08-20-2021 11:40 PM |
t68 indexing books and failing to update "curently reading" | Maartinezz123 | Onyx Boox | 0 | 01-17-2018 10:39 AM |
"Add a book" template like "Save to disk"? | vr8ce | Library Management | 10 | 06-09-2017 09:16 AM |
A warning for Linux users: slow "Add Books", "Unknown" title and Author | rolgiati | Library Management | 8 | 07-24-2013 05:36 PM |
Question about disable indexing permanently by disabling access to "Search Indexes" | WS64 | Kindle Developer's Corner | 1 | 12-17-2011 06:51 PM |