07-27-2020, 01:30 PM | #16 |
Custom User Title
Posts: 9,575
Karma: 64960983
Join Date: Oct 2018
Location: Canada
Device: Kobo Libra H2O, formerly Aura HD
|
Since a good part of my library is PDFs, using pdftotext sped things up considerably.
I did notice everything else lagging when I used 12 processes. I switched it to six and the lag disappeared. |
07-27-2020, 01:35 PM | #17 |
Connoisseur
Posts: 77
Karma: 90088
Join Date: Jul 2020
Device: android
|
|
Advert | |
|
07-27-2020, 01:48 PM | #18 | |
Bibliophagist
Posts: 40,603
Karma: 157444382
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
My main reason for checking this out was that I use ElasticSearch with Greylog which states that part of it's reason for existence is to work around the shortcomings of ElasticSearch. Last edited by DNSB; 07-27-2020 at 01:53 PM. |
|
07-27-2020, 02:09 PM | #19 | |
Connoisseur
Posts: 77
Karma: 90088
Join Date: Jul 2020
Device: android
|
Quote:
That said, I hope it's more than enough for local library management. |
|
07-27-2020, 04:06 PM | #20 |
Custom User Title
Posts: 9,575
Karma: 64960983
Join Date: Oct 2018
Location: Canada
Device: Kobo Libra H2O, formerly Aura HD
|
My library is about 4000 books and 20 GB, though the bulk of that is image-heavy PDF files (lots of video game strategy guides).
Last edited by ownedbycats; 07-27-2020 at 04:09 PM. |
Advert | |
|
07-27-2020, 04:17 PM | #21 |
Wizard
Posts: 1,089
Karma: 1221485
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite
|
Thanks for the plugin! Awesome idea.
I did not installed pdftotext, but I only have 18 PDFs on my library. I'm testing it on a library with 1130 books (many with multiple formats: EPUB, AZW3/KFX) and about 3GB. Info about the initial indexing: It took only 16 minutes to go from 0 to 99%. But now it is stuck at 99% for about 3h45min. My system is an i7 7700HQ (16GB of RAM). My processor has 4 cores (8 threads). Plugin has chosen 8 max parallel process. Now, the strange part: according to Task Manager (Windows), my CPU is only using 20% of its total capacity. While writing this post, it finished, after 4h05min. Now it searches instantly! Nice! ------ My first impressions and questions ------ 1) Question: When you have multiple formats for one book, does it lookup all the formats or just one? 2) Question: On caps.json, it only shows EPUB, MOBI, PDF and TXTs files. According to this, and other tests I have done, it does not index AZW3/KFX files. Is this correct? 3) Question: How the index works for new additions? Are the new files automatically indexed when I run ElasticSearch? 4) Suggestion: It would be really important to have more options for search. Right now, it searches word by word. So, I can't look for phrases or compound words (Ex: coffee table. It will search for books with "coffee" OR "table"). Also, accented characters are distinguished from non-accented. 5) Info: According to ElasticSearch Reference, to have more options for search, you would need to change your query from "match" to "query_string". This would allow operators, wild cards and regular expressions. P.S.: "match" queries can use operators too, but you would have to code that. 6) Info: The ZIP file attached to first post has another ZIP inside (with the full plugin). |
07-27-2020, 05:26 PM | #22 | |||
Bibliophagist
Posts: 40,603
Karma: 157444382
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
Quote:
Quote:
I used the 3rd zip file from message #9 in this thread. Note that my setup is on Windows x64. I also restarted the full search after deleting the old setup and moving my computer related ebooks out of my calibre library. I also realized that I had not pointed to pdftotext properly and corrected that. Was a heck of a lot faster with those mostly oversized pdf files removed. 2 hours down to 5 minutes. |
|||
07-27-2020, 05:51 PM | #23 | ||||
Connoisseur
Posts: 77
Karma: 90088
Join Date: Jul 2020
Device: android
|
Thanks thiago.eec for feedback!
You might probably have encountered one of "bad" PDF files which is taking ages to process without pdftotext. Quote:
Quote:
Quote:
Quote:
Haha indeed! Thanks for noticing! |
||||
07-27-2020, 06:19 PM | #24 | |
Connoisseur
Posts: 77
Karma: 90088
Join Date: Jul 2020
Device: android
|
Quote:
The new files won't be automatically indexed instantly when you add new books. But they will be indexed once you run Power Search again. |
|
07-27-2020, 07:20 PM | #25 |
Bibliophagist
Posts: 40,603
Karma: 157444382
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
|
07-27-2020, 08:44 PM | #26 |
Custom User Title
Posts: 9,575
Karma: 64960983
Join Date: Oct 2018
Location: Canada
Device: Kobo Libra H2O, formerly Aura HD
|
While testing, I also had FanFicFare update some of my fanfics and then did a search for random words that appeared only in the newest chapters. It did re-index the ePubs that had changed.
|
07-28-2020, 12:37 PM | #27 | ||
Wizard
Posts: 1,089
Karma: 1221485
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite
|
Quote:
Quote:
I did a simple test here, and changing query from "match" to "match_phrase" did the job allowing phrases and compound words. Using "query_string" isn't that easy, though. |
||
07-28-2020, 06:32 PM | #28 | ||
Connoisseur
Posts: 77
Karma: 90088
Join Date: Jul 2020
Device: android
|
Quote:
I just decided to be conservative in first releases and support only those book formats that I could test well enough. I will extend this list in next release once I make sure it works well. Quote:
This however means that I would need to find a way of sorting results according to relevance. Don't know how easy is it to do in Calibre, but generally it seems to me a right way to go. |
||
07-31-2020, 01:05 PM | #29 |
Connoisseur
Posts: 77
Karma: 90088
Join Date: Jul 2020
Device: android
|
Version 1.2.0 released.
Contains following usability improvements:
Adds support for DOC, AZW3, KFX file formats. Last edited by mapozyan; 08-08-2020 at 02:00 PM. Reason: Removed attached version 1.2.0 |
07-31-2020, 01:18 PM | #30 |
Connoisseur
Posts: 77
Karma: 90088
Join Date: Jul 2020
Device: android
|
I tried to search for "up-to-date" as a phrase. It still doesn't work correctly, so "up-to-date" will bring the same results as "up to date". Still, its much more useful than searching for individual words.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[GUI Plugin] Search the Internet | kiwidude | Plugins | 434 | 09-30-2024 04:04 AM |
[GUI Plugin] Clipboard Search | kiwidude | Plugins | 29 | 04-02-2024 11:05 PM |
[GUI Plugin] Recoll Full Text Search | Satas | Plugins | 16 | 08-05-2016 04:54 AM |
[GUI Plugin] Full Text Search (SOLR) | peterpisljar | Plugins | 2 | 08-09-2015 09:16 AM |
Make a simple Plugin for Full Text Search using Recoll | Satas | Development | 9 | 07-20-2013 05:15 PM |