10-22-2023, 08:48 PM | #1 |
Custom User Title
Posts: 9,588
Karma: 65099765
Join Date: Oct 2018
Location: Canada
Device: Kobo Libra H2O, formerly Aura HD
|
Full-text indexing - .pdf.error?
I was monitoring the Calibre temp folder during a reindex and noticed sometimes a PDF is copied for indexing, it's accompanied by a 0-byte '.pdf.error' file with the same randomized name. This doesn't happen with other filetypes. Does it mean indexing is failing for those books?
|
10-22-2023, 10:54 PM | #2 |
creator of calibre
Posts: 44,572
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
no as far as I know calibre doesnt create such a file, it might be coming from the pdftotext process calibre uses to extract the text from the PDF. See pdftotext() function in fts/text.py in the calibre source code. calibre assumes text extractionw as successful if that process exits with 0 exit code.
|
Advert | |
|
10-23-2023, 12:56 AM | #3 |
Custom User Title
Posts: 9,588
Karma: 65099765
Join Date: Oct 2018
Location: Canada
Device: Kobo Libra H2O, formerly Aura HD
|
Thanks.
Additionally, since it's related to this: Have you ever seen the full-text index database corrupt? It happened sometime in the last few days but only came up when I was trying to run a search. |
10-23-2023, 01:02 AM | #4 |
creator of calibre
Posts: 44,572
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
No, cant say that I have.
|
10-23-2023, 01:36 AM | #5 |
Custom User Title
Posts: 9,588
Karma: 65099765
Join Date: Oct 2018
Location: Canada
Device: Kobo Libra H2O, formerly Aura HD
|
Ah, I thought a specific sequence of deleting files, adding a few file, then restoring the format from the trash might've screwed something up - I can't replicate it a second time though. I notice the newly-reindexed database is at least 250mb smaller, so maybe it was just in dire need of a vacuum.
|
Advert | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Epub to PDF - All Superscripts are Full-Size Text? | Blaineoreski | Conversion | 2 | 05-04-2023 04:29 AM |
Full-text search not really full-text | secasa | Library Management | 5 | 10-01-2022 11:11 PM |
Re-indexing full text search | jesscat | Calibre | 2 | 08-11-2022 08:55 AM |
Forcing indexing to skip a block of text? | andyb | Sigil | 15 | 05-03-2017 09:19 PM |
Best practice to convert PDF to simple flowing text? Calibre error | avid01 | 6 | 03-31-2017 04:47 AM |