05-15-2024, 05:56 PM | #1 |
Member
Posts: 21
Karma: 50
Join Date: Jan 2019
Device: none
|
PDF to ePub - impossible to unwrap line breaks on this file?
This is a very important historical Bulgarian translation of the Bible which I haven't been able to find anywhere else as a singular file - thankfully it's not OCR and even has a table of contents, but the line breaks seem impossible to unwrap upon conversion, no matter the settings.
Can anyone help? Ideally I would've just said "screw it" and cleaned up the line breaks with regex into a raw TXT file, but then I'd lose the table of contents. Any help would be very much appreciated, no matter how hacky the workaround! Looking forward to finally uploading this one to archive.org, so no one else has to scour the web to find it, nor torture themselves converting it. Last edited by barbiedolphin; 05-15-2024 at 06:04 PM. |
05-15-2024, 07:22 PM | #2 | |
null operator (he/him)
Posts: 20,997
Karma: 27620706
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
BR |
|
Advert | |
|
05-16-2024, 08:15 AM | #3 | |
Member
Posts: 21
Karma: 50
Join Date: Jan 2019
Device: none
|
Quote:
- the numbered verses seemed to be treated and converted as a numbered list (which some readers render as bullet points) - some of the verses merged with one another (too much unwrapping) I also tried replacing the dot in each verse number (e.g. "13.") with a similar-looking unicode dot so it wouldn't be treated as a numbered list, but that didn't seem to work. Just now I also found this particular text in various exotic formats: http://eubible.com/download/download.htm They seem to be viewable with software called "Simple Bible Reader", with which I was able to convert one into a "LOGOS Import File" - basically a DOCX with weird verse numbering, which I fixed using regex. I then converted that DOCX into ePub, which went flawlessly! Last edited by barbiedolphin; 05-16-2024 at 10:31 AM. |
|
05-16-2024, 09:56 PM | #4 |
null operator (he/him)
Posts: 20,997
Karma: 27620706
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
|
05-19-2024, 07:35 PM | #5 |
Member
Posts: 21
Karma: 50
Join Date: Jan 2019
Device: none
|
Ughhh, it turned out the weird file formats I found/converted actually contained various text issues. So ultimately, I did need to copy the text from the original ugly PDF and tardwrangle it with regex and calibre until I finally got what I needed (incl. a table of contents).
Here's where it ultimately ended up, alongside other editions: https://archive.org/details/bibliya-...o-izdanie-1885 |
Advert | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
PDF - relation between "line un-wrapping factor" and "unwrap line" (heuristic proc.) | dr_Fell | Conversion | 1 | 10-16-2017 11:56 PM |
Line breaks when converting to pdf | maffia | Conversion | 2 | 05-05-2015 04:27 AM |
PDF has random line breaks | bsabiston | Conversion | 1 | 09-20-2013 07:43 PM |
Ignoring line breaks in pdf file | mike_bike_kite | Calibre | 0 | 06-14-2010 10:37 AM |
PDF line unwrap | miquel | Calibre | 15 | 05-26-2010 06:35 PM |