08-20-2024, 09:29 AM | #1 |
Groupie
Posts: 181
Karma: 304158
Join Date: Jan 2016
Device: none
|
Couple of newbie questions (text cut off, wrong fonts)
Hello,
To make them more readable, I'm trying to replace fonts in PDFs from serif to serif. After running pdf2html and editing the file in the latest (2.3.0) release of Sigil, the EPUB I get has two issues: 1. The text is cut off on the right 2. It still uses serif instead of sans serif Code:
.ft0{font: 14px 'Verdana';line-height: 21px;} Thank you. |
08-20-2024, 09:57 AM | #2 |
Sigil Developer
Posts: 8,155
Karma: 5450818
Join Date: Nov 2009
Device: many
|
My guess is the pdf to html tool created a fixed layout file, where css provides the exact location of each word start or even each character. It is non-reflowable. There is no good reason to create fixed layout epubs/html as pdf handles fixed layout better.
Does your pdf to html tool have an option to create a reflowable html file? Last edited by KevinH; 08-20-2024 at 10:18 AM. |
Advert | |
|
08-21-2024, 12:32 AM | #3 |
Evangelist
Posts: 420
Karma: 2737916
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Jutoh, Kobo Forma
|
It seems you are trying to get the text out of a pdf just to replace the font, then make a new pdf..right? So getting the text out properly is the problem; try the things below.
Once you have the text out, by far your best bet to get a new pdf is to put the text into Writer or Word, format it the way you want, including the font, and export the new pdf from the word processor app. Trying to go pdf-->html-->epub-->pdf is just too hard and fraught with problems. No pdf on the planet will have styling information to allow you to get the text out and put it back with the same appearance. They just aren't made that way. Calibre uses pdftohtml, a Popper tool, in it's attempts to convert pdf files. It is not always successful. There is a pdftotext tool that will sometimes work when the html tool fails. You called it "pdf2html" so you may have some completely different tool. There is no shortage of similar tools. Try here: https://github.com/elswork/poppler-utils The bottom line is there can be anything, any kind of garbage, in a pdf file and some of them are just about impossible to get good text out of. Another thing you might try is using an OCR tool to give the pdf your own text layer in one go, and then converting. Again there are many tools. The last gasp is to do your own OCR page by page from screen shots. Or pull out png images with pdftopng, then OCR. Lots of work. PDFs are mostly evil things in the ebook world, especially old ones made with who-knows what tools. |
08-21-2024, 12:55 AM | #4 |
Evangelist
Posts: 420
Karma: 2737916
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Jutoh, Kobo Forma
|
I downloaded your picture so I could enlarge it and see it better. The text as shown at the bottom, in the Sigil preview, actually seems pretty complete and flowing. Try opening the html text you got directly with Writer...that might work well after all (Or Word, it probably does the same but I don't use it.) Then you could fiddle the font and export a new pdf.
What are you using to view the epub? With the preview looking as good as it does, that could be a problem. |
08-21-2024, 01:01 AM | #5 | |
null operator (he/him)
Posts: 20,981
Karma: 27620706
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
If the OP wants EPUB then they can save as DOCX and convert that to EPUB… via the Sigil plugin, the ePub-Tools addin for Word, or via calibre. The last is probably the easiest. BR |
|
Advert | |
|
08-21-2024, 07:51 AM | #7 |
Sigil Developer
Posts: 8,155
Karma: 5450818
Join Date: Nov 2009
Device: many
|
Yes the Apryse pdf to html tool creates a fixed layout epub not a reflowable one according to its website. The fixes the location of every word on thar page so it will not reflow to adhjust to screen width.
|
08-21-2024, 12:10 PM | #8 |
Groupie
Posts: 181
Karma: 304158
Join Date: Jan 2016
Device: none
|
That figures
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Text formatting (newbie questions) | kamanza | Sigil | 9 | 08-17-2011 06:50 PM |
My kindle was cut in its prime a couple of days ago | tzooka | Amazon Kindle | 10 | 01-31-2011 10:42 AM |
Couple of newbie questions about Jetbook | greenapple | Ectaco jetBook | 10 | 01-30-2010 12:27 AM |
Newbie here - couple of questions | puremagic | Sony Reader | 19 | 01-05-2009 05:29 PM |
A couple of more newbie questions | reader_newb | iRex | 4 | 04-21-2008 02:47 PM |