Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old Yesterday, 09:29 AM   #1
Shohreh
Groupie
Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.
 
Posts: 157
Karma: 192898
Join Date: Jan 2016
Device: none
Question Couple of newbie questions (text cut off, wrong fonts)

Hello,

To make them more readable, I'm trying to replace fonts in PDFs from serif to serif.

After running pdf2html and editing the file in the latest (2.3.0) release of Sigil, the EPUB I get has two issues:
1. The text is cut off on the right

2. It still uses serif instead of sans serif

Code:
.ft0{font: 14px 'Verdana';line-height: 21px;}
Any idea why?

Thank you.
Attached Thumbnails
Click image for larger version

Name:	8445DE39-B890-4E07-B6DD-D50057D8DEA5.png
Views:	50
Size:	181.5 KB
ID:	210319  
Shohreh is offline   Reply With Quote
Old Yesterday, 09:57 AM   #2
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,969
Karma: 5449552
Join Date: Nov 2009
Device: many
My guess is the pdf to html tool created a fixed layout file, where css provides the exact location of each word start or even each character. It is non-reflowable. There is no good reason to create fixed layout epubs/html as pdf handles fixed layout better.

Does your pdf to html tool have an option to create a reflowable html file?

Last edited by KevinH; Yesterday at 10:18 AM.
KevinH is online now   Reply With Quote
Advert
Old Today, 12:32 AM   #3
retiredbiker
Evangelist
retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.
 
retiredbiker's Avatar
 
Posts: 413
Karma: 2289864
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Jutoh, Kobo Forma
It seems you are trying to get the text out of a pdf just to replace the font, then make a new pdf..right? So getting the text out properly is the problem; try the things below.

Once you have the text out, by far your best bet to get a new pdf is to put the text into Writer or Word, format it the way you want, including the font, and export the new pdf from the word processor app. Trying to go pdf-->html-->epub-->pdf is just too hard and fraught with problems. No pdf on the planet will have styling information to allow you to get the text out and put it back with the same appearance. They just aren't made that way.

Calibre uses pdftohtml, a Popper tool, in it's attempts to convert pdf files. It is not always successful. There is a pdftotext tool that will sometimes work when the html tool fails. You called it "pdf2html" so you may have some completely different tool. There is no shortage of similar tools. Try here: https://github.com/elswork/poppler-utils

The bottom line is there can be anything, any kind of garbage, in a pdf file and some of them are just about impossible to get good text out of. Another thing you might try is using an OCR tool to give the pdf your own text layer in one go, and then converting. Again there are many tools.

The last gasp is to do your own OCR page by page from screen shots. Or pull out png images with pdftopng, then OCR. Lots of work.

PDFs are mostly evil things in the ebook world, especially old ones made with who-knows what tools.
retiredbiker is offline   Reply With Quote
Old Today, 12:55 AM   #4
retiredbiker
Evangelist
retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.
 
retiredbiker's Avatar
 
Posts: 413
Karma: 2289864
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Jutoh, Kobo Forma
I downloaded your picture so I could enlarge it and see it better. The text as shown at the bottom, in the Sigil preview, actually seems pretty complete and flowing. Try opening the html text you got directly with Writer...that might work well after all (Or Word, it probably does the same but I don't use it.) Then you could fiddle the font and export a new pdf.

What are you using to view the epub? With the preview looking as good as it does, that could be a problem.
retiredbiker is offline   Reply With Quote
Old Today, 01:01 AM   #5
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,813
Karma: 27405122
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by retiredbiker View Post
It seems you are trying to get the text out of a pdf just to replace the font, then make a new pdf..right? So getting the text out properly is the problem; try the things below.

Once you have the text out, by far your best bet to get a new pdf is to put the text into Writer or Word, format it the way you want, including the font, and export the new pdf from the word processor app. Trying to go pdf-->html-->epub-->pdf is just too hard and fraught with problems. No pdf on the planet will have styling information to allow you to get the text out and put it back with the same appearance. They just aren't made that way.

Calibre uses pdftohtml, a Popper tool, in it's attempts to convert pdf files. It is not always successful. There is a pdftotext tool that will sometimes work when the html tool fails. You called it "pdf2html" so you may have some completely different tool. There is no shortage of similar tools. Try here: https://github.com/elswork/poppler-utils

The bottom line is there can be anything, any kind of garbage, in a pdf file and some of them are just about impossible to get good text out of. Another thing you might try is using an OCR tool to give the pdf your own text layer in one go, and then converting. Again there are many tools.

The last gasp is to do your own OCR page by page from screen shots. Or pull out png images with pdftopng, then OCR. Lots of work.

PDFs are mostly evil things in the ebook world, especially old ones made with who-knows what tools.
Reading the text in the OPs screen shot I see it's about Lithium ion batteries, so it's a recently created PDF. I would try opening the PDF in current Word or maybe LO Writer, change the font and save as PDF.

If the OP wants EPUB then they can save as DOCX and convert that to EPUB… via the Sigil plugin, the ePub-Tools addin for Word, or via calibre. The last is probably the easiest.

BR
BetterRed is offline   Reply With Quote
Advert
Old Today, 05:01 AM   #6
Shohreh
Groupie
Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.
 
Posts: 157
Karma: 192898
Join Date: Jan 2016
Device: none
Thanks for the infos.

The pdftohtml I mentioned is the one from Apryse.

I view EPUBs through SumatraPDF (Windows) or a 6" e-reader.

I'll try the poppler → LibreOffice solution.
Shohreh is offline   Reply With Quote
Old Today, 07:51 AM   #7
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,969
Karma: 5449552
Join Date: Nov 2009
Device: many
Yes the Apryse pdf to html tool creates a fixed layout epub not a reflowable one according to its website. The fixes the location of every word on thar page so it will not reflow to adhjust to screen width.
KevinH is online now   Reply With Quote
Old Today, 12:10 PM   #8
Shohreh
Groupie
Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.
 
Posts: 157
Karma: 192898
Join Date: Jan 2016
Device: none
That figures
Shohreh is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Text formatting (newbie questions) kamanza Sigil 9 08-17-2011 06:50 PM
My kindle was cut in its prime a couple of days ago tzooka Amazon Kindle 10 01-31-2011 10:42 AM
Couple of newbie questions about Jetbook greenapple Ectaco jetBook 10 01-30-2010 12:27 AM
Newbie here - couple of questions puremagic Sony Reader 19 01-05-2009 05:29 PM
A couple of more newbie questions reader_newb iRex 4 04-21-2008 02:47 PM


All times are GMT -4. The time now is 01:38 PM.


MobileRead.com is a privately owned, operated and funded community.