09-05-2006, 04:27 AM | #1 |
Evangelist
Posts: 423
Karma: 1517132
Join Date: Jun 2006
Location: Madrid, Spain
Device: quaderno, remarkable2, yotaphone2, prs950, iliad, onhandpc, newton
|
howto: importing PDFs to a word processor
I've been looking for an easy way to convert pdfs. Until now I was using a pdf2html program and processing the result, with mixed results. For the curious, this is what I used to convert some pdfs so they become nice to read on the Iliad (11cmx15cm, etc):
pdftohtml ( http://pdftohtml.sourceforge.net ), some ad-hoc scripts, tidy (http://tidy.sourceforge.net/ ), gnuhtml2latex (http://packages.debian.org/unstable/text/gnuhtml2latex ) and lyx ( http://www.lyx.org ). The results are acceptable but it's a lengthy process (about an hour for each book, mostly to adapt the ad-hoc scripts so they join lines correctly and detect chapter headings). I've found an alternative: a plug-in for Abiword (a lean and portable wordprocessor) that imports pdf with some heuristics (and the heuristics seems to be well chosen, as to be general aplicable). It supports styles, multiple columns, etc. It's incredible. As an example the author posts some images of before (pdf) importing and after (Abiword), see the attached images. For a description of what it does: http://www.abisource.com/twiki/bin/v...luginWithStyle To download the sources of the pdf import plug-in and try it: http://jauco.nl/blog/ Caution: I've just found it, so I have not tested it yet. As I have some spare time I'll try it ;-). Tell me what you think about about it ;-). Last edited by Antartica; 09-05-2006 at 04:29 AM. |
09-05-2006, 03:04 PM | #2 |
Uebermensch
Posts: 2,583
Karma: 1094606
Join Date: Jul 2003
Location: Italy
Device: Kindle
|
If the images depict the general conversion quality of this plugin, then I am really impressed. It's better than most commercial solutions I've seen.
I am curious to hear how it works for you. |
Advert | |
|
09-05-2006, 11:25 PM | #3 | |
Addict
Posts: 285
Karma: 10
Join Date: Apr 2006
Location: Vancouver, Canada
Device: Proud Iliad owner
|
Quote:
Seems that my programming illiteracy is quite advanced: how the hell am i supposed to install the patch? http://www.jauco.nl/SoC/abiword-pdf-style-0.3.patch http://www.jauco.nl/SoC/poppler-pdf-style-0.3.patch Those two are supposed to be the plugins, but when i click on them it opens a text file. There's no .dll no .exe no nothin' I'd really appreciate some help from someone more knowledgeable. |
|
09-06-2006, 01:10 PM | #4 | |
Evangelist
Posts: 423
Karma: 1517132
Join Date: Jun 2006
Location: Madrid, Spain
Device: quaderno, remarkable2, yotaphone2, prs950, iliad, onhandpc, newton
|
Quote:
Patches are usually geared to programmers or advanced users, not afraid of downloading source code and compilling it himself. It's really not very difficult if you have the right tools. So this is a patch in the old UNIX way. In Windows is more common to say "patch" refering to a package of replacement files needed to upgrade a program. And more to the point: search below for detailed instructions to how to apply the patch and compile the program (in Linux, that is what I've installed; in Windows+Cygwin it should be slightly different)... but the instructions are incomplete right now, as I've found that the patched poppler library fails to compile using gcc 3.3.5 :-( . Anyway, in the next message I say how to get to that error Last edited by Antartica; 09-06-2006 at 01:20 PM. |
|
09-06-2006, 01:16 PM | #5 |
Evangelist
Posts: 423
Karma: 1517132
Join Date: Jun 2006
Location: Madrid, Spain
Device: quaderno, remarkable2, yotaphone2, prs950, iliad, onhandpc, newton
|
(Partial and ) Detailed Debian GNU/Linux 3.0 "Sarge" instructions (what I've done):
For patching, compiling and installing the required poppler library: $ su # apt-get install cdbs gnome-pkg-tools libgtk2.0-dev libqt3-mt-dev automake1.9 dh-make build-essential dpkg-dev libjpeg62-dev libz-dev fakeroot libxml2-dev # exit $ mkdir src.poppler $ cd src.poppler $ wget http://poppler.freedesktop.org/poppler-0.5.3.tar.gz $ wget http://www.jauco.nl/SoC/poppler-pdf-style-0.3.patch $ tar -xvzf poppler-0.5.3.tar.gz $ cd poppler-0.5.3 $ patch -p1 < ../poppler-pdf-style-0.3.patch $ ln -s /usr/include/libxml2/libxml poppler/ $ echo "s" | dh_make $ sed -i "s/configure /configure --enable-zlib --enable-xpdf-headers/g" debian/rules $ chmod a+x debian/rules $ fakeroot debian/rules binary This should have generated a .deb file that you can install, but it failed to compile, with the following error: g++ -DHAVE_CONFIG_H -I. -I. -I.. -I. -I.. -I../goo -I/usr/include/freetype2 -Wall -Wno-unused -g -O2 -MT ABWOutputDev.lo -MD -MP -MF .deps/ABWOutputDev.Tpo -c ABWOutputDev.cc -fPIC -DPIC -o .libs/ABWOutputDev.o ABWOutputDev.cc: In member function `void ABWOutputDev::ATP_recursive(xmlNode*) ': ABWOutputDev.cc:804: error: declaration of `void ABWOutputDev::cleanUpNode(xmlNode*, bool)' outside of class is not definition After being able to compile the poppler library, it is necessary to do the same with the abiword sources... so there is quite a bit of work left to do. BTW: Maybe this post should be in hacks/devel :-? Last edited by Antartica; 09-06-2006 at 01:25 PM. |
Advert | |
|
09-06-2006, 03:20 PM | #6 | |
Addict
Posts: 285
Karma: 10
Join Date: Apr 2006
Location: Vancouver, Canada
Device: Proud Iliad owner
|
Quote:
Thanks Antarctica for taking the time to explain. Unfortunately at the 2nd post you have lost me....all that code is chinese to me So there seems to be a some kind of error in the patch as it will not compile. Hopefully it is not a big issue, because the ideea of the plugin is wonderful and i'd really want to see it in action |
|
09-06-2006, 03:23 PM | #7 |
Addict
Posts: 261
Karma: 156
Join Date: Jul 2006
Device: iliad
|
That sounds great.
Well, if someone manages to compile it with patch for windows, *please* upload the executable. I didn't manage. |
09-06-2006, 08:17 PM | #8 | |
Addict
Posts: 285
Karma: 10
Join Date: Apr 2006
Location: Vancouver, Canada
Device: Proud Iliad owner
|
Quote:
I second that! |
|
11-06-2006, 04:42 AM | #9 |
Junior Member
Posts: 2
Karma: 10
Join Date: Nov 2006
Device: HTC Wizard
|
How to install the pdf to abiword processor
Hey, I didn't see this thread earlier but I like to positive tone
I'm the guy trying to write the pdf plugin. ATM if you can't install the patch, you probably don't want to, because the program is buggy as some infernal place. The past 2 months where increadibly busy for me, so I didn't do much work on it but once I get most of the bugs out of the code, I will try to get it released with the windows version of abiword. Greets, Jauco |
11-06-2006, 05:22 AM | #10 | ||
Evangelist
Posts: 423
Karma: 1517132
Join Date: Jun 2006
Location: Madrid, Spain
Device: quaderno, remarkable2, yotaphone2, prs950, iliad, onhandpc, newton
|
Hi Jauco!
Thanks for taking the time to register here and replying :-) Quote:
I only need a bit of information: 1. The linux distribution/linux version that you're using to compile 2. The compiler version 3. The libpoppler and abiword version I hope that with that information I will be able to replicate your compilation success ;-) Quote:
Antartica |
||
11-06-2006, 12:03 PM | #11 |
Junior Member
Posts: 2
Karma: 10
Join Date: Nov 2006
Device: HTC Wizard
|
I'm using a vanilla ubuntu linux "dapper drake"
compiler : whichever came with dapper drake (4.0.3 I think) poppler source: cvs from back then. I'd suggest using the latest release abiword source: Doesn't matter. latest release will be fine. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
PRS-600 Word and PDFs always named 'Tell me' | houndstooth | Sony Reader | 3 | 07-24-2010 05:42 AM |
PRS-505 Word Processor Template for Sony prs505 sized readers | BookCat | Sony Reader | 2 | 04-22-2010 02:42 AM |
Iliad Book Edition: a viable word processor? | lotusindigo | iRex | 12 | 08-10-2009 11:32 PM |
Romance Ebers, Georg: A Word, Only a Word. V1. 20 Mar 2009 | crutledge | ePub Books | 0 | 03-20-2009 09:09 AM |
Keyboard and Word Processor | Devlar | iRex | 2 | 06-11-2007 04:43 AM |