08-27-2009, 12:37 AM | #1 |
Junior Member
Posts: 6
Karma: 32
Join Date: Sep 2008
Device: Sony PRS505
|
Extract html from epub
I got a little tired of manually extracting the html from epub
files when I wanted to just read the book in a browser. Just messing around with bash I came up with a simple script to do the job. Its pretty crude and I know I should have read the metadata.opf and probably would have if I did this in Java or Python, anyway thought I would share nonetheless. Works in linux, might work on a mac with a few tweaks. Just pass in the epub file as the first parameter. Code:
#!/bin/bash # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # You should have received a copy of the GNU General Public License # along with this program. If not, see <http://www.gnu.org/licenses/>. bookname=$1 unzip $1 -d /tmp/epub2html > /dev/null str0=`find /tmp/epub2html/content/* -regex '.*_1.html'` let len=${#str0}-6 substr=${str0:23:$len} substr=${substr%1.html} files=`ls -l /tmp/epub2html/content/$substr*.html | wc -l` for x in $(seq 0 $files); do filepart="/tmp/epub2html/content/$substr$x.html" if [ -e $filepart ]; then cat $filepart >> ${bookname//.epub/.html} fi done #copy over the images if you want them if [ ! -e resources ]; then mkdir resources fi `cp /tmp/epub2html/content/resources/*.jpg /tmp/epub2html/content/resources/*.png -t ./resources 2> /dev/null` rm -R /tmp/epub2html |
08-27-2009, 12:44 AM | #2 |
Wizard
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
|
Thanks!
You might also want to check out jellby's script here: https://www.mobileread.com/forums/showthread.php?t=51267 |
Advert | |
|
08-27-2009, 12:55 AM | #3 |
Junior Member
Posts: 6
Karma: 32
Join Date: Sep 2008
Device: Sony PRS505
|
You're right, I remember reading the post but saw the reference
to javascript so didn't think much of it... On the plus side I did learn some new things about bash scripting so it wasn't a waste. Thanks |
08-27-2009, 05:21 AM | #4 |
frumious Bandersnatch
Posts: 7,534
Karma: 19000001
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
The javascript is only used to have some kind of "reader" in the browser and to override the epub's CSS. If you only want to extract the XHTML files, you don't need it.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
HTML to ePub? | martienne | ePub | 1 | 08-08-2010 07:05 PM |
HTML Book + non HTML TOC to epub | aarcane | Calibre | 4 | 03-02-2010 02:58 AM |
HTML to EPUB? | SFCurley | Calibre | 7 | 02-02-2010 12:20 PM |
epub to html | banjomike | Calibre | 2 | 01-31-2010 11:27 AM |
Why ePub rather than HTML? | Robotech_Master | Workshop | 20 | 03-30-2009 03:53 PM |