08-07-2013, 04:05 AM | #1 |
Junior Member
Posts: 2
Karma: 10
Join Date: Jul 2013
Device: nook
|
Convert website with txt files to epub
I couldn't find anywhere how to convert a website that has some txt files to an epub so here is my method. For a plain html website all you need to do is to save the site to disk with say winHTTrack then drag the index file into calibre. But this only works if all the links are html files. It fails if some of the files are .txt and also I guess if pdf files.
So here is the method: 1. Grab the site using winHTTrack setting the options to store the file site in /web (flattening the structure). 2. Create a temp directory in the saved site site directory 3. Use this batch file (from site saved directory) to convert all the text files to html --------------------------------------------------------------- for /R web %%i IN (*.txt) DO ( "C:\Program Files\Calibre2\ebook-convert.exe" web\%%~nxi temp ren temp\index*.html %%~ni.html ) --------------------------------------------------------------- 4. Edit site index html file to change all the .txt links to .html and save to <saved-site>/temp directory. 5. Copy any other html files from <saved-site>/web to <saved-site>/temp 4. Drag index file from <saved-site>/temp into calibre How this works is that ebook-convert.exe when given a directory (i.e. <saved-site>/temp) it dumps the intermediate html output from a .txt conversion and stops. Hence the batch file first converts each *.txt into html. The output is normally index.html but sometimes index1.html. The next line in the batch file renames the html output to the same name as the text file but with html extension. Hence when the batch file finishes calibre has done a default conversion of all the txt files to html files of the same name and stored in temp directory. It's then just a case of copy the other files and editing the site index file to point to .html rather than .txt, then dragging the index file into calibre. |
08-07-2013, 07:46 AM | #2 |
Color me gone
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
If you work with Sigil, you can start with a blank document, add in all the html documents, arrange them in the desired order, copy and paste in the text documents from any text editor.
For the pdfs, you will need to convert them to html, or copy and paste them from a pdf viewer using the display text function or copy and paste them from the normal display mode. This will likely leave extra spaces, or breaks or carriage returns that will need to be cleaned up. If you are unlucky and the pdfs are image pdfs containing no text, you will have to process them with an Optical Character Recognition program whose output will need to be cleaned up also...an error rate of only 2% means an error on every page. This procedure avoids calibre adding in its own hard to understand tags and css. |
08-08-2013, 06:49 AM | #3 |
Junior Member
Posts: 2
Karma: 10
Join Date: Jul 2013
Device: nook
|
Yes, but suppose you want to want to convert a website with a hundred text documents? Or more. Copy and paste gets pretty boring after the first 10 .
|
08-08-2013, 09:20 AM | #4 |
Color me gone
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
Try Easy Text To HTML Converter (http://www.easyhtools.com/download.html). It's freeware and in my brief test, it worked ok. It will convert text files in bulk.
|
Tags |
convert, epub, txt, website |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Convert epub to txt for ipod HELP!!!!!! | adrian59 | Conversion | 1 | 09-03-2012 11:18 PM |
If I convert an epub to .txt with Calibre, what does it look like? | theusualuser | Ectaco jetBook | 8 | 12-10-2010 01:27 PM |
Convert .TXT to .EPUB | Arfer | Calibre | 6 | 09-02-2010 10:41 AM |
Txt files - Convert to Epub - Multiple files into one book - noob help | Cernan | Calibre | 6 | 05-18-2010 10:12 AM |
Convert ePub to txt for better functionality | PodPeople | Ectaco jetBook | 1 | 03-14-2010 01:56 PM |