03-29-2011, 04:20 PM | #1 |
Fanatic
Posts: 541
Karma: 1152752
Join Date: Aug 2010
Location: Evansville, IN, USA
Device: Samsung Galaxy Tab 4 Nook & Samsung Galaxy Tab S 10.5
|
Having trouble converting html to markdown txt
I have a file that isn't in the best of shape and I want to clean it up a little bit. I'm starting with an html file and I want to get it into epub for editing but I want to start by cleaning up a lot of the sloppy formatting. I've learned that one of the best ways to do this is to convert it to markdown text and then to epub.
Unfortunately, this isn't working this time around and I'm not certain why. The only thing making it into the markdown file is anything marked with an <h#> tag. All other text formatting (bold, italics, etc.) is being lost and I can't quite seem to figure out why. Also, when I add the html file, it shows up in Calibre as a zip file. I can't say I've ever noticed that before so it may be normal behavior. It's probably something extremely simple that I'm overlooking in a fit of stupidity but, when I started to get frustrated, I decided to come here and ask for advice. Does anyone here have any ideas what may be going on or what I may doing wrong? I'm using version 0.7.49. Thanks. - Byron |
03-29-2011, 04:29 PM | #2 |
creator of calibre
Posts: 44,530
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
conversion to txt does not support CSS. Convert your epub to MOBI and then to txt.
|
Advert | |
|
03-29-2011, 04:45 PM | #3 | |
Fanatic
Posts: 541
Karma: 1152752
Join Date: Aug 2010
Location: Evansville, IN, USA
Device: Samsung Galaxy Tab 4 Nook & Samsung Galaxy Tab S 10.5
|
Quote:
I have the latest version of Calibre installed at home. I'll try again when I get to my home desktop and see what happens. Also, I'm not really certain what you mean by "conversion to txt does not support CSS"? I've converted to markdown text from an epub before and I'm pretty sure it used css. If the markdown txt conversion engine isn't compatible with css, what type of text formatting is it compatible with? Thanks. - Byron |
|
03-29-2011, 04:48 PM | #4 |
creator of calibre
Posts: 44,530
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The point is that when converting something to txt, calibre does not handle formatting specified as css, only formatting intrinsic to html tags, so for example <i>italics</i> will work but <span style="font-style:italic">italics</span> will not.
|
03-29-2011, 04:54 PM | #5 | |
Fanatic
Posts: 541
Karma: 1152752
Join Date: Aug 2010
Location: Evansville, IN, USA
Device: Samsung Galaxy Tab 4 Nook & Samsung Galaxy Tab S 10.5
|
Quote:
Unfortunately this is how this particular file is setup and I know of no way to correct this short of going through the file manually, in which case I wouldn't need to do the markdown anyway because I'd correct the other sloppy formatting while I was in there. I'll attach a sample below. I'll keep chugging away. Thanks much for the info. - Byron Code:
<p align=center style='text-align:center'><span class=italic1>IN TRIBUTE</span></p> <p align=center style='text-align:center'><span class=italic1>Billy Wellington</span></p> <p align=center style='text-align:center'><span class=italic1>(March 12, 1943–October 29, 2007)</span></p> Last edited by bfollowell; 03-29-2011 at 04:59 PM. |
|
Advert | |
|
03-29-2011, 06:36 PM | #6 |
Wizard
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
When adding HTML files to Calibre, it will walk through the file and gather everything it links to that is present as a local copy into a ZIP file.
|
03-29-2011, 07:10 PM | #7 |
Sigil & calibre developer
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
If your document uses images or links be sure to use the keep images and links options. Also, in the case of images I would highly recommend using Textile instead of Markdown. Currently Markdown output has issues with writing images reference in the output.
|
03-30-2011, 12:17 PM | #8 |
Fanatic
Posts: 541
Karma: 1152752
Join Date: Aug 2010
Location: Evansville, IN, USA
Device: Samsung Galaxy Tab 4 Nook & Samsung Galaxy Tab S 10.5
|
Well, this one was much easier than I thought it would be. I'll have to make sure to add this one to my really big book of cool techniques.
I did a basic conversion from html to epub. Luckily for me, every single occurrence of italics was setup using <span class="italic">. This was easy to change using search/replace and replace all of these occurrences with <i>. All the corresponding </span> tags were automatically changed to </i>. From there I was able to do the txt markdown conversion and everything went without a hitch. Thanks guys. - Byron |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Markdown HTML problem | nhopton | Conversion | 10 | 02-10-2011 06:43 PM |
Slowness converting markdown book from the GUI in 7.38 | ldolse | Calibre | 4 | 01-09-2011 11:21 PM |
->Txt+Markdown | Perkin | Calibre | 2 | 12-11-2010 05:04 AM |
Capture intermediate html from markdown | Agama | Calibre | 3 | 07-30-2010 12:33 PM |
Anyone else having trouble lately converting HTML files in Calibre? | ficbot | Workshop | 1 | 07-27-2009 05:15 AM |