Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 03-29-2011, 03:20 PM   #1
bfollowell
Fanatic
bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.
 
Posts: 541
Karma: 1152752
Join Date: Aug 2010
Location: Evansville, IN, USA
Device: Samsung Galaxy Tab 4 Nook & Samsung Galaxy Tab S 10.5
Having trouble converting html to markdown txt

I have a file that isn't in the best of shape and I want to clean it up a little bit. I'm starting with an html file and I want to get it into epub for editing but I want to start by cleaning up a lot of the sloppy formatting. I've learned that one of the best ways to do this is to convert it to markdown text and then to epub.

Unfortunately, this isn't working this time around and I'm not certain why. The only thing making it into the markdown file is anything marked with an <h#> tag. All other text formatting (bold, italics, etc.) is being lost and I can't quite seem to figure out why.

Also, when I add the html file, it shows up in Calibre as a zip file. I can't say I've ever noticed that before so it may be normal behavior.

It's probably something extremely simple that I'm overlooking in a fit of stupidity but, when I started to get frustrated, I decided to come here and ask for advice. Does anyone here have any ideas what may be going on or what I may doing wrong? I'm using version 0.7.49.

Thanks.

- Byron
bfollowell is offline   Reply With Quote
Old 03-29-2011, 03:29 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,145
Karma: 22670164
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
conversion to txt does not support CSS. Convert your epub to MOBI and then to txt.
kovidgoyal is offline   Reply With Quote
Advert
Old 03-29-2011, 03:45 PM   #3
bfollowell
Fanatic
bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.
 
Posts: 541
Karma: 1152752
Join Date: Aug 2010
Location: Evansville, IN, USA
Device: Samsung Galaxy Tab 4 Nook & Samsung Galaxy Tab S 10.5
Quote:
Originally Posted by kovidgoyal View Post
conversion to txt does not support CSS. Convert your epub to MOBI and then to txt.
I tried but it made no difference. The mobi shows all the proper text formatting but it all gets lost upon converting to markdown text.

I have the latest version of Calibre installed at home. I'll try again when I get to my home desktop and see what happens.

Also, I'm not really certain what you mean by "conversion to txt does not support CSS"?

I've converted to markdown text from an epub before and I'm pretty sure it used css. If the markdown txt conversion engine isn't compatible with css, what type of text formatting is it compatible with?

Thanks.

- Byron
bfollowell is offline   Reply With Quote
Old 03-29-2011, 03:48 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,145
Karma: 22670164
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
The point is that when converting something to txt, calibre does not handle formatting specified as css, only formatting intrinsic to html tags, so for example <i>italics</i> will work but <span style="font-style:italic">italics</span> will not.
kovidgoyal is offline   Reply With Quote
Old 03-29-2011, 03:54 PM   #5
bfollowell
Fanatic
bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.
 
Posts: 541
Karma: 1152752
Join Date: Aug 2010
Location: Evansville, IN, USA
Device: Samsung Galaxy Tab 4 Nook & Samsung Galaxy Tab S 10.5
Quote:
Originally Posted by kovidgoyal View Post
The point is that when converting something to txt, calibre does not handle formatting specified as css, only formatting intrinsic to html tags, so for example <i>italics</i> will work but <span style="font-style:italic">italics</span> will not.
Gotcha. I had a feeling that's what you meant but wanted to make certain.

Unfortunately this is how this particular file is setup and I know of no way to correct this short of going through the file manually, in which case I wouldn't need to do the markdown anyway because I'd correct the other sloppy formatting while I was in there. I'll attach a sample below.

I'll keep chugging away. Thanks much for the info.

- Byron

Code:
<p align=center style='text-align:center'><span class=italic1>IN TRIBUTE</span></p>

<p align=center style='text-align:center'><span class=italic1>Billy Wellington</span></p>

<p align=center style='text-align:center'><span class=italic1>(March 12,
1943–October 29, 2007)</span></p>

Last edited by bfollowell; 03-29-2011 at 03:59 PM.
bfollowell is offline   Reply With Quote
Advert
Old 03-29-2011, 05:36 PM   #6
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by bfollowell View Post
Also, when I add the html file, it shows up in Calibre as a zip file. I can't say I've ever noticed that before so it may be normal behavior.
When adding HTML files to Calibre, it will walk through the file and gather everything it links to that is present as a local copy into a ZIP file.
Manichean is offline   Reply With Quote
Old 03-29-2011, 06:10 PM   #7
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
If your document uses images or links be sure to use the keep images and links options. Also, in the case of images I would highly recommend using Textile instead of Markdown. Currently Markdown output has issues with writing images reference in the output.
user_none is offline   Reply With Quote
Old 03-30-2011, 11:17 AM   #8
bfollowell
Fanatic
bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.
 
Posts: 541
Karma: 1152752
Join Date: Aug 2010
Location: Evansville, IN, USA
Device: Samsung Galaxy Tab 4 Nook & Samsung Galaxy Tab S 10.5
Well, this one was much easier than I thought it would be. I'll have to make sure to add this one to my really big book of cool techniques.

I did a basic conversion from html to epub. Luckily for me, every single occurrence of italics was setup using <span class="italic">. This was easy to change using search/replace and replace all of these occurrences with <i>. All the corresponding </span> tags were automatically changed to </i>. From there I was able to do the txt markdown conversion and everything went without a hitch.

Thanks guys.

- Byron
bfollowell is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Markdown HTML problem nhopton Conversion 10 02-10-2011 05:43 PM
Slowness converting markdown book from the GUI in 7.38 ldolse Calibre 4 01-09-2011 10:21 PM
->Txt+Markdown Perkin Calibre 2 12-11-2010 04:04 AM
Capture intermediate html from markdown Agama Calibre 3 07-30-2010 11:33 AM
Anyone else having trouble lately converting HTML files in Calibre? ficbot Workshop 1 07-27-2009 04:15 AM


All times are GMT -4. The time now is 02:37 AM.


MobileRead.com is a privately owned, operated and funded community.