09-25-2010, 08:20 PM | #1 |
Member
Posts: 12
Karma: 10
Join Date: Sep 2010
Device: Kindle
|
recipe to pull web page similar to 'print/save as pdf'
Let me apologize right up front for my lack of savvy in html, calibre, or programming of any sort. I've been creating eBooks from instructional web sites for my Kindle DX. The web sites are typically set up like book chapters; from the TOC you select a 'chapter', and you click 'next/back' to navigate within each chapter. I go page by page, and do a file/print/save as pdf. Then I open them with Acrobat Pro to customize the metadata. Then I send them to Amazon for conversion, but as they often have scientific notation, figures, etc., they don't convert well, so I end up USB synching and dragging the pdf file from my Mac to the Kindle. Then repeat for every page...
I just discovered Calibre, and thought I'd found salvation. While all the articles are focused on news feeds, I thought it should be simple enough to create a custom 'news source', and use each url of the web page I want as the 'feed'. Wrong. All I end up with are strings of html code. Am I trying to do something that can't be done, or is it not as simple as just entering a web page's URL into the 'feed' field? Any help would appreciated. |
09-25-2010, 09:10 PM | #2 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Can it be done? Probably, but you'd need to custom write it to do what you need. You could look at some of the multipage recipes. You might also consider web scrapers like wget, web2disk, WinHtTrack, etc. |
|
Advert | |
|
09-25-2010, 11:32 PM | #3 |
Member
Posts: 12
Karma: 10
Join Date: Sep 2010
Device: Kindle
|
Thanks for the tips Starson17, and for verifying that there's more to this than I suspected. I suppose I'll keep plugging along with pdfs as before while trying to 'go to school' on this subject. Although even the calibre's basic getting started tutorials are over my head right now.
|
09-26-2010, 03:00 AM | #4 |
Member
Posts: 12
Karma: 10
Join Date: Sep 2010
Device: Kindle
|
no luck bookit or instapaper; html source worked, but no images
I tried using Bookit to convert web pages to mobi, but ran into the same brick wall of an error message others noted. Then I tried instapaper, as Calbre has a recipe for 'read later' web pages, but it didn't preserve any of the web page formatting or images. So then tried just viewing the page source of the web page I wanted to convert, saved it as a file, added it as a book to my Calibre library, and did a mobi convert. It worked almost perfectly, preserved all the formatting, but the fatal flaw was it just had boxes with '?' icons where the images should be - it was not pulling the embedded images, e.g. '<img src="redliq.gif" width="251" height="110" /></p><pre>', where if you click on the gif link in the source it brings up the image, but it's not making it into the eBook.
If anyone has any suggestions it would be most welcome. |
09-26-2010, 09:42 AM | #5 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
|
|
Advert | |
|
09-26-2010, 01:16 PM | #6 |
Member
Posts: 12
Karma: 10
Join Date: Sep 2010
Device: Kindle
|
almost with iMacros, but unwanted image on top of converted eBook page
duh, I was saving the html file to my computer, and using that file to convert to mobi, but of course all the paths to the images now pointed to where the file was on my computer, while the actual images were on the web site's server.
So I spent the night crawling through the website, going to 'page info/media' for every page, selecting every img, and saving the 2,000 collected .gifs to the same folder the html files were in. Now Calibre gave me a complete mobi with all the images, with one flaw - it plops one of the images at the top of every page. But I was more concerned with not having to repeat what I'd just done for every site, so after much searching found a wonderful FF web-scraper plug-in, iMacros ( https://addons.mozilla.org/en-US/firefox/addon/3863/ ), that will save all the web files, html and imgs. This is an enormous time saver, but I still get the unwanted image at the top of every converted mobi. Any ideas, short of learning to use Sigil and editing them as ePubs (which ain't gonna happen)? Here's an example of one of the web pages I'm trying to convert: http://www.chemguide.co.uk/analysis/...ation.html#top In any zip file I convert to mobi in Calibre, there will be an image at position 1.0 of the eBook. |
09-26-2010, 02:23 PM | #7 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
|
|
09-26-2010, 04:05 PM | #8 |
Member
Posts: 12
Karma: 10
Join Date: Sep 2010
Device: Kindle
|
Thanks for the assistance. I tried it following your protocol, but still get one of the page images at position 1.0, above the 'Electromagnetic Radiation' page heading where the book should actually start. The image still shows correctly where it's supposed to as well. It's almost as if the image is being inserted at the beginning as a book cover, although to the right it shows the generic book image. I don't know what I'm doing differently than you. I'm using a PPC Mac w/ OS 10.5.8, FF 3.6.10, and Calibre 0.7.20.
|
09-26-2010, 04:11 PM | #9 |
Member
Posts: 12
Karma: 10
Join Date: Sep 2010
Device: Kindle
|
I don't know if this is related, but when I quite calibre i get this error message:
ERROR: ERROR: Unhandled exception: <b>IOError</b>:[Errno 2] No such file or directory: '/var/folders/3g/3g++kTeeHJmwGtYBJz9CQk+++TI/-Tmp-/calibre_0.7.20_tmp_gFSqaR/ipc_result_1_7_q_9c8r.pickle' Traceback (most recent call last): File "/Applications/calibre.app/Contents/Resources/Python/lib/python2.6/site.py", line 147, in main return run_entry_point() File "/Applications/calibre.app/Contents/Resources/Python/lib/python2.6/site.py", line 116, in run_entry_point return getattr(pmod, func)() File "site-packages/calibre/utils/ipc/worker.py", line 101, in main IOError: [Errno 2] No such file or directory: '/var/folders/3g/3g++kTeeHJmwGtYBJz9CQk+++TI/-Tmp-/calibre_0.7.20_tmp_gFSqaR/ipc_result_1_7_q_9c8r.pickle' |
09-26-2010, 04:14 PM | #10 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Are you looking at the saved html, an epub or some other converted format? Perhaps you want to ask up in the main forum, as this isn't really a recipe issue and you may find more focused help there.
|
09-26-2010, 04:36 PM | #11 |
Member
Posts: 12
Karma: 10
Join Date: Sep 2010
Device: Kindle
|
I save the FF page and drag the html file to calibre; at this point it's zip, and I haven't opened anything yet. I then convert to mobi, and it's then that I view the converted file and there's an image at position 1.0 where the content should actually begin. I'm happy to take this to another forum, but before that I'd like to try and understand why you're conversion of the same web page is rendering correctly, without this stray image, and mine is not.
|
09-27-2010, 09:15 AM | #12 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
|
|
09-27-2010, 02:30 PM | #13 |
creator of calibre
Posts: 44,565
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
MOBI doesn't support floating images, so calibre puts em where they appear in the source document markup
|
09-27-2010, 02:43 PM | #14 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Perhaps you should just edit the epub before conversion? |
|
09-28-2010, 10:21 PM | #15 |
Member
Posts: 12
Karma: 10
Join Date: Sep 2010
Device: Kindle
|
edit epub before conversion
I think editing the epub before conversion sounds like the best approach for this. Does that require learning Sigil, or is there a simpler, more basic editor for such minor edits that would be approachable to a newbie? And do I convert from zip to epub first, edit, then convert to mobi?. If you can advise the tools and basic approach I need, I can take further questions to another forum. I appreciate all your help. Thanks.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
would like a recipe to pull down a free online book | N13L5 | Recipes | 17 | 10-09-2010 11:38 AM |
Financial Times / FT - help creating a UK print edition recipe | ndeb123 | Recipes | 1 | 09-29-2010 11:55 AM |
Recipe - save some date for later retrieval | mh445 | Calibre | 3 | 07-19-2010 05:06 PM |
Anyway to save a web page as an RTF? | Fugubot | Sony Reader | 16 | 02-06-2007 01:23 PM |
Print magazines are better when they emulate the web | Bob Russell | News | 0 | 05-18-2006 06:53 PM |