11-03-2011, 06:34 PM | #16 |
Member
Posts: 24
Karma: 12
Join Date: Oct 2011
Device: Xperia Active, Iconia A500, Galaxy I5500
|
Download EBooks in any format from Website
This is building up on "Recipe to download an EPUB from feed" by Starsom17.
You can use it to download all EBooks offered from a News Website, in all formats you like (epub, pdf, mobi, ...). To see how it works, first take a look at Starsom17's post. His trick is needed to cheat the recipe process so that it gets some epub to work on. Additionally, this recipe looks for links to other EBook formats, downloads them to a common temporary directory and then applies a system call "calibredb add -1 dir", so that all formats are added to the calibre db as one single logical book. If there are several logical books to download, you'll need to create a directory and make a system call for each one (or, don't use the -1 option, if there is only one format per book). Note: I have tested this on Linux and it works fine. Maybe on other OS one has to tweak the system call. Spoiler:
|
11-09-2011, 11:01 AM | #17 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Multiple Page Sites
This is not my code, but there have been many requests for code to handle sites where each article is split into multiple pages. At the bottom of each page will be a button to go to the next page. Here is typical code from Darko Miletic's builtin recipe for Adventure Gamers that is used in this situation:
You may want to look at the source for an article at Adventure Gamers with FireBug or equivalent. The append_page code identifies each "next page" button, follows the link it points to ("nexturl"), finds the article text on that next page, inserts that text into the first page beneath the article text found on the first page, and recursively reiterates that process until the last page (identified by not having the "next page" button) is found. The append_page code is then used in preprocess_html. Spoiler:
|
Advert | |
|
11-21-2011, 09:56 PM | #18 |
onlinenewsreader.net
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
Masthead logos and cover pages images for Kindle Fire
Kindle Fire treats masthead logos differently than its e-ink cousins, and they end up not looking as good as on e-ink readers. The Fire automatically scales the logos and color-inverts them (so black becomes white, red become turquoise, etc.). The logo is displayed an an almost-black background (it's actually a slight gradiant).
The Fire also displays the publication front page on the Newsstand bookshelf, so this encouraged me to go looking for a source of these front page images instead of looking at the default calibre image. The following code fragments can be inserted into your custom recipe to invoke a custom masthead logo and a front page image (if it's available). Spoiler:
Note that when you develop a masthead logo, plan for it to be color-inverted (so if you want the original color, provide the color-inverted version as the logo). The background should be R/G/B 211/211/211 and (after being inverted) it will blend with the Fire background to appear transparent. If you are really picky you can make the background pretty well perfect by using a linear gradiant (top to bottom) of 211/211/211 to 214/214/214. The size of the logo isn't all that important since the Fire will scale it, but logos at least 250 pixels wide will look better than smaller ones since upscaling doesn't work as well as reduction. I have atached 4 Fire-friendly logos in a ZIP file (NY Times, Wall Street Journal, Globe and Mail, National Post). Last edited by Starson17; 11-22-2011 at 09:42 AM. |
01-10-2012, 02:21 AM | #19 | |
Old Linux User
Posts: 36
Karma: 12
Join Date: Jan 2012
Device: NST
|
Quote:
PHP Code:
|
|
01-19-2012, 03:35 PM | #20 |
Old Linux User
Posts: 36
Karma: 12
Join Date: Jan 2012
Device: NST
|
Some sites need to submit login information twice. Bellow is an example that worked with MWJournal. It submit the credentials 1st, then saves the outcome to the system temp location, then open it again and submit. In this case the 2nd page didn't have a form a fill so just submit. Some other sites may need more info to be filled then follow normal procedure to fill and submit.
Spoiler:
|
Advert | |
|
02-10-2012, 02:02 PM | #21 |
Old Linux User
Posts: 36
Karma: 12
Join Date: Jan 2012
Device: NST
|
Embed images into an ebook
Some sites don't include the figures/images into articles and instead the reader needs to click on an href link to see the image/figure. This wouldn't be possible on many ebook readers. To embed the images into output ebook, the tag type needs to be changed from <a> to <img>. Also the "href" property needs to be changed to "src". The following code does the job by looking for all the links to jpg files, then changed them to <img> tags.The code should be included into preprocess_html
Spoiler:
|
06-13-2012, 07:55 PM | #22 |
Guru
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
How to search for a specific part of tag attribute:
Code:
dict(attrs={'someattribute':re.compile('(^|| )somestring($|| )', re.DOTALL)}) Code:
remove_tags = [ dict(attrs={'class':re.compile('(^|| )Sample($|| )', re.DOTALL)}) ] |
06-14-2012, 12:38 AM | #23 |
creator of calibre
Posts: 44,679
Karma: 24967300
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
@kiklop74: An easier way would be:
Code:
remove_tags = [ dict(attrs={'class':lambda x: x and 'Sample' in x.split()}), ] |
12-09-2012, 02:23 PM | #24 |
Guru
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
Sometimes sites can be badly implemented or overloaded so that first fetch of an article fails but second or third passes OK. To add that functionality to the calibre recipe you can use this approach:
Code:
# In the include section add this from calibre.ptempfile import PersistentTemporaryFile #later in the recipe class add this class MyRecipeclass(BasicNewsRecipe): # ... temp_files = [] articles_are_obfuscated = True # and than somewhere in the class add this method def get_obfuscated_article(self, url): count = 0 attempts = 4 html = None while (count < attempts): try: response = self.browser.open(url) html = response.read() count = attempts except: print "Retrying download..." count += 1 if html is None: pass tfile = PersistentTemporaryFile('_fa.html') tfile.write(html) tfile.close() self.temp_files.append(tfile) return tfile.name |
02-12-2013, 08:16 AM | #25 |
Guru
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
If you would like to add series support for some of your recipes this is what needs to be done:
Code:
def get_cover_url(self): soup = self.index_to_soup('someurl') #determine somehow the series number of the publication # and store it in seriesnr variable self.conversion_options.update({'series':'My series name'}) self.conversion_options.update({'series_index':seriesnr}) # code for cover url if any return None All this applies mostly to EPUB the rest of the formats AFAIK do not offer a chance to store this metadata. |
06-25-2013, 07:14 AM | #26 |
Junior Member
Posts: 7
Karma: 10
Join Date: May 2013
Device: K3 (Keyboard)
|
Can I collect clips, translate them into Polish, put in ebook and publish on the Polish forum? The goal is to make them available to users, who don't speak English. I would like to have permission to publish it.
TIA |
06-25-2013, 10:52 AM | #27 |
Guru
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
I doubt it would be a problem. Kovid is the owner of this forum so it is his call in the end.
|
06-25-2013, 03:23 PM | #28 |
creator of calibre
Posts: 44,679
Karma: 24967300
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Feel free to do so, I have no objections.
|
06-27-2013, 03:23 AM | #29 |
Junior Member
Posts: 7
Karma: 10
Join Date: May 2013
Device: K3 (Keyboard)
|
Thanks a lot
|
12-16-2013, 05:04 PM | #30 | |
Connoisseur
Posts: 97
Karma: 10
Join Date: Sep 2013
Device: Kindle Paperwhite (2012)
|
Quote:
Code:
Last edited by sup; 01-14-2014 at 01:50 PM. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
DR800 The working (usable) screen resolution | PaulS | iRex | 7 | 04-23-2010 01:27 PM |
Let's create a source code repository for DR 800 related code? | jraf | iRex | 3 | 03-11-2010 01:26 PM |
any usable epub reader? | janw | iRex | 10 | 09-04-2009 01:25 PM |
FICTIONWISE, still usable? | jcbeam | Amazon Kindle | 4 | 03-19-2009 02:17 PM |
iLiad usable for scientists? | doctorow | iRex | 5 | 08-14-2006 06:00 PM |