Help With Recipe from Smarter Planet | Tumblr Site

jackmason · 12-23-2010, 01:32 PM

I'm trying to clean up my recipe for the the Smarter Planet | Tumblr site (I'm using the mobile RSS as the source

http://smarterplanet.tumblr.com/mobile/rss

But I can't seem to find the right way to strip out the "about" content, disqus comment system, footer etc. that is cropping up after each post...I really just want to bundle up the posts as cleanly by themselves as possible. Am I missing something obvious or simple?

Here's the section I'm trying to eliminate with remove tag code... my full recipe is at the end. Many thanks Calibre team for such an amazing tool.

One other question...how do we best share the recipe or make it easy for people to discover and put it to use?

We're moving from the "World Wide Web" to a "Web Wide World"
one that is instrumented, interconnected and intelligent.

New Intelligence Internet of Things Cities * Water
Healthcare * Energy Traffic * Food Buildings * Telecom Safety * Retail
IT Infrastructure Banking * Education Work * Government Products * Rail Cloud Computing Oil

Home
Archive
Random
Follow Us!
About
Smarter Planet Widgets
Got a Smarter Planet Question? Ask Here
Scaffold theme by Mike Harding.
RSS feed
Random
Mobile
Powered by Tumblr

Code:

class AdvancedUserRecipe1293122276(BasicNewsRecipe):
    title          = u'Smarter Planet | Tumblr for eReaders'
    oldest_article = 7
    max_articles_per_feed = 30 
    no_stylesheets = True
    use_embedded_content = False
    masthead_url          = 'http://30.media.tumblr.com/tumblr_l70dow9UmU1qzs4rbo1_r3_250.jpg'
    remove_tags_before = dict(name='h1')  
    remove_tags    = [dict(name='div', attrs={'id':'sidebar'})]
    remove_tags    = [dict(name='div', attrs={'id':'about'})]
    remove_tags    = [dict(name='div', attrs={'id':'footer'})]
    remove_tags    = [dict(name='div', attrs={'id':'disqus'})]
    remove_tags    = [dict(name='div', attrs={'id':'disqus_thread'})]
    remove_tags    = [dict(name='div', attrs={'id':'nav'})]
    remove_tags    = [dict(name='div', attrs={'id':'notes'})]
    remove_tags    = [dict(name='div', attrs={'id':'description'})]
    remove_tags    = [dict(name='div', attrs={'id':'likes_container'})]
  
    feeds          = [(u'Smarter Planet Tumblr', u'http://smarterplanet.tumblr.com/mobile/rss')]

Starson17 · 12-23-2010, 01:49 PM

Quote:

Originally Posted by jackmason

But I can't seem to find the right way to strip out the "about" content, disqus comment system, footer etc. that is cropping up after each post...I really just want to bundle up the posts as cleanly by themselves as possible. Am I missing something obvious or simple?

You should use a keep_only_tags statement to keep the primary stuff you want, then remove from inside that. A common approach is to keep_only the title and the main article body, then remove tags that are inside the kept tags. remove tags before and after a specified tag also works well. Last, check to see if there's a print version that already removes the junk.

Quote:

One other question...how do we best share the recipe or make it easy for people to discover and put it to use?

Post here, or in the bug tracker.

jackmason · 01-04-2011, 03:23 PM

Starson, thanks for the advice....I couldn't get the keep tags approach to work (when I used the id 'item' I lost all the content but the table of contents. But when I tried remove before and after with 'item' and the following group of remove tag code, I seemed to get a decent result. Would like to know how it works for others on their Kindles, what else my recipe should include, etc.

Here's the recipe

class AdvancedUserRecipe1293122276(BasicNewsRecipe):
title = u'Smarter Planet | Tumblr for eReaders'
oldest_article = 7
max_articles_per_feed = 30
no_stylesheets = True
use_embedded_content = False
masthead_url = 'http://30.media.tumblr.com/tumblr_l70dow9UmU1qzs4rbo1_r3_250.jpg'
remove_tags_before = dict(id='item')
remove_tags_after = dict(id='item')
remove_tags = [dict(attrs={'class':['sidebar', 'about', 'footer', 'description,' 'disqus', 'nav', 'notes', 'disqus_thread']}),
dict(id=['sidebar', 'footer', 'disqus', 'nav', 'notes', 'likes_container', 'description', 'disqus_thread', 'about']),
dict(name=['script', 'noscript', 'style'])]

feeds = [(u'Smarter Planet Tumblr', u'http://smarterplanet.tumblr.com/mobile/rss')]

phiznlil · 05-13-2011, 09:32 AM

This works for me, much simpler than trying to remove everything

keep_only_tags = [dict(name='div', attrs={'class':'panel_content base_format'})]

Starson17 · 05-13-2011, 10:03 AM

Quote:

Originally Posted by phiznlil

This works for me, much simpler than trying to remove everything

keep_only_tags = [dict(name='div', attrs={'class':'panel_content base_format'})]

If you have a working and complete recipe, it's helpful to post the entire recipe (in code and spoiler tags) along with your comment as to how it differs, or why it's better. That way someone reading this thread doesn't have to piece together parts of different posts/recipes if they want to use your recipe.

12-23-2010, 01:32 PM	#1
jackmason Junior Member Posts: 4 Karma: 10 Join Date: Dec 2010 Device: Kindle	Help With Recipe from Smarter Planet \| Tumblr Site I'm trying to clean up my recipe for the the Smarter Planet \| Tumblr site (I'm using the mobile RSS as the source http://smarterplanet.tumblr.com/mobile/rss But I can't seem to find the right way to strip out the "about" content, disqus comment system, footer etc. that is cropping up after each post...I really just want to bundle up the posts as cleanly by themselves as possible. Am I missing something obvious or simple? Here's the section I'm trying to eliminate with remove tag code... my full recipe is at the end. Many thanks Calibre team for such an amazing tool. One other question...how do we best share the recipe or make it easy for people to discover and put it to use? We're moving from the "World Wide Web" to a "Web Wide World" one that is instrumented, interconnected and intelligent. New Intelligence Internet of Things Cities * Water Healthcare * Energy Traffic * Food Buildings * Telecom Safety * Retail IT Infrastructure Banking * Education Work * Government Products * Rail Cloud Computing Oil Home Archive Random Follow Us! About Smarter Planet Widgets Got a Smarter Planet Question? Ask Here Scaffold theme by Mike Harding. RSS feed Random Mobile Powered by Tumblr Code: class AdvancedUserRecipe1293122276(BasicNewsRecipe): title = u'Smarter Planet \| Tumblr for eReaders' oldest_article = 7 max_articles_per_feed = 30 no_stylesheets = True use_embedded_content = False masthead_url = 'http://30.media.tumblr.com/tumblr_l70dow9UmU1qzs4rbo1_r3_250.jpg' remove_tags_before = dict(name='h1') remove_tags = [dict(name='div', attrs={'id':'sidebar'})] remove_tags = [dict(name='div', attrs={'id':'about'})] remove_tags = [dict(name='div', attrs={'id':'footer'})] remove_tags = [dict(name='div', attrs={'id':'disqus'})] remove_tags = [dict(name='div', attrs={'id':'disqus_thread'})] remove_tags = [dict(name='div', attrs={'id':'nav'})] remove_tags = [dict(name='div', attrs={'id':'notes'})] remove_tags = [dict(name='div', attrs={'id':'description'})] remove_tags = [dict(name='div', attrs={'id':'likes_container'})] feeds = [(u'Smarter Planet Tumblr', u'http://smarterplanet.tumblr.com/mobile/rss')] Last edited by jackmason; 12-23-2010 at 01:40 PM. Reason: fixed typos

01-04-2011, 03:23 PM	#3
jackmason Junior Member Posts: 4 Karma: 10 Join Date: Dec 2010 Device: Kindle	Smarter Planet Recipe Starson, thanks for the advice....I couldn't get the keep tags approach to work (when I used the id 'item' I lost all the content but the table of contents. But when I tried remove before and after with 'item' and the following group of remove tag code, I seemed to get a decent result. Would like to know how it works for others on their Kindles, what else my recipe should include, etc. Here's the recipe class AdvancedUserRecipe1293122276(BasicNewsRecipe): title = u'Smarter Planet \| Tumblr for eReaders' oldest_article = 7 max_articles_per_feed = 30 no_stylesheets = True use_embedded_content = False masthead_url = 'http://30.media.tumblr.com/tumblr_l70dow9UmU1qzs4rbo1_r3_250.jpg' remove_tags_before = dict(id='item') remove_tags_after = dict(id='item') remove_tags = [dict(attrs={'class':['sidebar', 'about', 'footer', 'description,' 'disqus', 'nav', 'notes', 'disqus_thread']}), dict(id=['sidebar', 'footer', 'disqus', 'nav', 'notes', 'likes_container', 'description', 'disqus_thread', 'about']), dict(name=['script', 'noscript', 'style'])] feeds = [(u'Smarter Planet Tumblr', u'http://smarterplanet.tumblr.com/mobile/rss')]

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Recipe works when mocked up as Python file, fails when converted to Recipe	ode	Recipes	7	09-04-2011 04:57 AM
Content review: Kindle, Calibre, and how ebooks can make you smarter faster	Lancer Kind	Amazon Kindle	4	12-25-2010 02:47 AM
Review about Kindle and Calibre, and how ebooks make you smarter faster	Lancer Kind	Calibre	5	10-17-2010 01:14 PM
Liberals and Atheists Smarter?	kennyc	Lounge	31	09-06-2010 04:27 PM
Can someone smarter then me explain what this means?	Dragoro	Amazon Kindle	14	02-28-2009 12:06 PM

05-13-2011, 09:32 AM	#4
phiznlil Member Posts: 16 Karma: 12 Join Date: Mar 2011 Device: kindle 3	This works for me, much simpler than trying to remove everything keep_only_tags = [dict(name='div', attrs={'class':'panel_content base_format'})]

Advert

Advert