Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 12-23-2010, 01:32 PM   #1
jackmason
Junior Member
jackmason began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Dec 2010
Device: Kindle
Help With Recipe from Smarter Planet | Tumblr Site

I'm trying to clean up my recipe for the the Smarter Planet | Tumblr site (I'm using the mobile RSS as the source http://smarterplanet.tumblr.com/mobile/rss

But I can't seem to find the right way to strip out the "about" content, disqus comment system, footer etc. that is cropping up after each post...I really just want to bundle up the posts as cleanly by themselves as possible. Am I missing something obvious or simple?

Here's the section I'm trying to eliminate with remove tag code... my full recipe is at the end. Many thanks Calibre team for such an amazing tool.

One other question...how do we best share the recipe or make it easy for people to discover and put it to use?



We're moving from the "World Wide Web" to a "Web Wide World"
one that is instrumented, interconnected and intelligent.


New Intelligence Internet of Things Cities * Water
Healthcare * Energy Traffic * Food Buildings * Telecom Safety * Retail
IT Infrastructure Banking * Education Work * Government Products * Rail Cloud Computing Oil


Home
Archive
Random
Follow Us!
About
Smarter Planet Widgets
Got a Smarter Planet Question? Ask Here
Scaffold theme by Mike Harding.
RSS feed
Random
Mobile
Powered by Tumblr




Code:
class AdvancedUserRecipe1293122276(BasicNewsRecipe):
    title          = u'Smarter Planet | Tumblr for eReaders'
    oldest_article = 7
    max_articles_per_feed = 30 
    no_stylesheets = True
    use_embedded_content = False
    masthead_url          = 'http://30.media.tumblr.com/tumblr_l70dow9UmU1qzs4rbo1_r3_250.jpg'
    remove_tags_before = dict(name='h1')  
    remove_tags    = [dict(name='div', attrs={'id':'sidebar'})]
    remove_tags    = [dict(name='div', attrs={'id':'about'})]
    remove_tags    = [dict(name='div', attrs={'id':'footer'})]
    remove_tags    = [dict(name='div', attrs={'id':'disqus'})]
    remove_tags    = [dict(name='div', attrs={'id':'disqus_thread'})]
    remove_tags    = [dict(name='div', attrs={'id':'nav'})]
    remove_tags    = [dict(name='div', attrs={'id':'notes'})]
    remove_tags    = [dict(name='div', attrs={'id':'description'})]
    remove_tags    = [dict(name='div', attrs={'id':'likes_container'})]
  
    feeds          = [(u'Smarter Planet Tumblr', u'http://smarterplanet.tumblr.com/mobile/rss')]

Last edited by jackmason; 12-23-2010 at 01:40 PM. Reason: fixed typos
jackmason is offline   Reply With Quote
Old 12-23-2010, 01:49 PM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by jackmason View Post
But I can't seem to find the right way to strip out the "about" content, disqus comment system, footer etc. that is cropping up after each post...I really just want to bundle up the posts as cleanly by themselves as possible. Am I missing something obvious or simple?
You should use a keep_only_tags statement to keep the primary stuff you want, then remove from inside that. A common approach is to keep_only the title and the main article body, then remove tags that are inside the kept tags. remove tags before and after a specified tag also works well. Last, check to see if there's a print version that already removes the junk.
Quote:
One other question...how do we best share the recipe or make it easy for people to discover and put it to use?
Post here, or in the bug tracker.
Starson17 is offline   Reply With Quote
Advert
Old 01-04-2011, 03:23 PM   #3
jackmason
Junior Member
jackmason began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Dec 2010
Device: Kindle
Smarter Planet Recipe

Starson, thanks for the advice....I couldn't get the keep tags approach to work (when I used the id 'item' I lost all the content but the table of contents. But when I tried remove before and after with 'item' and the following group of remove tag code, I seemed to get a decent result. Would like to know how it works for others on their Kindles, what else my recipe should include, etc.

Here's the recipe

class AdvancedUserRecipe1293122276(BasicNewsRecipe):
title = u'Smarter Planet | Tumblr for eReaders'
oldest_article = 7
max_articles_per_feed = 30
no_stylesheets = True
use_embedded_content = False
masthead_url = 'http://30.media.tumblr.com/tumblr_l70dow9UmU1qzs4rbo1_r3_250.jpg'
remove_tags_before = dict(id='item')
remove_tags_after = dict(id='item')
remove_tags = [dict(attrs={'class':['sidebar', 'about', 'footer', 'description,' 'disqus', 'nav', 'notes', 'disqus_thread']}),
dict(id=['sidebar', 'footer', 'disqus', 'nav', 'notes', 'likes_container', 'description', 'disqus_thread', 'about']),
dict(name=['script', 'noscript', 'style'])]



feeds = [(u'Smarter Planet Tumblr', u'http://smarterplanet.tumblr.com/mobile/rss')]
jackmason is offline   Reply With Quote
Old 05-13-2011, 09:32 AM   #4
phiznlil
Member
phiznlil began at the beginning.
 
Posts: 16
Karma: 12
Join Date: Mar 2011
Device: kindle 3
This works for me, much simpler than trying to remove everything

keep_only_tags = [dict(name='div', attrs={'class':'panel_content base_format'})]
phiznlil is offline   Reply With Quote
Old 05-13-2011, 10:03 AM   #5
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by phiznlil View Post
This works for me, much simpler than trying to remove everything

keep_only_tags = [dict(name='div', attrs={'class':'panel_content base_format'})]
If you have a working and complete recipe, it's helpful to post the entire recipe (in code and spoiler tags) along with your comment as to how it differs, or why it's better. That way someone reading this thread doesn't have to piece together parts of different posts/recipes if they want to use your recipe.
Starson17 is offline   Reply With Quote
Advert
Reply

Tags
ibm, smarterplanet, tumblr


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Recipe works when mocked up as Python file, fails when converted to Recipe ode Recipes 7 09-04-2011 04:57 AM
Content review: Kindle, Calibre, and how ebooks can make you smarter faster Lancer Kind Amazon Kindle 4 12-25-2010 02:47 AM
Review about Kindle and Calibre, and how ebooks make you smarter faster Lancer Kind Calibre 5 10-17-2010 01:14 PM
Liberals and Atheists Smarter? kennyc Lounge 31 09-06-2010 04:27 PM
Can someone smarter then me explain what this means? Dragoro Amazon Kindle 14 02-28-2009 12:06 PM


All times are GMT -4. The time now is 07:28 PM.


MobileRead.com is a privately owned, operated and funded community.