12-19-2010, 10:37 AM | #1 |
Connoisseur
Posts: 99
Karma: 170
Join Date: Nov 2010
Location: Airdrie Alberta
Device: Sony 650
|
'Heading color' and 'p class span'
I have tried
remove_attributes = ['style', 'font','font color'] and remove_attributes = ['style', 'font','color'] when trying to get rid of color in h with no luck. h2><font color="#33cccc">WHEN SHOULD I SEE A DOCTOR? </font><br></h2> Also I cannot remove span name through the usual channels <span name="KonaFilter"> dict(name='span', attrs={'name':['KonaFilter']}), also no luck with p class span either. Any ideas. |
12-20-2010, 09:54 AM | #2 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
|
Advert | |
|
12-20-2010, 08:28 PM | #3 |
Connoisseur
Posts: 99
Karma: 170
Join Date: Nov 2010
Location: Airdrie Alberta
Device: Sony 650
|
Hope I can find somewhere to learn that
|
12-21-2010, 03:11 PM | #4 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Here's an example from a recipe:
Code:
preprocess_regexps = [ (re.compile(r'<body.*?<div class="pad_10L10R">', re.DOTALL|re.IGNORECASE), lambda match: '<body><div>'), (re.compile(r'</div>.*</body>', re.DOTALL|re.IGNORECASE), lambda match: '</div></body>'), (re.compile('\r'),lambda match: ''), (re.compile(r'<!-- .+? -->', re.DOTALL|re.IGNORECASE), lambda match: ''), (re.compile(r'<link .+?>', re.DOTALL|re.IGNORECASE), lambda match: ''), (re.compile(r'<script.*?</script>', re.DOTALL|re.IGNORECASE), lambda match: ''), (re.compile(r'<noscript.*?</noscript>', re.DOTALL|re.IGNORECASE), lambda match: ''), (re.compile(r'<meta .*?/>', re.DOTALL|re.IGNORECASE), lambda match: ''), ] |
12-21-2010, 07:01 PM | #5 |
Connoisseur
Posts: 99
Karma: 170
Join Date: Nov 2010
Location: Airdrie Alberta
Device: Sony 650
|
Thanks I will look in to that as soon as I can
|
Advert | |
|
12-21-2010, 11:48 PM | #6 |
Connoisseur
Posts: 99
Karma: 170
Join Date: Nov 2010
Location: Airdrie Alberta
Device: Sony 650
|
Missing Something
When I do this I get rid of articles with Video in the title but Gallery is ignored in the url when it clearly is in the url. Not only that but when I tried removing .upper in line 5 it would not work even though Video in my titles only has first letter Upper Case. On trying to delete pages with Gallery in the title I added another instance of the def parse feed but reversed where VIDEO and GALLERY are in the recipe and then I got rid of Gallery pages but not Video pages. Seems the 2nd instance overrode the first. Is there any way to combine both VIDEO and GALLERY in one.
Hope I made myself clear. It seems that the first part works fine for me but not the second. Spoiler:
|
12-22-2010, 08:51 AM | #7 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
I didn't closely follow what worked for you, but if they worked separately, you can just run them separately with an elif: Code:
if 'VIDEO' in article.title.upper(): feed.articles.remove(article) elif 'GALLERY' in article.url.upper(): feed.articles.remove(article) return feeds |
|
12-22-2010, 09:02 PM | #8 |
Connoisseur
Posts: 99
Karma: 170
Join Date: Nov 2010
Location: Airdrie Alberta
Device: Sony 650
|
Thanks.
I had managed to get the original parse-feed to work putting 'gallery' before 'video'. I could not get it to work the other way around. However I like your idea better in fact in will it be included in my template for the start of a new recipes. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
PRS-650 SD Card Importance? SDHC, SDHC Class 4, Class 10 etc is it important | Renji | Sony Reader | 11 | 12-03-2011 12:30 PM |
yet another heading question | jhempel24 | Sigil | 3 | 11-25-2010 07:58 AM |
Span tags, h1s and emspaces | ConorHughes | ePub | 11 | 09-30-2010 05:00 PM |
STREET & CLAIRVOYANCE by Ryan A. Span | Winter | Self-Promotions by Authors and Publishers | 36 | 09-01-2010 11:09 AM |
PRS-500 Span tags in LRS and LRF files -- do I understand them? | Falstaff | Sony Reader Dev Corner | 2 | 01-31-2007 10:34 AM |