Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 09-18-2010, 10:18 PM   #2761
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by marbs View Post
i am having trouble with my recipe:
Spoiler:
Code:
class AdvancedUserRecipe1283848012(BasicNewsRecipe):
    description   = 'TheMarker'
    cover_url      = 'http://static.ispot.co.il/wp-content/upload/2009/09/themarker.jpg'
    title          = u'The Marker1'
    language       = 'he'
    simultaneous_downloads = 1
    delay                  = 6   
    remove_javascript     = True
    timefmt        = '[%a, %d %b, %Y]'
    oldest_article = 1
    remove_tags = [dict(name='tr', attrs={'bgcolor':['#738A94']})          ]
    max_articles_per_feed = 1000
    extra_css='body{direction: rtl;} .article_description{direction: rtl; } a.article{direction: rtl; } .calibre_feed_description{direction: rtl; }'
    feeds          = [(u'Head Lines', u'http://www.themarker.com/tmc/content/xml/rss/hpfeed.xml'), (u'TA Market', u'http://www.themarker.com/tmc/content/xml/rss/sections/marketfeed.xml'), (u'Real Estate', u'http://www.themarker.com/tmc/content/xml/rss/sections/realEstaterfeed.xml'), (u'Wall Street & Global', u'http://www.themarker.com/tmc/content/xml/rss/sections/wallsfeed.xml'), (u'Law', u'http://www.themarker.com/tmc/content/xml/rss/sections/lawfeed.xml'), (u'Media', u'http://www.themarker.com/tmc/content/xml/rss/sections/mediafeed.xml'), (u'Consumer', u'http://www.themarker.com/tmc/content/xml/rss/sections/consumerfeed.xml'), (u'Career', u'http://www.themarker.com/tmc/content/xml/rss/sections/careerfeed.xml'), (u'Car', u'http://www.themarker.com/tmc/content/xml/rss/sections/carfeed.xml'), (u'High Tech', u'http://www.themarker.com/tmc/content/xml/rss/sections/hightechfeed.xml'), (u'Investor Guide', u'http://www.themarker.com/tmc/content/xml/rss/sections/investorGuidefeed.xml')]
    def print_version(self, url):
       baseURL=url.replace('tmc/article.jhtml?ElementId=', 'ibo/misc/printFriendly.jhtml?ElementId=%2Fibo%2Frepositories%2Fstories%2Fm1_2000%2F')
       s= baseURL + '.xml'
       return s





if not, pleas tell me where to look.
and thank you starson for the help so far. i think this message was posted in orderly fation
I don't understand the 'he' language but i think this will work for you.
What I done was split the returned url and then appended it it sort of like you were doing. I put some print statements in there so you can see what is actually being used as the final print_url when you run
ebook-convert yourecipenamehere.recipe output_dir --test -vv >myrecipe.txt
when you run that you can see the print statements in the myrecipe.txt

use this for your print_url code
Spoiler:

Code:
def print_version(self, url):
        split1 = url.split("=")
        print 'THE SPLIT IS: ', split1 
       
        print_url = 'http://www.themarker.com/ibo/misc/printFriendly.jhtml?ElementId=%2Fibo%2Frepositories%2Fstories%2Fm1_2000%2F' + split1[1]+'.xml'
        print 'THIS URL WILL PRINT: ', print_url # this is a test string to see what the url is it will return
        return print_url
TonytheBookworm is offline  
Old 09-18-2010, 10:31 PM   #2762
bhandarisaurabh
Enthusiast
bhandarisaurabh began at the beginning.
 
Posts: 49
Karma: 10
Join Date: Aug 2009
Device: none
Quote:
Originally Posted by TonytheBookworm View Post
This should work for the CURRENT ARTICLE MONTH/YEAR
It has a form that you select the different year but I'm not sure what the actual true urls are that it uses on that. So I just stuck with the current month year since I figured that is what you would want anyway. If you will look even though September 2010 is selected on the page the article content still says August 18 or whatever. That is the same date that is on the original page.

Anyway the only thing that I don't understand how to do is get the description to drop the text that is inside the <a>. Once that is done I will post an update.


Updated Code to do descr correctly
Spoiler:

Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, re
class IW(BasicNewsRecipe):
    title      = 'Industry Week'
    __author__ = 'Tonythebookworm'
    description = ''
    language = 'en'
    no_stylesheets = True
    publisher           = 'Tonythebookworm'
    category            = 'Manufactoring'
    use_embedded_content= False
    no_stylesheets      = True
    oldest_article      = 40
    remove_javascript   = True
    remove_empty_feeds  = True
    
    max_articles_per_feed = 200 # only gets the first 200 articles
    INDEX = 'http://www.industryweek.com'
    
    
    
    remove_tags = [dict(name='div', attrs={'class':['crumbNav']}),
                   dict(name='i')]
    
    def parse_index(self):
        feeds = []
        for title, url in [
                            (u"Current Month", u"http://www.industryweek.com/Archive.aspx"),
                             ]:
            articles = self.make_links(url)
            if articles:
                feeds.append((title, articles))
        return feeds
        
    def make_links(self, url):
        title = 'Temp'
        current_articles = []
        soup = self.index_to_soup(url)
        
        for item in soup.findAll('a', attrs={'class':'article'}):
         
         link = item['href']
         soup = self.index_to_soup(url)    
         if link:
         
          url         = self.INDEX + link
          title       = self.tag_to_string(item)
          descr    = item.parent
          item.extract()
          descr       = self.tag_to_string(descr)
          #print 'the url is: ', url
          #print 'the title is: ', title
          #print 'the descr is: ', descr
          current_articles.append({'title': title, 'url': url, 'description': descr, 'date':''}) # append all this
        return current_articles
      

   
    def print_version(self, url):
        split1 = url.split("=")
        print_url = 'http://www.industryweek.com/PrintArticle.aspx?ArticleID=' + split1[1]
        
        return print_url
hey thanks for the recipe ,may be they release the content for magazine online one month before it is available for print.anyway thanks You are a genius
bhandarisaurabh is offline  
Advert
Old 09-18-2010, 10:33 PM   #2763
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by bhandarisaurabh View Post
hey thanks for the recipe ,may be they release the content for magazine online one month before it is available for print.anyway thanks You are a genius
Far from a genius, but thanks for the compliment.
TonytheBookworm is offline  
Old 09-18-2010, 11:14 PM   #2764
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Working Copy of Popular Science

Finally got Popular Science to work like I want. It goes 7 days back and also I have it remove any found Gallery: 's for image slide shows and so forth.
Attached Files
File Type: rar popscience.rar (1.2 KB, 263 views)
TonytheBookworm is offline  
Old 09-19-2010, 06:26 AM   #2765
AgiZ
Junior Member
AgiZ began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Aug 2010
Device: Nook
Can i request a recipe or even add the site into release please?
The site is http://slo-tech.com/ and is the best Slovenian tech news site.
Pleeeeease
AgiZ is offline  
Advert
Old 09-19-2010, 08:08 AM   #2766
marbs
Zealot
marbs began at the beginning.
 
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
Quote:
Originally Posted by TonytheBookworm View Post

use this for your print_url code
Spoiler:

Code:
def print_version(self, url):
        split1 = url.split("=")
        print 'THE SPLIT IS: ', split1 
       
        print_url = 'http://www.themarker.com/ibo/misc/printFriendly.jhtml?ElementId=%2Fibo%2Frepositories%2Fstories%2Fm1_2000%2F' + split1[1]+'.xml'
        print 'THIS URL WILL PRINT: ', print_url # this is a test string to see what the url is it will return
        return print_url
i ran it a few times and all the articles seem to be downloading with ebook convert, but when i try it in calibre, i get some empty articles. what to do?

also, i just saw that a small number of articles have a different format. both the web address (is "it.themarker.com" and not "themarker.com"), the way to get the print version, and the page format are different. is there any way to do an "if" or something like that? to deal with 2 different articles in different ways?

thanks again tony. you are a life saver.

Last edited by marbs; 09-19-2010 at 09:44 AM.
marbs is offline  
Old 09-19-2010, 03:58 PM   #2767
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
bad post... code totally wrong..

Last edited by TonytheBookworm; 09-19-2010 at 05:09 PM. Reason: sorry about that.
TonytheBookworm is offline  
Old 09-19-2010, 04:18 PM   #2768
marbs
Zealot
marbs began at the beginning.
 
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
i need to go over your code slowly. i am not sure i understand it at all. can i use it as is? i would love an explanation when you have the time.

BTW, the IT address is "http://it.themarker.com/tmit/article/XXXXX"
and the print version is "http://it.themarker.com/tmit/PrintArticle/XXXXX"

how would you do the clean up for the different pages (or should i just leave it?)

thanks again for all your help. i really do appreciate it.

Last edited by marbs; 09-19-2010 at 04:20 PM.
marbs is offline  
Old 09-19-2010, 04:32 PM   #2769
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by marbs View Post
i need to go over your code slowly. i am not sure i understand it at all. can i use it as is? i would love an explanation when you have the time.

BTW, the IT address is "http://it.themarker.com/tmit/article/XXXXX"
and the print version is "http://it.themarker.com/tmit/PrintArticle/XXXXX"

how would you do the clean up for the different pages (or should i just leave it?)

thanks again for all your help. i really do appreciate it.
thats what i get for posting code without testing it... Anyway.
this might do the trick. (i can't seem to get it to find it.themarket link) so your gonna have to be my eyes in the field on this one. Cause what happens is this. for instance you have cars.themarket.com when it goes to that link it converts it to themarket in the cases i have seen. if you know a specific url that i can test please let me know. because as i'm seeing things like law.themarket and cars.themarket and careers the market all revert to www.themarket.com/xxxxxxxxx and on on

here is what I have come up with thus far. sorry about the previous code.
Spoiler:

Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, re

class AdvancedUserRecipe1283848012(BasicNewsRecipe):
    description   = 'TheMarker'
    cover_url      = 'http://static.ispot.co.il/wp-content/upload/2009/09/themarker.jpg'
    title          = u'The Marker1'
    language       = 'he'
    simultaneous_downloads = 5
    #delay                  = 6   
    remove_javascript     = True
    timefmt        = '[%a, %d %b, %Y]'
    oldest_article = 2
    #remove_tags = [dict(name='tr', attrs={'bgcolor':['#738A94']})          ]
    max_articles_per_feed = 10
    #extra_css='body{direction: rtl;} .article_description{direction: rtl; } a.article{direction: rtl; } .calibre_feed_description{direction: rtl; }'
    feeds          = [(u'Head Lines', u'http://www.themarker.com/tmc/content/xml/rss/hpfeed.xml'), 
                      (u'TA Market', u'http://www.themarker.com/tmc/content/xml/rss/sections/marketfeed.xml'),
                      (u'Real Estate', u'http://www.themarker.com/tmc/content/xml/rss/sections/realEstaterfeed.xml'),
                      (u'Wall Street & Global', u'http://www.themarker.com/tmc/content/xml/rss/sections/wallsfeed.xml'), 
                      (u'Law', u'http://www.themarker.com/tmc/content/xml/rss/sections/lawfeed.xml'), 
                      (u'Media', u'http://www.themarker.com/tmc/content/xml/rss/sections/mediafeed.xml'), 
                      (u'Consumer', u'http://www.themarker.com/tmc/content/xml/rss/sections/consumerfeed.xml'), 
                      (u'Career', u'http://www.themarker.com/tmc/content/xml/rss/sections/careerfeed.xml'), 
                      (u'Car', u'http://www.themarker.com/tmc/content/xml/rss/sections/carfeed.xml'), 
                      (u'High Tech', u'http://www.themarker.com/tmc/content/xml/rss/sections/hightechfeed.xml'), 
                      (u'Investor Guide', u'http://www.themarker.com/tmc/content/xml/rss/sections/investorGuidefeed.xml')]
    ##def print_version(self, url):
    # baseURL=url.replace('tmc/article.jhtml?ElementId=', 'ibo/misc/printFriendly.jhtml?ElementId=%2Fibo%2Frepositories%2Fstories%2Fm1_2000%2F')
     #  print 'BASE IS :', baseURL
      # s= baseURL + '.xml'
       #return s
       #http://www.themarker.com/tmc/article.jhtml?ElementId=zz20100918_6121
       #http://www.themarker.com/ibo/misc/printFriendly.jhtml?ElementId=%2Fibo%2Frepositories%2Fstories%2Fm1_2000%2Fzz20100918_6121.xml
       
       
    def print_version(self, url):
        print 'ORG URL IS: ', url
        split1 = url.split("=")
        print 'THE SPLIT IS: ', split1 
        weblinks = url
      
        if weblinks is not None:
            for link in weblinks:
                #---------------------------------------------------------
                #here we need some help with some regexpressions
                #we are trying to find it.themarker.com in a url
                #-----------------------------------------------------------
                re1='.*?'	# Non-greedy match on filler
                re2='(it\\.themarker\\.com)'	# Fully Qualified Domain Name 1
                rg = re.compile(re1+re2,re.IGNORECASE|re.DOTALL)
                m = rg.search(url)
                
                
                if m:
                 split2 = url.split("article/")
                 print 'FOUND IT: ', url
                 print_url = 'http://it.themarker.com/tmit/PrintArticle/' + split2[1]
                
                else:
                    print_url = 'http://www.themarker.com/ibo/misc/printFriendly.jhtml?ElementId=%2Fibo%2Frepositories%2Fstories%2Fm1_2000%2F' + split1[1]+'.xml'
                 
        print 'THIS URL WILL PRINT: ', print_url # this is a test string to see what the url is it will return
        return print_url

Last edited by TonytheBookworm; 09-19-2010 at 07:07 PM. Reason: modified code to find it.themarker.com error was in regex
TonytheBookworm is offline  
Old 09-19-2010, 05:17 PM   #2770
marbs
Zealot
marbs began at the beginning.
 
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
gmail news

i was reading stuff online.
look what i found:
http://lifehacker.com/157701/get-rss...r-gmail-labels
it is way way WAY out of my capabilities.
could anyone create a news feed for gmail? one that requires a username and password?
the feed address is https://mail.google.com/mail/feed/atom/label/
marbs is offline  
Old 09-19-2010, 06:34 PM   #2771
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by marbs View Post
i need to go over your code slowly. i am not sure i understand it at all. can i use it as is? i would love an explanation when you have the time.

BTW, the IT address is "http://it.themarker.com/tmit/article/XXXXX"
and the print version is "http://it.themarker.com/tmit/PrintArticle/XXXXX"

how would you do the clean up for the different pages (or should i just leave it?)

thanks again for all your help. i really do appreciate it.
look at the updated code I posted. Test that on your end and see if it works for you. I changed the reg expression and it finds the link correctly on my end. it finds it.themarker.com and changes it and anything else it leaves as that themarker.com/********* stuff
here is the code
Spoiler:

Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, re

class AdvancedUserRecipe1283848012(BasicNewsRecipe):
    description   = 'TheMarker'
    cover_url      = 'http://static.ispot.co.il/wp-content/upload/2009/09/themarker.jpg'
    title          = u'The Marker1'
    language       = 'he'
    simultaneous_downloads = 5
    #delay                  = 6   
    remove_javascript     = True
    timefmt        = '[%a, %d %b, %Y]'
    oldest_article = 2
    #remove_tags = [dict(name='tr', attrs={'bgcolor':['#738A94']})          ]
    max_articles_per_feed = 10
    #extra_css='body{direction: rtl;} .article_description{direction: rtl; } a.article{direction: rtl; } .calibre_feed_description{direction: rtl; }'
    feeds          = [(u'Head Lines', u'http://www.themarker.com/tmc/content/xml/rss/hpfeed.xml'), 
                      (u'TA Market', u'http://www.themarker.com/tmc/content/xml/rss/sections/marketfeed.xml'),
                      (u'Real Estate', u'http://www.themarker.com/tmc/content/xml/rss/sections/realEstaterfeed.xml'),
                      (u'Wall Street & Global', u'http://www.themarker.com/tmc/content/xml/rss/sections/wallsfeed.xml'), 
                      (u'Law', u'http://www.themarker.com/tmc/content/xml/rss/sections/lawfeed.xml'), 
                      (u'Media', u'http://www.themarker.com/tmc/content/xml/rss/sections/mediafeed.xml'), 
                      (u'Consumer', u'http://www.themarker.com/tmc/content/xml/rss/sections/consumerfeed.xml'), 
                      (u'Career', u'http://www.themarker.com/tmc/content/xml/rss/sections/careerfeed.xml'), 
                      (u'Car', u'http://www.themarker.com/tmc/content/xml/rss/sections/carfeed.xml'), 
                      (u'High Tech', u'http://www.themarker.com/tmc/content/xml/rss/sections/hightechfeed.xml'), 
                      (u'Investor Guide', u'http://www.themarker.com/tmc/content/xml/rss/sections/investorGuidefeed.xml')]
    ##def print_version(self, url):
    # baseURL=url.replace('tmc/article.jhtml?ElementId=', 'ibo/misc/printFriendly.jhtml?ElementId=%2Fibo%2Frepositories%2Fstories%2Fm1_2000%2F')
     #  print 'BASE IS :', baseURL
      # s= baseURL + '.xml'
       #return s
       #http://www.themarker.com/tmc/article.jhtml?ElementId=zz20100918_6121
       #http://www.themarker.com/ibo/misc/printFriendly.jhtml?ElementId=%2Fibo%2Frepositories%2Fstories%2Fm1_2000%2Fzz20100918_6121.xml
       
       
    def print_version(self, url):
        print 'ORG URL IS: ', url
        split1 = url.split("=")
        print 'THE SPLIT IS: ', split1 
        weblinks = url
      
        if weblinks is not None:
            for link in weblinks:
                #---------------------------------------------------------
                #here we need some help with some regexpressions
                #we are trying to find it.themarker.com in a url
                #-----------------------------------------------------------
                re1='.*?'	# Non-greedy match on filler
                re2='(it\\.themarker\\.com)'	# Fully Qualified Domain Name 1
                rg = re.compile(re1+re2,re.IGNORECASE|re.DOTALL)
                m = rg.search(url)
                
                
                if m:
                 split2 = url.split("article/")
                 print 'FOUND it: ', url
                 print_url = 'http://it.themarker.com/tmit/PrintArticle/' + split2[1]
                
                else:
                    print_url = 'http://www.themarker.com/ibo/misc/printFriendly.jhtml?ElementId=%2Fibo%2Frepositories%2Fstories%2Fm1_2000%2F' + split1[1]+'.xml'
                 
        print 'THIS URL WILL PRINT: ', print_url # this is a test string to see what the url is it will return
        return print_url

Last edited by TonytheBookworm; 09-19-2010 at 06:50 PM.
TonytheBookworm is offline  
Old 09-19-2010, 06:36 PM   #2772
noxxx
Junior Member
noxxx began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Sep 2010
Device: Kindle 3
Tagesanzeiger

I updated the Feed for Tagesanzeiger (Swiss Newspaper)

Code:
class AdvancedUserRecipe1284927619(BasicNewsRecipe):
    title = u'Tagesanzeiger'
    publisher = u'Tamedia AG'
    oldest_article = 2
    max_articles_per_feed = 100
    description = 'tagesanzeiger.ch: Nichts verpassen'
    category = 'News, Politik, Nachrichten, Schweiz, Zürich'
    language = 'de'
    conversion_options = {
                             'comments'  : description
                            ,'tags'      : category
                            ,'language'  : language
                            ,'publisher' : publisher
                         }
    
    remove_tags = [
	 dict(name='img')
                    ,dict(name='div',attrs={'class':['swissquote ad','boxNews','centerAD','contentTabs2','sbsLabel']})
                    ,dict(name='div',attrs={'id':['colRightAd','singleRight','singleSmallRight','MailInfo','metaLine','sidebarSky','contentFooter','commentInfo','commentInfo2','commentInfo3','footerBottom','clear','boxExclusiv','singleLogo','navSearch','headerLogin','headerBottomRight','horizontalNavigation','subnavigation','googleAdSense','footerAd','contentbox','articleGalleryNav']})
	,dict(name='form',attrs={'id':['articleMailForm','commentform']})
	,dict(name='div',attrs={'style':['position:absolute']})
	,dict(name='script',attrs={'type':['text/javascript']})
	,dict(name='p',attrs={'class':['schreiben','smallPrint','charCounter','caption']})
     ]
    feeds = [
	 (u'Front', u'http://www.tagesanzeiger.ch/rss.html')
	,(u'Zürich', u'http://www.tagesanzeiger.ch/zuerich/rss.html')
	,(u'Schweiz', u'http://www.tagesanzeiger.ch/schweiz/rss.html')
	,(u'Ausland', u'http://www.tagesanzeiger.ch/ausland/rss.html')
	,(u'Digital', u'http://www.tagesanzeiger.ch/digital/rss.html')
	,(u'Wissen', u'http://www.tagesanzeiger.ch/wissen/rss.html')
	,(u'Panorama', u'http://www.tagesanzeiger.ch/panorama/rss.html')
	,(u'Wirtschaft', u'http://www.tagesanzeiger.ch/wirtschaft/rss.html')
	,(u'Sport', u'http://www.tagesanzeiger.ch/sport/rss.html')
	,(u'Kultur', u'http://www.tagesanzeiger.ch/kultur/rss.html')
	,(u'Leben', u'http://www.tagesanzeiger.ch/leben/rss.html')
	,(u'Auto', u'http://www.tagesanzeiger.ch/auto/rss.html')]

    def print_version(self, url):
        return url + '/print.html'
Any suggestions are welcomed...

Cheers noxxx
noxxx is offline  
Old 09-19-2010, 10:51 PM   #2773
bhandarisaurabh
Enthusiast
bhandarisaurabh began at the beginning.
 
Posts: 49
Karma: 10
Join Date: Aug 2009
Device: none
Smile

Quote:
Originally Posted by TonytheBookworm View Post
Far from a genius, but thanks for the compliment.
Can you help me with the recipe of business standard the website has been updated
I want help in the print section of the recipe
if the article url is http://www.business-standard.com/ind...ky-way/406220/
and print url is
http://www.business-standard.com/ind...ono=406220&tp=
then how will we define the print section
bhandarisaurabh is offline  
Old 09-19-2010, 11:34 PM   #2774
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by bhandarisaurabh View Post
Can you help me with the recipe of business standard the website has been updated
I want help in the print section of the recipe
if the article url is http://www.business-standard.com/ind...ky-way/406220/
and print url is
http://www.business-standard.com/ind...ono=406220&tp=
then how will we define the print section
something like this would be my first guess: if it doesn't work I'll have to test it later.
Spoiler:

Code:
def print_version(self, url):
        print 'ORG URL IS: ', url
        split1 = url.split("/")
        print 'THE SPLIT IS: ', split1 
        id = len(split1)
        # we want to find the size of the array split 
        # because we know the id will always be in the last index
        
        print_url = ‘http://www.business-standard.com/india/printpage.php?autono=’ + split1[id]+ ‘&tp=’
      return print_url

Last edited by TonytheBookworm; 09-20-2010 at 02:24 AM. Reason: typo
TonytheBookworm is offline  
Old 09-20-2010, 08:27 AM   #2775
greenapple
Evangelist
greenapple will become famous soon enoughgreenapple will become famous soon enoughgreenapple will become famous soon enoughgreenapple will become famous soon enoughgreenapple will become famous soon enoughgreenapple will become famous soon enough
 
Posts: 404
Karma: 664
Join Date: Dec 2009
Device: Kindle Paperwhite, Kindle DX, Kobo Aura HD
Hi, I'd like to request for reddit.com feed recipe.
Doesn't seem to download when I use a basic configuration.
Thanks!
greenapple is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 03:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 01:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 06:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 05:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 03:37 PM


All times are GMT -4. The time now is 01:03 AM.


MobileRead.com is a privately owned, operated and funded community.