Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 08-10-2011, 04:47 PM   #16
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by yoss15 View Post
I did, but I have no clue what that section means, like I said I don't have any experience with this stuff.
If you want to learn, I'll answer your questions. You should use FireBug and FireFox to inspect the source page.
Starson17 is offline   Reply With Quote
Old 08-10-2011, 05:11 PM   #17
yoss15
Enthusiast
yoss15 began at the beginning.
 
Posts: 37
Karma: 10
Join Date: Jul 2011
Device: Kindle
Quote:
Originally Posted by Starson17 View Post
If you want to learn, I'll answer your questions. You should use FireBug and FireFox to inspect the source page.

I would really appreciate it. For starters what does the for section in soup.findAll line do?
yoss15 is offline   Reply With Quote
Advert
Old 08-10-2011, 05:22 PM   #18
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by yoss15 View Post
I would really appreciate it. For starters what does the for section in soup.findAll line do?
The job of parse_index is to look at a page and find links on that page to articles. The for section in soup.findAll is "finding all" tags that have a link in them to an article. More specifically, it's the beginning of that process. Do you know what a <div> tag is? The way that line works is it finds all tagged parts of the page that are tagged <div class="content">

I'll be nice and look at your page - hold on ....

There aren't any div tags like that.

You should probably be doing something like this:
Code:
for section in soup.findAll('li'):
Then something like:
Code:
for post in section.findAll('a', href=True):
That will find the <li> tags that have <a> tags inside with hrefs.
Starson17 is offline   Reply With Quote
Old 08-10-2011, 05:34 PM   #19
yoss15
Enthusiast
yoss15 began at the beginning.
 
Posts: 37
Karma: 10
Join Date: Jul 2011
Device: Kindle
I have an idea of what the <div> tag is I just never understand any of the recipe code referring to it.

Thanks, I appreciate you looking at it. That makes a lot of sense, also that Firebug extension is a great help.

Here is my next problem. I really don't understand the whole indent thing in python. It always seems to give me errors. For example when I add

Code:
for post in section.findAll('a', href=True):
should it be indented under the other line? How do I properly indent it? Hitting tab seems to send it way to far but even with 1-5 spaces it still gives errors, I just don't understand it.
yoss15 is offline   Reply With Quote
Old 08-10-2011, 05:59 PM   #20
PeterT
Grand Sorcerer
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
PeterT's Avatar
 
Posts: 12,752
Karma: 75000002
Join Date: Nov 2007
Location: Toronto
Device: Libra H2O, Libra Colour
Python uses indentation to "nest" code. Be consistent and use spaces rather than tabs.

Not being an expert I think the idea is you want something close to this:

Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Tag, NavigableString

class WSWS(BasicNewsRecipe):

    title      = 'World Socialist Web Site'
    __author__ = 'International Committee of The Fourth International'
    description = 'WSWS'

    no_stylesheets = True
    remove_javascript     = True

    def parse_index(self):
        articles = []
        soup = self.index_to_soup('http://wsws.org/mobile/')
        cover = None
        feeds = []
        for section in soup.findAll('li'):
            section_title = self.tag_to_string(section.find('b'))
            articles = []
            for post in section.findAll('a', href=True):
                url = post['href']
                if url.startswith('/'):
                  url = 'http://www.wsws.org'+url
                  title = self.tag_to_string(post)
                  if str(post).find('class=') > 0:
                    klass = post['class']
                    if klass != "":
                      self.log()
                      self.log('--> post:  ', post)
                      self.log('--> url:   ', url)
                      self.log('--> title: ', title)
                      self.log('--> class: ', klass)
                      articles.append({'title':title, 'url':url})
            if articles:
                feeds.append((section_title, articles))
        return feeds
so the idea is you loop through all sections that are identified by "li" entries and then for each entry found use the loop

Code:
            for post in section.findAll('a', href=True):
                url = post['href']
                if url.startswith('/'):
                  url = 'http://www.wsws.org'+url
                  title = self.tag_to_string(post)
                  if str(post).find('class=') > 0:
                    klass = post['class']
                    if klass != "":
                      self.log()
                      self.log('--> post:  ', post)
                      self.log('--> url:   ', url)
                      self.log('--> title: ', title)
                      self.log('--> class: ', klass)
                      articles.append({'title':title, 'url':url})
to append each article to the list of articles
PeterT is offline   Reply With Quote
Advert
Old 08-10-2011, 06:20 PM   #21
yoss15
Enthusiast
yoss15 began at the beginning.
 
Posts: 37
Karma: 10
Join Date: Jul 2011
Device: Kindle
Thanks for the suggestion Peter, after trying that out it gets stuck at the klass = post['class'] I'm not sure that I need those lines because they are for getting rid of extra links, but my links seem pretty straight forward. I also think that klass has something to do with the other specific page but I'm not sure.

Ugh I'm still having a hard time with indentation errors and I'm not sure what to do.

Calibre will tell me that I have an error on line 29 and so I will look at it in Komodo Edit to match up the line numbers and have gone as far as deleting line 29 but still get the error, I don't know what the problem could me.
yoss15 is offline   Reply With Quote
Old 08-11-2011, 09:44 AM   #22
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by yoss15 View Post
How do I properly indent it? Hitting tab seems to send it way to far but even with 1-5 spaces it still gives errors, I just don't understand it.
Don't use any tabs - all spaces. For any part of the code, you indent each line the same. PeterT's indents look correct to me. I prefer running my recipe like this:
Code:
ebook-convert _Test_1.recipe _Test_1 --test   -vv > _Test.txt
That runs my _Test1.recipe file and puts the html produced by my recipe into the _Test folder and all of my print statements and verbose comments into the_Test.txt file. I use underscores so those files are all at the top of my directory list and I keep the recipe and text output (_Test.txt) files open in my editor at all times. I also have a batch file that runs the line above. I run the file, read the output file to see any errors or print statement output, then revise and run again.
Starson17 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Text file formatting - line feeds and spaces Fallingwater Workshop 6 07-04-2011 03:42 PM
Newbie question- PDF conversion without losing file formatting simong6 Amazon Kindle 4 05-03-2011 05:26 PM
PDB file (eReader) - How to keep the formatting? Juliepac Other formats 0 11-26-2010 08:38 AM
PDB file - how to keep the formatting? Juliepac Apple Devices 0 11-25-2010 07:41 PM
text file formatting hobbyman Calibre 5 10-05-2008 06:18 PM


All times are GMT -4. The time now is 09:04 PM.


MobileRead.com is a privately owned, operated and funded community.