Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 11-27-2015, 05:51 AM   #16
didsbury
Member
didsbury began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Jan 2011
Device: sony prs-650
https://www.dropbox.com/s/cktwcc20jc...ibre1.rtf?dl=0

Can you see these three screenshots?
didsbury is offline   Reply With Quote
Old 11-27-2015, 06:52 AM   #17
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,994
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
the size difference is simply because the new recipe does not reduce image quality.
kovidgoyal is online now   Reply With Quote
Advert
Old 12-14-2015, 04:38 AM   #18
paddyrm
Connoisseur
paddyrm began at the beginning.
 
Posts: 67
Karma: 10
Join Date: Oct 2012
Device: Kindle 3
Supplemental feeds

The Guardian changed its web format drastically in November 2015. Prior to that extra section articles were stored in named folders, eg "Cook", "G2" etc and the old script would scrape all these in. A member of the Guardian's User Help team sent me a link to a missing article from the Cook section, pointing me to url www.theguardian.com/lifeandstyle/2015/nov/14/ and further investigation showed that nearly all articles from supplements are now stored in date folders.

Following Kovid's recommendation on adding feeds I added these line to the bottom of the Guardian recipe:

def parse_index(self):
feeds = self.parse_section(self.base_url)
feeds += self.parse_section('http://www.theguardian.com/politics/'+strftime('%Y/%b/%d'))
feeds += self.parse_section('http://www.theguardian.com/lifeandstyle/'+strftime('%Y/%b/%d'))
feeds += self.parse_section('http://www.theguardian.com/uk/commentisfree/'+strftime('%Y/%b/%d'))
feeds += self.parse_section('http://www.theguardian.com/travel/'+strftime('%Y/%b/%d'))
feeds += self.parse_section('http://www.theguardian.com/lifeandstyle/food-and-drink/'+strftime('%Y/%b/%d'))
feeds += self.parse_section('http://www.theguardian.com/tv-and-radio/'+strftime('%Y/%b/%d'))
feeds += self.parse_section('http://www.theguardian.com/theguardian/theguide/'+strftime('%Y/%b/%d'))
return feeds

and this works well for the Saturday Guardian which is my main interest. Other sections can be added for other days as needed.

For it to work two lines need to be added near the top of the script:

from calibre import strftime
(I have it at line 11) this brings in the PC time via calibre, to use in the feed urls above. I have used the trick of resetting my PC time to a previous Saturday to scrape an earlier issue!

ignore_duplicate_articles = {'title', 'url'}
(my line 38) needed because there may be several links to the same article in different parts of the newspaper.

Hope this may be a some use to other Guardian readers dismayed by the loss of wanted supplements! And thanks to Kovid for very helpful suggestions.

Paddy
paddyrm is offline   Reply With Quote
Old 12-14-2015, 09:56 AM   #19
didsbury
Member
didsbury began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Jan 2011
Device: sony prs-650
Thanks Paddy, but I'm afraid all that is beyond me. I don't understand!

Kieran
didsbury is offline   Reply With Quote
Old 12-14-2015, 01:39 PM   #20
paddyrm
Connoisseur
paddyrm began at the beginning.
 
Posts: 67
Karma: 10
Join Date: Oct 2012
Device: Kindle 3
Red face

Kieran, if you add a custom news source, customise built in recipe for The Guardian and Observer, Kovid has done most of the work for you (see message #12 in this thread). Scan down to the bottom of the recipe and you will see he has added the Sports section, it looks like this:

def parse_index(self):
feeds = self.parse_section(self.base_url)
feeds += self.parse_section('http://www.theguardian.com/uk/sport', 'Sport - ')
return feeds

I don't want the Sports section, so I took that out and replaced it with the sections I do want, eg Travel. But I had to add the dates, or the file is enormous (it swells from 7Mb to 72Mb!) because I assume it scrapes everything it finds. The date relates to a specific issue, eg 2015/dec/12 for last Saturday.

Does that help, or make it worse?

Paddy
paddyrm is offline   Reply With Quote
Advert
Old 12-14-2015, 01:45 PM   #21
didsbury
Member
didsbury began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Jan 2011
Device: sony prs-650
Thanks, I'll let you know after I've had a play tomorrow. It doesn't look much like COBOL which I was familiar with in the early 80s!
didsbury is offline   Reply With Quote
Old 12-15-2015, 04:55 AM   #22
paddyrm
Connoisseur
paddyrm began at the beginning.
 
Posts: 67
Karma: 10
Join Date: Oct 2012
Device: Kindle 3
My expertise was Z80 machine code I'm afraid, dates me a bit!! So Python takes a bit of getting used to, but nice to dabble with a purpose...

Paddy
paddyrm is offline   Reply With Quote
Old 12-16-2015, 04:09 AM   #23
paddyrm
Connoisseur
paddyrm began at the beginning.
 
Posts: 67
Karma: 10
Join Date: Oct 2012
Device: Kindle 3
Quote:
Originally Posted by didsbury View Post
Thanks, I'll let you know after I've had a play tomorrow
Tip: try one section at a time, eg Travel or Lifeandstyle, get that working then build up the rest of the sections you want. -- Paddy
paddyrm is offline   Reply With Quote
Old 12-16-2015, 04:13 AM   #24
didsbury
Member
didsbury began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Jan 2011
Device: sony prs-650
Thanks for the tips. Unfortunately my back has decided to trap a nerve, so I can neither sit down nor stand up comfortably.

I'm taking a break from this till it clears up.
didsbury is offline   Reply With Quote
Old 12-16-2015, 01:46 PM   #25
paddyrm
Connoisseur
paddyrm began at the beginning.
 
Posts: 67
Karma: 10
Join Date: Oct 2012
Device: Kindle 3
Quote:
Originally Posted by didsbury View Post
Thanks for the tips. Unfortunately my back has decided to trap a nerve, so I can neither sit down nor stand up comfortably.

I'm taking a break from this till it clears up.
Sorry to hear that, hope it clears before Christmas, though it would be a good excuse to drink lots! I could always email you the script to copy into a new custom recipe, which worked well with my No 2 son, another G reader.

Paddy
paddyrm is offline   Reply With Quote
Old 12-16-2015, 01:51 PM   #26
didsbury
Member
didsbury began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Jan 2011
Device: sony prs-650
The ibuprofen is starting to work and I've found that I can use the computer chair if it's tilted just so.

So I've been fiddling with some success.

Probably I'll have some further questions, but thanks again to you both.

Kieran
didsbury is offline   Reply With Quote
Old 12-16-2015, 01:54 PM   #27
didsbury
Member
didsbury began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Jan 2011
Device: sony prs-650
Quote:
Originally Posted by paddyrm View Post
Sorry to hear that, hope it clears before Christmas, though it would be a good excuse to drink lots! I could always email you the script to copy into a new custom recipe, which worked well with my No 2 son, another G reader.

Paddy
Our posts cross!

Yes I would like to peruse your code.
didsbury is offline   Reply With Quote
Old 12-17-2015, 05:36 AM   #28
paddyrm
Connoisseur
paddyrm began at the beginning.
 
Posts: 67
Karma: 10
Join Date: Oct 2012
Device: Kindle 3
Guardian recipe with dates

Quote:
Originally Posted by didsbury View Post
Our posts cross!

Yes I would like to peruse your code.
Kieran, attached to this message. It's a text file (Notepad), just copy and paste it into a new custom news source: it will name itself automatically when you save it.

I'm also on Ibuprofen, strained my shoulder planing a rain-swollen hardwood door yesterday. Couple of old crocks!

Paddy
Attached Files
File Type: txt Dated Guardian.txt (3.9 KB, 245 views)
paddyrm is offline   Reply With Quote
Old 12-18-2015, 06:09 PM   #29
Worzel
Junior Member
Worzel began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Feb 2015
Device: Sony prs-t3
Hi , I've been having problems with the Guardian too for the last few weeks. Even though I've next to zero knowledge of programming I tried the previous copy and paste fix [thanks Paddy]

It worked but it still didn't have the section I was looking for, that's been missing from the download for 3 weeks now...
culture>books

If anybody could work up the lines of code that I could paste into the previous custom recipe I'd be very grateful.
Worzel is offline   Reply With Quote
Old 12-19-2015, 04:57 AM   #30
didsbury
Member
didsbury began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Jan 2011
Device: sony prs-650
I haven't tried to download this section but if you go onto the web page for books, it is:

http://www.theguardian.com/books

rather than:

http://www.theguardian.com/culture/books/

Similarly I wanted the Tech section but I see that is in fact now called Technology.

Edit: I have taken Paddy's file from the post above and added 2 book sections. It is very rough and ready code but produced lots of book section stuff.
Attached Files
File Type: txt Guardian incl books.txt (4.2 KB, 111 views)
didsbury is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
The Guardian / Observer (UK) ribena Recipes 3 11-19-2014 10:38 AM
The Guardian, modified ajnorman Recipes 20 01-10-2014 11:02 AM
Guardian scheduled download failing nickd Recipes 2 04-10-2011 04:35 AM
The Guardian 24 automatic download rio iRex 39 12-01-2009 05:36 AM
The Guardian Reviews the DX poohbear_nc News 3 07-06-2009 09:33 AM


All times are GMT -4. The time now is 04:25 AM.


MobileRead.com is a privately owned, operated and funded community.