09-28-2010, 03:57 PM | #1 |
Zealot
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
|
maya recipe
this one is hard, and it is Hebrew only.
i want to do this step by step, so i understand what i am doing. i want to create a recipe for this page and similar pages. if you go in to the page, you will see a list of articles (right hand side of the page). the actual link to the article is the 2nd link in each pair. i have recognized that that all the relevant links (and only them) have an id="SubjectHref*" (the * represents some numbers). the URLs i want to get in stage one is 'http://maya.tase.co.il/' + [the href from tag <a> with id SubjectHref*]. i have then need to do the same in the next page (see the bottom of the page) this is the code i have so far and i am a little lost now. its built on the NZ herald recipe. can some one tell me if this is the right way? Spoiler:
|
09-28-2010, 04:23 PM | #2 |
Addict
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
On first look at that thing, why not do something in the area of this:
You said you wanted the second link for example: http://maya.tase.co.il/bursa/report....port_cd=570152 it always has report_cd in it so why not just follow it with a regex match ? Spoiler:
or maybe use something like this: Spoiler:
Last edited by TonytheBookworm; 09-28-2010 at 05:25 PM. Reason: edited code |
Advert | |
|
09-30-2010, 03:56 PM | #3 |
Zealot
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
|
Hey Tony!
the reason i didn't use regex to follow the link is because i haven't wrapped my head around it yet. i don't fully understand the concept. i tryed running your code.
when i used the 1st one i got raw HTML from the feed page. when i used the 2nd code i got "NameError: global name 're' is not defined" ill have to read it a bit more (after a good nights sleep.) i am going to work on it some more.... |
09-30-2010, 04:09 PM | #4 | |
Addict
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
Quote:
the second set of code works. I tested it. It is up to you to clean it up and get what you want and get rid of what you don't. but as far as getting the link you wanted here is what i did. Spoiler:
Last edited by TonytheBookworm; 09-30-2010 at 04:14 PM. Reason: added code |
|
10-01-2010, 03:12 AM | #5 |
Zealot
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
|
i went over it again
and you are right, i works.
so i wanted to take it to the next step. on the urls that you found, there is the clean version of the reports i am trying to get. it is the "src" attr from the iframe tag (in some cases, i want to do this step by step). so i added a sub function. i gave it all the information in needs to do what you did. Spoiler:
when i run it, i get "NameError: global name 'make_links1' is not defined" it looks right to me, i have no idea what i did wrong. |
Advert | |
|
10-01-2010, 03:14 PM | #6 |
Addict
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
make_links is a built in function so of course make_links1 is not valid
In other words put the new stuff in with the old stuff because there is no point in reinventing the wheel. I don't have the time to debug your code but basically do a for loop to find some stuff and then do whatever you need to with it. then do another for loop to find other stuff. and append it to the article list like i showed you. You can even take and rename title to temp1 after the first for loop if you like. Last edited by TonytheBookworm; 10-01-2010 at 03:24 PM. |
10-02-2010, 01:27 PM | #7 |
Zealot
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
|
i see.
i changed it to fit. mu 2nd call of soup it not opening the url (temp2) and souping it(the html file that the url leeds to). it is just souping the url it self.what am i doing wrong?
Spoiler:
|
10-07-2010, 04:19 PM | #8 |
Zealot
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
|
is it possible to soup twice?
can i soup the url that i found in the 1st soup
|
10-07-2010, 09:42 PM | #9 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
|
10-08-2010, 02:08 AM | #10 |
Zealot
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
|
i found my bug
but i need some help fixing it.
i marked it HERE in the code. i want to format the url as u"www....com" i am giving it a simple string. i tried ' and " and []. still cant get the syntax right. can i get some help with that? Spoiler:
|
10-08-2010, 09:13 AM | #11 | ||||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
Quote:
And: I don't see any iframes in soup1? Quote:
|
||||
10-09-2010, 05:12 PM | #12 |
Zealot
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
|
sometimes al you need is someone to hit you over the head with the answer.
so thanks starson17. it now downloads my articles (there is still a lot of work, but i get news ant the end, and not an error).
i didnt think of this when i started, but and calibre deal with pdf files? some of the reports come in pdf form. i get gibrish where the pdf use to be. can i do anything about it? does it matter if my output format is pdf? this is the code: Spoiler:
the 2nd article is a pdf file. (i am working with a feed that is very rarely updated, so i know the page format very well.) can i import a library that deals with pdf? thanks for the help. ps i also wanted to know if you can add an output file type to the recipe it self that will override the default for calibre (if the default is pdf, but i want one self built recipe to come out as epub?) |
10-09-2010, 09:34 PM | #13 | ||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
|
||
10-09-2010, 10:01 PM | #14 | |
Addict
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
Quote:
Spoiler:
I'm just not sure what the actual variable name is. Maybe it is 'output_format': epub or something like that. Kovid can you chime in on this one please ? |
|
10-11-2010, 03:19 PM | #15 |
creator of calibre
Posts: 44,428
Karma: 24044628
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
you cannot override the output format from within a recipe.
Trying to extract text from PDFs is not going to be easy. Just try converting your PDF in calibre to see what will happen. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
New recipe | kiklop74 | Recipes | 0 | 10-05-2010 04:41 PM |
New recipe | kiklop74 | Recipes | 0 | 10-01-2010 02:42 PM |
New Title from Book View Cafe: A Princess of Passyunk by Maya Kaathryn Bohnhoff | suelange | Self-Promotions by Authors and Publishers | 0 | 08-11-2010 04:35 PM |
Recipe Help | lrain5 | Calibre | 3 | 05-09-2010 10:42 PM |
Recipe Help Please | estral | Calibre | 1 | 06-11-2009 02:35 PM |