09-23-2010, 11:57 AM | #31 |
Grand Sorcerer
Posts: 11,950
Karma: 7225107
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
And I, as the marker, would have to give you credit. Wouldn't be the first time I was caught out for writing bad questions. The good part is that it is almost always the good students who figure out the ambiguity.
|
09-23-2010, 12:14 PM | #32 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
So you are that proverbial literally minded professor
http://www.snopes.com/college/exam/choice.asp |
Advert | |
|
09-23-2010, 01:11 PM | #33 | |
Grand Sorcerer
Posts: 11,950
Karma: 7225107
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Quote:
Well, it depended on the situation, the question, and the student, but yes, I did sometimes give credit in situations like this. My upper-division classes were reasonably small (10 to 20 people), so I could know the students well enough to tell if someone was jerking my chain or really had no clue. I had a situation once where I demonstrated that 3 students plagiarized the final project of a fourth. I wrote individual final exams, where for the three students the first question (10% of the exam) was 'explain in detail why your code is identical to XXX's'. Two answered 'because I copied it', and I gave the the exam points. They failed the project, though. The third failed both. Several students told me that I had a reputation for very hard grading, but fair. |
|
09-23-2010, 01:36 PM | #34 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
To get this thread back on track - you will find flags, such as DOTALL, used in many of the recipes. Customizing recipes inhabits the same sort of advanced user, middle ground, as the advanced conversion options do.
|
09-23-2010, 03:28 PM | #35 |
Junior Member
Posts: 2
Karma: 10
Join Date: Sep 2010
Device: nook
|
REGEX help filename parseing
I am fairly new to REGEX but I think I have a handle on it so far in Calibre anyway! Forgive me for skipping ahead I skimmed forum and didnt see my question so if I missed it elsewhere just point me to it thanks I have some books named like so Lauthor, Fauthor - series ##- title.ext Grant, Maxwell - The Shadow 331 - Mark Of The Shadow(b).txt My best so far is (?P<author>.+) - (?P<series>.+) - (?P<title>[^_]+) but now the series index is part of series what is series index Var. name? and how do I change title section to drop crap at end like (b) I can rename as needed in most cases can anyone help me author, series, index, and title out of this Grant, Maxwell - [The Shadow 331] - Mark Of The Shadow(b).txt Thanks CalibreUser |
Advert | |
|
09-23-2010, 03:31 PM | #36 |
Enthusiast
Posts: 38
Karma: 134
Join Date: Feb 2010
Location: ENGLAND
Device: kindle dx
|
just go to edit meta data manually and can change all of these i would suggest signing up on the ISBN website it free and easy
|
09-23-2010, 03:43 PM | #37 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Code:
^((?P<author>([^\-_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?((?P<series>[^0-9\-]+)(\s*-\s*)?(?P<series_index>[0-9.]+)\s*-\s*)?(?P<title>[^\-_0-9\(]+) |
|
09-23-2010, 05:01 PM | #38 |
Wizard
Posts: 3,454
Karma: 10484861
Join Date: May 2006
Device: PocketBook 360, before it was Sony Reader, cassiopeia A-20
|
Great post.
I have a few suggestions. At the very beginning of the first post you might put something like: This is fourth version of the guide and it was amended using various suggestions in subsequent posts. I do not suggest this to get some credit for a suggestion or two, but without this explanation a first time reader might find some of the following posts ... superfluous. I also suggest that the text of this post should be included in official Calibre documentation, or at the very least, Calibre documentation should point to this post. Another good place for preserving this thread would be our Wiki. Regular expressions are very widespread and yet, a GOOD documentation, explaining Regular Expressions from a point of view of beginner are relatively hard to find. The documentation for programming language or a text editor is usually written from the point of view of Reference manual describing all the options in a rather terse, concentrated manner. As you see for yourself, writing even relatively simple description of a few selected features is quite lengthy. My favourite tool For using Regular Expressions is Vim text editor. It has also one of the very best documentations I have seen. Unfortunately, it has a little different syntax than Python REs, but the principle remains the same. ---------------- Now, let's see how we can improve the introduction. First of all, now that you have introduces the Pipe '|' for providing different branches, you have to explain the rules of precedence a little bit ;-) A pipe - '|' has the lowest precedence. So if you write RE 'abcd|efgh' it will match the whole 'abcd' string OR 'efgh' and not 'abc' followed by either 'd' or 'e' and then followed by 'fgh'. If we wanted to do that, we would have to write 'abc(d|e)fgh'. I know, it should be obvious from your example, but there are a few interesting twists here. Now, I can hear you asking: So now, instead of '[1234]' I can write '(1|2|3|4)'. Well, yes, you can. BUT! '[1234]+' will match strings like '1212' or '444' or '34' - literally any of members of the members of the group [1234] followed by any other member of the group. '(1|2|3|4)+', on the other hand, will match '111', or '22' or '44444', but not '12', or '34'. Because the Regular Expression parser when matching '34' will select '3' out of '(1|2|3|4)' and the plus quantifier will want to match the selected '3' again and will fail. Let's get back to the precedence rules. Quantifiers apply only to the preceding atom. An atom (and that should have been explained at the very beginning, but we did not want to scare the reader away ) is: - a letter, such as 'a', 'q', '2' or ';' that simply matches itself. - dot '.' that stands for any character - special escape sequence, such as '\t' - a tabulator, or '\D' - non digit character - a group, such as [a-zA-Z] or [^>] - if you have several atoms, you want to make into one atom, you can enclose them to a pair of parenthesis, such as (<[^>]+>) So. If I write RE 'ab+', it will match 'ab', or 'abbbbbb', but not 'abab', because the plus quantifier only applies to the preceding atom. If we wanted to match 'abab' or 'ababab' we would need to write Regular expression like this: '(ab)+' I will continue later. At this moment I go to sleep, but there are a few things that need to be explained, such as: - referencing parenthesis using \1, \3 notation - anchors - interesting extensions (? ... ) - more quantifiers {m,n} (not that I consider them particular useful in Regular Expression typically used in Calibre.) We should also develop a few very typical examples, useful for ordinary user, such as processing filename that *might* contain series information (here we will use the pipe '|' to process several branches, with and without series info) So, please, if you want to solve your typical problem, post it here, so we could develop some examples using real-life situations. Disclaimer: Please feel free to use any portion of my text for improvement of the "introduction" |
09-23-2010, 05:58 PM | #39 |
creator of calibre
Posts: 44,384
Karma: 23766374
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I do intend to add this to the User Manual (with Manichean's permission) when he is done updating it.
|
09-23-2010, 06:48 PM | #40 | ||
Wizard
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
Quote:
I don't know about the superfluous following posts, though. I see the first post to be kind of stand-alone and the thread to be a discussion of what can and should be improved. I hoped to have made that point through the introductory and final comments, do you think I should clarify? Quote:
With pleasure. Though I can't tell you when that will be, because everyone seems to constantly come up with new and valid points. |
||
09-23-2010, 06:56 PM | #41 | ||
Wizard
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
Quote:
Quote:
|
||
09-23-2010, 07:02 PM | #42 |
Well trained by Cats
Posts: 30,397
Karma: 58055234
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
My 2 cents
Put a CURRENT Revision Date at the Beginning (or in the title) Follow that by a Note, that this is a Living Document an the First Post will be revised based upon input received later on in the thread (no need to drill down to see if there are changes). |
09-23-2010, 07:10 PM | #43 | |
Wizard
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
Quote:
|
|
09-23-2010, 07:48 PM | #44 |
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
I think moving the flags discussion to the front brings up a rather advanced topic a little early. Ignorecase is handy, but it can be worked around easily enough, and re.DOTALL is only useful in specific cases. I'd put them in the end as an addendum.
Also, you repeat this example twice: Code:
Hello, World!(?is) Code:
(?is)Hello, World! This was the first example: Code:
"Maybe, but the cops feel like you do, Anita. What's one more dead vampire? New laws don't change that." </p><p class="calibre4"> <b class="calibre2">Generated by ABC Amber LIT Conv<a href="http://www.processtext.com/abclit.html" class="calibre3">erter, http://www.processtext.com/abclit.html</a></b></p><p class="calibre4"> It had only been two years since Addison v. Clark. The court case gave us a revised version of what life was Code:
<p*?>\s*.*?Generated\s+by\s+ABC?\s+Amber.*?</p> Code:
<p class="calibre4">I looked directly at him for a moment. His eyes were still brown. He caught me looking, and I looked down at my desk.</p> <p class="calibre4">Willie laughed, a wheezing <b class="calibre2">Generated by ABC Amber LIT Conv<a href="http://www.processtext.com/abclit.html" class="calibre3">erter, http://www.processtext.com/abclit.html</a></b>snicker of a sound. The laugh hadn't changed. "Geez, I love it. You're afraid of me."</p> <p class="calibre4">"Not afraid, just cautious."</p> Code:
<b*?>.*?Generated\s+by\s+ABC?\s+Amber.*?</b> |
09-23-2010, 09:24 PM | #45 |
US Navy, Retired
Posts: 9,865
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
|
I don't know if it makes a difference in the search but as you change the language from html to C++ to java to python the editor changes its behavior.
|
Tags |
regexp calibre tutorial |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Problem with regular expressions | Manichean | Conversion | 10 | 02-03-2011 02:27 PM |
Custom Regular Expressions for adding book information | bigbot3 | Calibre | 1 | 12-25-2010 06:28 PM |
Help with Regular Expressions | ghostyjack | Workshop | 2 | 01-08-2010 11:04 AM |
Regular Expressions help needed | Phil_C | Workshop | 20 | 10-03-2009 12:14 AM |
BookDesigner v5 and regular expressions | ShineOn | Sony Reader | 11 | 08-25-2008 04:06 PM |