![]() |
#16 | |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Quote:
![]() http://www.regular-expressions.info/lookaround.html They provide a thorough explanation, and break down the examples. For my example: Code:
<span class="none2">((?:(?!<span).)*?)</span> Code:
<span class="none2">inner text</span>
Code:
((?:(?!<span).)*?) Code:
(?:(?!<span).)*?
![]() Code:
(?:(?!<span).)
(plus a confusing "?" which is redundant (the start already makes it optional) and I seem to have copied it randomly from the original source ![]() This group contains the negative lookahead (a zero-length assertion) Code:
(?!<span)
So, putting it all back together, the dot-match-all must be preceded by the negative lookahead, and this "any character other than part of a span tag" is then grouped and repeated zero or more times, then captured as "\1" to produce the "inner text" which should be saved from in between the span. |
|
![]() |
![]() |
![]() |
#17 | |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 74,777
Karma: 131375596
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Is there a way to find a proper span when the code looks like
Quote:
|
|
![]() |
![]() |
Advert | |
|
![]() |
#18 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
|
![]() |
![]() |
![]() |
#19 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 74,777
Karma: 131375596
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
But that's the problem. You don't know ahead of time. You don't know how many or which ones. The idea is for regex to get it right which it doesn't. This is why we need an update to regex to allow us to have more control so we can do things like this. Regex should be able to say I want the next instance of </span> after the <span>. But we cannot do it. So regex should be updated to handle such things.
|
![]() |
![]() |
![]() |
#20 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
Quote:
I'd do it for you but I'm feeling a lttle lazy myself , right now ![]() |
|
![]() |
![]() |
Advert | |
|
![]() |
#21 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
You would be using the wrong tool for the job, and what you want is a parser! Here is some discussion on why Regex isn't recommended for parsing HTML: https://stackoverflow.com/questions/...e-html-why-not There is a reason why they are separate beasts. ![]() |
|
![]() |
![]() |
![]() |
#22 | |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Quote:
Also fun link: http://stackoverflow.com/questions/6...lanation-in-la The second answer does a good job explaining why regex is a bad tool for html parsing (and when it is a good tool!). |
|
![]() |
![]() |
![]() |
#23 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,081
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
|
![]() |
![]() |
![]() |
#24 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 12,326
Karma: 74007256
Join Date: Nov 2007
Location: Toronto
Device: Nexus 7, Clara, Touch, Tolino EPOS
|
You need to look for the version posted by Reverend Bob.
|
![]() |
![]() |
![]() |
#25 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,081
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
Sorry, but I still don't see anything close to that title or by Rev Bob.
Can you point me in the right direction please? |
![]() |
![]() |
![]() |
#26 | |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Quote:
|
|
![]() |
![]() |
![]() |
#27 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,081
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
Quote:
I only had a chance to browse the last 5 or 6 of the 53 pages, but it looks like a very useful tool Is there a brief description of the options and features that will be in the final version? |
|
![]() |
![]() |
![]() |
#28 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Buried, "somewhere".
![]() It should add methods for
|
![]() |
![]() |
![]() |
#29 |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 20,725
Karma: 27405072
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
|
![]() |
![]() |
![]() |
#30 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Tags & Series | RealRedhair | Library Management | 22 | 07-22-2014 08:28 AM |
Calibre Tags & Aldiko Tags Not the Same | Themus | Calibre | 3 | 03-21-2012 08:23 PM |
Amazon Tags - Popular tags vs Unique tags. | chrisanthropic | Writers' Corner | 6 | 09-19-2011 11:18 PM |
FBReader tags on DR & PC | sasilk | iRex | 0 | 01-23-2010 01:38 AM |