Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 07-28-2014, 09:52 AM   #16
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by phossler View Post
@eschwartz--



1. Can you explain how the negative look ahead works, including breaking down the pieces of the RE?

2. Many times when I'm cleaning an epub, removing unneeded 'class=" ... " ' in <span class="..."> I'll eventually end up with a lot of <span>.....</span> constructs. It appears that your RE is better than the more simplistic RE I was using to just remove them

Thanks
This website has a very good regex explanation, which is how I learned about it.
http://www.regular-expressions.info/lookaround.html

They provide a thorough explanation, and break down the examples.

For my example:

Code:
<span class="none2">((?:(?!<span).)*?)</span>
Search for
Code:
<span class="none2">inner text</span>
"inner text" itself is a little more complicated, though:
Code:
((?:(?!<span).)*?)
We capture everything as "\1" --inside is
Code:
(?:(?!<span).)*?
The main search (finally ) is:
Code:
(?:(?!<span).)
a non-capturing group, which is repeated zero or more times -- yes, we can repeat whole groups.
(plus a confusing "?" which is redundant (the start already makes it optional) and I seem to have copied it randomly from the original source .)

This group contains the negative lookahead (a zero-length assertion)
Code:
(?!<span)
which searches for the non-existence of "<span", followed by a dot-matches-all.

So, putting it all back together, the dot-match-all must be preceded by the negative lookahead, and this "any character other than part of a span tag" is then grouped and repeated zero or more times, then captured as "\1" to produce the "inner text" which should be saved from in between the span.
eschwartz is offline   Reply With Quote
Old 07-28-2014, 10:26 AM   #17
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 74,777
Karma: 131375596
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Is there a way to find a proper span when the code looks like
Quote:
<p><span class="italics"><span class="smallcaps">This is italics smallcaps</span></span> <span class="bold>This is bold</span></p>
Can all three of the span pairs be found correctly using regex?
JSWolf is offline   Reply With Quote
Advert
Old 07-28-2014, 12:18 PM   #18
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by JSWolf View Post
Is there a way to find a proper span when the code looks like

Can all three of the span pairs be found correctly using regex?
Yes -- by specifying all of them. You'd have to know ahead of time how many there are.
eschwartz is offline   Reply With Quote
Old 07-28-2014, 03:24 PM   #19
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 74,777
Karma: 131375596
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by eschwartz View Post
Yes -- by specifying all of them. You'd have to know ahead of time how many there are.
But that's the problem. You don't know ahead of time. You don't know how many or which ones. The idea is for regex to get it right which it doesn't. This is why we need an update to regex to allow us to have more control so we can do things like this. Regex should be able to say I want the next instance of </span> after the <span>. But we cannot do it. So regex should be updated to handle such things.
JSWolf is offline   Reply With Quote
Old 07-28-2014, 03:33 PM   #20
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Quote:
Originally Posted by JSWolf View Post
... Regex should be able to say I want the next instance of </span> after the <span>. But we cannot do it. So regex should be updated to handle such things.
surely that is almost trivially easy to write regex for- just write a lazy find expression.

I'd do it for you but I'm feeling a lttle lazy myself , right now wasn't one presented a few posts back ?
cybmole is offline   Reply With Quote
Advert
Old 07-28-2014, 04:11 PM   #21
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by JSWolf View Post
This is why we need an update to regex to allow us to have more control so we can do things like this. Regex should be able to say I want the next instance of </span> after the <span>. But we cannot do it. So regex should be updated to handle such things.
Just because Regex can't do something it wasn't designed to do, doesn't mean Regex must be updated!

You would be using the wrong tool for the job, and what you want is a parser!

Here is some discussion on why Regex isn't recommended for parsing HTML:

https://stackoverflow.com/questions/...e-html-why-not

There is a reason why they are separate beasts.
Tex2002ans is offline   Reply With Quote
Old 07-28-2014, 05:05 PM   #22
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by Tex2002ans View Post
Just because Regex can't do something it wasn't designed to do, doesn't mean Regex must be updated!

You would be using the wrong tool for the job, and what you want is a parser!

Here is some discussion on why Regex isn't recommended for parsing HTML:

https://stackoverflow.com/questions/...e-html-why-not

There is a reason why they are separate beasts.
Yep, basically. Although there are some things you can do with lookaround, like my example. Given a specific known quantity (what regex is for) it will clobber all those nested spans on repeating runs.

Also fun link: http://stackoverflow.com/questions/6...lanation-in-la

The second answer does a good job explaining why regex is a bad tool for html parsing (and when it is a good tool!).
eschwartz is offline   Reply With Quote
Old 07-28-2014, 07:40 PM   #23
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,081
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
Quote:
Originally Posted by PeterT View Post
You might also check out the forked version of ePub Clean plugin for calibre that has some support for removing SPANs
I can find a Modify ePub PI, but no ePubClean PI

Modify ePub does not seem to have any options to clean unneeded tags etc.
phossler is offline   Reply With Quote
Old 07-28-2014, 07:50 PM   #24
PeterT
Grand Sorcerer
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
PeterT's Avatar
 
Posts: 12,326
Karma: 74007256
Join Date: Nov 2007
Location: Toronto
Device: Nexus 7, Clara, Touch, Tolino EPOS
You need to look for the version posted by Reverend Bob.
PeterT is offline   Reply With Quote
Old 07-28-2014, 09:56 PM   #25
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,081
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
Quote:
Originally Posted by PeterT View Post
You need to look for the version posted by Reverend Bob.
Sorry, but I still don't see anything close to that title or by Rev Bob.

Can you point me in the right direction please?
Attached Thumbnails
Click image for larger version

Name:	Capture.JPG
Views:	141
Size:	137.7 KB
ID:	126086  
phossler is offline   Reply With Quote
Old 07-28-2014, 10:00 PM   #26
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by phossler View Post
Sorry, but I still don't see anything close to that title or by Rev Bob.

Can you point me in the right direction please?
It's not indexed -- it is in the Modify EPUB thread, and will eventually get merged back into the main PI, once Rev. Bob finishes testing it. See here for the latest beta: https://www.mobileread.com/forums/sho...78#post2880178
eschwartz is offline   Reply With Quote
Old 07-28-2014, 10:09 PM   #27
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,081
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
Quote:
Originally Posted by eschwartz View Post
It's not indexed -- it is in the Modify EPUB thread, and will eventually get merged back into the main PI, once Rev. Bob finishes testing it. See here for the latest beta: https://www.mobileread.com/forums/sho...78#post2880178
Ahh - thanks

I only had a chance to browse the last 5 or 6 of the 53 pages, but it looks like a very useful tool

Is there a brief description of the options and features that will be in the final version?
phossler is offline   Reply With Quote
Old 07-28-2014, 10:14 PM   #28
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Buried, "somewhere".

It should add methods for
  • removing certain spans created by Kobo for KEPUB that do nothing besides provide bookmarking abilities,
  • removing <span>content</span> paris, which obviously do nothing,
and a few other things I forget.
eschwartz is offline   Reply With Quote
Old 07-28-2014, 10:23 PM   #29
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,725
Karma: 27405072
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by eschwartz View Post
Buried, "somewhere".
Hopefully it will be exhumed and incorporated into .../calibre/plugins/Modify ePub Help.htm

BR
BetterRed is offline   Reply With Quote
Old 07-28-2014, 10:27 PM   #30
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by BetterRed View Post
Hopefully it will be exhumed and incorporated into .../calibre/plugins/Modify ePub Help.htm

BR
I have no idea, I don't have either one installed.
eschwartz is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Tags & Series RealRedhair Library Management 22 07-22-2014 08:28 AM
Calibre Tags & Aldiko Tags Not the Same Themus Calibre 3 03-21-2012 08:23 PM
Amazon Tags - Popular tags vs Unique tags. chrisanthropic Writers' Corner 6 09-19-2011 11:18 PM
FBReader tags on DR & PC sasilk iRex 0 01-23-2010 01:38 AM


All times are GMT -4. The time now is 07:45 PM.


MobileRead.com is a privately owned, operated and funded community.