Regex examples - Page 2

Jellby · 02-21-2012, 07:26 AM

So you want something like this?

Code:

<span class="italics">[^<]*\s.*</span>

It might not be the right regex dialect, but [^<] is intended to mean "any character not <". That won't match instances where there is something nested in the before the first space, but those should be rare, and can be looked for afterwards.

DiapDealer · 02-21-2012, 08:07 AM

Quote:

Originally Posted by Jellby

So you want something like this?

Code:

<span class="italics">[^<]*\s.*</span>

It might not be the right regex dialect, but [^<] is intended to mean "any character not <". That won't match instances where there is something nested in the before the first space, but those should be rare, and can be looked for afterwards.

That seems to be the ticket. Thanks!

I had a mishap where an ill-thought-out global replace (because of nested spans and greedy expressions) left me with a boatload of long, incorrectly italicized passages. And it got saved before I caught it. I could've backed up a few revisions and started over, but I didn't want to ( maybe not always straight, but ever forward

).

Anyway, since I know the one word occurrences aren't mistakes, I can safely skip those. And in this particular document... that little regex expression knocks the number of occurrences I have to manually proof against the original text from 700+ down to around 150.

Thanks again!

Timur · 02-21-2012, 08:08 AM

@DiapDealer: Does this narrow down your set enough? This one should match anything with at least one non-word(unicode) character in italics, including contractions but excluding empty spans(which should be easy enough to remove before- or afterwards.)

Code:

(*UCP)(?U)<span class="italic">[^<]*\W[^<]*</span>

If you do not want to miss absolutely anything(like nested spans) use .* instead of [^<]*. But you will probably match some unwanted multi-span matches.

DiapDealer · 02-21-2012, 08:25 AM

@Timur: a non-word character in the target isn't required, but any multi-word instances that happen to contain non-word characters needs to be included by the search, too. (also... that expression crashes my Sigil 5.1

)

I've yet to find an instance where Jellby's expression skips something that I wanted included. And I've given it a hell of a workout so far.

This is the non-beta version of what is now working outstandingly well for me.

Code:

(?U)<span class="italics">[^<]*\s.*</span>

Timur · 02-21-2012, 08:53 AM

Strange that my pattern causes a crash, I use 0.5.1 here too and it works. Anyway, I am glad that you have found a regexp working for you.

DiapDealer · 02-21-2012, 09:25 AM

Quote:

Originally Posted by Timur

Strange that my pattern causes a crash, I use 0.5.1 here too and it works. Anyway, I am glad that you have found a regexp working for you.

Me too. Thanks for your input (and everybody's)

Serpentine · 02-22-2012, 11:57 PM

A few people have asked me for a nice way to change CSS class formatting for simple print types into their relevant basic tags, I often use something like this :

Code:

Find: (?si)<span[^<>]+(?:class="(?:(i)talics?|(b)old|(u)nderlined?|(s)trikeout)")[^<>]*>(.+?)</span>
Replace: <\1\2\3\4>\5</\1\2\3\4>

It will likely hate nesting, and it's messy since you cant use duplicate names (which would mean no spammed \1\2\3\4.)

roger64 · 03-24-2012, 09:35 PM

Hi

I am no expert at all so forgive me. I still have in my notebook a Regex that Zelda of world fame gave me a long time ago:
<span.sgc-5>([\w\s]*\w)

I need to add a span tag to a span tag (that is to make a double span) without touching or removing the capital letter enclosed in the span. It's to make a box for drop-caps.

To make myself understood, I need to search and replace using Sigil, in code view, for all html files:
search
X
replace
X

X of course can be any letter from A to Z (capital letters only)
I had no success with the above Regex. Has anybody a solution for it?

I guess only \w would be needed since there is no space.

theducks · 03-25-2012, 12:55 AM

Search:

Code:

(<span class="lettrine2">[A-Z]</span>)

Note the location of the Paren, that captures the whole span

Replace:

Code:

<span class="lettrine1">\1</span>

roger64 · 03-25-2012, 03:57 AM

@theducks

Works beautifully. Thank you very much.

roger64 · 03-28-2012, 08:04 PM

Hi

I have one wish that could be of interest for French users. We currently use   before the following punctuation signs : ; ! ? and inside the French quotes and before an endnotes call. Seven possible occurences (at least for punctuation purposes, because there are other uses of this entity).

However, our typographic rules suggest to use - preferably - a narrow no-break space that we call "espace fine insécable".

When? Before the three following double punctuation signs ; ! ? and inside the French quotes and before an endnotes call. Six possible occurences.

However, this entity is mishandled by ADE and other -old generation- ebook-readers, even if we use appropriate fonts. So, for the time being, it's a little useless.

That's why, meanwhile some people try to emulate this narrow no-break space from the no-break space. I found two CSS solutions for this. Admittedly they are not optimal because they will leave a lot of spans. At least, it's a choice.

L'émulation 1
texte texte
Code CSS:
.fine {font-size: 30%;}
L'émulation 2
 
Code CSS:
.fine {display:inline-block;width:0.125em}

I would like to have an advice which of the two seems better?
I also would like to use one Regex to search and replace   using one of the two span classes above.

If I make a plain search and replace, it would have to be played six times. I hope there is a Regex to wrap everything and play it only once.

Phew, it was a long post but I tried to be clear. Thank you for your help.

Jellby · 03-29-2012, 09:06 AM

Be careful, you have class=«fine» instead of class="fine". Guillemets are not for everything

roger64 · 03-29-2012, 09:55 AM

Quote:

Originally Posted by Jellby

Be careful, you have class=«fine» instead of class="fine". Guillemets are not for everything

Oh Thanks. Sure I did not pay attention.

I will probably go this way: As the characters ; and ? can be misunderstood in Code view,
- first in source odt file, search occurence of no-break space with ([; ! ? »]) and replace it with a neutral character. Same for «
- then, in Sigil in Code view (Book view F&R is said unsafe?) search this neutral character and replace it with the span above with English quotes.

After that, I'll see if the reader is not too sluggish with all these new spans.

Toxaris · 04-04-2012, 03:04 AM

I have a problem where hopefully a RegEx can help me. I want to find all words ending with f, with the exception of a few words. So, I want a hit on for example 'dwarf', but not on 'of' or 'behalf'.

The first part I can cover with \b\w+\b, but can I filter the results with an kind of exception list? I can't seem to find that.

*update* I seem to have found it, but I need to test further. This seems to do what I want:
\b(?!of|behalf)\w+f\b

roger64 · 04-04-2012, 06:16 PM

Sorry Toxaris

I wish to set this span around some selected   tags (according French punctuation rules).
 

A simple Find and Replace regex allows me to do it in Code view.

However, I can create the desired tags but only one by one. When I intend to perform a global F and R, the result is reported but in fact not executed. Once created one by one, I can have a look at them in Code view. So, I am not dreaming.

Worse, the created spans are automaticaly erased by Sigil the next time I come back to Code view from Book view.

I am sure it is possible to ask Sigil not to erase anything but I do not find how : for example, the same instance of Sigil respects these exactly same span tags around the   entities on one document (alas! not created by me) but it removes them on mine.

HTML Tidy is not involved. I do not touch its button.

I really would like to be able to perform a global Find and Replace and to keep my changes.

02-21-2012, 07:26 AM	#16
Jellby frumious Bandersnatch Posts: 7,536 Karma: 19000001 Join Date: Jan 2008 Location: Spaniard in Sweden Device: Cybook Orizon, Kobo Aura	So you want something like this? Code: <span class="italics">[^<]\s.</span> It might not be the right regex dialect, but [^<] is intended to mean "any character not <". That won't match instances where there is something nested in the <span> before the first space, but those should be rare, and can be looked for afterwards.

02-21-2012, 08:08 AM	#18
Timur Connoisseur Posts: 54 Karma: 37363 Join Date: Aug 2011 Location: Istanbul Device: EBW1150, Nook STR	@DiapDealer: Does this narrow down your set enough? This one should match anything with at least one non-word(unicode) character in italics, including contractions but excluding empty spans(which should be easy enough to remove before- or afterwards.) Code: (UCP)(?U)<span class="italic">[^<]\W[^<]</span> If you do not want to miss absolutely anything(like nested spans) use . instead of [^<]*. But you will probably match some unwanted multi-span matches.

02-21-2012, 08:25 AM	#19
DiapDealer Grand Sorcerer Posts: 28,032 Karma: 199464182 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD	@Timur: a non-word character in the target isn't required, but any multi-word instances that happen to contain non-word characters needs to be included by the search, too. (also... that expression crashes my Sigil 5.1 ) I've yet to find an instance where Jellby's expression skips something that I wanted included. And I've given it a hell of a workout so far. This is the non-beta version of what is now working outstandingly well for me. Code: (?U)<span class="italics">[^<]\s.</span>

02-22-2012, 11:57 PM	#22
Serpentine Evangelist Posts: 416 Karma: 1045911 Join Date: Sep 2011 Location: Cape Town, South Africa Device: Kindle 3	A few people have asked me for a nice way to change CSS class formatting for simple print types into their relevant basic tags, I often use something like this : Code: Find: (?si)<span[^<>]+(?:class="(?:(i)talics?\|(b)old\|(u)nderlined?\|(s)trikeout)")[^<>]*>(.+?)</span> Replace: <\1\2\3\4>\5</\1\2\3\4> It will likely hate nesting, and it's messy since you cant use duplicate names (which would mean no spammed \1\2\3\4.)

03-24-2012, 09:35 PM	#23
roger64 Wizard Posts: 2,608 Karma: 3000161 Join Date: Jan 2009 Device: Kindle PW3 (wifi)	Hi I am no expert at all so forgive me. I still have in my notebook a Regex that Zelda of world fame gave me a long time ago: <span.sgc-5>([\w\s]\w)</span> I need to add a span tag to a span tag (that is to make a double span) without touching or removing* the capital letter enclosed in the span. It's to make a box for drop-caps. To make myself understood, I need to search and replace using Sigil, in code view, for all html files: search <span class="lettrine2">X</span> replace <span class="lettrine1"><span class="lettrine2">X</span></span> X of course can be any letter from A to Z (capital letters only) I had no success with the above Regex. Has anybody a solution for it? I guess only \w would be needed since there is no space. Last edited by roger64; 03-24-2012 at 09:44 PM.

02-21-2012, 08:53 AM	#20
Timur Connoisseur Posts: 54 Karma: 37363 Join Date: Aug 2011 Location: Istanbul Device: EBW1150, Nook STR	Strange that my pattern causes a crash, I use 0.5.1 here too and it works. Anyway, I am glad that you have found a regexp working for you.

03-25-2012, 12:55 AM	#24
theducks Well trained by Cats Posts: 30,437 Karma: 58055868 Join Date: Aug 2009 Location: The Central Coast of California Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A	Search: Code: (<span class="lettrine2">[A-Z]</span>) Note the location of the Paren, that captures the whole span Replace: Code: <span class="lettrine1">\1</span>

03-25-2012, 03:57 AM	#25
roger64 Wizard Posts: 2,608 Karma: 3000161 Join Date: Jan 2009 Device: Kindle PW3 (wifi)	@theducks Works beautifully. Thank you very much.

03-28-2012, 08:04 PM	#26
roger64 Wizard Posts: 2,608 Karma: 3000161 Join Date: Jan 2009 Device: Kindle PW3 (wifi)	Hi I have one wish that could be of interest for French users. We currently use   before the following punctuation signs : ; ! ? and inside the French quotes and before an endnotes call. Seven possible occurences (at least for punctuation purposes, because there are other uses of this entity). However, our typographic rules suggest to use - preferably - a narrow no-break space that we call "espace fine insécable". When? Before the three following double punctuation signs ; ! ? and inside the French quotes and before an endnotes call. Six possible occurences. However, this entity is mishandled by ADE and other -old generation- ebook-readers, even if we use appropriate fonts. So, for the time being, it's a little useless. That's why, meanwhile some people try to emulate this narrow no-break space from the no-break space. I found two CSS solutions for this. Admittedly they are not optimal because they will leave a lot of spans. At least, it's a choice. L'émulation 1 texte<span class=«fine»> </span>texte Code CSS: .fine {font-size: 30%;} L'émulation 2 </span class=«fine»> </span> Code CSS: .fine {display:inline-block;width:0.125em} I would like to have an advice which of the two seems better? I also would like to use one Regex to search and replace   using one of the two span classes above. If I make a plain search and replace, it would have to be played six times. I hope there is a Regex to wrap everything and play it only once. Phew, it was a long post but I tried to be clear. Thank you for your help. Last edited by roger64; 03-28-2012 at 11:40 PM.

03-29-2012, 09:06 AM	#27
Jellby frumious Bandersnatch Posts: 7,536 Karma: 19000001 Join Date: Jan 2008 Location: Spaniard in Sweden Device: Cybook Orizon, Kobo Aura	Be careful, you have class=«fine» instead of class="fine". Guillemets are not for everything

04-04-2012, 03:04 AM	#29
Toxaris Wizard Posts: 4,520 Karma: 121692313 Join Date: Oct 2009 Location: Heemskerk, NL Device: PRS-T1, Kobo Touch, Kobo Aura	I have a problem where hopefully a RegEx can help me. I want to find all words ending with f, with the exception of a few words. So, I want a hit on for example 'dwarf', but not on 'of' or 'behalf'. The first part I can cover with \b\w+\b, but can I filter the results with an kind of exception list? I can't seem to find that. update I seem to have found it, but I need to test further. This seems to do what I want: \b(?!of\|behalf)\w+f\b Last edited by Toxaris; 04-04-2012 at 03:16 AM.

04-04-2012, 06:16 PM	#30
roger64 Wizard Posts: 2,608 Karma: 3000161 Join Date: Jan 2009 Device: Kindle PW3 (wifi)	Sorry Toxaris I wish to set this span around some selected * * tags (according French punctuation rules). </span class="fine"> </span> A simple Find and Replace regex allows me to do it in Code view. However, I can create the desired tags but only one by one. When I intend to perform a global F and R, the result is reported but in fact not executed. Once created one by one, I can have a look at them in Code view. So, I am not dreaming. Worse, the created spans are automaticaly erased by Sigil the next time I come back to Code view from Book view. I am sure it is possible to ask Sigil not to erase anything but I do not find how : for example, the same instance of Sigil respects these exactly same span tags around the * * entities on one document (alas! not created by me) but it removes them on mine. HTML Tidy is not involved. I do not touch its button. I really would like to be able to perform a global Find and Replace and to keep my changes.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Examples of Subgroups	emonti8384	Lounge	32	02-26-2011 06:00 PM
Accessories Pen examples	Gunnerp245	enTourage Archive	15	02-21-2011 03:23 PM
Stylesheet examples?	Skitzman69	Sigil	15	09-24-2010 08:24 PM
Examples	kafkaesque1978	iRiver Story	1	07-26-2010 03:49 PM
Looking for examples of typos in eBooks	Tonycole	General Discussions	1	05-05-2010 04:23 AM

Advert

Advert