Regex examples - Page 5

DiapDealer · 04-17-2012, 10:11 AM

Quote:

Originally Posted by mncowboy

Is there a single regex that can do this in Sigil?
Thanks in advance.

Find:

Code:

<a href="#_edn(\d+)" name="_ednref(\d+)" title=""><span class="MsoEndnoteReference"><span class="MsoEndnoteReference"><b><span style="font-size:8.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;color:black">\[(\d+)\]</span></b></span></span></a>

Replace:

Code:

<a href="#_edn\1" id="_ednref\2" title=""><sup>[\3]</sup></a>

Or if you know, absolutely, that the numbers will always be the same across each instance:
Find:

Code:

<a href="#_edn(\d+)" name="_ednref\d+" title=""><span class="MsoEndnoteReference"><span class="MsoEndnoteReference"><b><span style="font-size:8.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;color:black">\[\d+\]</span></b></span></span></a>

Replace:

Code:

<a href="#_edn\1" id="_ednref\1" title=""><sup>[\1]</sup></a>

NOTE: I replaced the "name" attribute with "id" because "name" is old and tired.

EDIT: The above stuff is all based on the assumption that the <b>, <span>, and font-family/size stuff is identical in all of the original endnote code instances. You'd need to make judicious use of (.*?) if not.
(and I had a mistake in the first edition of this post that I corrected)

mncowboy · 04-17-2012, 10:43 AM

Thank you sir!!
Changing name= to id= is one of the first S&R I do on a document.

You would think that Word would have changed over by now.

GRiker · 04-20-2012, 08:09 AM

With the old regex engine, I could use '\x20' to specify a space in the replacement pattern, but that no longer works in the current version.

Other than using a literal space, how do I specify a space character in the replace field? (I don't want to use a literal space, because I often save my s/r patterns in a development notes file, and they're hard to see in plain text.)

Perkin · 04-20-2012, 12:37 PM

You could use

Code:

& #32;

(remove the space)

Edit: I think you might only be able use that if the replace is part of text - not inside a tag.

GRiker · 04-20-2012, 12:53 PM

Quote:

Originally Posted by Perkin

You could use

Code:

& #32;

(remove the space)

Edit: I think you might only be able use that if the replace is part of text - not inside a tag.

That would work, but I was looking for something that would insert an actual ASCII space in the text, rather than an entity (for readability).

After lots of experimenting, I discovered that I could use

Code:

\U \E

as the replacement term but that seems indirect and inelegant. But it works.

G

roger64 · 05-11-2012, 11:40 AM

Hi

It's just a small question. To select letters intended to become dropcaps, I use this part of a Regex:
([A-Z])

However, I realize this does not select accented capitals that do exist in French (like É, À, Ô and so on). Of course, I can just suppress their accents. But if I wish to make a drop-cap out of an accented capital, what would be the code?

([.]) is a catch-all. Have you better?

theducks · 05-11-2012, 01:02 PM

Quote:

Originally Posted by roger64

Hi

It's just a small question. To select letters intended to become dropcaps, I use this part of a Regex:
([A-Z])

However, I realize this does not select accented capitals that do exist in French (like É, À, Ô and so on). Of course, I can just suppress their accents. But if I wish to make a drop-cap out of an accented capital, what would be the code?

([.]) is a catch-all. Have you better?

([A-ZÉÀÔ])

the dash just means range. the normal is any one of these. You can use both as I have

DiapDealer · 05-11-2012, 02:51 PM

Quote:

Originally Posted by roger64

However, I realize this does not select accented capitals that do exist in French (like É, À, Ô and so on). Of course, I can just suppress their accents. But if I wish to make a drop-cap out of an accented capital, what would be the code?

([.]) is a catch-all. Have you better?

Code:

\p{Lu}

Will catch all upper-case letters (including unicode characters), if that's what you're looking for. Add parentheses to make it a capture group if desired, of course.

roger64 · 05-12-2012, 11:10 AM

@DiapDealer, theducks

Thanks very much for your replies. As this regex is intended to be used for French texts, I will use theducks' proposal. I just did not know one could add letters this way as I did not see any example of it.

DiapDealer · 05-12-2012, 11:47 AM

Quote:

Originally Posted by roger64

@DiapDealer, theducks

Thanks very much for your replies. As this regex is intended to be used for French texts, I will use theducks' proposal. I just did not know one could add letters this way as I did not see any example of it.

Just so you know, it doesn't matter what language it is. If it's a valid uppercase letter (including unicode characters with acute, grave, breve umlauts—any valid diacritic, really), (\p{Lu}) will capture it. But whatever you're comfortable with is the way to go.

roger64 · 05-25-2012, 10:41 AM

@DiapDealer

Did not see you reply in time. It really needed your explanation. Yes of course, this is also a very convenient solution. I note it. Thanks again.

paulfiera · 06-05-2012, 01:44 PM

How can I change in Sigil all the occurrences of "Chapter" like the following example:

Quote:

Chapter One

Where "One" can be "Two", "Three", and so on...

...or even "1", "2", "3",...

with

Quote:

<h1>Chapter [the text that goes after the word Chapter]</h1>

Many thanks!

Edit: Never mind I think I found the solution in JeremyR's post. Many thanks, JeremyR

meme · 06-05-2012, 01:59 PM

You don't say what the original Chapter One looks like in code view. Just the text isn't sufficient to make sure the find/replace is correct.

Assuming you have

Code:

<p>Chapter SOMETHING</p>

and want

Code:

<h1>Chapter SOMETHING</h1>

then

Code:

Find:    (?sU)<p>Chapter (.*)</p>
Replace: <h1>Chapter \1</h1>

might get you what you want.

paulfiera · 06-05-2012, 02:09 PM

Quote:

Originally Posted by meme

You don't say what the original Chapter One looks like in code view. Just the text isn't sufficient to make sure the find/replace is correct.

Assuming you have

Code:

<p>Chapter SOMETHING</p>

and want

Code:

<h1>Chapter SOMETHING</h1>

then

Code:

Find:    (?sU)<p>Chapter (.*)</p>
Replace: <h1>Chapter \1</h1>

might get you what you want.

Many thanks, meme. You are correct.

In Code View it is

Quote:

<p class="calibre4">CHAPTER 1</p>

Using JeremyR's code seems to do the trick. The Chapters are now

Quote:

<h3>CHAPTER 1</h3>

I omitted the horizontal line and used calibre to split the html at every h3.

Don't know if I'm doing this right though

I have more books with the same issue. I'll try with your code next time.

Many thanks.

roger64 · 06-19-2012, 08:00 AM

Successive Find and Replace

I wish to clean an html text which suffers from recurrent mistakes from an OCR engine (Cuneiform).

When I meet one the mistakes, I make a replacement and I note it. After some pages, I met most of the mistakes and now I intend to build a regex, adding as many as 15 successive simple search and replace like the following two.
A@ → à
B@ → ç
I do not know how to perform these 15 F&R within a simple regex.Suppose I would like to build it for the two above, what should I write?

Nota: I already use utf8 for the whole text.

04-17-2012, 10:43 AM	#62
mncowboy Wanderer Posts: 106 Karma: 472218 Join Date: Jan 2011 Device: Kindle 3, PaperWhite 2	Thank you sir!! Changing name= to id= is one of the first S&R I do on a document. You would think that Word would have changed over by now. Last edited by mncowboy; 04-17-2012 at 10:48 AM.

04-20-2012, 08:09 AM	#63
GRiker Comparer of the Ephemeris Posts: 1,496 Karma: 424697 Join Date: Mar 2009 Device: iPad	Specifying space character in replace field? With the old regex engine, I could use '\x20' to specify a space in the replacement pattern, but that no longer works in the current version. Other than using a literal space, how do I specify a space character in the replace field? (I don't want to use a literal space, because I often save my s/r patterns in a development notes file, and they're hard to see in plain text.)

04-20-2012, 12:37 PM	#64
Perkin Guru Posts: 657 Karma: 64171 Join Date: Sep 2010 Location: Kent, England, Sol 3, ZZ9 plural Z Alpha Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)	You could use Code: & #32; (remove the space) Edit: I think you might only be able use that if the replace is part of text - not inside a tag. Last edited by Perkin; 04-20-2012 at 12:48 PM.

06-05-2012, 01:59 PM	#73
meme Sigil developer Posts: 1,274 Karma: 1101600 Join Date: Jan 2011 Location: UK Device: Kindle PW, K4 NT, K3, Kobo Touch	You don't say what the original Chapter One looks like in code view. Just the text isn't sufficient to make sure the find/replace is correct. Assuming you have Code: <p>Chapter SOMETHING</p> and want Code: <h1>Chapter SOMETHING</h1> then Code: Find: (?sU)<p>Chapter (.*)</p> Replace: <h1>Chapter \1</h1> might get you what you want.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Examples of Subgroups	emonti8384	Lounge	32	02-26-2011 06:00 PM
Accessories Pen examples	Gunnerp245	enTourage Archive	15	02-21-2011 03:23 PM
Stylesheet examples?	Skitzman69	Sigil	15	09-24-2010 08:24 PM
Examples	kafkaesque1978	iRiver Story	1	07-26-2010 03:49 PM
Looking for examples of typos in eBooks	Tonycole	General Discussions	1	05-05-2010 04:23 AM

05-11-2012, 11:40 AM	#66
roger64 Wizard Posts: 2,624 Karma: 3120635 Join Date: Jan 2009 Device: Kindle PW3 (wifi)	Hi It's just a small question. To select letters intended to become dropcaps, I use this part of a Regex: ([A-Z]) However, I realize this does not select accented capitals that do exist in French (like É, À, Ô and so on). Of course, I can just suppress their accents. But if I wish to make a drop-cap out of an accented capital, what would be the code? ([.]) is a catch-all. Have you better?

05-12-2012, 11:10 AM	#69
roger64 Wizard Posts: 2,624 Karma: 3120635 Join Date: Jan 2009 Device: Kindle PW3 (wifi)	@DiapDealer, theducks Thanks very much for your replies. As this regex is intended to be used for French texts, I will use theducks' proposal. I just did not know one could add letters this way as I did not see any example of it.

05-25-2012, 10:41 AM	#71
roger64 Wizard Posts: 2,624 Karma: 3120635 Join Date: Jan 2009 Device: Kindle PW3 (wifi)	@DiapDealer Did not see you reply in time. It really needed your explanation. Yes of course, this is also a very convenient solution. I note it. Thanks again.

06-19-2012, 08:00 AM	#75
roger64 Wizard Posts: 2,624 Karma: 3120635 Join Date: Jan 2009 Device: Kindle PW3 (wifi)	Successive Find and Replace I wish to clean an html text which suffers from recurrent mistakes from an OCR engine (Cuneiform). When I meet one the mistakes, I make a replacement and I note it. After some pages, I met most of the mistakes and now I intend to build a regex, adding as many as 15 successive simple search and replace like the following two. A@ → à B@ → ç I do not know how to perform these 15 F&R within a simple regex.Suppose I would like to build it for the two above, what should I write? Nota: I already use utf8 for the whole text.

Advert

Advert