07-25-2013, 10:34 PM | #1 |
Wizard
Posts: 1,085
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
Removing Soft hyphens
https://www.mobileread.com/forums/showthread.php?t=77992
I can see the C2 AD (194 and 173) with my hex editor, but as others have pointed out, they're invisible in Sigil I tried copy/paste the characters, but nothing worked. The post above is old and 7.2 is out, so maybe things have changed??? Is there any way to strip them out of Sigil? RegEx maybe? Paul |
07-25-2013, 11:11 PM | #2 | |
Well trained by Cats
Posts: 30,397
Karma: 58055234
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
|
|
Advert | |
|
07-25-2013, 11:36 PM | #3 | |
Wizard
Posts: 1,085
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
Quote:
That's one for my saved searchs Still, it would be nice (IMHO) if Sigil had a 'Revel Hidden Codes' View option that S&R would work in. Poking around in hex, it looks like there's some more stuff to investigate Paul Last edited by phossler; 07-25-2013 at 11:39 PM. |
|
07-26-2013, 02:04 AM | #4 |
Grand Sorcerer
Posts: 5,636
Karma: 23191067
Join Date: Dec 2010
Device: Kindle PW2
|
BTW, the Calibre Hyphenate This! plug-in can automatically remove all soft hyphens.
|
07-26-2013, 07:07 AM | #5 |
Color me gone
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
Agreed. But it may not be available in webkit. However, perhaps they could build a search function for these types of characters which would allow us to do something with them. Hidden spaces which throw things out of line have been my issue, though mostly with imported html.
|
Advert | |
|
07-26-2013, 01:05 PM | #6 |
Wizard
Posts: 1,085
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
@Doitsu -- thanks, but a lot of time Calibre will add a lot of CSS that I don't want. I will follow up. Maybe run the html into Calibre, convert to epub, and then the plug in???
@mrmike -- The biggest problem is that F&R seems to want the character as text. This means that I have to locate and identify the troublesome character, use CharMap to copy it, paste in into a Sigil Find (remembering to escape it -- thanks 'theducks'), etc. I knew it was 173, so I did try the \ and then alt+numpad 0173 in the Find, but didn't work. But CharMap works if I know what I'm looking for Now that I have it as saved search it will be easier. I hope that as I find more things like this, I can just keeping addeing them to my 'Delete Bad Char' saved search I couldn't figure out why spell check had 100+ occurances of just 'ed' and 'ing' flagged in things like 'walked' and 'walking'. Paul |
07-26-2013, 01:47 PM | #7 |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
If you know a location where it is in Sigil, you can actually select and copy it. Click on the character next to it and press shift+arrow in the direction you want to select. If you have the right character, the cursor will not move although you pressed the arrow. Copy and paste in the S&R window.
|
07-26-2013, 02:01 PM | #8 |
Grand Sorcerer
Posts: 27,942
Karma: 198500000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Just use the \x{FFFF} method provided by PCRE Regex to search for unicode code points. Replace FFFF with hexdecimal representation of the unicode code point you're wanting. In this case 00ad (or just ad)
Using regex, search for \x{00AD} (or \x{ad) and replace with nothing to remove soft-hyphens. Last edited by DiapDealer; 07-26-2013 at 02:04 PM. |
07-26-2013, 03:52 PM | #9 |
Wizard
Posts: 1,085
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
@Toxaris and DiapDealer -- thanks !!
Both very useful tips The \x{00AD} is MUCH easer to see So if I wanted to follow this theme then could I include even more characters in my SavedSearch:? [\x{00AD}\x{2000}-\x{200D}] where 2000 is En-Quad and 200D is Zero Width Joiner (what ever that is) That would include Thin, Hair, and Zero Width spaces that I think mrmike mentioned Paul |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Calibre remove soft hyphens? | zuli | Calibre | 3 | 11-08-2017 09:20 PM |
Soft Hyphens | wallcraft | Workshop | 29 | 06-12-2012 04:21 AM |
Option for removing soft hyphens? | WarnerYoung | Calibre | 1 | 05-24-2012 11:44 PM |
Feature request: soft hyphens | paulpeer | Sigil | 3 | 12-05-2009 01:43 PM |
Calibre deletes soft Hyphens in Epub ? | NASCARaddicted | Calibre | 4 | 09-20-2009 06:31 PM |