11-20-2007, 11:13 AM | #1 |
Addict
Posts: 223
Karma: 356
Join Date: Aug 2007
Device: Rocket; Hiebook; N700; Sony 505; Kindle DX ...
|
Perl processing
since I don't own MsOffice, I'm out of luck using BookDesigner.
I was wondering if some Perl-lover reads this forum, to know his ideas on basic formatting txt files using perl one-liners, e.g. s/[[:^print:]]//g to eliminate nonprinting chars ... it's all about not re-inventing the wheel each time! e.g. - how to deal with accented characters? Which punctuation to accept? And so on... Alessandro |
11-23-2007, 05:57 PM | #2 |
Enthusiast
Posts: 36
Karma: 14
Join Date: Oct 2007
Device: Sony PRS-505
|
Since no one else replied I thought I'd offer my beginners input after messing around for a few hours.
Not sure what Office has to do with it? You mean as an automatic paragraph formatter by importing into it? Would OpenOffice help? That opens all the office formats. I don't have Office either. All the existing processors seem to do a reasonably good job for most generic cases, there's a few modules on CPAN that can reformat messed up text that might help Book Designer import things better. I've found it easier to just run the text file through a few sed filters rather than setting up a big perl script, it's the same search/replace as perl but you can get some faster results if you don't need anything fancy. If you're familiar with this then maybe it will help some other Unix-type people who haven't realised how easy it is to do some command line text processing. Eg. I fixed hard returns in one html file that wasn't importing into Book Designer properly because it had "<space><br>" at the end of every line by running Code:
cat file.txt | sed -e 's/ <br>$/ /g' | more Code:
cat file.txt |\ sed -e 's/ <br>$/ /g' | \ sed -e 's/-<br>$/- /g' | \ more Code:
cat file.txt |\ sed -e 's/ <br>$/ /g' | \ sed -e 's/-<br>$/- /g' | \ >newfile.txt I can't see how to automate these "special cases" of broken text files, the problems are too specific. But once it's clean enough Book Designer will import and do it's magic amazingly well. Once it is in Book Designer it has an amazing search/replace with regex which can help with the rough edges. Last edited by maxk; 11-23-2007 at 06:02 PM. |
Advert | |
|
11-26-2007, 07:05 AM | #3 | |||
Addict
Posts: 223
Karma: 356
Join Date: Aug 2007
Device: Rocket; Hiebook; N700; Sony 505; Kindle DX ...
|
Quote:
Quote:
Quote:
I'm not an expert at all in sed, but I'll give it a try, thanks! Alessandro |
|||
11-26-2007, 07:13 AM | #4 |
Addict
Posts: 223
Karma: 356
Join Date: Aug 2007
Device: Rocket; Hiebook; N700; Sony 505; Kindle DX ...
|
Specifically, I get the error:
<Cannot convert to rtf format, Check your MS Word installation> - even when I follow the advice at https://www.mobileread.com/forums/sho...&postcount=199 Alessandro |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Perl and Regex | Alexander Turcic | Lounge | 3 | 01-25-2011 08:48 PM |
Comic File Processing | wonderboy | Other formats | 1 | 08-08-2009 05:17 AM |
Image processing using html2epub? | Portnull | Calibre | 2 | 06-03-2009 01:31 PM |
Text Processing: Some Ideas | ahi | Workshop | 4 | 05-29-2009 05:35 PM |
Any perl or python gurus? | jbenny | Workshop | 0 | 11-23-2007 04:27 PM |