03-04-2008, 09:30 PM | #1 |
Reticulator of Tharn
Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
|
Obelisk -- legal distribution of format-shifted copyrighted works
I love curly quotation marks. They're so round and inviting. I also love free e-books, and so have been delighted by Tor's current free–e-book–each–week program. Perhaps by Tor my loves may be joined? But alas not – the HTML versions Tor provides have ASCII quotation marks, and when I asked if this could be rectified was told “I'm afraid the quotation-mark conversion has to stay.”
So for Robert Charles Wilson’s Spin I rolled up my crazy-sleeves, pulled out by regexps, and fixed them myself. Every last one. And modified the CSS and some of the markup to much more more closely resemble the formatting in the PDF version. Then wrapped it up as a valid .epub book. Then converted/tweaked to produce a great-looking Sony Reader BBeB book. And they’re all for only me! Nope, can’t give them to you. The power of copyright compels me! I can add those curly quotes myself because I have the source HTML to start with. If I start handing people my curly-quoted version I have no means to stop it from falling into new hands which didn’t already have the straight-from-Tor edition. Or do I? I could provide you with a grid of just the byte offsets of the various curly quotes. Some extreme variant of diff/patch in which nothing of the original copyrighted text persists. It would contain just my curly quotes, owned by me under copyright law and free to give you as I wish. You provide the straight-from-Tor e-book, mix in my curly quotes and poof! – you have a be-curled edition of Spin. But this doesn’t work for format-shifting over compression, encoding changes, etc., where “put a curly quote here” ceases to makes sense. Unless we distill the idea down to the lowest level – what is XOR but the difference between two bits? Let’s try an experiment, which I’m calling Obelisk[1]. Download the following files: obelisk.py Then get your copy of WilsonSpin_HTML.zip handy, pop open your favorite shell, and run:Mohm5pei#WilsonSpin_HTML.zip#Spin.epub.obelisk AhZe5shu#WilsonSpin_HTML.zip#Spin.lrf.obelisk Code:
python obelisk.py Mohm5pei WilsonSpin_HTML.zip Mohm5pei#WilsonSpin_HTML.zip#Spin.epub.obelisk Spin.epub python obelisk.py AhZe5shu WilsonSpin_HTML.zip AhZe5shu#WilsonSpin_HTML.zip#Spin.lrf.obelisk Spin.lrf Let me know what you think. [1] Obelisk is similar to and inspired by a “project” called Monolith, although with rather different goals. |
03-04-2008, 09:46 PM | #2 |
creator of calibre
Posts: 44,565
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Assuming the source file has an even number of quotes, shouldn't replacing them with curly quotes be as simple as
Code:
intag = False inquote = False for i, chr in enumerate(data): if chr == '<': intag = True elif chr == '>' intag = False elif not intag and chr == '"': if inquote: data[i] = right curly quote inquote = False else: data[i] = left curly quote inquote = True |
Advert | |
|
03-04-2008, 10:07 PM | #3 | |
Reticulator of Tharn
Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
|
Quote:
“This quotation-marked bit goes on for more than one paragraph. It doesn’t end with a double quote. So pretty much the rules are:“And here I have some ‘examples’ of single quotes. I’ve got several of ’em. The examples’ quotation marks point in all kinds of directions. “And here ends the quote.” Code:
<ws>" == “ "<ws> == ” \w'\w == ’ '<ws> == ’ <ws>' == ‘ But then have to manually check all the instances of “<ws>‘” and probaly start by looking for any quotations marks with white space on both sides (usually found when doing "something like 'this' "). So anyway. Mostly mechanizable, but still some manual labor to get it perfect. And can’t automate improving the CSS. :-) |
|
03-04-2008, 10:16 PM | #4 |
creator of calibre
Posts: 44,565
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Ah I see, well lets see if Tor starts beating on your door in the middle of the night.
|
03-04-2008, 11:38 PM | #5 |
Resident Curmudgeon
Posts: 76,491
Karma: 136564766
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Um... it won't work because it never came zipped. And how do we know the filename to use in the ZIP file or even if we have the exact same contents?
|
Advert | |
|
03-05-2008, 12:21 AM | #6 |
Reticulator of Tharn
Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
|
The e-mails actually contain links to two separate HTML versions. One is the HTML content served directly, the other is a ZIP archive which contains the images used in the book, a (broken) OPF file, etc.
Last edited by llasram; 03-05-2008 at 12:21 AM. Reason: Fix typo. |
03-05-2008, 12:34 AM | #7 |
Resident Curmudgeon
Posts: 76,491
Karma: 136564766
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Yes, you are correct. My apologies. I'll give your script another go and see how it works out.
|
03-05-2008, 12:39 AM | #8 |
Resident Curmudgeon
Posts: 76,491
Karma: 136564766
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
How do I use your script to generate a diff file for other content? I'd love to do one for Mistborn based on the PDF to make the LRF from it.
|
03-05-2008, 12:52 AM | #9 |
Resident Curmudgeon
Posts: 76,491
Karma: 136564766
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
I've taken the EPUB edition and built an LRF to my specification. Looks nice. Now all I need to do is build a proper ToC and I'll be all set.
|
03-05-2008, 09:10 AM | #10 | |
Reticulator of Tharn
Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
|
Quote:
Code:
python obelisk.py SALT KEYFILE INFILE OUTFILE Code:
python obelisk.py sai3sahS 9780765350381.zip Mistborn.lrf sai3sahS#9780765350381.zip#Mistborn.lrf.obelisk |
|
03-05-2008, 11:11 AM | #11 |
Resident Curmudgeon
Posts: 76,491
Karma: 136564766
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
I got it to work. Thank you. This will make it a lot easier now to post conversions without having to post the converted file.
|
03-05-2008, 11:38 AM | #12 |
Gizmologist
Posts: 11,615
Karma: 929550
Join Date: Jan 2006
Location: Republic of Texas Embassy at Jackson, TN
Device: Pocketbook Touch HD3
|
Heh, we may have to have another category in the Book Uploads area.
|
03-05-2008, 11:39 AM | #13 |
Resident Curmudgeon
Posts: 76,491
Karma: 136564766
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
|
03-05-2008, 01:59 PM | #14 | |||
Reticulator of Tharn
Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
|
Quote:
Quote:
Quote:
|
|||
03-05-2008, 02:07 PM | #15 |
Resident Curmudgeon
Posts: 76,491
Karma: 136564766
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Nice! I don't see any issue. Because as you said without the source the diff file is useless. What if you've downloaded the PDF of Mistborn then you can get my LRF conversion over at https://www.mobileread.com/forums/sho...356#post156356
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
What format works best on Jetbook Lite? | clerky96 | Ectaco jetBook | 18 | 02-10-2010 05:06 PM |
Format Shifting - Soon Legal in the UK? | bingle | News | 15 | 01-21-2008 10:07 AM |
Baen format upgrade in the works | Nate the great | Workshop | 11 | 12-09-2007 10:32 PM |
Canadian government requires non-DRM "legal deposit" of digital works | nekokami | News | 2 | 01-22-2007 04:21 PM |