10-11-2024, 05:05 PM | #1 | |
Enthusiast
Posts: 28
Karma: 10
Join Date: May 2024
Device: Kindle Scribe
|
Digital preservation: ePUB for the archiving of text and other media
Greetings all. Forgive me if this has been hashed out to death already in other threads (pointers to them would be most welcome, especially if they are not super old).
TLDR; Have you used ePUB to preserve or archive anything? How did it go? What did you learn? Longer: I'm interested in seeing other perspectives and stories that relate to where I am on my path as I continue to explore the merits of ePUB, this time focusing on efforts to create a digital archive for collections of personal and family texts and photos. For anyone reading this who might not know, The US Library of Congress' Recommended Formats Statement lists EPUB3-compliant XML as the format of choice for text preservation: Quote:
As for me, at this juncture it's still early days and I'm still messing around, so I don't believe I have much to share yet that would be of significant value to anyone else, but I'll go ahead and outline it anyway. The archive I'm looking at right now is a combination of my own stuff and stuff that I somehow ended up with that's been handed down over a few generations. The majority of it appears to be text, with photos being the next biggest media type, then audio and video, and finally an uncertain overall amount of digital. Needless to say, for the text I'm looking at epub3 for the final storage format, but am considering also using it as an optional and maybe convenient way to view other media types as well. (E.g. physical photos and AV would be stored using appropriate methods, and stored with them would be the unedited digital versions and then maybe some epubs that could make casual browsing a little easier for whoever digs up the vault in the future.) As I began, I decided that I would start by experimenting on myself using my own personal paper-based journals. (I also have journals in other kinds of media but but I won't go into that here just yet.) First up is a small selection of 100 pages or so. When I have these pages in a form that I'm happy with, my plan is to then use whatever I learn from the process to do things properly as I begin to more seriously tackle other more fragile/important parts of the collection. Currently the epub3 document I'm working with contains about 50 xhtml pages that are each devoted to a single color 1350x2000px PNG (16.5MG average file size). This image and file size already feels too big to me judging from a few little clues, and I'm wondering if I should cut it down by half. (Even that might be too large, for all I know.) Next up will be experimenting with transcribed text, which I will mix in with the image-based pages. (Image, text, image, text, etc.) I may also mix in a small amount of additional media as required. For example just this morning, on a page following a journal entry I added a captioned 1995 photograph that shows something I was writing about in 2004. The photo is for illustration purposes only and it's currently a 2400x3300 (12MB) PNG grid of three images. I'll leave it there. Fingers crossed that others will feel the urge to share, too. Last edited by Fitz Frobozz; 10-11-2024 at 05:32 PM. |
|
10-11-2024, 05:20 PM | #2 |
Enthusiast
Posts: 28
Karma: 10
Join Date: May 2024
Device: Kindle Scribe
|
Reserving just in case I decide to put project stuff here.
|
10-13-2024, 03:32 PM | #3 | ||
Wizard
Posts: 2,304
Karma: 12587727
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Sounds great. Welcome to the forum.
Quote:
That should cover pretty much any/all best practices + digitization questions. Quote:
Sounds to me like you may accidentally just be plopping in images of "scanned pages" into your EPUBs. If your images are just scans of pages out of books, you'd need to OCR and change those into actual text. If the images are photographs—like of people, trees, etc.—you can probably use JPGs instead of PNGs. That will save lots of space too. |
||
10-15-2024, 01:09 AM | #4 | |
Enthusiast
Posts: 28
Karma: 10
Join Date: May 2024
Device: Kindle Scribe
|
Quote:
Oh, sorry about that, it looks like I could have taken more care to clarify the above: I'm actually intentionally scanning pages in "photo mode" and adding the resultant PNGs (or JPGs, or whatever I end up going with) to the ePub as they are and manually transcribing the same pages into text. The goal being to provide both versions for every document in the collection. RE OCR, I'd be (very pleasantly) surprised if that were a viable option given that the majority of the original documents are handwritten. Last edited by Fitz Frobozz; 10-15-2024 at 04:21 AM. |
|
10-15-2024, 04:24 AM | #5 |
Enthusiast
Posts: 28
Karma: 10
Join Date: May 2024
Device: Kindle Scribe
|
Just a thought. Does Amazon use a standard and/or some library for their Scribe OCR, or something proprietary? I'm assuming it's homegrown/proprietary but thought I'd check. That OCR is surprisingly good. At least, it has been for my Scribe scribblings.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Good News About Old Media: How The Atlantic Went Digital | stonetools | News | 4 | 12-23-2011 09:08 PM |
digital media and printed media are the same... | mattbiernat | Amazon Kindle | 0 | 08-13-2010 08:55 PM |
Cooper blog: News media is lost about digital media, too | Steven Lyle Jordan | Deals and Resources (No Self-Promotion or Affiliate Links) | 0 | 11-05-2007 11:06 AM |
Palm Digital Media and PalmGear | Griff | Deals and Resources (No Self-Promotion or Affiliate Links) | 5 | 10-07-2003 03:47 AM |