01-01-2013, 10:44 PM | #1 |
Connoisseur
Posts: 51
Karma: 9502
Join Date: Oct 2010
Location: California
Device: Kindle 3 WiFi, Kindle 4 Touch
|
Generate eBook from Bookmarks Archive
Hello all. I regularly bookmark articles of interest to me, for later reference or easy access later. However, my reference folder is depended on those pages staying online and unchanged. I'd like to not only be able to download them for my records (which I already do), but also compile them into an eBook for easier reading and convenience. Here's the layout:
Abstract goal - turn some subset of my bookmarks into a logically structured ebook. Specific Solution 1. Export bookmarks to html, then use a program such wget or httrack to download all of the the raw html files 2. Filter the html files - removing comments, ads, and extra fluff. 3. Using the html index from (1), with each link pointing to a cleaned html file, compile an ebook with a TOC, possibly recreating the folder structure of my bookmarks. Main roadblocks: - for (2) - I'm not sure how to do this. Any ideas for tools? - for (3) - I need to find a way to preserve my folder structure. I'd love some comments on the overall plan, which tools to use, and general feedback. Thanks! |
01-02-2013, 02:38 AM | #2 |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
What I would do, is upload those HTML files in Sigil and clean then up there. Layout/style is not to be retained probably anyway.
I would also ditch the folder structure, since it has no meaning in an ebook. If you want to maintain it, I would advise creating it by hand. You can the information on the wiki or the site of jedisaber for that. It will be a good read anyway. |
Advert | |
|
01-02-2013, 08:39 AM | #3 |
Connoisseur
Posts: 95
Karma: 10072
Join Date: Apr 2008
Device: sony
|
I would suggest starting with Instapaper or similar to help with the cleanup. I do not know what you mean by folders in a book sense. If you mean chapter and section, that should be easy in something like Sigil. If that is not what you mean, I agree with Toxaris that it is meaningless in an ebook.
|
01-05-2013, 04:17 PM | #4 |
Connoisseur
Posts: 51
Karma: 9502
Join Date: Oct 2010
Location: California
Device: Kindle 3 WiFi, Kindle 4 Touch
|
Updates
I've created an early alpha of a process that seems to work.
1. export bookmarks in html from firefox, leave only the links of interest (notepad++ is great) 2. clean using it the following regex expressions, using find and replace, replacing with empty blanks. a) <[^>^A]+> b) <A HREF=" c) " ADD_DATE="[0-9 ]+" LAST_MODIFIED="[0-9 ]+" d) >[^<]+<\/A> e) <H3 ADD_DATE="[0-9]+" LAST_MODIFIED="[0-9]+">.* 3. copy cleaned links to urls.txt 4. run this shell script: Code:
#!/bin/sh for url in `cat urls.txt `; do title=`curl $url 2>&1 | grep -i '<title>.*</title>' | sed -e 's/<[^>]*>//g'` && echo $url | mail -s "$title" YOUR_EMAIL@instapaper.com ; done Limitations Seems that instapaper only exports the last 20 unread articles, so I've been looking in to using a Calibre recipe that would download the newest 20, archive them, and grab the next 20. This loop could be run until I have a pile of epubs, which would be later glued together using some software. Questions 1. Does anyone know of a prebuilt recipe that can do this? 2. Are there any programs that can automate the process of gluing together multiple eBooks, while also gluing together the TOC? |
01-05-2013, 05:58 PM | #5 |
Grand Sorcerer
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
For recipe questions I would suggest you post in the recipe subforum of the Calibre forum.
|
Advert | |
|
Tags |
bookmarks, html, parsing, scripting |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Generate eBook from Bookmarks Archive | vaniaspeedy | General Discussions | 0 | 01-01-2013 10:44 PM |
How to generate cover from the first page of an ebook? | purgatorios | Library Management | 2 | 11-17-2012 07:42 AM |
Could not find an ebook in the archive. | emilyf | Devices | 7 | 04-02-2011 06:10 PM |
Ebook archive in case of the apocalypse? | jblitereader | General Discussions | 52 | 08-22-2010 06:13 PM |
instapaper.com - Bookmarks service that generate epub and mobi books | celson | Ectaco jetBook | 3 | 03-13-2010 11:10 PM |