01-23-2016, 09:07 AM | #1 |
Enthusiast
Posts: 35
Karma: 28904
Join Date: Aug 2015
Device: none
|
A Do It Yourself "Read It Later" Service for Koreader
In addition to reading a lot of books, I read a lot of news. I love KOreader, and like to read using that tool as much as I can. I've experimented with a lot of tools: Pocket, Wallabag, Calibre, Calibre2OPDS, COPS and many more. But none of them provided the simple, seamless integration of my reading list with KOreader that I desired. So I pieced together my own using some really nice open source tools.
The tools: Syncthing I use Syncthing to sync my books between my computers and my devices running KOreader. I also use it to sync a lot of other things between devices. Syncthing is open source, peer-to-peer (no server required) sync software available for a wide variety of platforms. Even if you aren't interested in the "Read It Later" solution I describe in this post you should consider using Syncthing to sync your device(s) running KOreader. There is an Android app, instructions for Kindle Touch and just this week, thanks to tshering, a simple installer for Kobos running KSM. It should be fairly easy to put Syncthing on other e-reader devices. Even if you can't or don't want to install Syncthing on your device, you can use Syncthing for a very easy USB sync solution. Five Filters Five Filters offers a variety of content-related tools that may be of interest. The one I use most heavily (and the one used in my "Read It Later" solution) is called "Push to Kindle". Don't worry, despite the name a Kindle is not required. If you submit the URL of web page to this tool, "Push to Kindle" creates a nicely formatted .epub, .mobi or .pdf which can be emailed to your Kindle device (hence the "Push to Kindle" name) or downloaded to your computer. (Note that if you prefer to run "Push to Kindle" on your own server, an open source release is coming soon. Pandoc (optional) Pandoc is an open source document converter. It is a very powerful (albeit complicated) tool. I use it as a backup to the Five Filters downloads, since for some unknown reason images are stripped from the Five Filters epubs. For some of the websites I follow, the images are very important (eg, financial charts) so I use Pandoc to generate epubs for them. The downside is that the outputted epubs are not nearly as pretty as their Five Filters counterparts. I'm sure that this could be fixed with stylesheets etc but I have not looked into this. Again, using Pandoc is entirely optional. It is available for a wide variety of platforms. If you need to install from source (this won't apply to most people), I recommend creating a "relocatable binary". A Simple Script Here is a simple script I wrote to use these tools together: Code:
#!/bin/bash # a simple script to download an epub version of a given web page from http://fivefilters.org/kindle-it/ # or (optionally) generate an epub version of the given web page using Pandoc (http://pandoc.org/) # change the next line to the absolute output path where you would like the epub to be saved inlcuding the trailing '/' savepath="$HOME/Documents/" # OPTIONAL: the absolute path to the list of domains for which you want epubs with images (less pretty output) # Use one fully qualified domain name (https://en.wikipedia.org/wiki/Fully_qualified_domain_name) per line. # Pandoc must be installed to use this feature. pandoclist="$HOME/.config/pandoclist" now=$(date +"%s") # store the current time url=$1 # store the input URL furl=${url#*://} # remove the 'http://' or 'https://' from the input URL domain=$( echo "$furl" |cut -d/: -f1 ) # get the domain for checking against Pandoc list # the next line contains the options to pass to Five Filters durl='http://fivefilters.org/kindle-it/send.php?context=download&format=epub&url=' durl+=$furl # construct the full URL of the epub request URL oname=$(basename $url) # save the last part of the URL, which we will use to name the epub oname="${oname%.*}" # remove the file extension (eg .html) oname+=-"$now" # add a timestamp to prevent overwriting of files with same name oname+='.epub' # add the .epub file extension to the output name opath=$savepath$oname # define the absolute path to the output file if grep -Fxq $domain $pandoclist # check for match in the Pandoc list then pandoc -r html $url -t epub -o $opath # generate the epub and store it in the specified directory else wget -b -q $durl -O $opath # download the epub and store it in the specified directory fi
Now, at the press of a couple of buttons on your computer, any URL you desire will be turned into an epub and automatically send to your KOreader device(s). Enjoy! Suggested improvements or alternative approaches welcome. Last edited by gummihuhn; 01-23-2016 at 09:10 AM. Reason: formatting |
01-24-2016, 06:26 AM | #2 |
Guru
Posts: 906
Karma: 149877
Join Date: Jul 2013
Location: Netherlands
Device: Cracked HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
|
I usually use rsync on linux. Had a look at pandoc quite some time ago. Is it already a viable solution for Latex to epub without loss of formatting?
|
Advert | |
|
01-24-2016, 08:05 AM | #3 |
Enthusiast
Posts: 35
Karma: 28904
Join Date: Aug 2015
Device: none
|
rsync is a nice tool, which I use a lot. But once you want to do two-way sync and/or more than two devices are involved, I find Syncthing works really well.
Sorry to say, I don't know. I've only used Pandoc for converting HTML to epub (haven't looked into what intermediate formats are used for that), and in that use case there is definitely a loss of formatting. I only use it for a couple of sites I follow where images are important, and for me the output is "good enough"-- at least until I find a tool that works better for this. Generating PDFs from the HTML is another option, which I may experiment with when I get some free time. |
01-25-2016, 04:26 AM | #4 |
Evangelist
Posts: 444
Karma: 1084584
Join Date: Aug 2007
Location: Sisak, Croatia
Device: Kobo Aura H2O, Kobo Aura ONE
|
You can check https://dotepub.com/
They also offer easy conversion of web pages into epub with images, but this also doesn't give great result. As I didn't tried Pandoc and don't know how bad result with it is, maybe dotepub gives same or similar result. Please check and share how it works for you. |
01-25-2016, 06:56 AM | #5 | |
Enthusiast
Posts: 35
Karma: 28904
Join Date: Aug 2015
Device: none
|
Quote:
I did look at dotepub. Using the bookmarklet, I got better results than I've been getting with Pandoc. Unfortunately, it appears that the only way to use dotepub programmatically is to use their API, which requires you to parse the HTML yourself. If I parse the HTML, I've already solved the formatting issues with Pandoc, so dotepub doesn't offer much of an advantage. If I can figure out how to grab the "Printer Friendly Format" link from sites where images are important and pass that URL to Pandoc, that should solve the problem. This probably requires site-specific configurations or "recipes", which I may play with at some point, but this isn't a huge priority for me at the moment. |
|
Advert | |
|
01-27-2016, 05:22 PM | #6 | |
Enthusiast
Posts: 35
Karma: 28904
Join Date: Aug 2015
Device: none
|
Quote:
An example of the current output is attached. To customize output content for the source website of that epub, here is all I needed: Code:
.date {display:none;} .tophat {display:none;} .persistent-header-placeholder {display:none;} .lede-headline {display:none;} .social-share {display:none;} .article-rail {display:none;} .terminal-tout {display:none;} .read-this-next {display:none;} .article-tags__tag {display:none;} .article-tags__tag-link {display:none;} .unsupported-browser {display:none;} .footer {display:none;} .footer__container {display:none;} There are still some obvious improvements to be made, but I like the progress. After I've had some time to clean up my script and write up a how-to (probably this weekend), I'll post the updated script with instructions in case anyone is interested in trying it. |
|
01-30-2016, 10:48 AM | #7 |
Enthusiast
Posts: 35
Karma: 28904
Join Date: Aug 2015
Device: none
|
I've reworked my script to make it quite a bit more flexible. As part of that it has become two different scripts.
I haven't yet had time to write up a how-to for customizing website-specific output from Pandoc. I hope to do that over the next week or so, and to push out a few more website-specific formatting rules. I'm mainly only doing this to meet my own needs (other tools just weren't cutting it for me), but if you do give it a try, feedback and suggestions are welcome. You can see the latest iteration and follow future developments here: https://github.com/0r0/klemheist |
02-10-2016, 07:09 PM | #8 |
Addict
Posts: 295
Karma: 2139988
Join Date: Nov 2014
Device: bookeen
|
wallabag works great for me but epub export is not automatic. I tend to use wallabag client on Android.
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
"Koreader" and "Coolreader" Prefixes | Ken Maltby | Feedback | 3 | 05-21-2015 06:39 PM |
Touch How transfer "Books Read" and "Hours Read" data | Abrakadabra77 | Kobo Reader | 5 | 02-16-2015 03:30 AM |
Koreader plugin "Calibre Companion" | chaley | Kobo Developer's Corner | 4 | 12-21-2014 05:05 PM |
How to remove "Fully read" books from "Last Open" list? | pjeanetta | PocketBook | 4 | 12-08-2010 10:30 AM |