Couple weeks ago I changed my workflow regarding reading/sending emails. So to have more control over my emails I started using offlineimap which can download your emails offline to a directory on your filesystem. I then used Mu which index this directory so you can search for emails offline using some queries, like “show me unread emails from inboxes” or you can search for a word in all your emails from all inboxes. Over all of that I used Mu4e which is an email client inside Emacs (my default editor). As I’m using Spacemacs So I added a binding that opens Mu4e using SPC M
and boom I can see all my emails, I can search with s
and I have a bookmark that shows all unread emails using bi
.
Now I want the same for RSS.
Here is the problem, I made some research and couldn’t find similar tools that does the same for RSS, although it would be easier, there are no authentication required like IMAP/SMTP servers, So I spent an hour or so writing a small script that does the same as offlineimap
, this script on my machine is called offlinerss
, it’s the first piece of the puzzle and it looks like that
1#!/usr/bin/env ruby
2# frozen_string_literal: true
3
4require 'bundler/inline'
5require 'open-uri'
6require 'fileutils'
7require 'digest'
8require 'yaml'
9
10gemfile do
11 source 'https://rubygems.org'
12 gem 'rss'
13end
14
15def mkdir(*paths)
16 path = File.join(*paths)
17 FileUtils.mkdir(path) unless Dir.exist?(path)
18 path
19end
20
21destination = mkdir(File.expand_path('~/rss/'))
22inbox = mkdir(destination, 'INBOX')
23meta_dir = mkdir(destination, '.meta')
24
25config_file = File.join(destination, 'config.yml')
26config = YAML.load_file(config_file)
27urls = config['urls']
28
29urls.each do |url|
30 url_digest = Digest::SHA1.hexdigest(url)
31
32 URI.open(url) do |rss|
33 content = rss.read
34 feed = RSS::Parser.parse(content)
35
36 feed.items.each do |item|
37 id = item.respond_to?(:id) ? item.id : item.guid
38 id_digest = Digest::SHA1.hexdigest(id.content)
39 file_basename = url_digest + '-' + id_digest + '.xml'
40
41 next unless Dir.glob(File.join(destination, '**', file_basename)).empty?
42
43 filename = File.join(inbox, file_basename)
44 File.write(filename, item.to_s)
45 end
46
47 [{ start_tag: '<entry>', end_tag: '</entry>' }, { start_tag: '<item>', end_tag: '</item>' }].each do |tag|
48 next unless content.include?(tag[:start_tag])
49
50 content[content.index(tag[:start_tag])...(content.rindex(tag[:end_tag]) + tag[:end_tag].length)] = ''
51 end
52
53 metafile = File.join(meta_dir, url_digest + '.xml')
54 File.write(metafile, content)
55 end
56end
I have a small config file in ~/rss/config.yml
which has all the URLs I care for, so far just ruby/rails/go main blogs to be alerted by the latest versions.
1urls:
2 - https://server.tld/feed.rss
3 - https://server.tld/feed.atom
This just reads the URLs, and saves each entry to a file on your machine ~/rss/INBOX
if the file doesn’t exist in any sub directory in ~/rss
. Then it removes all entries/items from the feed and save the rest to ~/rss/.meta
.
The file names of the RSS item is sha1(url)-sha1(item.id).xml
and the meta file name is sha1(url).xml
very simple.
So now I need to write a client that reads files in ~/rss/
and render the XML files and some actions to create directories under ~/rss/
and actions to move the file to another directory after it’s read or the user want to move it to read-later
or something, just like emails directories.
Another piece of the puzzle is indexer like what Mu
does for the emails.
I was surprised that it was easy to just sit down and write the thing for myself, than searching for days for a solution.