A Successful Experiment in Self-Hosted RSS Reading

For just over a month, my RSS reading has been self-hosted. Usually I’d write about this kind of thing because there was an interesting challenge or something that I learnt in the process, but it has basically been a completely transparent change.

I’m still using NetNewsWire to do the actual reading, but I’ve replaced Feedly with FreshRSS running on my home server (well, one of them).

I didn’t really have any problems with the quality of the Feedly service—they fetch feeds without any issues and most apps support their API, and their free tier is very generous. I’ve had my Feedly account for years. However they use their feed-scraping tools to provide anti-union and anti-protest strikebreaking services, which is a bit gross to say the least.

The ease of moving between RSS services is really what makes this an easy project, as Dan Moren wrote on Six Colours it’s as simple as exporting the OPML file that includes all the feed URLs, and importing that into another service. Dan ended up using the local feed parser offered by NetNewsWire, but I’m morally opposed to having my phone do periodic fetches of 611 feeds when I have a computer sitting at home that could use its wired power and internet to do this work.

NetNewsWire supports pulling from FreshRSS, which is an open-source self-hosted feed aggregator. It supports running in a container, so naturally all I needed to do was add the config to a pod file:

freshrss:
  name: freshrss
  remote: steve
  image: docker.io/freshrss/freshrss:alpine
  interactive: false
  ports:
    4120: 80
  environment:
    TZ: Australia/Sydney
    CRON_MIN: '*/15'
  volumes:
    freshrss_data: /var/www/FreshRSS/data
    freshrss_extensions: /var/www/FreshRSS/extensions

You just do some basic one-time setup in the browser, import your OPML file, add the account to NetNewsWire, and you’re done.

The most annoying thing is a very subtle difference in how Feedly and FreshRSS treat post timestamps. Feedly will report the time that the feed was fetched, whereas FreshRSS will use the time on the post. So if a blog publishes posts in the past or there is a significant delay between publishing and when the feed is fetched, in Feedly the post will always appear at the bottom of the list, but FreshRSS will slot it in between the existing posts. I want my posts to always appear in reverse chronological order so this is a bit annoying.

An example of a website where the times on posts are not accurate is this very website! I don’t bother putting times on posts—just dates—since in 10 years of posts I only have two posts that are on the same day. Feedly assigns a best-guess post of when the post was published (when Feedly first saw it) whereas FreshRSS just says they were published at midnight. Which isn’t too far from the truth, as it’s half past ten as I write this.

To avoid exposing FreshRSS to the outside world, it’s only accessible when I’m connected to my VPN, so I don’t have to worry about having a domain name, SSL cert, secure login, and all that.

I haven’t had any reliability issues with FreshRSS yet, obviously the biggest disadvantage is that I’m signing myself up to be a sysadmin for it, and the time that it will break is when I’m away from home without my laptop.

  1. As of the time of writing, that is. 


Scalability and Capability

I thought of this as a single topic, but when I started writing it I realised that I was really thinking about two different things—scalability and capability—but after writing half of this I also realised that the broader idea that I’ve been thinking about needs to include both. So let’s start with:

Scalability

Desktop operating systems are able to scale to cover so many use-cases in part by their open nature, but also because of the incredible flexibility of windowed GUIs. Every modern mainstream OS has a window manager that works in the same basic way—you have a collection of rectangles that can be moved around the screen, and within each rectangle there are UI elements.

Floating windows is such a good abstraction that it can be used on a huge range of display sizes. My netbook with a tiny 10” screen used the same system as my current 13” laptop. If I connect a huge external monitor, the interactions remain the same—I’ve just got more space to put everything.

What’s really amazing is that there has been almost no change in the window metaphor since their inception. I’m not a computer historian, but I know that if you time-travelled and showed any modern desktop OS to someone using Windows 98 (which ran on the first computer that I used), they would be quite at home. The visual fidelity, speed, and some rearranging of UI elements might be a bit jarring, but “move this window over there” and “make that window smaller” work in the exact same way.

Characterising it as no changes is obviously selling it short. The best change to the core windowing metaphor is the addition of virtual desktops. It fits in to the system really well; instead of having windows be shown on the screen, we just imagine that there are multiple screens in a line, and we’re just looking at one of them. In the relationship of “computer” to “windows” we’re just adding a layer in the middle, so a computer has many desktops, and each desktop has many windows. The best part is that the existing behaviour can just be modelled as a single desktop in this new system.

The difficulty is that this introduces a possibility for windows being “lost” on virtual desktops that aren’t currently visible on the screen. Most window managers solve this by adding some kind of feature to “zoom out” from the desktop view, and show all the virtual desktops at once, so you can visually search for something you misplaced. MacOS calls this “Exposé” and I use it constantly just to swap between windows on a single desktop.

Tablets haven’t yet managed to re-invent window management for a touch-first era. Allowing multitasking while not breaking the single-full-screen-app model is exceptionally challenging, and what we’ve ended up with is a complicated series of interdependent states and app stacks that even power-users don’t understand. Even the iPad falls back to floating windows when an external monitor is connected, as being limited to two apps on a screen larger than 13” is not a good use of screen real estate.

Capability

Something simultaneously wonderful and boring about computers is that while they continue to get better over time, they don’t really do anything more over time. The computer that I bought from a recycling centre for $20 basically does the same things as the laptop that I’m using to write this very post.

On my netbook I could run Eclipse1 and connect my phone via a USB cable and be doing Android development using the exact same tools as the people that were making “real” apps. Of course it was incredibly slow and the screen was tiny, but that just requires some additional patience. Each upgrade to my computer didn’t fundamentally change this, it just made the things I was already doing easier and faster.

Of course at some point you cross over a threshold where patience isn’t enough. If I was working on a complicated app with significantly more code, the compilation time could end up being so long that it’s impossible to have any kind of productive feedback loop. In fields like computer graphics, where the viewport has to be able to render in real-time to be useful, your computer will need to reach a minimum bar of usability.

However in 2020 I did manage to learn how to use Blender on my 2013 MacBook Air. It could render the viewport fast enough that I could move objects around and learn how to model—so long as the models weren’t too high detail. Actually rendering the images meant leaving my laptop plugged in overnight with the CPU running as hard as it could go.

All those same skills applied when I built a powerful PC with a dedicated graphics card to run renders faster. This allowed me to improve my work much faster and use features like volumetric rendering that were prohibitively slow running on a laptop.

A computer render of a small cabin in a foggy forest with a radio mast next to it with sunlight shining through the trees

Rendering the fog in this shot would likely have taken days on my laptop, but rendering this at ultra-high quality probably took less than an hour.

I really appreciate using tools that have a lot of depth to them, where the ceiling for its capabilities is vastly higher than you’ll ever reach. One of the awesome things about learning to program is that many of the tools that real software engineers use are free and open source, so you can learn to use the real thing instead of learning using a toy version. This is one of the reasons I wanted to learn Blender—it’s a real tool that real people use to make real movies and digital art (especially after watching Ian Hubert’s incredible “lazy” tutorials). There are apps that allow for doing some of this stuff on an iPad, but none are as capable or used substantially for real projects.

It’s not just increases in processing speed that can create a difference in capability. My old netbook is—in a very abstract way—just as able to take photos as my phone. The only difference being that it had a 0.3MP webcam, and my phone has a 48MP rear-facing camera. The difference in image quality, ergonomics, and portability make the idea of taking photos on a netbook a joke and my phone my most-used camera.

Portability is a huge difference in capability, which has enabled entire classes of application to be viable where they were not before. There’s no reason you couldn’t book a taxi online on a desktop computer, but the ease and convenience of having a computer in your pocket that has sensors to pinpoint your location and cellular connectivity to access the internet anywhere makes it something people will actually do.

My phone is also capable of doing almost everything that a smartwatch does2, but it’s too big to strap to my wrist and wear day-to-day. The device has to shrink below a size threshold before the use-case becomes practical.

Of course the biggest difference between any of the “real computers” I’ve mentioned so far and my phone is that it has capabilities locked by manufacturer policy. It’s much more capable from a computing power standpoint than any of my older computers, and the operating system is not lacking in any major features compared to a “desktop” OS, but since the software that can run on it is limited to being installed from the App Store and the associated rules, if you wanted to write a piece of software you’d be better off with my netbook.

My iPad—which has just as much screen space as my laptop—can’t be used for full-on development of iPad applications. You can use Swift Playgrounds to write an app, but the app is not able to use the same functionality as an app developed on a Mac—the app icon doesn’t appear on the Home Screen, for example. If this was a truly capable platform, you would be able to use it to write an application that can be used to write applications. Turtles all the way down. On a desktop OS I could use an existing IDE like IntelliJ or Eclipse to write my own IDE that ran on the same OS, and then use that IDE to write more software. That’s just not possible on most new platforms.

“Desktop” operating systems are suffering from their own success—they’re so flexible that it’s completely expected for a new platform to require a “real computer” to do development work on for the other platform. This is a shame because it shackles software developers to the old platforms, meaning that the people that write the software to be used on a new device aren’t able to fully embrace said new device.

Once your work gets too complicated for a new platform, you graduate back to a desktop operating system. Whether that’s because the amount of data required exceeds that built into the device (a single minute of ProRes 4K from an iPhone is 6GB), or you need to process files through multiple different applications, you’re much less likely to hit a limit of capability on a desktop OS. So unlike me, you might start on one platform and then later realise you’re outgrowing it and have to start learning with different tools on a different platform.

Smartphones have made computing and the internet accessible to so many people, but with desktop operating systems as the more-capable older sibling still hanging around, there’s both little pressure to push the capability of new platforms, or to improve on the capabilities of older ones.

  1. This was before Android Studio, by the way. 

  2. The exception being that it doesn’t have a heart rate and other health-related sensors. 


40th Anniversary Macs

My current M1 MacBook Air, taken in BitCam

The Upgrade Podcast just did a special episode with panellists drafting various Mac-related things for the 40th anniversary of the original Macintosh. Here are my pics:

First Mac

I was looking for an upgrade to my Acer netbook, trawling through second-hand computers. This was in 2011. My main issue when I’m buying second hand computers is having something be predictable—I didn’t want to spend a bunch of money on something that turns out to be crap. I looked in the “Mac” section and realised that they weren’t that expensive. If I got a Mac, I’d know that it was going to be reasonably well-built, usable on battery, with a half-decent screen, keyboard, and trackpad.

The other advantage of buying a Mac was that it’s easy to know compatibility for installing Linux ahead of time. It’s a shame that the compatibility is “difficult”, but at least you know that up front.

In the end I bought a 2008 MacBook with a Core 2 Duo processor, 160GB hard drive, and 2GB of RAM. The bigger screen and better keyboard made everything easier, compared to my tiny netbook.

I used OS X on it for a while, before installing Ubuntu (I assume version 12.04) on it. I’d occasionally dual-boot but most of my time was spent using Ubuntu. This lasted until probably late 2012 when I realised that Minecraft performed much better on OS X than on Ubuntu, and so ended up spending more time back in OS X.

Favourite Mac

My dad upgraded his 2010 MacBook Pro to a MacBook Air—to reduce weight while travelling—and I got the Pro as a hand-me-down. This ended up being short-lived as he upgraded again to the 11” Air, and I got the previous 13” Air. That 2013 13” MacBook Air, by virtue of being the Mac I used the most and longest, is my favourite Mac. It was my first computer with an SSD, which gave it a huge speed boost compared to the MacBook Pro.

2013 was really when the Air became an awesome all-round computer. The advertised battery life was 12 hours (almost twice that of the previous generation which claimed 7 hours) which meant I could take it to university and leave the power brick at home. At a time when most people had their huge 15” ultra-glossy laptops tethered to a wall outlet, this was awesome.

In a post-dongle world it’s weird to remember the fact that I could plug in power, a mouse, keyboard, headphones, and a display all into the built-in ports on my “entry-level” “consumer” laptop.

Favourite Mac Software

The software that defined my use of the Mac in the 2010s was TextMate. It was the go-to editor for Rails development, and I used it almost exclusively from 2012 to 2017. I’d use an IDE for Java development, but everything else would be done in TextMate.

I still keep it installed in case I just need to do something quickly or wrangle some text with multiple cursors, but most of the time I’ll use Vim to make use of muscle memory and macros.

Favourite Mac Accessory

In 2015 I bought a Magic Trackpad on a bit of a whim. I’d been using the wireless Mighty Mouse when I was working at my desk, but I liked the idea of using a trackpad for everything and must’ve found a good deal on a second-hand one.

Since then I’ve been using trackpads almost exclusively. I replaced the first-generation Magic Trackpad in 2019 since I got sick of the AA batteries running out, and the second-generation trackpad has longer-lasting built-in batteries that can be charged while the trackpad is in use.

I’ve never had any significant RSI issues using the low-profile Magic Keyboard and Magic Trackpad, and so I’m hesitant to make any changes to a setup that works so well.

Hall of Shame

The worst Mac that I’ve used was the 2018 MacBook Pro (with Touch Bar) that I used at work. My first work laptop had to be replaced1 after the “b” key stopped working, but the replacement wasn’t that much better. I didn’t really mind typing on the low-travel butterfly keyboard, but I loathed having no gap between the arrow keys, which made feeling for them with the tips of my fingers more difficult.

In contrast to my experience with the amazing battery life on the 2013 Air, the battery life I would get from the Pro was abysmal. This is in no small part due to the types of work that I was doing on each machine—text editing is a lot less power-hungry than large video calls—but I came to resent the fact that the fans would constantly be maxed out and the battery wouldn’t last through even one hour of meetings.

Thankfully in 2022 I was able to replace this with an M1 MacBook Pro, which has amazing battery life, no fan noise, and never stutters no matter how many browser tabs I have open.

My current personal laptop is an M1 MacBook Air, which I am using to write this post.

  1. Replaced from my perspective, it was evidently easier to just give me a new laptop rather than have me wait on a repair—as much as I would have wanted to keep the exact machine with all my stickers on it. 


The Code-Config Continuum

At some point you’ve probably written or edited a config file that had the same block of config repeated over and over again with just one or two fields changed each time. Every time you added a new block you’d just duplicate the previous block and change the one field. Maybe you’ve wished that the application you’re configuring supported some way of saying “configure all these things in the same way”.

What this is exposing is an interesting problem that I’m sure all sysadmins, devops, SREs, and other “operations” people will appreciate deeply:

Where should something sit on the continuum between config and code?

This follows on from the difficulty of parsing command-line flags. Once your application is sufficiently complex, you’ll either need to use something that allows you to write the flags in a config file, or re-write your application to be configured directly from a config file instead of command-line arguments.

The first logical step is probably to read a JSON file. It’s built-in to most modern languages, and if it’s not then there’s almost certainly a well-tested third-party library that does the job for you. You just need to define the shape of your config data structure (please define this as a statically-typed structure that will fail to parse quickly with a good error message, rather than just reading the config file as a big JSON blob and extracting out fields as you go, setting yourself up for a delayed failure) and you’re all set.

This file will inevitably grow as more options and complexity are added to the application, and at some point two things will happen: firstly someone who hasn’t dealt with tonnes of JSON will ask why they can’t add comments into the config file, and someone will write a script that applies local overrides of configuration options by merging two config files to allow for easier development for a local environment.

To remedy the first issue you could probably move to something like YAML or TOML. Both are designed as config-first rather than object-representation-first, and so support comments and some other niceties like multi-line strings.

If you stuck with JSON or chose to use TOML, you’ll soon end up with another problem: you need to keep common sections in sync. Say you have something like a set of database connection configs, one for production and one for development (a good example is a Rails database.yml file). You want to keep all the boring bits in sync so that development and production don’t stray too far from one another.

I run into this with my pods.yaml config files. The program I wrote to track helicopter movements around the Sydney beaches has five different container configurations that I can run, all of them need the a handful of common flags:

flags:
  timezone: Australia/Sydney
  point_a:
    lat: -34.570
    long: 152.397
  point_b:
    lat: -32.667
    long: 149.469
  http_timeout: 5s

If this was JSON or TOML I would have to repeat that same block of config five times, and if I ever changed the area I was scanning, I would have to remember to update each place with the same values.

However, YAML is a very powerful config language; you can capture references to parts of the config and then re-use them in other parts of the file:

flags: &default-flags
  timezone: Australia/Sydney
  point_a:
    lat: -34.570
    long: 152.397
  point_b:
    lat: -32.667
    long: 149.469
  http_timeout: 5s

containers:
  my-container:
    name: test-container
    flags:
      <<: *default-flags
  my-other-container:
    name: second-test-container
    flags:
      <<: *default-flags

Here I use default-flags to set the flags attribute of both containers to the exact same value.

This is quite powerful and very useful, but there are still plenty of things that you can’t express: mathematical operations, string concatenation, and other data transformations. I can’t redefine how I write the configuration to be completely different to what the program that’s parsing the YAML expects.

# Reference a field, and transform it
field: new-$another_field
# Grab an environment variable
field: $USER
# Do some arithmetic using a field
field: 2 * $other_field
# A simple conditional
field: $PRODUCTION ? enabled : disabled

Some things that you can’t do in YAML.

That being said, YAML is far from simple:

The YAML spec is 23,449 words; for comparison, TOML is 3,339 words, JSON is 1,969 words, and XML is 20,603 words. Who among us have read all that? Who among us have read and understood all of that? Who among us have read, understood, and remembered all of that? For example did you know there are nine ways to write a multi-line string in YAML with subtly different behaviour?

Martin Tournoij: YAML: probably not so great after all

YAML is full of surprising traps, like the fact that the presence or absence of quotes around a value changes how it is parsed and so the country code for Norway gets parsed as the boolean value false.

Even if you decide that the power of YAML is worth these costs, you’re still going to run into a wall eventually. noyaml.com is a good entrypoint to the world of weird YAML behaviour.

As your application becomes more complex—or as the interdependence of multiple applications becomes more complex—you’ll probably want to split the config into multiple files1.

A classic example would be doing something like putting all the common flags that are shared between environments in one file, and then the development, staging, and production configurations each in their own file that reference the common one. YAML has no way of supporting this, and so you’ll end up writing a program that either:

  • concatenates multiple YAML files before sending them to the application to be parsed
  • parses a YAML file and reads attributes in it to define a rudimentary #include system
  • generates a single lower-level YAML config file that is given to the application based on multiple higher-level config files

And of course whichever option you chose will be difficult to understand, error-prone, hard to debug, and almost impossible to change once all it’s idiosyncrasies are being relied upon to generate production configuration.

The sensible thing to do—of course—is to use an existing configuration language that is designed from the ground up for managing complex configuration, like HCL. HCL is a language that has features that look like a declarative config (“inspired by libucl, nginx configuration, and others”) but is basically a programming language. It has function calls, conditionals, and loops so you can write an arbitrary program that translates one config data structure into another before it gets passed to an application.

This is all very good, but now you’ve got another problem: you need to learn and use another programming language. At some point you’re going to say “why doesn’t this value get passed through correctly?” and the solution will be to debug your configuration language. That could involve using an actual debugger, or working out how to printf in your config language.

Chances are pretty high that you’re not very good at debugging this config language that you don’t pay much attention to, and the tooling for debugging it is probably not as good as a “real” programming language that’s been around for 29 years.

If you’ve done any Rails development, then you’ve come across Ruby-as-config before. Ruby has powerful metaprogramming features that make writing custom DSLs fairly simple, and the Ruby syntax is fairly amenable to being written like a config language. If there is a problem with the config then you can use familiar Ruby debugging tools and techniques (assuming you have some of those), but the flip side is that the level of weird metaprogramming hacks required to make a configuration “readable”—or just look slick—are likely outside of the understanding of anyone not deeply entrenched in weird language hacks.

Of course you’re free to choose whichever language you like, they’re all fairly capable of taking some values and translating them to a data structure that the end application can ingest. You could even write your config in Java.

There are a lot of additional benefits to using a real programming language to write your configuration. As well as abstracting away configuration details, you can add domain-specific validation that doesn’t need to exist in the application (perhaps enforcing naming conventions just for your project), or dynamically load config values from another source—perhaps even another config file—before they are passed into the application.

The next iteration is when the config continues to increase in complexity2, and so you decide to make some kind of tool that helps developers make common changes. Adding and removing sections is the obvious use-case. Strictly speaking it doesn’t have to be due to the config being complex, it could just be that you want some automated system to be able to edit the files.

Your problem is that you have no guarantees about the structure of the config. Since it’s a general-purpose programming language, details could be scattered anywhere throughout the program. With JSON, it’s super easy to parse the file, edit the data, and write a well-formatted config back out—you just have to match the amount of indentation and ideally the order of keys too. Doing this for most programming languages is much more difficult (just look at the work that has gone into making rubyfmt).

Even if you can parse and output the config program, the whole point of using a general-purpose language was to allow people to structure their configs in different ways, so to make a tool that is able to edit their configs, you’re going to have to enforce a restricted format that is easier for a computer to understand and edit.

So if you’ve got an application that expects a config file with hostnames and ports in a list, something like this:

[
  {
    "hostname": "steve",
    "port": 4132
  },
  {
    "hostname": "brett",
    "port": 5314
  },
  {
    "hostname": "gavin",
    "port": 9476
  }
]

The simplest translation to a Ruby DSL could look like:

[
  host {
    hostname "steve"
    port 4132
  },
  host {
    hostname "brett"
    port 5314
  },
  host {
    hostname "gavin"
    port 9476
  }
]

If someone was deploying this to a cloud service, they might not want to write all that out, so their config might look like:

zones = ["us-east-1", "us-west-2", "au-east-1", ...]
STANDARD_PORT = 4123

zones.map do |zone|
  host {
    hostname "host-#{zone}"
    port STANDARD_PORT
  }
end

A program that has to edit these files to “add a new host” basically has to understand the intent behind the whole file3. This is an exceptionally difficult job. I read a book about robots as a child that likened computer speech to squeezing toothpaste out of a tube, and speech recognition to pushing the toothpaste back into the tube. Creating the config is like squeezing the toothpaste, having a computer edit the config is like putting the toothpaste back.

There are two paths you can take from here: double down on the programming language and build higher-level abstractions over the existing config to remove the need for the computer to edit the files, or move towards stricter formats for config files to allow computers to edit them.

You’re being forced to pick a position on the code-config continuum, between something that’s bad for people but good for computers, and something that’s better for people and bad for computers. There’s no right answer, and every option trades off between the two ends of the spectrum.

  1. I can imagine a student or junior developer reading this and thinking “when would your configuration ever get too big for one file?”. Trust me, it does. 

  2. It’ll happen to you one day! 

  3. Maybe an LLM could get us there most of the time? 


I'm Dumbfounded by Discord

I find Discord baffling. Not in its popularity in group messaging for a class, team, or friend group—it seems fine at that—but the other, larger use cases.

In 2020 and 2021 I learnt how to create digital art in Blender, the 3D modelling software. I watched both Clinton Jones’s videos (who I had been following from his time at RocketJump and Corridor Digital) and Blender Bob. It was Clinton’s work and the videos showing his process where I learnt that you could use computer graphics without ever thinking about video or “VFX”—that’s just where I was exposed to these ideas initially. His Instagram has a mix of both film photography and rendered computer graphics, but since he targets the same aesthetic in both, it’s often hard to tell at a glance which is which.

Anyway. Both of these creators have Discord servers where subscribers could chat, share their work, and potentially get some guidance from people in the community or the creator themselves. When I joined, both were open for anyone to join, but I think that now Clinton’s Discord is for Patreon supporters only.

This is where the bafflement comes in. Discord is designed as a synchronous messaging system. You can obviously view or reply to messages at any time, but the interface expects you to read messages almost as soon as they are received, and reply immediately or never.

For a team or group of friends this makes sense, you’re probably all in the same timezone and share a similar schedule. If you’re not, then at least the group is probably small enough that it’s easy to catch up on anything that you missed. Discords for “fan communities” are basically the exact opposite—they’re large and highly trafficked. The time difference is exacerbated by me being in a significantly different timezone than the typical North American audience.

The experience that I would have was every time I checked the servers, there would be at least tens—if not hundreds—of new messages in every channel, with topics of conversation shifting multiple times. Any attempt to ask a question or have a conversation is drowned out in the noise of additional messages and threads.

The Discord app just isn’t designed for reading all the messages. Even if I treated the server as a read-only experience (much like I do with Mastodon1), it’s difficult to go through and look at the history of a channel. If you do, you’re going to be reading it backwards as the app probably isn’t going to perfectly preserve your scroll position (something that I’m especially keen on).

It seems to me that these Discord servers have a few roles; a support forum, a showcase of work, and a space for informal discussion.

You know what works really well as a support forum? An actual forum with first-class support for topics, threads, and detailed discussion that can happen asynchronously as the question-asker works through their problem. As someone that remembers a time before Stack Overflow, it seems like people have collectively forgotten the experience of describing your problem on a forum, and then a day later having a kind and knowledgable person ask you to give them some more information so they can pin down the solution.

I’ve seen it mentioned on Mastodon that some software projects use Discord in lieu of a support forum or documentation, which I find absolutely baffling as trying to find something that someone mentioned within a chat conversation—and understanding all the surrounding context, while filtering out any unrelated noise in the channel that was happening alongside it—seems completely impossible. Those conversations are also not going to be indexed by a search engine, so people that aren’t aware of the Discord are almost certainly not going to stumble across it while searching for information about a problem they’re having.

If the infamous discussion about whether there are 7 or 8 days in a week had happened on Discord, I wouldn’t be able to effortlessly find it 16 years later with a single search.

The other two use-cases—showcasing work and having informal discussions—are less well suited to forums, but I think they’d still be passable if implemented that way. However, the actual point of this whole post was to propose an alternative for this kind of fan community: a private Mastodon server.

As web creators move towards sharing their work on their own terms, rather than via an existing platform (an example), a suitably tech-focussed2 creator could offer membership on a private Mastodon server as a perk of being a supporter.

Mastodon’s soft-realtime and Twitter-like flat-threaded structure give it a nice balance of working reasonably well for quick conversations as well as time-delayed asynchronous communication. Since the instance would be private, the “local” timeline would just contain posts made by the community, allowing members to see everything, or create their own timeline by following specific people or topics.

Ideally, Mastodon clients would allow mixing and merging accounts into a single timeline—so I could have the accounts I follow from my main account and accounts on this private instance show up in the same timeline, so I don’t have to scroll through two separate timelines.

The biggest challenge would obviously be explaining that you’re signing up to an instance federated social media platform that has disconnected itself from the federated world in order to provide an “exclusive” experience only for supporters of the creator.

I don’t think that Mastodon will reach a level of mainstream success that such a niche use of it could be anything but a support headache, but it’s interesting to think how open platforms could be re-used in interesting ways.

  1. Go on, toot me. 

  2. Read this as “willing to put up with the complexities of Mastodon and able to understand the nuance of having a de-federated instance of a federated system”. 


Build and Install Tools Using Containers

Another challenge in my quest to not have any programming languages installed directly on my computer is installing programs that need to be built from source. I’ve been using jj in place of Git for the last few months1. To install it you can either download the pre-build binaries, or build from source using cargo. When I first started using it there was a minor bug that was fixed on main but not the latest release, so I needed to build and install it myself instead of just downloading the binary.

Naturally the solution is to hack around it with containers. The basic idea is to use an base image that matches the host OS (Ubuntu images for most languages are not hard to come by) and build in that, and only copy the executable out into the host system.

To install jj and scm-diff-editor I make a Containerfile like this:

FROM docker.io/library/rust:latest
WORKDIR /src
RUN apt install libssl-dev openssl pkg-config
RUN cargo install --git https://github.com/martinvonz/jj.git --locked --bin jj jj-cli
RUN cargo install --git https://github.com/arxanas/git-branchless scm-record --features scm-diff-editor
COPY install.sh .
ENTRYPOINT /src/install.sh

This just runs the necessary cargo commands to install the two executables in the image. The install.sh script is super simple, it just copies the executables from the image into a bind-mounted folder:

#!/bin/bash
for bin in jj scm-diff-editor; do
  cp "$(which "$bin")" "/output/$bin"
done

So the last part is just putting it all together with a pod config file:

images:
  jj-install:
    tag: jj-install:latest
    from: Containerfile
    build_flags:
      cache-ttl: 24h

containers:
  install:
    name: jj-install
    image: jj-install:latest
    interactive: true
    autoremove: true
    bind_mounts:
      ~/.local/bin: /output

I can then run pod build to create a new image and build new executables with cargo. Then pod run the container to copy them out of the image and into the $PATH on my host system.

This is the same approach I used for the automatic install script for pod itself—except using podman commands directly rather than a pod config. I’ve done the same thing to install rubyfmt since that is only packaged with Brew, or requires Cargo to build from source.

I’m sure at some point an incompatibility between libraries inside and outside of the container will create a whole host of bizarre issues, but until then I will continue using this approach to install things.

  1. Short review, it’s good but has a long way to go. Global undo is excellent, and I like the “only edit commits that aren’t in main yet” workflow. 


Jekyll Blog Tips

This site that you’re reading now is generated by Jekyll and hosted on GitHub Pages. Originally when I set this site up, GitHub Pages only supported their own limited set of plugins, and if you wanted to do anything extra you had to generate the HTML content yourself. In the interim, you can now write a custom GitHub Action that builds the site, allowing you to run arbitrary code during generation.

In an effort to keep things simple and avoiding the temptation to write my own site generator, I have stuck with the basic deploy-on-push system with the standard set of plugins. This has worked fairly well, and the downsides are fairly minor—for example the version of the Rogue syntax highlighter that is used is a few years old and doesn’t know about Swift’s async or await keywords. This is not an issue unless you write a long blog post about concurrency.

I have of course worked out a variety of ways to maximise my use of this constrained environment.

Default Post Attributes

Jekyll allows you to specify the default front matter attributes in the config file. Previously whenever I would read these attributes in a template I would check if they were empty, and put the default right there in the template. Being able to configure a default makes this much easier. The defaults set the layout and OpenGraph metadata.

404 Page

Originally GH Pages didn’t support a custom 404 page (instead just delivering a generic one common to all sites) but you can now create a 404.md file and tell people they’re looking for something that doesn’t exist. This is what mine looks like.

Make The Most of Layouts and Includes

There are four places that posts can appear on the website; the actual post page, the index page, and the two feeds (RSS and JSON). I’m sad to say that despite Liquid supporting re-using files, I just copy-pasted the content of the post header between the index page and the post layout. There were definitely a few times where I was making edits to one and getting confused why I didn’t see any changes to the site.

What I do now is much better, I have a template in _includes for the HTML version of the post that has the styled title and post metadata. This is used on the homepage and individual post pages. The post page is a custom layout that adds a footer that I only want when viewing a single post. The two feeds use a separate template that omits the header (since RSS readers will make a header themselves) but adds a small footer that isn’t present on the HTML version.

The trick with getting this to work was that Jekyll stores the post information in different variables depending on whether you were rendering a page or a single post. A layout uses {{ content }} to inject the content of the page, but in the index page you’ve got multiple posts, each with their own content that’s accessed with {{ post.content }}. I don’t think you can pass variables to templates, but variables in Liquid templates are seemingly all global anyway, so you can just assign to post and use that in the layout. Now anywhere I need to include a post is just:

# index.html
{% for post in paginator.posts %}
  {% include post.html %}
{% endfor %}

# _layouts/post.html
{% assign post=page %}
{% include post.html %}

Data in _config.yml

The content of _config.yml is basically mapped directly to the site object, so you can define additional configuration knobs instead of setting them multiple times across the site. I use this to define a single date format that is used wherever a human-readable date is shown. I set date_format: "%B %-d, %Y" in the config and whenever I show a date I can access that format: {{ post.date | date: site.date_format }}.

I also use this for some common URLs—not because I’m likely to change them, but to avoid me mistyping them. Or you can dump data directly from the config file into a page, as I did with the webfinger Mastodon trick earlier this year.

Jekyll Admin

Jekyll Admin is a web UI that allows you to edit posts and pages, as well as upload files. Since I write on my laptop but run the Jekyll dev server on my home server, this avoids some awkward scp-ing by allowing me to just paste my posts into a webform.

The killer feature would be for it to have a basic Git integration, so you could commit changes and push them to a remote right from the admin interface. Alas the project isn’t there yet.

I don’t know if it’s a problem with the version of Jekyll that I run (I use locally whatever version GH Pages is using for consistency) but the admin interface shows constant errors when you save a post—despite it never actually failing to do anything. It’s still more convenient than scp, but definitely doesn’t inspire confidence.

Adding this very post to Jekyll Admin showed an error banner that said “Error: Could not update the doc”. The doc had updated without any problems.

Development-Only Content

Liquid has conditional expressions, and Jekyll has jekyll.environment. Smash these two together and you can add extra information that you only want visible when you run the website locally. For example I have a link to Jekyll Admin show as an additional link in the status bar, and every post has an “Edit” link that takes me directly to the Jekyll Admin edit interface for that post. Since the site is statically generated, that information obviously not just hidden on the real site—it’s completely gone.

A Jekyll issue that’s made worse with Jekyll Admin is the handling of the site URL. If you want to listen on all interfaces—because you’re developing in a container or running the Jekyll dev server on a different machine than the one you want to view the website on—then you set host: 0 either in _config.yml or via command-line arguments. The problem is that this overrides site.url, so any absolute URL will be http://0:80/my_url which is meaningless. Jekyll doesn’t allow you to set the host without overriding the site URL, and Jekyll admin generates a bunch of these URLs that don’t work properly.

Run it in a Container

My website was actually one of the first things that I containerised and saw a real benefit. Even though Ruby environment management is a pretty well-trodden area, I still would run into dependency issues from time to time. Now I can simply just pod run and have the server running with basically no effort. Ideally I would use the exact same image GH Pages uses to build the site, but I haven’t set that up yet and to be honest the benefits are probably fairly academic.

Jekyll supports passing a second config file that is merged with the first, which I use to only load the jekyll-admin plugin in development—and avoid any warnings from GH Pages that it isn’t supported.


Parsing Flags is Surprisingly Hard

On the topic of “thinking too much about things that you didn’t really want to think about”, have you considered just how hard it is to parse command-line arguments? Most tools—especially the battle-tested standard POSIX command-line tools—have this worked out pretty well, and work in a fairly predictable way. Until you start trying to implement them yourself, you might not notice just how much of a messy job it is.

First off, the abstract problem that flag-parsing has to solve is taking an array of strings and mapping them unambiguously to a set of configuration options. Of course you could make this incredibly easy, just give every option a unique name, and pass every option as --${name}=${value}. Except we add an obnoxious requirement that the input array of strings should be easily human writable (and readable) so any ultra-verbose and easy-to-implement solution is immediately unsuitable.

The convention for POSIX programs is something like:

Boolean options can be passed like -v to turn them on. They can also be passed like --verbose, -verbose, or --verbose=true. You might even support -V to turn the option off. A single flag could be split into two arguments, like --verbose true (the space means it’s two arguments!) but since shells are unpredictable, you should also support a single argument with a space, in case it was quoted: "--verbose true".

Flags might take arguments, which are often file paths. Like boolean options you could pass --path=/dev/null or -path /dev/null. If it’s a common option then maybe you let users just write -p /dev/null—if you do that you should probably also support -p=/dev/null.

Some flags can accept multiple values, so maybe you should support --search path/one second/path as well as --search=path/one --search=second/path. Of course you should support -s and -search and maybe even mixing and matching all of these.

To reduce the amount of typing users have to do, often the short forms of flags can be shoved together into one flag, so instead of typing -a -b -c you can just do -abc. Hopefully there aren’t so many short options that they could spell out the long form of other flags. Some programs allow using this short form and passing a value for the last flag. So if you had a program that has a boolean flag -b and a string flag -s, you could do -bs value instead of -b -s value.1

If your program is doing a lot of different things, it probably makes sense to group functionality into subcommands, like git clone or tmux attach. You should then support short subcommand names like tmux a, but you’ve also got to match flags to a certain subcommand.

Some flags are going to apply in all cases—things like the log level config file location—but others will only apply to a specific subcommand. Do you require these flags to be in a certain order, or do you allow them to be mixed? If you allow them to be mixed then you’ll have to defer processing any flags until you know the subcommand is—since they could behave differently depending on the subcommand.

Let’s consider a program:

$ program --flag "a value" subcommand-one
$ program --flag subcommand-two

If --flag is defined as taking a string for subcommand-one, and being a boolean for subcommand-two, then you can’t decide whether subcommand-two should be a separate argument itself, or a value for --flag. This leads to programs (like podman) having fairly strict orders for their CLI. Any global flags come directly after the command, then there’s the subcommand, then any flags for the subcommand, then the image name, and finally any arguments after the image name are passed into the container.

This can be annoying as you have to remember which flags go where, and specifically with podman you can easily end up doing something like:

$ podman run alpine:latest --interactive

And wonder why you don’t get a shell. The answer is that --interactive is passed into the container since it’s after the image name, and not used to configure your container. echo has almost the inverse problem, it is used to print things but what if you want to print something that is interpreted as a flag for echo?

# This works just fine, since -t isn't a flag that echo uses
$ echo -t
-t
# but this will interpret it as a flag
$ echo -e

# quoting doesn't do anything
$ echo '-e'

# you need to know that '-' is special
$ echo - -e
-e

The additional catch is that shells don’t have datatypes, everything passed to a program is a string. So there’s no difference between -e and '-e', the program will always receive the string "-e". Many people get caught up on this as if you’re used to a “normal” programming language, the dash seems special and wrapping it in quotes feels like it should force it to be treated as a string.

Speaking of the dashes, they’re purely a convention. There’s no reason that you can’t structure your flags and arguments in a completely different way—it would just be confusing. I’ve seen tools that use a trailing colon to write flags instead of leading dashes:

# so this:
$ program --flag value
# would be
$ program flag: value

It’s somewhat neat—maybe easier to type—but will be unfamiliar for most people that are going to use it. This doesn’t really allow you to have boolean flags that don’t have an explicit value.

Something else to consider is that modern shells will provide some level of auto-completion by default, usually just for file paths. If you write flags as a single argument, using = to separate key from value, the shell won’t as easily be able to provide autocompletion, since it will use spaces to separate units to autocomplete, and without spaces it won’t know when to start:

$ program --path=|
$ program --path |

On the first line, the shell has to know to strip away --path= and autocomplete from there (a naive implementation would just look for files starting with --path=). On the second line, the space means --path and the following word are treated as separate units, and so the shell can more easily autocomplete without doing any special handling.

All of this complexity is why I pretty much always outsource this to a library. I usually use clim for my projects, it’s pretty easy to use and offers more out-of-the-box than the built-in Crystal OptionParser. As soon as you try and make a general solution, you end up having to make some significant assumptions about what the format of the commands will be.


How I Learned to Stop Worrying and Love Concurrency

Doing more than one thing at a time is still a somewhat unsolved problem in programming languages. We’ve largely settled on how variables, types, exceptions, functions, and suchlike usually work, but when it comes to concurrency the options vary between “just use threads” and some version of “green threads” that just allows for something that looks like a thread but takes fewer resources. We’ve also mostly been stuck on whether to actually do more than one thing at a time1, rather than how best to do it.

In this post I’m going to be talking about concurrency—the ability for a program to work through multiple queues of work, switching between them where necessary. This is distinct from parallelism in that no two pieces of work will be happening at the same time. Of course parallelism has its place, but I’m interested in how concurrent programming can be made easier for most programs.

Many applications (I would argue most applications) benefit hugely from concurrency, and less from parallelism since IO is such a large part of many applications. Being able to send multiple network requests or read multiple files “at once” is useful for more applications than having multiple streams of CPU-intense work happening at once.

Exceptions

Before we talk about concurrency, I want to introduce you to my newly-invented programming language. It works just like every other language, except the return keyword is replaced two new keywords: yeet and hoik. To accompany these two new keywords there will be two assignment operators, y= and h= (pronounced “ye” and “he”). y= will be used to receive a yeeted value, h= to receive a hoiked value. If you want to receive both, you can use both in the same expression. So for example:

def get_value(a, b):
  if a == b:
    hoik a
  elif a < b:
    yeet b
  else:
    yeet a

x y= get_value(10, 5)
print(x) # => 10
x h= get_value(5, 5)
print(x) # => 5
p h= l y= get_value(1, 2)
print(p, l) # => None, 2

If a value is hoiked or yeeted but not received by the caller with h= or y=, the hoiking or yeeting will propagate up to the next function.

“Wow Will, that’s so original. That’s just exceptions.” Yes, I know. I’m very clever.

The idea of having two different ways of returning from a function seems bizarre, until you take a step back and realise that most programming languages have two routes out of a function, you just don’t really consider the second one. For example, what does this do:

def parse_file(path):
  contents = read_file(path)
  data = parse_data(contents)
  return data

parse_file("~/config.yaml")

Does parse_data() get called? Well of course not, config.yaml doesn’t exist, and so read_file raises an exception and parse_file re-raises the exception, exiting early. The alternate path(s) through the function are basically invisible and often not given much thought.

Like it or not, humans have a serious thing with the number two. Having two ways of propagating data from a function is no exception (pun absolutely intended), and the ability for most code to ignore the exceptional case is usually convenient. There are obviously some fairly severe downsides—resource usage should be wrapped with a finally (or similar) block to ensure cleanup happens, creating an exception with a trace is not free, and there are plenty of cases where something could be considered a valid return or an exception (like an HTTP response with a 300-block status code). It’s up to the API designer to work out what should be communicated via a return value, and what should be communicated via an exception.

Swift has an interesting approach to exceptions; any call site that can raise an exception must be marked with try or its friends:

  • try will re-raise the exception, forcing the function to be marked with throws and the caller one level up must handle the exception instead.
  • try? will turn any exceptions into an optional, so if an exception is raised you just receive nil.
  • try! converts the exception into a fatal error, stopping the program.

I like having an explicit marker of which calls could cause an exception and alter the flow of the program. It means that the typically-invisible alternate path through the program is clearer, and I know whenever I see try, control flow could be jumping or returning to a different point in the program.

This does have its downsides however; there is an implicit syntactic cost to marking a function as throws. Every caller then must choose to propagate or handle the exception somehow. In many cases this makes a lot of sense—if the call can fail, mark it as throws and add try. But what about calls that should never fail, but can under some circumstances? Let’s consider this fairly innocuous program:2

let text = "oh no"
let index = str.index(
  text.startIndex, offsetBy: 7)
print(text[index])

I’ve managed to create an index on the string that is outside its bounds. The subscript operator on a string isn’t marked with throws, so its only options to communicate this failure are:

  1. return some sentinel value (like an empty string)
  2. crash the whole program
  3. return invalid garbage and let the program continue running like nothing happened

Swift chooses the second option:

Swift/StringCharacterView.swift:158: Fatal error: String index is out of bounds
Current stack trace:
0    libswiftCore.so    0x00007fe01d488740 _swift_stdlib_reportFatalErrorInFile + 113
1    libswiftCore.so    0x00007fe01d163fe4 <unavailable> + 1458148
2    libswiftCore.so    0x00007fe01d163e64 <unavailable> + 1457764
3    libswiftCore.so    0x00007fe01d163b9a <unavailable> + 1457050
4    libswiftCore.so    0x00007fe01d163720 _assertionFailure(_:_:file:line:flags:) + 253
5    libswiftCore.so    0x00007fe01d29d54c <unavailable> + 2741580
6    swift-test         0x000055b8dbcd7e7a <unavailable> + 3706
7    libc.so.6          0x00007fe01c029d90 <unavailable> + 171408
8    libc.so.6          0x00007fe01c029dc0 __libc_start_main + 128
9    swift-test         0x000055b8dbcd7b55 <unavailable> + 2901

Aside from not giving us a stack trace, there’s no way for me to recover from this failure3. If the function isn’t marked as throws, it doesn’t have a good way to report an unexpected failure. The result is that you’re forced to ensure that every value passed to the subscript operator is valid—just like if you were programming in C.

You could mark all methods like this with throws, but that adds a lot of syntactic noise for something that should never happen. I’m sure that the end result would be most people using try! with a justification of “I know the index is within the bounds”.

Java worked around this by having two types of exceptions, checked and unchecked. It’s up to the developer to decide which is appropriate. You can make an API clearer either by including exceptions in the type system—forcing them to be handled in a similar (if more verbose) way to Swift—or omit them from the type system, having them crash the program if unhandled, but still able to be handled in the same way as checked exceptions.

I presume the design of Swift’s exceptions was driven by a desire to avoid checking for failure on every single function call. I’m more interested in syntax here, understanding the performance trade-offs is another topic entirely.

Swift is mostly the outlier here in terms of the status-quo of mainstream languages. The default exception-handling approach is that any function can throw an exception, and that exception will propagate up the stack until a caller catches appropriately. Designers of general-purpose application programming languages have generally decided that automatic error propagation and implicit error checking after each call is worth the performance trade-off. A language doing something different, for example requiring manual error handling, is somewhat noteworthy.

async / await & Concurrency

The most popular4 implementation of concurrency into language is using two keywords—async and await—to annotate points in the program where it can stop and do something else while something happens in the background. Usually this bridges to a historical API that uses something called a “future” or a “promise”.

The basic idea behind a “future” or “promise” API (I’m just going to call them futures from now on) is that you want to save some code for running later, and often a little bit more code for after that.

The reason this works so well is that most languages don’t have support for pausing execution of a running function and coming back to it later, but they do have support for code-as-data-ish in the form of objects with associated methods, and often those objects can be anonymous5. So in Java land we could always do something like this:

HTTPTool.sendGetRequest(
  "https://example.com",
  new HTTPResponseHandler() {
    @Override
    public void handle(HTTPResponse response) {
      System.out.println(response.getBody());
    }
  });

The code in handle() (and any data that it has access to) is effectively saved for later. There’s a suspension point conceptually in my code, but the actual language doesn’t really know that. It just knows about an HTTPResponseHandler object that it needs to hold a reference to so that sendGetRequest can call the .handle() method.

Where this gets super messy is when you want to do one asynchronous thing after another. Say you want to make a second HTTP request with the result of the first, you’d have to do something like:

HTTPTool.sendGetRequest(
  "https://example.com",
  new HTTPResponseHandler() {
    @Override
    public void handle(HTTPResponse response) {
      HTTPTool.sendGetRequest(
        response.getHeader("Location"),
        new HTTPResponseHandler() {
          @Override
          public void handle(HTTPResponse response) {
            System.out.println(response.getBody());
          }
        });
    }
  });

This results in a Pyramid of Doom where each level of async-ness is another level of indentation. Futures work around this problem by allowing “chaining”, inverting how the callbacks are built and avoiding nested indentations:

HTTPTool.sendGetRequest("https://example.com")
  .then(response ->
    HTTPTool.sendGetRequest(
      response.getHeader("Location")))
  .then(response -> {
    System.out.println(response.getBody());
  });

This is obviously much better with Java lambdas, which are less verbose than writing out a full anonymous class implementation, but are conceptually the same thing. However we’re still using closures to hack around the fact that we can’t pause a function.

Most futures APIs are pretty good at chaining a bunch of requests together, but when you get to anything more complicated, you end up having to use a sub-language that operates on futures: continue when all these finish, when one of them finish, do this if one fails, etc. It’s fairly easy to lose track of all your futures and leave one doing work to produce a result that is never used.

What async/await does is allow us to write the closures inline in the body of the function, so our code would end up like this:6

let response = await HTTPTool.sendGetRequest("https://example.com")
let url = response.headers["Location"]
let response2 = await HTTPTool.sendGetRequest(url)
println(response2.body)

The code reads as though the code blocks until a value is available, but what is effectively happening is that at each await, the compiler splits the function in two, and inserts the necessary code to turn the latter half into a callback. This way you can integrate into an existing language without having to change your byte code interpreter—Kotlin does this so it can have concurrency and still interop with Java.

When you’re introducing this awesome function-splitting compiler trick, you can’t do it by default for all functions, since anything from before the trick (ie: Java code) won’t know anything about the implicit callbacks and so won’t be able to call them correctly. To solve this problem you introduce function colours—some functions are asynchronous, some functions are synchronous, and there are rules about how they interact. In general it looks like this:

  • Synchronous functions can call synchronous functions
  • Asynchronous functions can call synchronous functions
  • Asynchronous functions can call asynchronous functions
  • Synchronous functions can cast to asynchronous functions

I’m borrowing the term cast here from Elixir/erlang. Casting over in that world is sending a message but not receiving a result. In most languages with async/await you can start an asynchronous function, but you can’t get a result from it—since you don’t know when it will finish, and your function can’t split into a callback to run when the async call finishes.

This split system introduces a problem similar to how Swift handles exceptions—you can only do async work from an async context. If you don’t get called from an async context, you can’t do any async work and receive the result. This makes it harder to reach for async as a tool—as soon as you’ve made one major API async, all callers of it must be async, and all callers of them must be async. It will propagate through your codebase like a wildfire.

Unlike exceptions, you can’t safely handle async work in a non-async context without risking deadlocking your program. A function that doesn’t throw an exception can call a function that does throw one, it just needs to handle the failure within its body and return an appropriate result. A synchronous function can’t do this if it needs to call an async function. In some cases it may be able to block the thread while it waits for a result, but in a single-threaded context, the async function never gets an opportunity to run, and so the program deadlocks. In a multi-threaded context, some work might still be constrained to a single thread (ie: the UI thread or a background thread) and if you block on that you will deadlock.

The worst thing is that often blocking the thread will work, but it introduces a possibility of all of your threads blocking on async work at the same time, preventing any of the async work from progressing, deadlocking your program but only sometimes.

So why do we have async and await in the first place? As far as I can see there are two reasons, the first is that we don’t want to break compatibility with non-async code that can’t be automatically split into callbacks. The second is that we want to make it explicit that on an await point, the program can go off and do something else—potentially for an indefinitely long amount of time. Even if you call an async function that only takes two milliseconds to finish, most implementations use co-operative multitasking and so there’s no protection against some function calculating primes in the background preventing a context switch back to your function.

“Co-operative” multitasking means that each function is responsible for ensuring that there are enough points that it yields control back to the scheduler to make progress on some other work. If there’s a huge CPU-intensive calculation going on that doesn’t yield, then nothing will happen concurrently until that calculation is completely finished. “Pre-emptive” multitasking will proactively stop one function if it’s running for too long and do some other queued work.

If you’re making a brand-new language that isn’t saddled with backwards compatibility to an existing language or runtime, would you make this same tradeoff? The best language ever (Crystal) and notable poster-child of concurrency (Go) both omit the need for an async keyword.

In both languages, every function is treated as async. At any point7 in a function, execution can swap to a different function and do some work there before swapping back. Much to the fear of people that like their code to be explicit, at any point in your program, an arbitrarily large gap in execution could happen.

Before I used a language with async/await I had heard people talking about how amazing it was, and always got confused because I was used to writing concurrent code in Crystal (or Go before that) where this was not needed. I felt like I was missing something and that this syntax would unlock some new way of doing things, but the reality is just that it’s most often just a way to bridge to a old API because of backwards-compatibility constraints in the language.

Rust is in a particularly tricky situation with async, as their no-runtime and zero-cost abstractions goals mean they can’t just wrap the whole program in an event loop. I don’t know much about Rust—much less writing async code using it—but found these posts to be an interesting look at the history and state of async in Rust:

Using Concurrency

That’s less than half the battle. We can pause a function mid-execution, but we haven’t actually done two things at the same time1. The biggest benefit of non-blocking IO is that you can easily send off two slow requests (eg: over the network) and only wait for the slowest one before continuing, rather than doing them in sequence. This is another API design challenge. The simplest example looks like this:8

        B
      /   \
 o - A     D - o
      \   /
        C

Our function starts on the left, does some processing in A, does B and C at the same time, and then once both have finished does the final step D. There are plenty of ways you could handle this, and the measure of a good API is how easy it is to do the right thing—not introducing race conditions, unexpected behaviour, memory leaks, etc.

The example I’ll use here is something you might see in the world’s most naive web browser—we’re going to load a page and try to also load the favicon for that webpage at the same time. Here’s one example in Go, a language that doesn’t have any notion of async/await because every function can be interrupted at any point:

func loadPage(url string) WebPage {
  pageChan := make(chan []byte)
  faviconChan := make(chan []byte)
  go sendRequest(url, pageChan)
  go sendRequest(url + "/favicon.ico", faviconChan)
  page := <-pageChan
  favicon := <-faviconChan
  return WebPage{page: page, favicon: favicon}
}

And here’s an example of the same function in Swift, that does have async/await:

func loadPage(url: String) -> WebPage {
  async let page = sendRequest(url)
  async let favicon = sendRequest(url + "/favicon.ico")
  return WebPage(page: await page, favicon: await favicon)
}

Ok I’m going to pause here and say that the following section is basically just my notes on Nathaniel J. Smith’s post Notes on structured concurrency, or: Go statement considered harmful. I recommend it, it’s a good read. You can come back to this later.

The main difference here is that Go doesn’t have any higher-level abstractions for dealing with concurrency as values, just as goroutines using the go keyword, and channels using the chan keyword. We have to hand-craft any structure in our concurrency with our bare hands. Appropriately, Swift has a keyword for this. Instead of immediately await-ing an async function, we can assign it to a variable with async let and then await the value later.

What happens when our code gets a little more complicated? Let’s say we’re writing a program to fetch posts from our favourite blogs. We know that some have an Atom feed, and we should prefer that if it exists, otherwise we should fall back to the RSS feed. This might look something like:

func getFeedsFrom(url: string) []Feed {
  atomChannel := make(chan Response)
  rssChannel := make(chan Response)
  go fetchFeed(url + "/atom.xml", atomChannel)
  go fetchFeed(url + "/rss.xml", rssChannel)
  atomResponse := <-atomChannel
  if atomResponse.IsSuccess() {
    return parseItems(atomResponse)
  }
  rssResponse := <-rssChannel
  return parseItems(rssResponse)
}

Seems reasonable? The problem is that go fetchFeed(url + "/rss.xml", rssChannel) can outlive the lifetime of the function if we get a successful response back for the Atom feed first. My program would just have a process running in the background doing useless work that I don’t care about, and there’s nothing in the language to help me do this correctly.9 Some languages with async/await can have the same problem, it’s just spelled slightly differently. Depending on the implementation, if a value is not await-ed, it will continue running in the background and any result or error discarded. For example this JavaScript example is much more succinct, but it has the same problem in that the RSS result will not get cleaned up when the function returns:

async function getFeeds(url) {
  let atom = fetchFeed(url + "/atom.xml")
  let rss = fetchFeed(url + "/rss.xml")

  let atomResult = await atom
  if (atomResult.success) {
    return parseItems(atomResult)
  }
  return parseItems(await rss)
}

You don’t think about it as much since you don’t have the explicit go keyword here, but you are doing the same thing. The control flow splits in two, one fetching the Atom feed and one fetching the RSS feed, and then you wait for the results.

Swift and Kotlin do this very well,10 I’m going to use Kotlin as an example here since it does things a little more explicitly. The only place you can split your function is within a CoroutineScope. By default, the scope will only finish when every coroutine in it has finished. So the previous example would look like:11

suspend fun getFeeds(url: String): List<Feed> {
  return coroutineScope {
    val atomAsync = async {
      fetchFeed(url + "/atom.xml")
    }
    val rssAsync = async {
      fetchFeed(url + "/rss.xml")
    }

    val atom = atomAsync.await()
    if (atom.success) {
      return@coroutineScope parseItems(atom)
    }
    return parseItems(rssAsync.await())
  }
}

This will wait for rssAsync to finish before coroutineScope returns. Even though we’ve got an early return on a successful fetch of the Atom feed, we’ll still implicitly wait for the RSS feed. If the RSS feed takes ages to respond, our whole function will take ages. This is the price to pay for encapsulation. coroutineScope enforces our concurrent code to be a diamond pattern, instead of that fork pattern:

Always this:
        B
      /   \
 o - A     D - o
      \   /
        C

Never this:
        - - - - - B - - - - - - ?
      /
 o - A     D - o
      \   /
        C

coroutineScope isn’t something magical, it’s just a function with a block argument12 that exposes the async method and keeps track of anything launched using it. If I find the “wait for everything to finish, even on early return” behaviour to be limiting, I can just write another function that uses the same building blocks to give me that behaviour:

suspend fun <T> coroutineScopeCancelOnReturn(
    block: suspend CoroutineScope.() -> T): T {
  return coroutineScope {
    val result = block.invoke(this)
    currentCoroutineContext().cancelChildren(null)
    return@coroutineScope result
  }
}

As concurrency is tied to a scope, we can use this building block to create our own scopes with different behaviours—mine makes it easier for blocks to cancel outstanding work after an early return, but you could equally easily make a scope that included a timeout, or limited the number of async calls happening at any one time. Most of the time you should only need the coroutineScope builder function, but there’s nothing stopping you from having a global variable that’s a scope, and having things work more like Go, where any function can start work in the scope that outlives the life of the function. It’s easier to spot however, since you just need to look at the cross-references for the global scope to find who’s using it. In Go you would have to manually inspect every function and understand how they handled concurrency to be sure that nothing was leaking.

The usage of scopes to handle concurrency changes how APIs are written. Take a basic HTTP server in Crystal:

server = HTTP::Server.new do |context|
  context.response.content_type = "text/plain"
  context.response.puts "Hello world!"
end

spawn do
  sleep 5.minutes
  server.close
end

server.bind_tcp "0", 8080
server.listen

After five minutes, what will this do? The documentation for #close says:

This closes the server sockets and stops processing any new requests, even on connections with keep-alive enabled. Currently processing requests are not interrupted but also not waited for. In order to give them some grace period for finishing, the calling context can add a timeout like sleep 10.seconds after #listen returns.

So the fibres spawned by the server (that run the block passed to .new) won’t be cancelled (which makes sense since fibres in Crystal can’t be cancelled) and will be left dangling. If Crystal had scoped coroutines like Kotlin, you could more easily change and reason about the behaviour by passing in a different scope to the server to use for handling requests—currently you have no guarantee that code in the .new block won’t run after .listen returns, or in theory any point after that, since an HTTP connection could take a prolonged time to establish before the handler code is run.

This would support the common use-case of cancelling outstanding requests when the server shuts down, but could easily be changed to add a timeout grace period, or stop the whole server if there is an unhandled exception (instead of printing it and continuing like nothing happened).

This implementation that uses scopes to control concurrency basically allows you to start building towards an Erlang supervisor tree.13

When I was in university I wrote a Slack bot using Elixir. It originally didn’t handle the “someone’s typing” notification from the Slack API, which caused it to crash. The (Elixir) process that ran the bot would crash, and the supervisor would replace it with another identical process. The storage was handled in a separate process, no data was lost and the bot would reconnect after a few seconds. If I had been using almost any other language, the end result probably would have been my whole program crashing, and me having to fix it immediately.

Having language support for cancelling pieces of work is also useful in a lot of other contexts, POSIX processes can be interrupted with a SIGINT which often trigger some kind of callback in the language, and the callback needs to communicate to any currently-running things that they should stop. Cancellation being a first-class citizen could allow for better default behaviour when a program is told to stop. This same concept could apply to applications in resource-constrained environments (ie phone OSes) so that they can respond effectively to being stopped due to lack of resources.

Concurrent Data

Once you’ve got the lifetime of your concurrency sorted, you need to work out the lifetime and access for your data. Rust does this with lifetime annotations and more static analysis than you can point a stick at, Pony has six reference capabilities that define how a variable can be used in what context. Erlang and Elixir just have fully immutable data structures, so you can’t mutate something you shouldn’t—you can have “mutable” data in a stateful process and introduce a race condition by multiple processes sending messages to the stateful process.

When I’m writing stuff in my free time I usually have a fairly cavalier attitude to thread safety. Crystal doesn’t have many guarantees for this, and since it’s currently single-threaded, most of the time it works fine. I’ll write some dirty code that spawns a new fibre that does some work and appends the result to a list. That’s always atomic—right?

I haven’t written enough Rust to appreciate what it’s like working with the borrow checker and lifetime annotations. From what I’ve read (a recent example) the borrow checker is frustrating, to say the least.

What I’d like is—somehow—for concurrent data access to be verified as easily as types are checked in Crystal. I get most of the benefits of static typing and dynamic typing by using Crystal’s type inference, can the lifetimes of variables be inferred in a similar way? I think this would be a very hard problem, and probably only practical if the general population of developers was already used to adding lifetime annotations—like they are with types—so you could just require fewer of them.

For me, the best concurrency system would be one that doesn’t require any tagging of functions, to avoid having to think about function colouring and the syntactic cost of annotating every call site, and a well-defined structured concurrency API that is used throughout the standard library and third party libraries, to give guarantees about the lifetime of concurrent work. This would need to have affordances to handle pending concurrent work as values (like Swift’s async let or Kotlin’s Deferred<>), and enough tools in the standard library to make it easy to handle these values. I don’t have particularly strong opinions about actors, lifetimes, or reference capabilities14 as I’ve not used them much to write any real-world programs.

If you liked this and want to read something by someone who knows what they’re talking about, I would recommend reading Notes on structured concurrency, or: Go statement considered harmful. Reading this was definitely the “ah-ha” moment where I was convinced that just tacking a spawn function in your language wasn’t good enough.

  1. Yeah yeah, I know it’s not actually at the same time, see my note right at the top. But you know what I mean, otherwise you wouldn’t have read the footnote. If you’re the type of person to correct a concurrency-versus-parallelism mistake, you’re also the kind of person that will read a footnote to be absolutely accurate in your correction.  2

  2. Credit to @acb for pointing this out. 

  3. Well maybe there is, I’m not a Swift expert. But we’re talking abstractly about syntax here, just roll with it. 

  4. This just means they don’t have a real name, and are typically defined inline where they get passed to a function. 

  5. Part of the joy of reading my blog is getting confused as I change language in the middle of a series of examples. This next one is in Swift, since Java doesn’t have async/await yet, and Kotlin’s implementation is less clear about await-ing things. 

  6. As long as a function yields, see co-operative versus pre-emptive note above. 

  7. Appreciate my effort- and bandwidth- saving ASCII diagram. 

  8. Maybe Go has some library for keeping track of your goroutines, but my basic point is this is not the default and not what I see people doing. 

  9. They basically do the previously mentioned blog post

  10. Yes I know my Kotlin function could be more idiomatic and shorter, but then everyone would be getting confused about Kotlin’s weird syntax, instead of getting confused at concurrency. 

  11. Ok Kotlin’s blocks are kinda magic. 

  12. Ignoring the fact you don’t have memory isolation for each process so you’ll never fully get there. 

  13. Perhaps that’s part 2? Subscribe to the RSS feed for more! 


Improvements for Initialising Pod Projects

One of the major usability misses with pod was that it was tricky to setup a new project. My goal was remove the need for language-specific development tools installed directly onto my computer, but whenever I started a new project with pod, I would need to run crystal init to create the basic project skeleton. With the new pod init command, this is now unnecessary.

To create a new project that wasn’t Crystal (like when I was messing around with Swift websockets) I would manually run a shell in a container using the image for the language and bind mount my working directory. I’d then use the package manager within the container to setup a project (eg: running swift package init) and then copy-paste some containerfiles from a previous project. This is incredibly fiddly and tedious. So I added functionality to pod that does this automatically.

Now when you run pod init, it asks for a base image to use—I use the latest Crystal Alpine image—and runs a container using that image with the working directory already available as a bind mount. Using the shell in that container you can run whatever tools are needed to setup the files for your project (npm init, crystal init, cargo init, etc). When you exit that shell, pod will create containerfiles and a pods.yaml file for the project, so in most cases you can just build with pod build and then pod run without any further changes.

Another thing that is more difficult in a container-only world is running REPLs inside the project. I don’t do this often—since the Crystal interpreter isn’t shipping in the main release yet—but I really enjoyed this way of working when I was using Elixir or Ruby more. Running an iex shell where I could recompile and interactively test my code was probably the most pleasant development experience I’ve ever had, and I wanted to support that with pod.

This is now possible with pod enter. By default you can run a shell using any of the images in your pods.yaml file, or you can configure entrypoints and jump straight into a REPL by running a particular command. So for example this:

entrypoints:
  iex:
    image: my-elixir-project:dev-latest
    shell: iex -S mix

Will allow me to do this:

$ pod enter iex
Erlang/OTP 26 [erts-14.0.2] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit]

Interactive Elixir (1.15.4) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)>

This bind-mounts the working directory in, so your code is available to any tools that run in the entrypoint. If you’ve got something more complicated that requires more customisation of the container (like exposing ports or binding additional directories) you can always make a custom run target that spawns an interactive shell.

You can imagine that if you were working on a Ruby on Rails project, you might setup something like this:

entrypoints:
  console:
    image: my-rails-project:dev-latest
    shell: bin/rails console

I’ve enjoyed working in a container-first and now largely container-only way, and improving pod is what has made this possible for me to do. You can check it out here, specifically the documentation for getting started.