The Code-Config Continuum

At some point you’ve probably written or edited a config file that had the same block of config repeated over and over again with just one or two fields changed each time. Every time you added a new block you’d just duplicate the previous block and change the one field. Maybe you’ve wished that the application you’re configuring supported some way of saying “configure all these things in the same way”.

What this is exposing is an interesting problem that I’m sure all sysadmins, devops, SREs, and other “operations” people will appreciate deeply:

Where should something sit on the continuum between config and code?

This follows on from the difficulty of parsing command-line flags. Once your application is sufficiently complex, you’ll either need to use something that allows you to write the flags in a config file, or re-write your application to be configured directly from a config file instead of command-line arguments.

The first logical step is probably to read a JSON file. It’s built-in to most modern languages, and if it’s not then there’s almost certainly a well-tested third-party library that does the job for you. You just need to define the shape of your config data structure (please define this as a statically-typed structure that will fail to parse quickly with a good error message, rather than just reading the config file as a big JSON blob and extracting out fields as you go, setting yourself up for a delayed failure) and you’re all set.

This file will inevitably grow as more options and complexity are added to the application, and at some point two things will happen: firstly someone who hasn’t dealt with tonnes of JSON will ask why they can’t add comments into the config file, and someone will write a script that applies local overrides of configuration options by merging two config files to allow for easier development for a local environment.

To remedy the first issue you could probably move to something like YAML or TOML. Both are designed as config-first rather than object-representation-first, and so support comments and some other niceties like multi-line strings.

If you stuck with JSON or chose to use TOML, you’ll soon end up with another problem: you need to keep common sections in sync. Say you have something like a set of database connection configs, one for production and one for development (a good example is a Rails database.yml file). You want to keep all the boring bits in sync so that development and production don’t stray too far from one another.

I run into this with my pods.yaml config files. The program I wrote to track helicopter movements around the Sydney beaches has five different container configurations that I can run, all of them need the a handful of common flags:

flags:
  timezone: Australia/Sydney
  point_a:
    lat: -34.570
    long: 152.397
  point_b:
    lat: -32.667
    long: 149.469
  http_timeout: 5s

If this was JSON or TOML I would have to repeat that same block of config five times, and if I ever changed the area I was scanning, I would have to remember to update each place with the same values.

However, YAML is a very powerful config language; you can capture references to parts of the config and then re-use them in other parts of the file:

flags: &default-flags
  timezone: Australia/Sydney
  point_a:
    lat: -34.570
    long: 152.397
  point_b:
    lat: -32.667
    long: 149.469
  http_timeout: 5s

containers:
  my-container:
    name: test-container
    flags:
      <<: *default-flags
  my-other-container:
    name: second-test-container
    flags:
      <<: *default-flags

Here I use default-flags to set the flags attribute of both containers to the exact same value.

This is quite powerful and very useful, but there are still plenty of things that you can’t express: mathematical operations, string concatenation, and other data transformations. I can’t redefine how I write the configuration to be completely different to what the program that’s parsing the YAML expects.

# Reference a field, and transform it
field: new-$another_field
# Grab an environment variable
field: $USER
# Do some arithmetic using a field
field: 2 * $other_field
# A simple conditional
field: $PRODUCTION ? enabled : disabled

Some things that you can’t do in YAML.

That being said, YAML is far from simple:

The YAML spec is 23,449 words; for comparison, TOML is 3,339 words, JSON is 1,969 words, and XML is 20,603 words. Who among us have read all that? Who among us have read and understood all of that? Who among us have read, understood, and remembered all of that? For example did you know there are nine ways to write a multi-line string in YAML with subtly different behaviour?

Martin Tournoij: YAML: probably not so great after all

YAML is full of surprising traps, like the fact that the presence or absence of quotes around a value changes how it is parsed and so the country code for Norway gets parsed as the boolean value false.

Even if you decide that the power of YAML is worth these costs, you’re still going to run into a wall eventually. noyaml.com is a good entrypoint to the world of weird YAML behaviour.

As your application becomes more complex—or as the interdependence of multiple applications becomes more complex—you’ll probably want to split the config into multiple files1.

A classic example would be doing something like putting all the common flags that are shared between environments in one file, and then the development, staging, and production configurations each in their own file that reference the common one. YAML has no way of supporting this, and so you’ll end up writing a program that either:

  • concatenates multiple YAML files before sending them to the application to be parsed
  • parses a YAML file and reads attributes in it to define a rudimentary #include system
  • generates a single lower-level YAML config file that is given to the application based on multiple higher-level config files

And of course whichever option you chose will be difficult to understand, error-prone, hard to debug, and almost impossible to change once all it’s idiosyncrasies are being relied upon to generate production configuration.

The sensible thing to do—of course—is to use an existing configuration language that is designed from the ground up for managing complex configuration, like HCL. HCL is a language that has features that look like a declarative config (“inspired by libucl, nginx configuration, and others”) but is basically a programming language. It has function calls, conditionals, and loops so you can write an arbitrary program that translates one config data structure into another before it gets passed to an application.

This is all very good, but now you’ve got another problem: you need to learn and use another programming language. At some point you’re going to say “why doesn’t this value get passed through correctly?” and the solution will be to debug your configuration language. That could involve using an actual debugger, or working out how to printf in your config language.

Chances are pretty high that you’re not very good at debugging this config language that you don’t pay much attention to, and the tooling for debugging it is probably not as good as a “real” programming language that’s been around for 29 years.

If you’ve done any Rails development, then you’ve come across Ruby-as-config before. Ruby has powerful metaprogramming features that make writing custom DSLs fairly simple, and the Ruby syntax is fairly amenable to being written like a config language. If there is a problem with the config then you can use familiar Ruby debugging tools and techniques (assuming you have some of those), but the flip side is that the level of weird metaprogramming hacks required to make a configuration “readable”—or just look slick—are likely outside of the understanding of anyone not deeply entrenched in weird language hacks.

Of course you’re free to choose whichever language you like, they’re all fairly capable of taking some values and translating them to a data structure that the end application can ingest. You could even write your config in Java.

There are a lot of additional benefits to using a real programming language to write your configuration. As well as abstracting away configuration details, you can add domain-specific validation that doesn’t need to exist in the application (perhaps enforcing naming conventions just for your project), or dynamically load config values from another source—perhaps even another config file—before they are passed into the application.

The next iteration is when the config continues to increase in complexity2, and so you decide to make some kind of tool that helps developers make common changes. Adding and removing sections is the obvious use-case. Strictly speaking it doesn’t have to be due to the config being complex, it could just be that you want some automated system to be able to edit the files.

Your problem is that you have no guarantees about the structure of the config. Since it’s a general-purpose programming language, details could be scattered anywhere throughout the program. With JSON, it’s super easy to parse the file, edit the data, and write a well-formatted config back out—you just have to match the amount of indentation and ideally the order of keys too. Doing this for most programming languages is much more difficult (just look at the work that has gone into making rubyfmt).

Even if you can parse and output the config program, the whole point of using a general-purpose language was to allow people to structure their configs in different ways, so to make a tool that is able to edit their configs, you’re going to have to enforce a restricted format that is easier for a computer to understand and edit.

So if you’ve got an application that expects a config file with hostnames and ports in a list, something like this:

[
  {
    "hostname": "steve",
    "port": 4132
  },
  {
    "hostname": "brett",
    "port": 5314
  },
  {
    "hostname": "gavin",
    "port": 9476
  }
]

The simplest translation to a Ruby DSL could look like:

[
  host {
    hostname "steve"
    port 4132
  },
  host {
    hostname "brett"
    port 5314
  },
  host {
    hostname "gavin"
    port 9476
  }
]

If someone was deploying this to a cloud service, they might not want to write all that out, so their config might look like:

zones = ["us-east-1", "us-west-2", "au-east-1", ...]
STANDARD_PORT = 4123

zones.map do |zone|
  host {
    hostname "host-#{zone}"
    port STANDARD_PORT
  }
end

A program that has to edit these files to “add a new host” basically has to understand the intent behind the whole file3. This is an exceptionally difficult job. I read a book about robots as a child that likened computer speech to squeezing toothpaste out of a tube, and speech recognition to pushing the toothpaste back into the tube. Creating the config is like squeezing the toothpaste, having a computer edit the config is like putting the toothpaste back.

There are two paths you can take from here: double down on the programming language and build higher-level abstractions over the existing config to remove the need for the computer to edit the files, or move towards stricter formats for config files to allow computers to edit them.

You’re being forced to pick a position on the code-config continuum, between something that’s bad for people but good for computers, and something that’s better for people and bad for computers. There’s no right answer, and every option trades off between the two ends of the spectrum.

  1. I can imagine a student or junior developer reading this and thinking “when would your configuration ever get too big for one file?”. Trust me, it does. 

  2. It’ll happen to you one day! 

  3. Maybe an LLM could get us there most of the time? 


I'm Dumbfounded by Discord

I find Discord baffling. Not in its popularity in group messaging for a class, team, or friend group—it seems fine at that—but the other, larger use cases.

In 2020 and 2021 I learnt how to create digital art in Blender, the 3D modelling software. I watched both Clinton Jones’s videos (who I had been following from his time at RocketJump and Corridor Digital) and Blender Bob. It was Clinton’s work and the videos showing his process where I learnt that you could use computer graphics without ever thinking about video or “VFX”—that’s just where I was exposed to these ideas initially. His Instagram has a mix of both film photography and rendered computer graphics, but since he targets the same aesthetic in both, it’s often hard to tell at a glance which is which.

Anyway. Both of these creators have Discord servers where subscribers could chat, share their work, and potentially get some guidance from people in the community or the creator themselves. When I joined, both were open for anyone to join, but I think that now Clinton’s Discord is for Patreon supporters only.

This is where the bafflement comes in. Discord is designed as a synchronous messaging system. You can obviously view or reply to messages at any time, but the interface expects you to read messages almost as soon as they are received, and reply immediately or never.

For a team or group of friends this makes sense, you’re probably all in the same timezone and share a similar schedule. If you’re not, then at least the group is probably small enough that it’s easy to catch up on anything that you missed. Discords for “fan communities” are basically the exact opposite—they’re large and highly trafficked. The time difference is exacerbated by me being in a significantly different timezone than the typical North American audience.

The experience that I would have was every time I checked the servers, there would be at least tens—if not hundreds—of new messages in every channel, with topics of conversation shifting multiple times. Any attempt to ask a question or have a conversation is drowned out in the noise of additional messages and threads.

The Discord app just isn’t designed for reading all the messages. Even if I treated the server as a read-only experience (much like I do with Mastodon1), it’s difficult to go through and look at the history of a channel. If you do, you’re going to be reading it backwards as the app probably isn’t going to perfectly preserve your scroll position (something that I’m especially keen on).

It seems to me that these Discord servers have a few roles; a support forum, a showcase of work, and a space for informal discussion.

You know what works really well as a support forum? An actual forum with first-class support for topics, threads, and detailed discussion that can happen asynchronously as the question-asker works through their problem. As someone that remembers a time before Stack Overflow, it seems like people have collectively forgotten the experience of describing your problem on a forum, and then a day later having a kind and knowledgable person ask you to give them some more information so they can pin down the solution.

I’ve seen it mentioned on Mastodon that some software projects use Discord in lieu of a support forum or documentation, which I find absolutely baffling as trying to find something that someone mentioned within a chat conversation—and understanding all the surrounding context, while filtering out any unrelated noise in the channel that was happening alongside it—seems completely impossible. Those conversations are also not going to be indexed by a search engine, so people that aren’t aware of the Discord are almost certainly not going to stumble across it while searching for information about a problem they’re having.

If the infamous discussion about whether there are 7 or 8 days in a week had happened on Discord, I wouldn’t be able to effortlessly find it 16 years later with a single search.

The other two use-cases—showcasing work and having informal discussions—are less well suited to forums, but I think they’d still be passable if implemented that way. However, the actual point of this whole post was to propose an alternative for this kind of fan community: a private Mastodon server.

As web creators move towards sharing their work on their own terms, rather than via an existing platform (an example), a suitably tech-focussed2 creator could offer membership on a private Mastodon server as a perk of being a supporter.

Mastodon’s soft-realtime and Twitter-like flat-threaded structure give it a nice balance of working reasonably well for quick conversations as well as time-delayed asynchronous communication. Since the instance would be private, the “local” timeline would just contain posts made by the community, allowing members to see everything, or create their own timeline by following specific people or topics.

Ideally, Mastodon clients would allow mixing and merging accounts into a single timeline—so I could have the accounts I follow from my main account and accounts on this private instance show up in the same timeline, so I don’t have to scroll through two separate timelines.

The biggest challenge would obviously be explaining that you’re signing up to an instance federated social media platform that has disconnected itself from the federated world in order to provide an “exclusive” experience only for supporters of the creator.

I don’t think that Mastodon will reach a level of mainstream success that such a niche use of it could be anything but a support headache, but it’s interesting to think how open platforms could be re-used in interesting ways.

  1. Go on, toot me. 

  2. Read this as “willing to put up with the complexities of Mastodon and able to understand the nuance of having a de-federated instance of a federated system”. 


Build and Install Tools Using Containers

Another challenge in my quest to not have any programming languages installed directly on my computer is installing programs that need to be built from source. I’ve been using jj in place of Git for the last few months1. To install it you can either download the pre-build binaries, or build from source using cargo. When I first started using it there was a minor bug that was fixed on main but not the latest release, so I needed to build and install it myself instead of just downloading the binary.

Naturally the solution is to hack around it with containers. The basic idea is to use an base image that matches the host OS (Ubuntu images for most languages are not hard to come by) and build in that, and only copy the executable out into the host system.

To install jj and scm-diff-editor I make a Containerfile like this:

FROM docker.io/library/rust:latest
WORKDIR /src
RUN apt install libssl-dev openssl pkg-config
RUN cargo install --git https://github.com/martinvonz/jj.git --locked --bin jj jj-cli
RUN cargo install --git https://github.com/arxanas/git-branchless scm-record --features scm-diff-editor
COPY install.sh .
ENTRYPOINT /src/install.sh

This just runs the necessary cargo commands to install the two executables in the image. The install.sh script is super simple, it just copies the executables from the image into a bind-mounted folder:

#!/bin/bash
for bin in jj scm-diff-editor; do
  cp "$(which "$bin")" "/output/$bin"
done

So the last part is just putting it all together with a pod config file:

images:
  jj-install:
    tag: jj-install:latest
    from: Containerfile
    build_flags:
      cache-ttl: 24h

containers:
  install:
    name: jj-install
    image: jj-install:latest
    interactive: true
    autoremove: true
    bind_mounts:
      ~/.local/bin: /output

I can then run pod build to create a new image and build new executables with cargo. Then pod run the container to copy them out of the image and into the $PATH on my host system.

This is the same approach I used for the automatic install script for pod itself—except using podman commands directly rather than a pod config. I’ve done the same thing to install rubyfmt since that is only packaged with Brew, or requires Cargo to build from source.

I’m sure at some point an incompatibility between libraries inside and outside of the container will create a whole host of bizarre issues, but until then I will continue using this approach to install things.

  1. Short review, it’s good but has a long way to go. Global undo is excellent, and I like the “only edit commits that aren’t in main yet” workflow. 


Jekyll Blog Tips

This site that you’re reading now is generated by Jekyll and hosted on GitHub Pages. Originally when I set this site up, GitHub Pages only supported their own limited set of plugins, and if you wanted to do anything extra you had to generate the HTML content yourself. In the interim, you can now write a custom GitHub Action that builds the site, allowing you to run arbitrary code during generation.

In an effort to keep things simple and avoiding the temptation to write my own site generator, I have stuck with the basic deploy-on-push system with the standard set of plugins. This has worked fairly well, and the downsides are fairly minor—for example the version of the Rogue syntax highlighter that is used is a few years old and doesn’t know about Swift’s async or await keywords. This is not an issue unless you write a long blog post about concurrency.

I have of course worked out a variety of ways to maximise my use of this constrained environment.

Default Post Attributes

Jekyll allows you to specify the default front matter attributes in the config file. Previously whenever I would read these attributes in a template I would check if they were empty, and put the default right there in the template. Being able to configure a default makes this much easier. The defaults set the layout and OpenGraph metadata.

404 Page

Originally GH Pages didn’t support a custom 404 page (instead just delivering a generic one common to all sites) but you can now create a 404.md file and tell people they’re looking for something that doesn’t exist. This is what mine looks like.

Make The Most of Layouts and Includes

There are four places that posts can appear on the website; the actual post page, the index page, and the two feeds (RSS and JSON). I’m sad to say that despite Liquid supporting re-using files, I just copy-pasted the content of the post header between the index page and the post layout. There were definitely a few times where I was making edits to one and getting confused why I didn’t see any changes to the site.

What I do now is much better, I have a template in _includes for the HTML version of the post that has the styled title and post metadata. This is used on the homepage and individual post pages. The post page is a custom layout that adds a footer that I only want when viewing a single post. The two feeds use a separate template that omits the header (since RSS readers will make a header themselves) but adds a small footer that isn’t present on the HTML version.

The trick with getting this to work was that Jekyll stores the post information in different variables depending on whether you were rendering a page or a single post. A layout uses {{ content }} to inject the content of the page, but in the index page you’ve got multiple posts, each with their own content that’s accessed with {{ post.content }}. I don’t think you can pass variables to templates, but variables in Liquid templates are seemingly all global anyway, so you can just assign to post and use that in the layout. Now anywhere I need to include a post is just:

# index.html
{% for post in paginator.posts %}
  {% include post.html %}
{% endfor %}

# _layouts/post.html
{% assign post=page %}
{% include post.html %}

Data in _config.yml

The content of _config.yml is basically mapped directly to the site object, so you can define additional configuration knobs instead of setting them multiple times across the site. I use this to define a single date format that is used wherever a human-readable date is shown. I set date_format: "%B %-d, %Y" in the config and whenever I show a date I can access that format: {{ post.date | date: site.date_format }}.

I also use this for some common URLs—not because I’m likely to change them, but to avoid me mistyping them. Or you can dump data directly from the config file into a page, as I did with the webfinger Mastodon trick earlier this year.

Jekyll Admin

Jekyll Admin is a web UI that allows you to edit posts and pages, as well as upload files. Since I write on my laptop but run the Jekyll dev server on my home server, this avoids some awkward scp-ing by allowing me to just paste my posts into a webform.

The killer feature would be for it to have a basic Git integration, so you could commit changes and push them to a remote right from the admin interface. Alas the project isn’t there yet.

I don’t know if it’s a problem with the version of Jekyll that I run (I use locally whatever version GH Pages is using for consistency) but the admin interface shows constant errors when you save a post—despite it never actually failing to do anything. It’s still more convenient than scp, but definitely doesn’t inspire confidence.

Adding this very post to Jekyll Admin showed an error banner that said “Error: Could not update the doc”. The doc had updated without any problems.

Development-Only Content

Liquid has conditional expressions, and Jekyll has jekyll.environment. Smash these two together and you can add extra information that you only want visible when you run the website locally. For example I have a link to Jekyll Admin show as an additional link in the status bar, and every post has an “Edit” link that takes me directly to the Jekyll Admin edit interface for that post. Since the site is statically generated, that information obviously not just hidden on the real site—it’s completely gone.

A Jekyll issue that’s made worse with Jekyll Admin is the handling of the site URL. If you want to listen on all interfaces—because you’re developing in a container or running the Jekyll dev server on a different machine than the one you want to view the website on—then you set host: 0 either in _config.yml or via command-line arguments. The problem is that this overrides site.url, so any absolute URL will be http://0:80/my_url which is meaningless. Jekyll doesn’t allow you to set the host without overriding the site URL, and Jekyll admin generates a bunch of these URLs that don’t work properly.

Run it in a Container

My website was actually one of the first things that I containerised and saw a real benefit. Even though Ruby environment management is a pretty well-trodden area, I still would run into dependency issues from time to time. Now I can simply just pod run and have the server running with basically no effort. Ideally I would use the exact same image GH Pages uses to build the site, but I haven’t set that up yet and to be honest the benefits are probably fairly academic.

Jekyll supports passing a second config file that is merged with the first, which I use to only load the jekyll-admin plugin in development—and avoid any warnings from GH Pages that it isn’t supported.


Parsing Flags is Surprisingly Hard

On the topic of “thinking too much about things that you didn’t really want to think about”, have you considered just how hard it is to parse command-line arguments? Most tools—especially the battle-tested standard POSIX command-line tools—have this worked out pretty well, and work in a fairly predictable way. Until you start trying to implement them yourself, you might not notice just how much of a messy job it is.

First off, the abstract problem that flag-parsing has to solve is taking an array of strings and mapping them unambiguously to a set of configuration options. Of course you could make this incredibly easy, just give every option a unique name, and pass every option as --${name}=${value}. Except we add an obnoxious requirement that the input array of strings should be easily human writable (and readable) so any ultra-verbose and easy-to-implement solution is immediately unsuitable.

The convention for POSIX programs is something like:

Boolean options can be passed like -v to turn them on. They can also be passed like --verbose, -verbose, or --verbose=true. You might even support -V to turn the option off. A single flag could be split into two arguments, like --verbose true (the space means it’s two arguments!) but since shells are unpredictable, you should also support a single argument with a space, in case it was quoted: "--verbose true".

Flags might take arguments, which are often file paths. Like boolean options you could pass --path=/dev/null or -path /dev/null. If it’s a common option then maybe you let users just write -p /dev/null—if you do that you should probably also support -p=/dev/null.

Some flags can accept multiple values, so maybe you should support --search path/one second/path as well as --search=path/one --search=second/path. Of course you should support -s and -search and maybe even mixing and matching all of these.

To reduce the amount of typing users have to do, often the short forms of flags can be shoved together into one flag, so instead of typing -a -b -c you can just do -abc. Hopefully there aren’t so many short options that they could spell out the long form of other flags. Some programs allow using this short form and passing a value for the last flag. So if you had a program that has a boolean flag -b and a string flag -s, you could do -bs value instead of -b -s value.1

If your program is doing a lot of different things, it probably makes sense to group functionality into subcommands, like git clone or tmux attach. You should then support short subcommand names like tmux a, but you’ve also got to match flags to a certain subcommand.

Some flags are going to apply in all cases—things like the log level config file location—but others will only apply to a specific subcommand. Do you require these flags to be in a certain order, or do you allow them to be mixed? If you allow them to be mixed then you’ll have to defer processing any flags until you know the subcommand is—since they could behave differently depending on the subcommand.

Let’s consider a program:

$ program --flag "a value" subcommand-one
$ program --flag subcommand-two

If --flag is defined as taking a string for subcommand-one, and being a boolean for subcommand-two, then you can’t decide whether subcommand-two should be a separate argument itself, or a value for --flag. This leads to programs (like podman) having fairly strict orders for their CLI. Any global flags come directly after the command, then there’s the subcommand, then any flags for the subcommand, then the image name, and finally any arguments after the image name are passed into the container.

This can be annoying as you have to remember which flags go where, and specifically with podman you can easily end up doing something like:

$ podman run alpine:latest --interactive

And wonder why you don’t get a shell. The answer is that --interactive is passed into the container since it’s after the image name, and not used to configure your container. echo has almost the inverse problem, it is used to print things but what if you want to print something that is interpreted as a flag for echo?

# This works just fine, since -t isn't a flag that echo uses
$ echo -t
-t
# but this will interpret it as a flag
$ echo -e

# quoting doesn't do anything
$ echo '-e'

# you need to know that '-' is special
$ echo - -e
-e

The additional catch is that shells don’t have datatypes, everything passed to a program is a string. So there’s no difference between -e and '-e', the program will always receive the string "-e". Many people get caught up on this as if you’re used to a “normal” programming language, the dash seems special and wrapping it in quotes feels like it should force it to be treated as a string.

Speaking of the dashes, they’re purely a convention. There’s no reason that you can’t structure your flags and arguments in a completely different way—it would just be confusing. I’ve seen tools that use a trailing colon to write flags instead of leading dashes:

# so this:
$ program --flag value
# would be
$ program flag: value

It’s somewhat neat—maybe easier to type—but will be unfamiliar for most people that are going to use it. This doesn’t really allow you to have boolean flags that don’t have an explicit value.

Something else to consider is that modern shells will provide some level of auto-completion by default, usually just for file paths. If you write flags as a single argument, using = to separate key from value, the shell won’t as easily be able to provide autocompletion, since it will use spaces to separate units to autocomplete, and without spaces it won’t know when to start:

$ program --path=|
$ program --path |

On the first line, the shell has to know to strip away --path= and autocomplete from there (a naive implementation would just look for files starting with --path=). On the second line, the space means --path and the following word are treated as separate units, and so the shell can more easily autocomplete without doing any special handling.

All of this complexity is why I pretty much always outsource this to a library. I usually use clim for my projects, it’s pretty easy to use and offers more out-of-the-box than the built-in Crystal OptionParser. As soon as you try and make a general solution, you end up having to make some significant assumptions about what the format of the commands will be.


How I Learned to Stop Worrying and Love Concurrency

Doing more than one thing at a time is still a somewhat unsolved problem in programming languages. We’ve largely settled on how variables, types, exceptions, functions, and suchlike usually work, but when it comes to concurrency the options vary between “just use threads” and some version of “green threads” that just allows for something that looks like a thread but takes fewer resources. We’ve also mostly been stuck on whether to actually do more than one thing at a time1, rather than how best to do it.

In this post I’m going to be talking about concurrency—the ability for a program to work through multiple queues of work, switching between them where necessary. This is distinct from parallelism in that no two pieces of work will be happening at the same time. Of course parallelism has its place, but I’m interested in how concurrent programming can be made easier for most programs.

Many applications (I would argue most applications) benefit hugely from concurrency, and less from parallelism since IO is such a large part of many applications. Being able to send multiple network requests or read multiple files “at once” is useful for more applications than having multiple streams of CPU-intense work happening at once.

Exceptions

Before we talk about concurrency, I want to introduce you to my newly-invented programming language. It works just like every other language, except the return keyword is replaced two new keywords: yeet and hoik. To accompany these two new keywords there will be two assignment operators, y= and h= (pronounced “ye” and “he”). y= will be used to receive a yeeted value, h= to receive a hoiked value. If you want to receive both, you can use both in the same expression. So for example:

def get_value(a, b):
  if a == b:
    hoik a
  elif a < b:
    yeet b
  else:
    yeet a

x y= get_value(10, 5)
print(x) # => 10
x h= get_value(5, 5)
print(x) # => 5
p h= l y= get_value(1, 2)
print(p, l) # => None, 2

If a value is hoiked or yeeted but not received by the caller with h= or y=, the hoiking or yeeting will propagate up to the next function.

“Wow Will, that’s so original. That’s just exceptions.” Yes, I know. I’m very clever.

The idea of having two different ways of returning from a function seems bizarre, until you take a step back and realise that most programming languages have two routes out of a function, you just don’t really consider the second one. For example, what does this do:

def parse_file(path):
  contents = read_file(path)
  data = parse_data(contents)
  return data

parse_file("~/config.yaml")

Does parse_data() get called? Well of course not, config.yaml doesn’t exist, and so read_file raises an exception and parse_file re-raises the exception, exiting early. The alternate path(s) through the function are basically invisible and often not given much thought.

Like it or not, humans have a serious thing with the number two. Having two ways of propagating data from a function is no exception (pun absolutely intended), and the ability for most code to ignore the exceptional case is usually convenient. There are obviously some fairly severe downsides—resource usage should be wrapped with a finally (or similar) block to ensure cleanup happens, creating an exception with a trace is not free, and there are plenty of cases where something could be considered a valid return or an exception (like an HTTP response with a 300-block status code). It’s up to the API designer to work out what should be communicated via a return value, and what should be communicated via an exception.

Swift has an interesting approach to exceptions; any call site that can raise an exception must be marked with try or its friends:

  • try will re-raise the exception, forcing the function to be marked with throws and the caller one level up must handle the exception instead.
  • try? will turn any exceptions into an optional, so if an exception is raised you just receive nil.
  • try! converts the exception into a fatal error, stopping the program.

I like having an explicit marker of which calls could cause an exception and alter the flow of the program. It means that the typically-invisible alternate path through the program is clearer, and I know whenever I see try, control flow could be jumping or returning to a different point in the program.

This does have its downsides however; there is an implicit syntactic cost to marking a function as throws. Every caller then must choose to propagate or handle the exception somehow. In many cases this makes a lot of sense—if the call can fail, mark it as throws and add try. But what about calls that should never fail, but can under some circumstances? Let’s consider this fairly innocuous program:2

let text = "oh no"
let index = str.index(
  text.startIndex, offsetBy: 7)
print(text[index])

I’ve managed to create an index on the string that is outside its bounds. The subscript operator on a string isn’t marked with throws, so its only options to communicate this failure are:

  1. return some sentinel value (like an empty string)
  2. crash the whole program
  3. return invalid garbage and let the program continue running like nothing happened

Swift chooses the second option:

Swift/StringCharacterView.swift:158: Fatal error: String index is out of bounds
Current stack trace:
0    libswiftCore.so    0x00007fe01d488740 _swift_stdlib_reportFatalErrorInFile + 113
1    libswiftCore.so    0x00007fe01d163fe4 <unavailable> + 1458148
2    libswiftCore.so    0x00007fe01d163e64 <unavailable> + 1457764
3    libswiftCore.so    0x00007fe01d163b9a <unavailable> + 1457050
4    libswiftCore.so    0x00007fe01d163720 _assertionFailure(_:_:file:line:flags:) + 253
5    libswiftCore.so    0x00007fe01d29d54c <unavailable> + 2741580
6    swift-test         0x000055b8dbcd7e7a <unavailable> + 3706
7    libc.so.6          0x00007fe01c029d90 <unavailable> + 171408
8    libc.so.6          0x00007fe01c029dc0 __libc_start_main + 128
9    swift-test         0x000055b8dbcd7b55 <unavailable> + 2901

Aside from not giving us a stack trace, there’s no way for me to recover from this failure3. If the function isn’t marked as throws, it doesn’t have a good way to report an unexpected failure. The result is that you’re forced to ensure that every value passed to the subscript operator is valid—just like if you were programming in C.

You could mark all methods like this with throws, but that adds a lot of syntactic noise for something that should never happen. I’m sure that the end result would be most people using try! with a justification of “I know the index is within the bounds”.

Java worked around this by having two types of exceptions, checked and unchecked. It’s up to the developer to decide which is appropriate. You can make an API clearer either by including exceptions in the type system—forcing them to be handled in a similar (if more verbose) way to Swift—or omit them from the type system, having them crash the program if unhandled, but still able to be handled in the same way as checked exceptions.

I presume the design of Swift’s exceptions was driven by a desire to avoid checking for failure on every single function call. I’m more interested in syntax here, understanding the performance trade-offs is another topic entirely.

Swift is mostly the outlier here in terms of the status-quo of mainstream languages. The default exception-handling approach is that any function can throw an exception, and that exception will propagate up the stack until a caller catches appropriately. Designers of general-purpose application programming languages have generally decided that automatic error propagation and implicit error checking after each call is worth the performance trade-off. A language doing something different, for example requiring manual error handling, is somewhat noteworthy.

async / await & Concurrency

The most popular4 implementation of concurrency into language is using two keywords—async and await—to annotate points in the program where it can stop and do something else while something happens in the background. Usually this bridges to a historical API that uses something called a “future” or a “promise”.

The basic idea behind a “future” or “promise” API (I’m just going to call them futures from now on) is that you want to save some code for running later, and often a little bit more code for after that.

The reason this works so well is that most languages don’t have support for pausing execution of a running function and coming back to it later, but they do have support for code-as-data-ish in the form of objects with associated methods, and often those objects can be anonymous5. So in Java land we could always do something like this:

HTTPTool.sendGetRequest(
  "https://example.com",
  new HTTPResponseHandler() {
    @Override
    public void handle(HTTPResponse response) {
      System.out.println(response.getBody());
    }
  });

The code in handle() (and any data that it has access to) is effectively saved for later. There’s a suspension point conceptually in my code, but the actual language doesn’t really know that. It just knows about an HTTPResponseHandler object that it needs to hold a reference to so that sendGetRequest can call the .handle() method.

Where this gets super messy is when you want to do one asynchronous thing after another. Say you want to make a second HTTP request with the result of the first, you’d have to do something like:

HTTPTool.sendGetRequest(
  "https://example.com",
  new HTTPResponseHandler() {
    @Override
    public void handle(HTTPResponse response) {
      HTTPTool.sendGetRequest(
        response.getHeader("Location"),
        new HTTPResponseHandler() {
          @Override
          public void handle(HTTPResponse response) {
            System.out.println(response.getBody());
          }
        });
    }
  });

This results in a Pyramid of Doom where each level of async-ness is another level of indentation. Futures work around this problem by allowing “chaining”, inverting how the callbacks are built and avoiding nested indentations:

HTTPTool.sendGetRequest("https://example.com")
  .then(response ->
    HTTPTool.sendGetRequest(
      response.getHeader("Location")))
  .then(response -> {
    System.out.println(response.getBody());
  });

This is obviously much better with Java lambdas, which are less verbose than writing out a full anonymous class implementation, but are conceptually the same thing. However we’re still using closures to hack around the fact that we can’t pause a function.

Most futures APIs are pretty good at chaining a bunch of requests together, but when you get to anything more complicated, you end up having to use a sub-language that operates on futures: continue when all these finish, when one of them finish, do this if one fails, etc. It’s fairly easy to lose track of all your futures and leave one doing work to produce a result that is never used.

What async/await does is allow us to write the closures inline in the body of the function, so our code would end up like this:6

let response = await HTTPTool.sendGetRequest("https://example.com")
let url = response.headers["Location"]
let response2 = await HTTPTool.sendGetRequest(url)
println(response2.body)

The code reads as though the code blocks until a value is available, but what is effectively happening is that at each await, the compiler splits the function in two, and inserts the necessary code to turn the latter half into a callback. This way you can integrate into an existing language without having to change your byte code interpreter—Kotlin does this so it can have concurrency and still interop with Java.

When you’re introducing this awesome function-splitting compiler trick, you can’t do it by default for all functions, since anything from before the trick (ie: Java code) won’t know anything about the implicit callbacks and so won’t be able to call them correctly. To solve this problem you introduce function colours—some functions are asynchronous, some functions are synchronous, and there are rules about how they interact. In general it looks like this:

  • Synchronous functions can call synchronous functions
  • Asynchronous functions can call synchronous functions
  • Asynchronous functions can call asynchronous functions
  • Synchronous functions can cast to asynchronous functions

I’m borrowing the term cast here from Elixir/erlang. Casting over in that world is sending a message but not receiving a result. In most languages with async/await you can start an asynchronous function, but you can’t get a result from it—since you don’t know when it will finish, and your function can’t split into a callback to run when the async call finishes.

This split system introduces a problem similar to how Swift handles exceptions—you can only do async work from an async context. If you don’t get called from an async context, you can’t do any async work and receive the result. This makes it harder to reach for async as a tool—as soon as you’ve made one major API async, all callers of it must be async, and all callers of them must be async. It will propagate through your codebase like a wildfire.

Unlike exceptions, you can’t safely handle async work in a non-async context without risking deadlocking your program. A function that doesn’t throw an exception can call a function that does throw one, it just needs to handle the failure within its body and return an appropriate result. A synchronous function can’t do this if it needs to call an async function. In some cases it may be able to block the thread while it waits for a result, but in a single-threaded context, the async function never gets an opportunity to run, and so the program deadlocks. In a multi-threaded context, some work might still be constrained to a single thread (ie: the UI thread or a background thread) and if you block on that you will deadlock.

The worst thing is that often blocking the thread will work, but it introduces a possibility of all of your threads blocking on async work at the same time, preventing any of the async work from progressing, deadlocking your program but only sometimes.

So why do we have async and await in the first place? As far as I can see there are two reasons, the first is that we don’t want to break compatibility with non-async code that can’t be automatically split into callbacks. The second is that we want to make it explicit that on an await point, the program can go off and do something else—potentially for an indefinitely long amount of time. Even if you call an async function that only takes two milliseconds to finish, most implementations use co-operative multitasking and so there’s no protection against some function calculating primes in the background preventing a context switch back to your function.

“Co-operative” multitasking means that each function is responsible for ensuring that there are enough points that it yields control back to the scheduler to make progress on some other work. If there’s a huge CPU-intensive calculation going on that doesn’t yield, then nothing will happen concurrently until that calculation is completely finished. “Pre-emptive” multitasking will proactively stop one function if it’s running for too long and do some other queued work.

If you’re making a brand-new language that isn’t saddled with backwards compatibility to an existing language or runtime, would you make this same tradeoff? The best language ever (Crystal) and notable poster-child of concurrency (Go) both omit the need for an async keyword.

In both languages, every function is treated as async. At any point7 in a function, execution can swap to a different function and do some work there before swapping back. Much to the fear of people that like their code to be explicit, at any point in your program, an arbitrarily large gap in execution could happen.

Before I used a language with async/await I had heard people talking about how amazing it was, and always got confused because I was used to writing concurrent code in Crystal (or Go before that) where this was not needed. I felt like I was missing something and that this syntax would unlock some new way of doing things, but the reality is just that it’s most often just a way to bridge to a old API because of backwards-compatibility constraints in the language.

Rust is in a particularly tricky situation with async, as their no-runtime and zero-cost abstractions goals mean they can’t just wrap the whole program in an event loop. I don’t know much about Rust—much less writing async code using it—but found these posts to be an interesting look at the history and state of async in Rust:

Using Concurrency

That’s less than half the battle. We can pause a function mid-execution, but we haven’t actually done two things at the same time1. The biggest benefit of non-blocking IO is that you can easily send off two slow requests (eg: over the network) and only wait for the slowest one before continuing, rather than doing them in sequence. This is another API design challenge. The simplest example looks like this:8

        B
      /   \
 o - A     D - o
      \   /
        C

Our function starts on the left, does some processing in A, does B and C at the same time, and then once both have finished does the final step D. There are plenty of ways you could handle this, and the measure of a good API is how easy it is to do the right thing—not introducing race conditions, unexpected behaviour, memory leaks, etc.

The example I’ll use here is something you might see in the world’s most naive web browser—we’re going to load a page and try to also load the favicon for that webpage at the same time. Here’s one example in Go, a language that doesn’t have any notion of async/await because every function can be interrupted at any point:

func loadPage(url string) WebPage {
  pageChan := make(chan []byte)
  faviconChan := make(chan []byte)
  go sendRequest(url, pageChan)
  go sendRequest(url + "/favicon.ico", faviconChan)
  page := <-pageChan
  favicon := <-faviconChan
  return WebPage{page: page, favicon: favicon}
}

And here’s an example of the same function in Swift, that does have async/await:

func loadPage(url: String) -> WebPage {
  async let page = sendRequest(url)
  async let favicon = sendRequest(url + "/favicon.ico")
  return WebPage(page: await page, favicon: await favicon)
}

Ok I’m going to pause here and say that the following section is basically just my notes on Nathaniel J. Smith’s post Notes on structured concurrency, or: Go statement considered harmful. I recommend it, it’s a good read. You can come back to this later.

The main difference here is that Go doesn’t have any higher-level abstractions for dealing with concurrency as values, just as goroutines using the go keyword, and channels using the chan keyword. We have to hand-craft any structure in our concurrency with our bare hands. Appropriately, Swift has a keyword for this. Instead of immediately await-ing an async function, we can assign it to a variable with async let and then await the value later.

What happens when our code gets a little more complicated? Let’s say we’re writing a program to fetch posts from our favourite blogs. We know that some have an Atom feed, and we should prefer that if it exists, otherwise we should fall back to the RSS feed. This might look something like:

func getFeedsFrom(url: string) []Feed {
  atomChannel := make(chan Response)
  rssChannel := make(chan Response)
  go fetchFeed(url + "/atom.xml", atomChannel)
  go fetchFeed(url + "/rss.xml", rssChannel)
  atomResponse := <-atomChannel
  if atomResponse.IsSuccess() {
    return parseItems(atomResponse)
  }
  rssResponse := <-rssChannel
  return parseItems(rssResponse)
}

Seems reasonable? The problem is that go fetchFeed(url + "/rss.xml", rssChannel) can outlive the lifetime of the function if we get a successful response back for the Atom feed first. My program would just have a process running in the background doing useless work that I don’t care about, and there’s nothing in the language to help me do this correctly.9 Some languages with async/await can have the same problem, it’s just spelled slightly differently. Depending on the implementation, if a value is not await-ed, it will continue running in the background and any result or error discarded. For example this JavaScript example is much more succinct, but it has the same problem in that the RSS result will not get cleaned up when the function returns:

async function getFeeds(url) {
  let atom = fetchFeed(url + "/atom.xml")
  let rss = fetchFeed(url + "/rss.xml")

  let atomResult = await atom
  if (atomResult.success) {
    return parseItems(atomResult)
  }
  return parseItems(await rss)
}

You don’t think about it as much since you don’t have the explicit go keyword here, but you are doing the same thing. The control flow splits in two, one fetching the Atom feed and one fetching the RSS feed, and then you wait for the results.

Swift and Kotlin do this very well,10 I’m going to use Kotlin as an example here since it does things a little more explicitly. The only place you can split your function is within a CoroutineScope. By default, the scope will only finish when every coroutine in it has finished. So the previous example would look like:11

suspend fun getFeeds(url: String): List<Feed> {
  return coroutineScope {
    val atomAsync = async {
      fetchFeed(url + "/atom.xml")
    }
    val rssAsync = async {
      fetchFeed(url + "/rss.xml")
    }

    val atom = atomAsync.await()
    if (atom.success) {
      return@coroutineScope parseItems(atom)
    }
    return parseItems(rssAsync.await())
  }
}

This will wait for rssAsync to finish before coroutineScope returns. Even though we’ve got an early return on a successful fetch of the Atom feed, we’ll still implicitly wait for the RSS feed. If the RSS feed takes ages to respond, our whole function will take ages. This is the price to pay for encapsulation. coroutineScope enforces our concurrent code to be a diamond pattern, instead of that fork pattern:

Always this:
        B
      /   \
 o - A     D - o
      \   /
        C

Never this:
        - - - - - B - - - - - - ?
      /
 o - A     D - o
      \   /
        C

coroutineScope isn’t something magical, it’s just a function with a block argument12 that exposes the async method and keeps track of anything launched using it. If I find the “wait for everything to finish, even on early return” behaviour to be limiting, I can just write another function that uses the same building blocks to give me that behaviour:

suspend fun <T> coroutineScopeCancelOnReturn(
    block: suspend CoroutineScope.() -> T): T {
  return coroutineScope {
    val result = block.invoke(this)
    currentCoroutineContext().cancelChildren(null)
    return@coroutineScope result
  }
}

As concurrency is tied to a scope, we can use this building block to create our own scopes with different behaviours—mine makes it easier for blocks to cancel outstanding work after an early return, but you could equally easily make a scope that included a timeout, or limited the number of async calls happening at any one time. Most of the time you should only need the coroutineScope builder function, but there’s nothing stopping you from having a global variable that’s a scope, and having things work more like Go, where any function can start work in the scope that outlives the life of the function. It’s easier to spot however, since you just need to look at the cross-references for the global scope to find who’s using it. In Go you would have to manually inspect every function and understand how they handled concurrency to be sure that nothing was leaking.

The usage of scopes to handle concurrency changes how APIs are written. Take a basic HTTP server in Crystal:

server = HTTP::Server.new do |context|
  context.response.content_type = "text/plain"
  context.response.puts "Hello world!"
end

spawn do
  sleep 5.minutes
  server.close
end

server.bind_tcp "0", 8080
server.listen

After five minutes, what will this do? The documentation for #close says:

This closes the server sockets and stops processing any new requests, even on connections with keep-alive enabled. Currently processing requests are not interrupted but also not waited for. In order to give them some grace period for finishing, the calling context can add a timeout like sleep 10.seconds after #listen returns.

So the fibres spawned by the server (that run the block passed to .new) won’t be cancelled (which makes sense since fibres in Crystal can’t be cancelled) and will be left dangling. If Crystal had scoped coroutines like Kotlin, you could more easily change and reason about the behaviour by passing in a different scope to the server to use for handling requests—currently you have no guarantee that code in the .new block won’t run after .listen returns, or in theory any point after that, since an HTTP connection could take a prolonged time to establish before the handler code is run.

This would support the common use-case of cancelling outstanding requests when the server shuts down, but could easily be changed to add a timeout grace period, or stop the whole server if there is an unhandled exception (instead of printing it and continuing like nothing happened).

This implementation that uses scopes to control concurrency basically allows you to start building towards an Erlang supervisor tree.13

When I was in university I wrote a Slack bot using Elixir. It originally didn’t handle the “someone’s typing” notification from the Slack API, which caused it to crash. The (Elixir) process that ran the bot would crash, and the supervisor would replace it with another identical process. The storage was handled in a separate process, no data was lost and the bot would reconnect after a few seconds. If I had been using almost any other language, the end result probably would have been my whole program crashing, and me having to fix it immediately.

Having language support for cancelling pieces of work is also useful in a lot of other contexts, POSIX processes can be interrupted with a SIGINT which often trigger some kind of callback in the language, and the callback needs to communicate to any currently-running things that they should stop. Cancellation being a first-class citizen could allow for better default behaviour when a program is told to stop. This same concept could apply to applications in resource-constrained environments (ie phone OSes) so that they can respond effectively to being stopped due to lack of resources.

Concurrent Data

Once you’ve got the lifetime of your concurrency sorted, you need to work out the lifetime and access for your data. Rust does this with lifetime annotations and more static analysis than you can point a stick at, Pony has six reference capabilities that define how a variable can be used in what context. Erlang and Elixir just have fully immutable data structures, so you can’t mutate something you shouldn’t—you can have “mutable” data in a stateful process and introduce a race condition by multiple processes sending messages to the stateful process.

When I’m writing stuff in my free time I usually have a fairly cavalier attitude to thread safety. Crystal doesn’t have many guarantees for this, and since it’s currently single-threaded, most of the time it works fine. I’ll write some dirty code that spawns a new fibre that does some work and appends the result to a list. That’s always atomic—right?

I haven’t written enough Rust to appreciate what it’s like working with the borrow checker and lifetime annotations. From what I’ve read (a recent example) the borrow checker is frustrating, to say the least.

What I’d like is—somehow—for concurrent data access to be verified as easily as types are checked in Crystal. I get most of the benefits of static typing and dynamic typing by using Crystal’s type inference, can the lifetimes of variables be inferred in a similar way? I think this would be a very hard problem, and probably only practical if the general population of developers was already used to adding lifetime annotations—like they are with types—so you could just require fewer of them.

For me, the best concurrency system would be one that doesn’t require any tagging of functions, to avoid having to think about function colouring and the syntactic cost of annotating every call site, and a well-defined structured concurrency API that is used throughout the standard library and third party libraries, to give guarantees about the lifetime of concurrent work. This would need to have affordances to handle pending concurrent work as values (like Swift’s async let or Kotlin’s Deferred<>), and enough tools in the standard library to make it easy to handle these values. I don’t have particularly strong opinions about actors, lifetimes, or reference capabilities14 as I’ve not used them much to write any real-world programs.

If you liked this and want to read something by someone who knows what they’re talking about, I would recommend reading Notes on structured concurrency, or: Go statement considered harmful. Reading this was definitely the “ah-ha” moment where I was convinced that just tacking a spawn function in your language wasn’t good enough.

  1. Yeah yeah, I know it’s not actually at the same time, see my note right at the top. But you know what I mean, otherwise you wouldn’t have read the footnote. If you’re the type of person to correct a concurrency-versus-parallelism mistake, you’re also the kind of person that will read a footnote to be absolutely accurate in your correction.  2

  2. Credit to @acb for pointing this out. 

  3. Well maybe there is, I’m not a Swift expert. But we’re talking abstractly about syntax here, just roll with it. 

  4. This just means they don’t have a real name, and are typically defined inline where they get passed to a function. 

  5. Part of the joy of reading my blog is getting confused as I change language in the middle of a series of examples. This next one is in Swift, since Java doesn’t have async/await yet, and Kotlin’s implementation is less clear about await-ing things. 

  6. As long as a function yields, see co-operative versus pre-emptive note above. 

  7. Appreciate my effort- and bandwidth- saving ASCII diagram. 

  8. Maybe Go has some library for keeping track of your goroutines, but my basic point is this is not the default and not what I see people doing. 

  9. They basically do the previously mentioned blog post

  10. Yes I know my Kotlin function could be more idiomatic and shorter, but then everyone would be getting confused about Kotlin’s weird syntax, instead of getting confused at concurrency. 

  11. Ok Kotlin’s blocks are kinda magic. 

  12. Ignoring the fact you don’t have memory isolation for each process so you’ll never fully get there. 

  13. Perhaps that’s part 2? Subscribe to the RSS feed for more! 


Improvements for Initialising Pod Projects

One of the major usability misses with pod was that it was tricky to setup a new project. My goal was remove the need for language-specific development tools installed directly onto my computer, but whenever I started a new project with pod, I would need to run crystal init to create the basic project skeleton. With the new pod init command, this is now unnecessary.

To create a new project that wasn’t Crystal (like when I was messing around with Swift websockets) I would manually run a shell in a container using the image for the language and bind mount my working directory. I’d then use the package manager within the container to setup a project (eg: running swift package init) and then copy-paste some containerfiles from a previous project. This is incredibly fiddly and tedious. So I added functionality to pod that does this automatically.

Now when you run pod init, it asks for a base image to use—I use the latest Crystal Alpine image—and runs a container using that image with the working directory already available as a bind mount. Using the shell in that container you can run whatever tools are needed to setup the files for your project (npm init, crystal init, cargo init, etc). When you exit that shell, pod will create containerfiles and a pods.yaml file for the project, so in most cases you can just build with pod build and then pod run without any further changes.

Another thing that is more difficult in a container-only world is running REPLs inside the project. I don’t do this often—since the Crystal interpreter isn’t shipping in the main release yet—but I really enjoyed this way of working when I was using Elixir or Ruby more. Running an iex shell where I could recompile and interactively test my code was probably the most pleasant development experience I’ve ever had, and I wanted to support that with pod.

This is now possible with pod enter. By default you can run a shell using any of the images in your pods.yaml file, or you can configure entrypoints and jump straight into a REPL by running a particular command. So for example this:

entrypoints:
  iex:
    image: my-elixir-project:dev-latest
    shell: iex -S mix

Will allow me to do this:

$ pod enter iex
Erlang/OTP 26 [erts-14.0.2] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit]

Interactive Elixir (1.15.4) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)>

This bind-mounts the working directory in, so your code is available to any tools that run in the entrypoint. If you’ve got something more complicated that requires more customisation of the container (like exposing ports or binding additional directories) you can always make a custom run target that spawns an interactive shell.

You can imagine that if you were working on a Ruby on Rails project, you might setup something like this:

entrypoints:
  console:
    image: my-rails-project:dev-latest
    shell: bin/rails console

I’ve enjoyed working in a container-first and now largely container-only way, and improving pod is what has made this possible for me to do. You can check it out here, specifically the documentation for getting started.


The Best Reading App

Since the start of this year—for some reason, I can’t put my finger on what—I’ve been reading far more RSS feeds and articles that I’ve come across. I’ve sporadically used RSS in the past, but never really got into a groove with it. Currently I’m using NetNewsWire, which is good but doesn’t quite match the experience that I want, and so I’m writing this to manifest into existence the perfect app.

I’m an absolute fiend for a reverse-chronological list of items where my position is perfectly preserved. Tweetbot and Ivory are absolutely perfect for this; I can open the app, scroll a little bit, and then leave and come back later. It’s been part of my daily routine to scroll through the tech news and gossip each morning as I start my day.

Sadly no RSS reader seems to have quite the interface I want. So I’m going to describe it in enough detail that someone can find one for me, or some enterprising developer can implement it.

The main interface would of course be a reverse-chronological list of posts (oldest at the bottom), with the key feature that your scrolling position would remain where you left it. New posts would be loaded “above” your scroll position, so you would just continue to scroll up as you read through the feed. The feed should make use of article and feed images to present a visually engaging view, rather than a simple list of titles.

It’s not a use case that I particularly care for, but if I were making this app it’s something I’d be sure to handle: viewing a single feed. This doesn’t really mesh too well with the main list of posts, but wouldn’t be an insurmountable UI challenge. What I would probably do would allow viewing a single feed (or group of feeds) like you’d open a user profile on a social media app. Except that you’d be put into another reverse-chronological feed in the same position as the main feed—but just for posts from that publication. You could then scroll through the single feed, and once you were at the top there would be an option to clear those posts from the main feed. That way you could swap back to the main feed and continue reading without repeating posts. This would be useful if a single feed has dumped a big collection of posts and you just want to see if there’s something interesting, otherwise get them out of the way.

The second most important feature would be a built-in read-later service. I switched from Instapaper to GoodLinks and am very happy with it so far, but I would be a lot happier if it were built right into my feed reader. I’ll often come across an interesting post, but won’t have time or be in the mood for reading a longer or more technical post. Ideally in this case I could just mark it for reading later, without having to share the post to a different app (even if that other app is very good). This would unlock the ability to read half a post, realise you’ve run out of time, and then just close the article and have it automatically saved for later—with your position already saved.1

Automatically saving posts would definitely be a UX challenge. You don’t want to flag every single post that gets opened as “read later”, but you also don’t want to have the interaction be unreliable. I would probably lean towards just having a very convenient “close and keep for later” button that is just as easily accessible as swiping back to exit the article.

The next UI challenge would be presenting the main feed and the read-later feed in such a way that neither appears to be playing second fiddle to the other, while also making it easy to swap between them. Perhaps you’d automatically switch between the two depending if there were new posts? Or maybe that would just end up being annoying.

The app would need all the features of a good read-later app; saving links from other apps, presenting web pages in a friendlier reader view, saving reading progress, and saving pages for offline reading.

A problem that I would like solved—but I’m not sure if link-sharing APIs allow for this—is knowing where the link was shared from. I find myself getting to the bottom of a post that I’ve saved from somewhere and thinking “oh whoever shared this obviously has excellent taste, I should see what else they do” but have no good way to find where I got it from. Alternatively I think of someone that I should share it with, only to find out that they were the person that sent it to me in the first place.

I have considered creating a second Mastodon account and just subscribing to the feeds of websites I follow (or using RSS-to-activitypub translators), and adding this account to Ivory. What stops me from doing this is that it would only get me half way there—no read-later integration—and Ivory would be doing double duty, meaning I’d have to switch accounts constantly.


If you’re looking for some fresh feeds, I quite like grumpy.website (examples of frustrating UI design), and Pixel Envy (links and commentary on technology with a focus on privacy and open design, which is my jam).

  1. It’s not too uncommon for me to save stuff straight to GoodLinks if I think I might not read it in full immediately, so I don’t have to find where I got up to later. 


The Five Stages of Swift on Linux

Recently I attempted to learn about Swift’s async support by doing my favourite thing—writing an RPC framework. In this case the “RPC framework” is just a request/response abstraction over websockets (which are message-based), which makes the actual RPC bit very simple, as all it’s really doing is wrapping some objects and matching responses to requests.

In doing this, I think I went through all five stages of grief1, which often happens when I try and use Swift on Linux—despite my previous excitement about it.

Denial

So first of all I found the documentation for URLSession#webSocketTask(with:). At first glance the API seemed pretty reasonable. I had a quick read over some blog posts and ended up with some code to test out:

let task = URLSession.shared.webSocketTask(
  with: URL(string: "ws://brett:9080")!)
try! await task.send(.string("test message"))

Don’t use this code as an example. It doesn’t work. That’s the whole point—keep reading.

This seems pretty easy, I create a websocket task and then send a message using it. The message should be received by a simple Crystal HTTP::WebSocketHandler and logged, so I know when it’s working.

I run the program, and it just hangs. No error, no timeout (at least not one that I was patient enough to wait for). Now there isn’t anything that I can see from the documentation that I’m missing (mostly because there is no documentation for send(_:)).

Eventually I look back over the blog posts and see that you need to call resume() on the URLSessionWebSocketTask for it to do anything.

Anger

This is very frustrating. If I were writing the documentation for this class, I would make sure that the requirement to call resume() was the first thing anyone saw when looking at the docs. Currently you have to go to the URLSessionTask superclass and find the resume() method docs which state:

Newly-initialized tasks begin in a suspended state, so you need to call this method to start the task.

A friendly API would raise an exception if you tried to use it before it was ready—failing fast is going to reveal your problem more readily than silently doing the wrong thing. However, I don’t know enough about the wider URLSession API know whether there’s a design tradeoff here that makes failing fast impractical.

Ok so I’ve wasted a bunch of time trying to work out what’s wrong all because my task was suspended. Never mind, at least I know what the problem is now. I add the resume() call and now I get:

Fatal error: 'try!' expression unexpectedly raised an error: Error Domain=NSURLErrorDomain Code=-1002 "(null)"
Current stack trace:
0    libswiftCore.so                    0x00007fecbfa6eb80 _swift_stdlib_reportFatalErrorInFile + 112
1    libswiftCore.so                    0x00007fecbf76043f <unavailable> + 1442879
2    libswiftCore.so                    0x00007fecbf760257 <unavailable> + 1442391

Hmm an NSURLErrorDomain problem. A -1002 problem to be precise. This is my first rodeo in Swift-networking-land so I don’t know what a -1002 means off the top of my head. Eventually I find some info that points me to this list of all the error codes. Hilariously it doesn’t include the code in the list—just the name—so you have to open each case one-by-one until you find the one that matches your error code. The fourth from last one turned out to by my error: NSURLErrorUnsupportedURL.

Immediately I start thinking of all the possible ways that you could consider a URL unsupported, maybe the ws:// scheme should be wss://? or maybe it won’t handle hostnames and needs an IP address? Perhaps I’ve messed something up in my container2 and it’s counting a closed port as an unsupported URL? (a bizarre thing to do, but at this stage all bets were off).

Bargaining

So maybe URLSessionWebSocketTask is a lost cause, but SwiftNIO is always an option. I won’t go into this too much, but basically I stumbled at the first hurdle when I followed this post to add SwiftNIO as a dependency. I don’t really understand all the moving pieces here but basically:

dependencies: [
  .package(url: "https://github.com/apple/swift-nio", from: "2.58.0")),
],
.executableTarget(
  name: "WebSocketRPC",
  // bad, doesn't work
  dependencies: ["SwiftNIO"],
  // also no good
  dependencies: ["NIOWebSocket"],
  // perfect and excellent
  dependencies: [.product(name: "NIOWebSocket", package: "swift-nio")]
)

Why do I need a .product instead of just a string? No idea, and I couldn’t find this mentioned anywhere in the SPM documentation. I happened to stumble across an NIO example project and looked at the Package.swift file to find this.3

However after learning more about the SwiftNIO websocket implementation, it seems that I would need to handle much more of the underlying protocol and HTTP-to-websocket upgrade than I had expected. The example websocket client has over 200 lines to do the same thing I was hoping to accomplish in two.

Depression

Maybe websockets aren’t that cool anyway, what if I just use plain old HTTP? Maybe this will help me understand whatever I’m doing wrong with the websocket API. While I’m at it, why don’t I translate the callback-based API into an async one—that was the original purpose of this exercise in the first place, right?

func download(url: URL) async throws -> String {
  return try await withCheckedThrowingContinuation { (continuation: CheckedContinuation<String, Error>) in
    let task = URLSession.shared.dataTask(with: url) { (data, response, error) in
      if let error = error {
        continuation.resume(throwing: error)
      } else if let data = data {
        continuation.resume(returning: String(data: data, encoding: .utf8)!)
      } else {
        fatalError("impossible?")
      }
    }
    task.resume()
  }
}

print(try! await download(url: URL(string: "https://willhbr.net")!))

And that just works first time? That’s definitely weird. Was this an excuse to include some tidy callback-to-async code? Maybe.

Acceptance

At this point my curiosity got the better of me—would it work on MacOS? Maybe I would get a better error and suddenly understand what was going wrong?

After a bit of an adventure with xcrun (it turns out you can’t use the Swift compiler that’s installed with the Xcode Command Line Tools), I installed Xcode and ran the exact code I had been trying on Linux for hours.

And it worked first time without any issues. The most frustrating result.

Eventually I found this GitHub issue linked from a project’s README:

fatalError when trying to send a message using URLSessionWebSocketTask

That code runs perfectly fine under macOS (using Swift 5.7), but as soon as it’s run on Linux I get the error from above.

A few people chime in saying they see the same issue, and then this comment points to this page of the libcurl documentation:

WebSocket is an EXPERIMENTAL feature present in libcurl 7.86.0 and later. Since it is experimental, you need to explicitly enable it in the build for it to be present and available.

So if your underlying library doesn’t support websockets, it makes sense that a websocket URL is unsupported.


I don’t have much of a conclusion here, apart from the fact that this was a very frustrating journey. I’m sad to see that almost eight years after being open-sourced and supporting Linux, Swift is still full of subtle traps that are hard to debug. Hopefully the Swift Server Working Group is aware of these issues and continues to make improvements—a simple @available annotation would have saved a lot of time.

  1. Yeah I know the titles don’t really match the content, I just did this for a funny title, alright? 

  2. Of course I’m running this in a container. 

  3. While writing this I did end up finding that towards the bottom of the readme for SwiftNIO there is a “Getting Started” section that has the correct incantations. Only after you’ve read past the conceptual overview, repository organisation, and versioning scheme, however. 


Warp Terminal

Yesterday I came across Warp Terminal via their advertisement on Daring Fireball.1 Immediately I was fascinated to know what their backwards-compatibility story was, and how their features were implemented. This is in a similar vein to the difficulties of modernising shells, that I wrote about in more detail last month.

If you’re not sure of the difference between a terminal and a shell, The TTY demystified is a really good read to understand the history and responsibilities of both. Basically the terminal emulator pretends to be a computer from 1978, and the shell runs inside of that.

I only spent about half an hour playing around with Warp, so my impressions are not particularly well-informed, it’s still in beta so many of these issues could be on a roadmap to fix. I didn’t look at any of the AI or collaboration features, I’m only interested in the terminal emulation and shell integration.

What sets Warp apart from other terminal emulators is that it hooks into the shell and provides a graphical text editor for the prompt, rather than using the TTY. For normal humans that are used to the standard OS keyboard shortcuts, and being able to select and copy text in a predictable way this is an excellent feature. The output from each command you run lives in a block, which stack up and scroll off the screen. In the prompt editor, autocomplete and other suggestions are native UI, not part of the TTY. They can be clicked, support non-monospaced fonts, and many other UI innovations from the last 40 years.

In their blog post “How Warp Works” there is a brief explanation of how they integrate with the shell.2 Basically they use callbacks within popular shells (ZSH, Bash, and Fish) to know when the command is started. If my interpretation of this is correct, they do away with the shell prompt entirely, and instead use their non-shell editor to allow the user to write their command, then they pass the whole finished command to the shell, and use hooks in the shell to know when to cut off the output and create a new block.

What this means is that Warp has some significant limitations on what it can “warpify”. Only the input to the shell prompt gets the magic editor experience, if you run another interactive program (like irb) then you’re back to inputting text like it’s the ’70s. You can tell Warp to inject some code into certain commands, but this will only work in the aforementioned shells. If the command doesn’t understand POSIX shell syntax with the functions that Warp expects, it won’t work.

So by default, if you start your login shell and then run bash to start a sub-shell, that sub-shell will miss out on the Warp features. I’m aware that this argument is entirely a “perfect solution” fallacy but hey, someone’s got to advocate for a perfect solution.

What is nice is that if you run a command that uses the “full screen” TTY, it will just work—the block takes up the whole screen while the command is running. You can still run vim and tmux, so if this takes over I’ll still be able to get things done.

The prompt editor is definitely good if you’re not used to working with a traditional shell, but since I’m used to having Vim mode in ZSH, going back to a normal editor feels broken. Also since the editor is split out from the shell, autocompletions are in a separate system. I have a few custom autocompletes setup in ZSH, and not being able to access those in the editor was frustrating. I’d type gcd <TAB>, expecting to see a list of my projects, but instead just get a list of the files in the current directory. I assume there’s some way of piping this information into Warp, but it’s a shame they don’t (yet?) have integration to pull this straight from ZSH.

The autocompletes that I did get were mostly good—files or arguments from my shell history—but I did get a few weird suggestions. I tried ssh and was suggested a bunch of hosts with names that were some base64-encoded junk. None of these appeared in my shell history of SSH config files.

I said I wasn’t going to look at any of the AI features, but then I connected to my server to see how the dialog command worked. The answer was that it wasn’t installed. Warp then said “✨ Insert suggested command: dig 13:02:20”. I don’t know how it made the leap in logic from “command not found” to “do a DNS lookup”, or why it wanted to suggest passing the current time to the DNS lookup—it was 1:02PM UTC when that suggestion popped up.

Warp is another example of how hard it is modernise things that directly interact with the underlying OS concepts. Perhaps Warp can partner with the nushell developers and reinvent the shell and terminal at the same time.

In the end I’m obviously not going to move away from using iTerm. Warp is solving a bunch of problems that I don’t have, and adding a whole suite of AI features that I have no interest in. If you are a fairly light terminal user, and get frustrated at editing commands in the traditional shell prompt, then maybe Warp is for you. Use my referral code so I can get a free t-shirt.

You get like 80% of the benefit of using Warp’s fancy editor by knowing that in the MacOS terminal, option-click will move the cursor around by sending the appropriate arrow keys to the shell.

  1. Who needs ad personalisation when you can just go directly to your target market? 

  2. If you skip over all the bits about how they render Rust on the GPU and stuff.