{ "version": "https://jsonfeed.org/version/1.1", "title": "Will Richardson", "home_page_url": "https://willhbr.net/", "feed_url": "https://willhbr.net/feed.json", "description": "The blog of Will Richardson, a software engineer and photographer from New Zealand.", "icon": "https://willhbr.net/apple-touch-icon.png", "favicon": "https://willhbr.net/favicon.ico", "expired": false, "language": "en", "items": [{ "id": "https://willhbr.net/2024/03/16/further-adventures-in-tmux-code-evaluation/", "title": "Further Adventures in tmux Code Evaluation","content_html": "
In my previous post about how I wrote a compiler that turns Python code into a tmux config file that makes tmux evaluate the program by performing actions while switching between windows. My implementation relies on a feature in tmux called “hooks” which run an command whenever a certain action happens in tmux. The action that I was using was when a pane received focus. This worked great except I had to do some trickery to avoid tmux’s cycle detection in hooks—it won’t run a hook on an action that is triggered by a hook, which is a sensible thing to do.
\n\nI don’t want things to be sensible, and I managed to work around this by running every tmux action as a shell command using the tmux run
command. I’ve now worked out an even sillier way that this could work by using two tmux sessions1, each attached back to the other, then using bind-key
and send-keys
to trigger actions.
You start a tmux session with two windows. The first window just runs any command, a shell or whatever. The second window runs a second instance of tmux (you’d have to unset $TMUX
for this to work). That second instance of tmux is attached to a second session, also with two windows. The first window also just runs any command, and the second window attaches back to the original session. Here’s a diagram to make this a bit clearer:
Session A (blue) has two windows, the first A:1
is just running a shell, the second A:2
is attached to session B (red) which is showing the first window in session B, B:1
. Session B also has two shells, the second (B:2
) is attached to session A, and is showing window A:1
from session A.
What this cursed setup allows us to do is use send-keys
to trigger keybindings that are interpreted by tmux itself, rather than the program running inside tmux—because tmux is the program running inside tmux.
If you have a tmux pane that’s running a program like Vim and you run send-keys a
, the character “a” will be typed into Vim. The key is not interpreted at all by the surrounding tmux pane, even if you send a key sequence that would normally do something in tmux, it goes directly to the program in the pane. For example if your prefix key is C-z
, then send-keys C-z c
will not create a new window, it’ll probably suspend the running program and type a literal character “c”.
However, if the program that’s running in tmux is tmux, then the inner tmux instance will interpret the keys just like any other program.
\n\nSo if we go back to our diagram, session A uses send-keys
to trigger an action in session B. Session B can use send-keys
to trigger an action in session A, by virtue of it also having a client attached to session A in one of its panes. The program would be evaluated by each session responding to a key binding, doing an action, and then sending a key binding to the other session to trigger the next instruction. For example, using some of the tricks I described in my previous post:
bind-key -n g {\n set-buffer \"1\"\n send-keys -t :=2 q\n}\n\nbind-key -n q {\n set-buffer \"2\"\n send-keys -t :=2 w\n}\n\nbind-key -n w {\n run 'tmux rename-window \"#{buffer_sample}\"'\n run 'tmux delete-buffer'\n run 'tmux rename-window \"#{e|+:#{buffer_sample},#{window_name}}\"'\n run 'tmux delete-buffer'\n run 'tmux set-buffer \"#{window_name}\"'\n send-keys -t :=2 e\n}\n\n# ... program continues with successive bindings\n
The program starts with the user pressing “g” in session A, which pushes a value onto the stack and sends the key “q” to the second window, which triggers the next action in session B. That next action pushes another value and sends “w” to the second window in session B, which triggers an action back in session A. This action does some juggling of the buffer stack and adds the two values together, putting the result on the stack. It then sends “e” to the second window in session A, triggering whatever the next action would be in session B.
\n\nThis should also allow the compiler to get rid of the global-expansion trick, in the last post I wrote:
\n\n\n\n\nWrapping everything in a call to
\nrun
gives us another feature: global variable expansion. Only certain arguments to tmux commands have variable expansion on them, but the whole string passed torun
is expanded, which means we can use variables anywhere in any tmux command.
Since we’re no longer using windows as instructions, it’s much easier to use them as variable storage. This should remove the need for storing variables as custom options, and using buffers as a stack.
\n\nThe stack would just be a separate, specifically-named session where each window contains a value on the stack. To add a value, you write the desired contents to that pane using either paste-buffer
to dump from a buffer, or send-keys
to dump a literal value. You can get that value back with capture-pane
and put it into a specific buffer with the -b
flag.
Options can be set to expand formats with the -F
flag, so you can put the contents of a window-based variable into a custom option with a command like set -F @my_option '#{buffer_sample}'
. This would allow for some more juggling without having to use the window and session name, like I did before.
Ideally you would have a different variable-storage session for each stack frame, and somehow read values from it corresponding to the active function call. This might not be possible without global expansion of the command, but if you allowed that then you’d avoid the problems that my current implementation has with having a single global set of variables.
\n\nThe astute among you might be thinking “wait Will, what happens when you want to have more than 26 or 52 actions, you’ll run out of letters!” Well, tmux has a feature called “key tables” which allow for swapping the set of active key bindings, so all you need to do is have each letter swap to a unique key table, and then the next letter actually does an action, which gives you enough space for 2,704 actions, if you only use upper and lower-case letters. But you can have as many key tables as you want, so you can just keep increasing the length of the sequence of keys required to trigger an action, allowing for more and more actions for larger programs.
\n\nI don’t think I’ve really worked around the “no global expansion” limitation that I imposed, but I think this shows there are enough different avenues to solve this that you can probably assemble something without the trade-offs that I made originally.
\nActually you can probably do this with one session connected back to itself, but I only realised this after I’d written up my explanation of how this would work. ↩
\nYou can use features of tmux to implement a Turing-complete instruction set, allowing you to compile code that runs in tmux by moving windows.
\n\nI feel like I really have to emphasise this: I’m not running a command-line program in tmux, or using tmux to launch a program. I can get tmux to run real code by switching between windows.
\n\nWatch a demo of it in action below or on YouTube:
\n\nThis whole mess started when I solved an issue I had with a helper script using the tmux wait-for
command. I thought to myself “wow tmux has a lot of weird features, it seems like you could run a program in it”. This idea completely took over my brain and I couldn’t think of anything else. I had to know if it was possible.
I spent a week writing a compiler that turns Python(ish) code into a tmux config file, which when you load makes tmux swap between windows super fast and run that code.
\n\nIf you just want to run your own code in tmux, you can grab the compiler from GitHub or see it in action in this video.
\n\nI’m not really a byte-code kinda guy. I’ve tinkered around with plenty of interpreters before, but those were tree-walk interpreters, or they compiled to another high-level language. I haven’t spent much time thinking about byte code instructions and how VMs actually get implemented since my second year of university where we had to implement a simple language that compiled to the JVM. I do own a physical copy of the delightful Crafting Interpreters by Robert Nystrom, which I assume counts for something.
\n\nOne thing I’m pretty sure I need is a stack. The easiest way to evaluate an arbitrarily-nested expression is to have each operation take the top N items from the stack, process them, and put the result on the top of the stack. The next operation takes another N items, and so on.
\n\nAt every stage of this project I could think of a solid handful of different tmux features that could be used (or abused) to implement the functionality. For the stack the easiest option was to use buffers.
\n\nBuffers are supposed to be used for things like copy-pasting, but the buffer commands have some neat side-effects. If you call set-buffer 'some value'
with no buffer name, you get a buffer named bufferN
with “some value” in it. Every time you call set-buffer
it gets added to the top of the list of buffers. Every time you call delete-buffer
(without specifying a buffer name) it’ll delete the topmost buffer from the list.
And just to make this even more convenient, there’s a string expansion #{buffer_sample}
that will give you the contents of the topmost buffer. We’ve got the perfect feature for implementing a stack.
Ok, string expansions. Most tmux commands allow for expanding variables so you can inject information about the current pane, window, session, etc into your command. For example to rename a window to the path of the current working directory, you can do:
\n\nrename-window '#{pane_current_path}'\n
These expansions are documented in the “formats” section of the tmux manual. The most obvious use of these is to define the format of your status line. For example the left hand side of my status line looks like:
\n\nset -g status-left '#[bold] #{session_name} #[nobold]│ #{host} │ %H:%M '\n
#{session_name}
and #{host}
are replaced with the name of the current session, and the hostname of the machine that tmux is currently running on.
If you read the manual in a little more detail, you’ll notice that you can actually do a little more than just inserting the value of a variable. There is a conditional operator, which can check that value of a variable and output one of two different options. I use this to show a “+” next to windows that are zoomed:
\n\nset window-status-format ' #I#{?#{window_zoomed_flag},+, }│ #W '\n
#{window_zoomed_flag}
is 1
if the current window is zoomed, so the window gets a +
next to the index. If the window is not zoomed, then it gets an empty space next to the index.
There are also operators for arithmetic operations, so #{e|*:7,6}
will expand to 42
, and #{e|<:1,5}
expands to 1
(tmux uses 1
and 0
for true/false).
Now of course you could just make a huge variable expansion and use that to make a computation, but that is quite limited. You can’t make a loop or have any action that has a side-effect.
\n\nThe feature that really gets things going is hooks. You can run a tmux command whenever a certain event happens. For example, if you want to split your window every time the window got renamed:
\n\nset-hook window-renamed split-window\n
Now whenever you rename a window, it gains a split! Splendid. I never really found a legitimate use for hooks, otherwise I’d give you a less contrived example.
\n\nI did of course find a completely illegitimate use for hooks. There’s a hook called pane-focus-in
that is triggered whenever a client switches to that pane. This is the key feature that makes the compiler work. You can set the hook to run multiple commands, so we can say “when you focus on this window, do X, then look at the next window”. Something like:
set-hook pane-focus-in {\n set-buffer 'some value'\n next-window\n}\n
Now this doesn’t actually work for what I want, as tmux is too smart and won’t trigger the pane-focus-in
event on the next window, since it wants to avoid accidentally creating cycles in window navigation. This is annoying if you are trying to intentionally create cycles in your window navigation.
However, if you instead wrap the commands in a shell call, that check gets skipped:
\n\nset-hook pane-focus-in {\n run \"tmux set-buffer 'some value'\"\n run 'tmux next-window'\n}\n
Some might say that this is cheating, but the shell is just being used to forward the command back to tmux—I’m not using any features of the shell here.
\n\nWrapping everything in a call to run
gives us another feature: global variable expansion. Only certain arguments to tmux commands have variable expansion on them, but the whole string passed to run
is expanded, which means we can use variables anywhere in any tmux command. For example:
This will add a buffer containing the literal string '#{session_name}'
:
set-buffer '#{session_name}'\n
But this will add a buffer containing whatever the current session name is:
\n\nrun \"set-buffer '#{session_name}'\"\n
The last ingredient we need is some way to store variables. I had considered storing these as window names, but setting and retrieving these would have been a huge pain, even if it was technically possible. I ended up going with the low-effort solution. You can set custom options in tmux as long as they’re prefixed with @
. This has the limitation that you’ve got a single set of global variables1, but it’ll do.
set @some-option \"some value\"\ndisplay \"option is: #{@some-option}\"\n
So what does it look like to actually do something? When we run the expression 1 + 2
, the result should be stored in the top of the stack.
First we add our two operands to the stack using set-buffer
. We could inline them, but I’m going for brute-force predictability here, with absolutely no regard for optimisation.
new-window\nset-hook pane-focus-in {\n run \"tmux set-buffer '1'\"\n run 'tmux next-window'\n}\n\nnew-window\nset-hook pane-focus-in {\n run \"tmux set-buffer '2'\"\n run 'tmux next-window'\n}\n
The next bit is a little tricky, we need to have access to two values from the stack to do the addition operation, but we can only access the top using #{buffer_sample}
. We can work around this by using the window name as a temporary storage space. We’re not using the window name for anything else, and it only needs to stay there for two instructions.
We rename the next window to be the top of the stack, and delete the top item from the stack. We need to keep track of window indexes for this trick (:=4
targets window number 4), which will also be needed when we implement conditionals and goto.
new-window\nset-hook pane-focus-in {\n run 'tmux rename-window -t :=4 \"#{buffer_sample}\"'\n run 'tmux delete-buffer'\n run 'tmux next-window'\n}\n
We’ve got our two values accessible now—one in buffer_sample
and one in window_name
so now we can finally add them together:
new-window\nset-hook pane-focus-in {\n run 'tmux rename-window -t :=4 \"#{e|+:#{buffer_sample},#{window_name}}\"'\n run 'tmux delete-buffer'\n run 'tmux set-buffer \"#{window_name}\"'\n run 'tmux next-window'\n}\n
We rename the current window to be #{e|+:#{buffer_sample},#{window_name}}
, which adds the two numbers together, replacing our window name scratch space. Next we delete the top of the stack (the topmost buffer) since we’ve consumed that value now, and put the result of the operation onto the top of the stack. Finally we advance to the next instruction.
This is the basis of all the operations needed to implement a simple Python-like language. To implement conditionals we just use a conditional expansion to determine which window to change to, instead of always using next-window
:
new-window\nset-hook pane-focus-in {\n run 'tmux select-window -t \"#{?#{buffer_sample},:=6,:=9}\"'\n run 'tmux delete-buffer'\n}\n
If buffer_sample
is 1
(or any other non-empty and non-zero value) we go to window 6, if it’s 0
or empty, then we go to window 9. Loops are implemented in a similar way, just with an unconditional jump to a window before the current one.
The biggest challenge when I implemented the compiler for Shortcuts was the fact that Shortcuts doesn’t really have support for functions. I could have just dumped all the functions into a single tmux session, and jumped around to different window indices when calling different functions. But that seemed too easy.
\n\nInstead I made each function its own session, and used switch-client
to swap the current client over to the other session. This gets difficult when you want to return back to the calling function.
I don’t know how real byte code does this (see disclaimer above) but I figured that I could just put the return point on the stack before calling a function, and then the function just has to do a little swap of the items on the stack and call switch-client
again.
I needed to use both the session name and the window name as scratch storage to get this to work, but the return instruction ends up like this:
\n\nnew-window\nset-hook pane-focus-in {\n # the value to return\n run 'tmux rename-session -- \"#{buffer_sample}\"'\n run 'tmux delete-buffer'\n # the location to return to\n run 'tmux rename-window -- \"#{buffer_sample}\"'\n run 'tmux delete-buffer'\n # put return value back on stack\n run 'tmux set-buffer \"#S\"'\n # restore session name\n run 'tmux rename-session -- \"func\"'\n run 'tmux switch-client -t \"#{window_name}\"'\n}\n
The function call instruction is much simpler, you just need to add all the arguments onto the stack, and then do:
\n\n# put the return point on the stack\nnew-window\nset-hook pane-focus-in {\n run \"tmux set-buffer 'main:3'\"\n run 'tmux next-window'\n}\n\n# any arguments would be added here\n\n# switch the client to call the function\nnew-window\nset-hook pane-focus-in {\n run 'tmux switch-client -t func:1'\n}\n
I know at compile time the exact instruction to jump back to, so that main:3
is hard-coded into the program to be the name of the current function and the index of the window after the switch-client
call.
Since window 0 on every session is “free parking”, you switch directly to window 1 which kicks off the function. The return value from a function is whatever item is on the top of the stack when the function jumps back to the caller.
\n\nSo I’ve got a subset of Python to run on tmux that can only use numbers. Is this Turing-complete?
\n\nI don’t know. I assume it is, or at least it’s close enough that you could make some changes and end up with a Turing-complete language that compiles and runs on tmux. This was enough to satisfy my curiosity and say “yep tmux is probably Turing-complete”, but I don’t want to go on the internet and make that claim without completely backing it up.
\n\nSo obviously I have to make a full-featured compiler for a Turing-complete language. So I also wrote a Brainfuck-to-tmux compiler.
\n\nBrainfuck is exceptionally simple; it only has eight instructions:
\n\n>
and <
move the data pointer to the right and left+
and -
increment and decrement the byte at the current location,
reads one byte from the input stream and places it on the data pointer.
writes the current byte to the output stream[
jumps to the matching ]
if the current byte is zero, otherwise continues as normal]
jumps back to the previous matching [
if the current byte is non-zero, otherwise continues as normalInitially I thought about using an infinite sequence of windows to represent the data, but then I realised that I could just create numbered variables on the fly, which is much simpler. The session name acts as a data “pointer”, the windows again act as instructions, I pull from a variable for input, and use send-keys
to the first window as output.
The instructions look like this:
\n\nnew-window\nset-hook pane-focus-in {\n run 'tmux rename-session -- \"#{e|-:#S,1}\"'\n run 'tmux next-window'\n}\n\nnew-window\nset-hook pane-focus-in {\n run 'tmux rename-session -- \"#{e|+:#S,1}\"'\n run 'tmux next-window'\n}\n
<
and >
(above) are super simple—they just rename the session to be one more or less than the current session name. The default tmux session name is 0
so I don’t even need to set it initially.
new-window\nset-hook pane-focus-in {\n run 'tmux set -s \"@data-#S\" \"#{e|%:#{e|+:#{E:##{@data-#S#}},1},256}\"'\n run 'tmux next-window'\n}\n\nnew-window\nset-hook pane-focus-in {\n run 'tmux set -s \"@data-#S\" \"#{e|%:#{e|+:#{E:##{@data-#S#}},255},256}\"'\n run 'tmux next-window'\n}\n
These two implement +
and -
. They read from and store their result in the variable @data-#S
, #S
being the session name which I’m using as the data pointer.
#{E:
allows for double-expanding variables, so I can expand @data-#S
into something like @data-0
and then expand that into the value stored in that variable. If the variable doesn’t exist it expands to an empty string, and when you add or subtract from an empty string it gets implicitly converted to 0
.
I have to modulo the results by 256 as Brainfuck expects an array of bytes, not arbitrarily large numbers. I didn’t realise this from my extensive research of skimming the Wikipedia page, so it took a bit of head-scratching while my program was looping out of control.
\n\nnew-window\nset-hook pane-focus-in {\n run 'tmux select-window -t \":=#{?#{E:##{@data-#S#}},6,7}\"'\n}\n\nnew-window\nset-hook pane-focus-in {\n run 'tmux select-window -t \":=#{?#{E:##{@data-#S#}},5,7}\"'\n}\n
I thought that [
and ]
would be tricky until I realised that I could pre-compute where they jumped to (I’d only ever implemented Brainfuck as a dumb interpreter before). They use the same select-window
logic as the conditionals in the Python compiler.
new-window\nset-hook pane-focus-in {\n run 'tmux set -s \"@data-#S\" \"#{=1:@input}\"'\n run 'tmux set -s \"@input\" \"#{=-#{e|-:#{n:@input},1}:#{?#{e|==:#{n:@input},1},0,#{@input}}}\"'\n run 'tmux next-window'\n}\n\nnew-window\nset-hook pane-focus-in {\n run 'tmux send-keys -t \":=0\" \"#{a:#{e|+:0,#{E:##{@data-#S#}}}}\"'\n run 'tmux next-window'\n}\n
This has some serious tmux expansion going on, but the basic idea is to implement ,
by taking the first character from the @input
option, and then truncate the first character from @input
. This is easier said than done as it requires getting the length and calculating the substring manually.
.
is much simpler, I just take the current value and pass it to send-keys
, using the #{a:
expansion filter to turn the number into an ASCII character.
A limitation of my implementation is that the input will only get interpreted as numbers—tmux doesn’t have a way to convert ASCII characters to their numeric code points.
\n\n\n\n \n\nIf you look at any of the compiled example programs in the repo you can see that I’m not exactly generating the most optimised code. For example to run this super simple program:
\n\na = 1\nprint(a)\n
The compiler will:
\n\n1
onto the stack@a
to the top of the stack@a
onto the stackdisplay-message
with the topmost element from the stack0
as a “return value” of print
to the stackAll that could be replaced with something much simpler:
\n\ndisplay-message
with the value 1
But that requires much more analysis of the actual program, and I’m not going for efficiency here, so I accepted generating unnecessary instructions.
\n\nLike I mentioned earlier, there are plenty of other ways that the data could be modelled. I was considering using window names to store my variables, but you could also store data in the tmux window buffers themselves—using send-keys
and capture-pane
to read and write data. Or maybe you could have nested sessions, where the outermost session windows are the instructions, and the inner session windows are the data. Window splits and layouts would be another possibility for storing data. That’s also not even considering the possibility of moving windows around to change how the program runs while it’s running. Perhaps update-environment
is a better variable store than custom options?
If you want to continue this project and implement an LLVM backend that targets tmux, or just want to hack around with tmux in general, you use the -L some-socket
flag to run a separate server, so you don’t mess up your actual tmux server. Instead of starting a normal shell in every window, I ran tmux wait-for main
. That way I could run tmux wait-for -S main
to close every single window at once—since if you try and close them one-by-one you end up running parts of your program. Alternatively, tmux kill-server
will probably do the trick.
Overall I’m super happy at how well this ended up working, and how directly the various concepts of a normal instruction set can be mapped to tmux commands.
\n\nI ran a benchmark to see how tmux-python compares to Python 3.11.4. I didn’t want to wait around for too long so I just used my is_prime
example to check whether 269 is a prime. On my dev machine, Python runs this in 0.02 seconds, whereas my tmux version takes just over a minute.
Technically it’s a set of variables per function if you pass the -s
flag to set the option only on the current session, but not per function call. So if you have a function f
that sets variable a
and then calls itself, a
will contain the value set from the previous function. ↩
I’m a very heavy user of tmux, and like to share how I make the most of it. This was going to be a short list of some nice things to know and some pointers to features people might not be aware of, but then I realised it’s probably easier to just explain the stuff that I have configured, and so here we are. I grabbed the current version of my tmux.conf
and added an explanation after each section.
This assumes that you use tmux in the same way that I do. Some people like to just use it as a way to get a few pre-defined splits in their terminal and they never want to change those splits. Other people just use it in case their ssh connection drops. When I’m working I basically always have a large or full-screen terminal open that’s connected via SSH to a server, and on server running tmux attached to a session for the specific project that I’m working on. If I work on a different project I’ll just detach from that session and start a new one.
\n\nSo with that in mind, let’s dive in…
\n\n# GENERAL BITS AND BOBS\nunbind -T root -aq\nunbind -T prefix -aq\nunbind -T nested -aq\nunbind -T popup -aq\nunbind -T copy-mode -aq\n
The unbind
command will remove all bindings in a key table. I do this so that anything I set while tinkering will get unset and replaced with the config (reducing the chances of getting into a weird state), and because I’ve chosen to redefine every key binding myself, this removes any double-ups. This is not something that I’d recommend others do, since you’ve got to be pretty familiar with all the bindings that you use regularly and define them yourself before this is actually practical.
\n\n\nIn tmux a key-table is just a set of key bindings. The two most important ones are
\n\nprefix
androot
. Theprefix
table contains all the bindings that can be used after you enter your prefix key, androot
contains all the bindings that can be done without having to first enter the prefix.The prefix key is just tmux’s way of “namespacing” its shortcuts off so you’re not going to have a conflict with another program. tmux doesn’t add any key bindings in the
\n\nroot
table by default.Since I know the programs that I’m going to be using—and know the keys that I’ll use in those programs—I heavily use the
\n\nroot
key table to add shortcuts that are faster to activate (and activate repeatedly) without having to first press the prefix.You can totally abuse the
\n\nroot
key-table too, for example you can make a binding so that whenever you press “a”, “b” is what gets sent to the shell:1\n\nbind-key -T root a send-keys b\n
\n
bind-key -n
is just a short-hand forbind-key -T root
.
set -g mode-keys vi\nset -g status-position bottom\nset -g base-index 1\nset -g renumber-windows on\nset -g default-terminal 'screen-256color'\nset -g history-file ~/._tmux-history\n# set -g prompt-history-limit 10000\n
This is just some fairly basic config for the standard behaviour of tmux. I use vim keybindings for copy-mode
since those are the shortcuts I am familiar with. The status bar (with the list of windows, etc) lives at the bottom. Windows are numbered starting from 1 instead of 0, since if I use a “switch to window X” shortcut, having the window indices match the order of the keys on a keyboard is nice. Although I don’t actually use the shortcuts for switching directly to a window by number, since it’s almost always faster for me to just mash “next window” a bunch of times until I’ve got the window I need.
When I first started using tmux I think I had default-terminal
incorrectly set to xterm-256color
—the standard for most terminal emulators—which caused some background colours to render incorrectly. It should basically always be screen-256color
unless you’re doing something weird where you don’t have 256 colours, but that’s unlikely. It might be set to this by default in tmux, but I just keep this here to be sure.
set -g prefix C-z\nbind C-z send-prefix\nbind r {\n source-file ~/.tmux.conf\n display 'config reloaded'\n}\n
As I’ve mentioned before, I use C-z
as my prefix shortcut. It’s more convenient to press than the default C-b
, and I don’t suspend tasks using C-z
very often (which is what it usually does). If I do need to suspend a task I can just press it twice (courtesy of bind C-z send-prefix
) which is not particularly inconvenient.
I’ve bound C-z r
to reload my tmux config, which also isn’t something I do that often but it’s more convenient than having to type out the whole source-file
command manually. A neat trick that I learnt a while ago is that tmux supports multi-action commands by wrapping them in curly braces. This is super nice both to make the config more readable, as well as allowing for confirmations that the action has happened using the display
command.
set -s escape-time 0\nset -g focus-events on\nset -g allow-rename on\n
Just some more default settings, I don’t think any of these are particularly important—in fact, I’m pretty sure that first one should be set -g
not set -s
but evidently it’s not been an issue so it’s remained like this. I can’t remember why I turned focus events on, I think it to make some vim plugin work? I’m fairly confident that I don’t use the plugin any more, so this is probably obsolete. allow-rename
allows an escape sequence to change the window name. I don’t dutifully set meaningful window names, so any program that wants to give me a useful name is more than welcome to.
# SHORTCUTS\nbind c new-window -c '#{pane_current_path}' -a -t '{next}'\nbind -n M-c new-window -c '#{pane_current_path}' -a -t '{next}'\nbind -n M-z resize-pane -Z\n
On the topic of making common actions really convenient, I bind M-c
to open a new window since C-z c
is just a tiny bit too slow—although I keep that binding around just in case I’ve got more time on my hands, I guess. I also have set the two options here to open the new window in the same directory as the current pane (doing anything else just doesn’t make sense to me). That -a -t '{next}'
means that the window will open directly next to the current one, rather than at the end.
M-z
zooms the current pane—hiding all other panes in the same window—which is useful to focus on one thing quickly, or to copy text from the window.
bind x confirm-before -p \"kill-pane #P? (y/n)\" kill-pane\nbind '$' command-prompt -I \"#S\" { rename-session \"%%\" }\nbind ',' command-prompt -I \"#W\" { rename-window \"%%\" }\n\nbind d detach\nbind C-d detach\nbind : command-prompt\n
Since I remove every single key binding, I have to add back every operation I want, and sometimes I do just want the default keybinding back. In this case I re-add C-z x
to kill a pane, C-z $
and C-z ,
to rename sessions and windows, C-z d
to detach from the session, and C-z :
to open the tmux prompt.
It’s neat that these two-step commands that ask for input or confirmation are actually implemented with other tmux commands, rather than being baked into the “dangerous” commands as additional options. This means that if I really wanted, I could add a confirmation step before opening a new window, or detaching from a session.
\n\nThe smart move in this section is actually bind C-d detach
. I would constantly press C-z
and then press d
just before I’d released the control key, which result in nothing happening. Instead of learning to be more careful with my keystrokes, I just added a mapping so that mistaken keypress also did what I was intending.
bind m {\n set -w monitor-bell\n set -w monitor-activity\n display 'window mute #{?#{monitor-bell},off,on}'\n}\n
This is something I’ve only really added recently. You’ll see below that there’s a window style for windows with activity (ie: their shell has printed output while in the background) as well as windows that have sent a terminal bell, and I use that to change the colour of the window in the status bar. However, sometimes I find this a bit annoying, and I want to just be able to run something (like a server) in the background and not care that it’s printing output, so I have a way to turn off the monitoring for just that window.
\n\nIf you don’t pass an argument to set
for an option that’s a boolean, then it gets toggled. So in this case I’m relying on the fact that I don’t change these options any other way, and that toggling them both won’t ever get them out of sync. I could probably do this “properly” to ensure that they’re consistent, but it’s not really an issue I care to fix.
Another example of multi-line commands making things easier to read.
\n\nbind s send-keys -R Enter\nbind S {\n send-keys -R Enter\n clear-history\n}\n
Sometimes I want to run a command and then search in the output. It’s really annoying to have previous commands’ output messing up the search, especially if you’re repeatedly running a test or looking at logs and trying to search for some message. I could just open a new pane each time, but it’s easier for me to just wipe out the scrollback history in the current pane.
\n\nC-z s
(lowercase “s”) is equivalent to the “clear” command, except I can do it while a command is running. C-l
in most terminals does the same thing, but I have that re-bound to pane navigation.
C-z S
(uppercase “S”) clears the screen and the history, again doable while a command is running.
I send Enter
after clearing the screen to force any prompts to re-draw, otherwise you can be left with a completely blank screen.
# NESTED MODE\nbind -n M-Z {\n set status\n set key-table nested\n set prefix None\n}\nbind -T nested M-Z {\n set status\n set key-table root\n set prefix C-z\n}\n
If you’ve messed around with tmux enough you’ve come across the warning:
\n\nsessions should be nested with care, unset $TMUX to force\n
This of course is just a warning, and so naturally I have a whole system to nest tmux sessions. This is useful if you’re always in tmux and ssh
from one machine to another. You don’t want to exit out of tmux locally (obviously) and you want to run tmux on the remote computer in case your connection drops so you don’t interrupt any in-progress jobs.
What I’ve done is something like a “more zoomed” mode2. This will hide the status bar of the outer tmux session and disable all key bindings except one to get out of this nested mode.
\n\nSo when I ssh
to another machine I can press M-Z
and all my local tmux UI disappears, so when I start tmux on the remote machine it looks and behaves like I’m connected directly, not nested. If I need to use the local session, I can press M-Z
again and the local tmux UI reappears and the key bindings reactivate, allowing me to move around in the local session, with the remote session being relegated back to its own window.
Where this gets really clever is in my shell wrapper around ssh
. It checks that I’m in a tmux session, and automatically switches to the nested mode when I start an ssh
connection, so I don’t even have to press a key.
This doesn’t really work with triply-nested sessions however, since the second time you press M-Z
the outer session with un-nest itself, rather than the middle session nesting itself. If I had two separate bindings—one for “nest” and a different one for “unnest” then it would work, but that would be 100% more nesting-related shortcuts to learn, and I don’t triple-nest enough to justify that.
bind -n M-V split-window -h -c '#{pane_current_path}'\nbind -n M-H split-window -v -c '#{pane_current_path}'\n\nbind V move-pane -h\nbind H move-pane -v\n
Creating splits is one of the things I do the most, so naturally I have a no-prefix shortcut for it. I think of splits the way Vim does them, with horizontal/vertical being the way the line goes, rather than the orientation of the panes themselves. So I’ve swapped the letters for the bindings here, M-V
gives me a horizontal tmux split, because I think of that as being vertical like :vsp
in Vim.
These last two bindings are for moving panes into windows, but I almost never do this because it’s almost always easier to just open a fresh new split.
\n\nbind -n M-n next-window\nbind -n M-N swap-window -d -t '{next}'\nbind -n M-m previous-window\nbind -n M-M swap-window -d -t '{previous}'\n
In Vim I use C-n
and C-p
to navigate buffers, so I wanted to use M-n
and M-p
in tmux to navigate windows. But I think for some reason that didn’t work, although I just tried it now and it totally does work. However my muscle memory is now locked onto the completely nonsensical M-m
to go to the previous window.
The uppercase versions of both of these bindings move the window, it’s like holding down shift “grabs” the window as you navigate.
\n\nbind -n M-s choose-tree -Zs -f '#{?#{m:_popup_*,#S},0,1}' -O name\n
choose-tree
is a neat way of swapping between tmux sessions—some people might use the next and previous session shortcuts, but I’ve settled on the navigable list.
This gets weird with my “popup” sessions (see below and the blog post I wrote about it), so I have a filter to hide them from the list, since they all start with _popup_
.
bind C {\n select-pane -m\n display 'pane marked: #{pane_id}, move with <prefix>V or <prefix>H'\n}\nbind -n M-L break-pane -a -t '{next}'\n
C-z C
is how I would merge panes back into the same window, if I ever actually wanted to do this, but I very rarely do. This works because the default target for move-pane
is the marked pane, so this binding is just marking a pane to be the default for moving.
break-pane
is super useful, and I like M-L
as a shortcut because “l” is “navigate right” in Vim-land, and the pane pops up as a window to the right, so it all makes sense. I’ll often run a command (like a test or build) in a split and then want to continue focussing on my editor, and use break-pane
to move the split into a new window without interrupting the running process.
bind Space next-layout\nbind Tab rotate-window\n
next-layout
shuffles through a predefined list of layouts for the panes in a window. It’s somewhat useful to avoid having to manually resize splits, or just as something to keep me entertained while I wait for something to finish. rotate-window
shuffles the order of the panes while maintaining the same layout, which I basically use as “oh no my editor is on the right and it needs to be on the left because that’s where the editor lives” C-z Tab
problem solved.
# COPY MODE\n\nbind -n C-o copy-mode\nbind -n M-p paste-buffer -p\nbind -T copy-mode-vi v send-keys -X begin-selection\nbind -T copy-mode-vi y send-keys -X copy-selection\n
I actually lied earlier, I don’t unbind every single key binding, I leave copy-mode-vi
as-is. It basically just uses the standard navigation commands that I’m used to from Vim or less
, so I don’t feel a need to change anything. The one thing I do set is using v
to start a selection and y
to copy that selection. This is what Vim does and so it’s just making things a little more consistent.
Since I don’t use mouse-mode
in tmux, entering copy-mode quickly is essential. I chose C-o
as it’s close to C-u
which is the shortcut to scroll up, so I can quickly press C-o C-u
and be scrolling up through the pane output.
bind -n M-1 select-window -t :=1\nbind -n M-2 select-window -t :=2\nbind -n M-3 select-window -t :=3\nbind -n M-4 select-window -t :=4\nbind -n M-5 select-window -t :=5\nbind -n M-6 select-window -t :=6\nbind -n M-7 select-window -t :=7\nbind -n M-8 select-window -t :=8\nbind -n M-9 select-window -t :=9\n
As I mentioned before, I don’t actually use these, they’re basically just here for like tradition or something. It’s basically always easier to just press M-n
or M-m
to cycle through my windows (I’d say I usually have <5 in a session) because that’s what my muscle memory is used to doing.
# STATUSBAR\nset -g status-interval 60\n\nset -g status-left-length 100\nset -g status-right-length 100\n\nset -g status-style bg=default\nset -g status-left-style fg=colour0,bg=colour$HOST_COLOR\nset -g status-left '#[bold]#{?#{N/s:_popup_#S},+, }#S #[nobold]│ #h │ %H:%M '\nset -g status-right-style fg=colour250\nset -g status-right '#[reverse] #(cat /proc/loadavg) '\n\n# WINDOW INDICATORS\nset -g window-status-separator ''\nset -g window-status-format ' #I#{?#{window_zoomed_flag},+, }│ #W '\nset -g window-status-style fg=colour245,bg=default\nset -g window-status-activity-style fg=colour$HOST_COLOR,bg=default,bold\nset -g window-status-bell-style fg=colour0,bg=colour$HOST_COLOR,bold\nset -g window-status-current-format ' #I#{?#{window_zoomed_flag},+, }│ #W '\nset -g window-status-current-style fg=colour231,bg=colour240,bold\n
This is a super dense section, and to be honest a picture is the easiest way to communicate what it’s doing:
\n\n\n\nAll my computers have a unique $HOST_COLOR
set, and I use that to set the highlight colour for a bunch of things in tmux as well as my zsh prompt. The screenshot above shows the colour that I use on my main computer, ANSI colour 183, which almost exactly matches the highlight colour for my website in dark mode. This is something I setup when I was in university and my time was split between my laptop and a few servers fairly frequently, so having them be immediately identifiable was really useful. Now it’s just nice that I can change one file and have a new colour.
The left side of the status bar has the session name, host name, and current time. If there is a popup shell (see below) then I get a simple “+” indicator next to the session name (that’s what the #{?#{N/s:_popup_#S},+, }
is doing).
The one hard requirement I have for the window indicators is that when I navigate through them, they don’t jump slightly due to the width of the active window indicator being different to the inactive window indicator. This is why I have the window-status-separator
to be ''
and make window-status-format
and window-status-current-format
take up exactly the same number of characters. I differentiate the active window with brighter, bold text and a lighter background.
I’ve been considering adding bit more info to the window indicators—perhaps removing the window number to give myself some more space—but currently the only additional piece of information is whether the window has a zoomed pane or not: #{?#{window_zoomed_flag},+, }
will add a “+” after the window index if there’s a zoomed pane. To me the plus is “there’s more stuff that you might not see immediately” and I use that both for the popup shells and for zoomed panes.
If a pane has activity, then the text colour changes to $HOST_COLOR
which makes it easily noticeable. If there’s a bell, then the background changes to $HOST_COLOR
which is even more noticeable. Both will be cleared automatically when you navigate to that window.
I have my build scripts send a bell when they finish so that I can kick them off in another window and then easily see when they finish. I’ve also recently added a neat feature where instead of just sending a bell, they set the tmux bell style to have a green or red background depending on whether the build (or test) passed or failed, and then send the bell. This way I can emotionally prepare myself before switching windows to look at the failure.
\n\nThe right side of the status bar is basically just free space, I have it set to just dump the loadavg there, which I find vaguely interesting to watch as I do a particularly resource-intensive task.
\n\n# MESSAGES\nset -g message-style fg=colour232,bg=colour$HOST_COLOR,bold\n\n# PANE SPLITS\nset -g pane-border-style fg=colour238\nset -g pane-active-border-style fg=colour252\n\n# CLOCK AND COPY INDICATOR\nset -g clock-mode-colour colour$HOST_COLOR\nset -g mode-style fg=colour$HOST_COLOR,bg=colour235,bold\n
This basically just makes the rest of the tmux UI match my existing styles, using various shades of grey to indicate what’s active vs inactive and the $HOST_COLOR
where a non-greyscale colour is needed.
# ACTIVITY\nset -g bell-action none\nset -g monitor-activity on\nset -g monitor-bell on\nset -g visual-activity off\nset -g visual-bell on\nset -g visual-silence off\n
These basically just set the various options needed to get tmux to listen out for a bell coming from a pane. I think I understood these options when I set them, but if I wanted to change them I’d have to re-read the tmux manual to make sure I got what I wanted.
\n\n# POPUP SHELL\nbind -n M-J display-popup -T ' +#S ' -h 60% -E show-tmux-popup.sh\n\nset -g popup-border-style fg=colour245\nset -g popup-border-lines rounded\n\n# support detaching from nested session with the same shortcut\nbind -T popup M-J detach\nbind -T popup C-o copy-mode\nbind -T popup M-c new-window -c '#{pane_current_path}'\nbind -T popup M-n next-window\nbind -T popup M-m previous-window\n\nbind -T popup M-L run 'tmux move-window -a -t $TMUX_PARENT_SESSION:{next}'\n
This is a slight extension of the popup shell I wrote about last year. I changed the shortcut from M-A
to M-J
as I found that a bit easier to press. I also added a binding to get into copy-mode so I could scroll up in the output.
Against my better judgement I also added bindings for creating and navigating windows. I don’t really use this, but I find the idea of secret hidden windows somewhat amusing.
\n\nThe same shortcut I use for break-pane
will move the window from the popup into the session it is popping up from. Realising that you can move tmux windows between sessions is fun. There are no rules! Isn’t that awesome!
# PUG AND LOCAL\nsource ~/.pug/source/tmux/pug\nif '[ -e ~/.tmux-local.conf ]' {\n source-file ~/.tmux-local.conf\n}\n
I still use my package manager pug
, that I wrote in 2017 to manage my shell packages. I’ve since accepted that no one else is going to use it and have just merged it into my dotfiles repo. The only tmux package that this loads is vim-tmux-navigator
which I forked from the original in order to make it installable from pug
.
It seems a shame to relegate vim-tmux-navigator
to the bottom since it’s one of the neatest tricks to make tmux more usable for Vim enthusiasts. But this is what the format demands3. For the uninitiated, it adds shortcuts to Vim and tmux to navigate splits with C-h/j/k/l
—so you can navigate the splits interchangeably. I forget that I have it installed, splits are just splits and I don’t have to think about how to navigate them.
All my config files will check for some -local
variant and source that if it’s present, which allows me to make per-machine customisations that I don’t want to commit into my dotfiles repo. This is great for work-machine-specific options.
mx
Helper ScriptMy other interaction with tmux is with a script called mx
that originally papered over the list-sessions
, attach
, and new
commands but has since gained responsibility for switch
and rename-session
.
The gist is that I want to be able to type mx my-session
from anywhere and then be in a session called “my-session”. The “from anywhere” requires a little bit of thought:
If we’re outside of tmux, use new-session -A
to attach to a session if it exists, or create a new one with that name.
If there’s only one window in our current session, we probably don’t care about the current session staying around. So if the session we’re trying to switch to exists, move the current window to that session, then switch over to it.
\n\nIf we’ve only got one window and the target session doesn’t exist, we can just rename the existing session to the target session name.
\n\nIf there’s more than one window in the current session, then create or switch to the new or existing target session and move the current window along with us.
\n\nThis is almost certainly unnecessary, but it avoids me leaving a trail of sessions that I’ve finished with and avoids me having to exit out of tmux to switch between sessions, which is what I’d have to do previously to avoid the nested-sessions error, since the script would try to attach while already inside of tmux.
\n\nIf you want to be really naughty, you can do something like this: bind-key -n e if '[ \"$(shuf -i 0-1 -n 1)\" = 0 ]' send-keys
which will silently swallow 50% of “e”s that get typed. You could do all sorts of naughty things here, like adding a sleep
before certain characters are sent, or replacing spaces with non-breaking spaces or some other invisible character. ↩
This is why the shortcut is M-Z
(uppercase “Z”) and my “zoom pane” shortcut is M-z
(lowercase “z”). ↩
I am aware that I made up the format and could have chosen to re-order the sections to make this more coherent. ↩
\nIt is an accepted wisdom that it’s more important to write code that is easily read and understood, in contrast to writing code that is fast to write1. This is typically used in discussions around verbose or statically typed languages versus terser dynamically typed languages.
\n\nThe kernel of the argument is that it doesn’t take you that much long to write a longer method name, spell out a variable in full, or import a class. Wheres it can take someone reading the code significantly more time if they have to trace and guess at every single variable name and function call to understand what the code is doing.
\n\nThe classic examples are Java’s excessively long class names, Ruby’s convoluted one-liners for data manipulation, or Swift’s overly verbose method and argument names. For example here’s how you trim whitespace characters from a string in Swift from StackOverflow:
\n\nlet myString = \" \\t\\t Let's trim all the whitespace \\n \\t \\n \"\nlet trimmedString = myString.trimmingCharacters(in: .whitespacesAndNewlines)\n
Whereas in Ruby it’s just \" my string \\t\".strip
.
In Swift, the writer of that code has to know—or lookup—the longer method with a potentially not-obvious argument2, but it would be incredibly clear to a reader what that method is doing. The writer of the equivalent Ruby code would have to remember a single word, but the reader may have to check what characters are included in the .strip
operation.
Another example is Go’s previous lack of support for building generic abstractions3. The counter-example was always to just write the code out by hand, using a classic for
loop or if
statement. So instead of doing this:
buildings.map(&:height).max\n
You would do something like:
\n\nmaxHeight := 0\nfor _, item := range buildings {\n if item.Height > maxHeight {\n maxHeight = item.Height\n }\n}\n
No hidden behaviour, and super easy to understand.
\n\nI don’t want to try and argue where on this spectrum is best. I have a different metric that I want to optimise for: the ease of manipulation.
\n\nI spend a lot of time changing code to understand how best to implement, refactor, or debug a problem, and languages that are more explicit code end up getting in the way.
\n\nI’ll just reach for System.out.println
in Java because the fully-productionised logging class requires me to add an import and edit my build config.
I might not use .map
and .filter
in my final code, but it sure is convenient to have these around to transform data either to print it, or to quickly pass it to another part of the application.
Having static types is absolutely valuable when undergoing a large refactor to build confidence that you haven’t completely messed something up, but when I just want to move some code around to see if I can change some behaviour, having to re-define interface definitions and then contend with anything else that breaks is a frustrating experience. It would be great if I could just turn off type checking in single files while I work.
\n\nAn easy example of this is when you’re doing something that unifies the behaviour of a bunch of objects, and will almost certainly result in defining some common interface for all the classes to implement. However in the interim you just want the compiler to treat all the objects as being the same shape, despite the fact that from the compiler’s point of view they have absolutely nothing in common.
\n\nSince I’m a big printf
debugger, languages that don’t have a sensible default for printing objects is a huge pain. Remembering to use whatever the method is that turns a Java array into a human-readable string is the absolute worst. Ruby is great here because every object has a .inspect
method that will dump the value of all instance variables, which is incredibly convenient. Of course you could attach a debugger, but having it available programmatically allows you to dump it into your applications UI if necessary, without having to re-run with a debugger attached.
Other times I might want to just:
\n\nInputStreamReader
boilerplateSwift’s error handling actually has a few of these features—the try!
and optional unwrap !
syntax are great examples of convenience features for hacking something together that should never get past a code review.4
Of course it’s no surprise that Crystal has a lot of these features (it is of course the best language ever). Being able to punt some best practices to the back seat is incredibly convenient, and not something that I’ve seen included much in discussions on readability versus writability of code.
\nOr even fast to run, in some cases! ↩
\nThey’ve also got to know that in:
is the argument label, I find this constantly baffling as charactersIn:
seems like it could be an equally-good argument label, so you have to remember both the full “trimming characters in” name of the method, and where in that name the arbitrary separator between what’s the method name and what’s the argument label. ↩
Until Go added support for generics, which I have not yet used. ↩
\nAs I wrote before Swift has some weird trade-offs when it comes to exceptions. ↩
\nFor just over a month, my RSS reading has been self-hosted. Usually I’d write about this kind of thing because there was an interesting challenge or something that I learnt in the process, but it has basically been a completely transparent change.
\n\nI’m still using NetNewsWire to do the actual reading, but I’ve replaced Feedly with FreshRSS running on my home server (well, one of them).
\n\nI didn’t really have any problems with the quality of the Feedly service—they fetch feeds without any issues and most apps support their API, and their free tier is very generous. I’ve had my Feedly account for years. However they use their feed-scraping tools to provide anti-union and anti-protest strikebreaking services, which is a bit gross to say the least.
\n\nThe ease of moving between RSS services is really what makes this an easy project, as Dan Moren wrote on Six Colours it’s as simple as exporting the OPML file that includes all the feed URLs, and importing that into another service. Dan ended up using the local feed parser offered by NetNewsWire, but I’m morally opposed to having my phone do periodic fetches of 611 feeds when I have a computer sitting at home that could use its wired power and internet to do this work.
\n\nNetNewsWire supports pulling from FreshRSS, which is an open-source self-hosted feed aggregator. It supports running in a container, so naturally all I needed to do was add the config to a pod
file:
freshrss:\n name: freshrss\n remote: steve\n image: docker.io/freshrss/freshrss:alpine\n interactive: false\n ports:\n 4120: 80\n environment:\n TZ: Australia/Sydney\n CRON_MIN: '*/15'\n volumes:\n freshrss_data: /var/www/FreshRSS/data\n freshrss_extensions: /var/www/FreshRSS/extensions\n
You just do some basic one-time setup in the browser, import your OPML file, add the account to NetNewsWire, and you’re done.
\n\nThe most annoying thing is a very subtle difference in how Feedly and FreshRSS treat post timestamps. Feedly will report the time that the feed was fetched, whereas FreshRSS will use the time on the post. So if a blog publishes posts in the past or there is a significant delay between publishing and when the feed is fetched, in Feedly the post will always appear at the bottom of the list, but FreshRSS will slot it in between the existing posts. I want my posts to always appear in reverse chronological order so this is a bit annoying.
\n\n\n\n\nAn example of a website where the times on posts are not accurate is this very website! I don’t bother putting times on posts—just dates—since in 10 years of posts I only have two posts that are on the same day. Feedly assigns a best-guess post of when the post was published (when Feedly first saw it) whereas FreshRSS just says they were published at midnight. Which isn’t too far from the truth, as it’s half past ten as I write this.
\n
To avoid exposing FreshRSS to the outside world, it’s only accessible when I’m connected to my VPN, so I don’t have to worry about having a domain name, SSL cert, secure login, and all that.
\n\nI haven’t had any reliability issues with FreshRSS yet, obviously the biggest disadvantage is that I’m signing myself up to be a sysadmin for it, and the time that it will break is when I’m away from home without my laptop.
\nAs of the time of writing, that is. ↩
\nI thought of this as a single topic, but when I started writing it I realised that I was really thinking about two different things—scalability and capability—but after writing half of this I also realised that the broader idea that I’ve been thinking about needs to include both. So let’s start with:
\n\nDesktop operating systems are able to scale to cover so many use-cases in part by their open nature, but also because of the incredible flexibility of windowed GUIs. Every modern mainstream OS has a window manager that works in the same basic way—you have a collection of rectangles that can be moved around the screen, and within each rectangle there are UI elements.
\n\nFloating windows is such a good abstraction that it can be used on a huge range of display sizes. My netbook with a tiny 10” screen used the same system as my current 13” laptop. If I connect a huge external monitor, the interactions remain the same—I’ve just got more space to put everything.
\n\nWhat’s really amazing is that there has been almost no change in the window metaphor since their inception. I’m not a computer historian, but I know that if you time-travelled and showed any modern desktop OS to someone using Windows 98 (which ran on the first computer that I used), they would be quite at home. The visual fidelity, speed, and some rearranging of UI elements might be a bit jarring, but “move this window over there” and “make that window smaller” work in the exact same way.
\n\nCharacterising it as no changes is obviously selling it short. The best change to the core windowing metaphor is the addition of virtual desktops. It fits in to the system really well; instead of having windows be shown on the screen, we just imagine that there are multiple screens in a line, and we’re just looking at one of them. In the relationship of “computer” to “windows” we’re just adding a layer in the middle, so a computer has many desktops, and each desktop has many windows. The best part is that the existing behaviour can just be modelled as a single desktop in this new system.
\n\nThe difficulty is that this introduces a possibility for windows being “lost” on virtual desktops that aren’t currently visible on the screen. Most window managers solve this by adding some kind of feature to “zoom out” from the desktop view, and show all the virtual desktops at once, so you can visually search for something you misplaced. MacOS calls this “Exposé” and I use it constantly just to swap between windows on a single desktop.
\n\nTablets haven’t yet managed to re-invent window management for a touch-first era. Allowing multitasking while not breaking the single-full-screen-app model is exceptionally challenging, and what we’ve ended up with is a complicated series of interdependent states and app stacks that even power-users don’t understand. Even the iPad falls back to floating windows when an external monitor is connected, as being limited to two apps on a screen larger than 13” is not a good use of screen real estate.
\n\nSomething simultaneously wonderful and boring about computers is that while they continue to get better over time, they don’t really do anything more over time. The computer that I bought from a recycling centre for $20 basically does the same things as the laptop that I’m using to write this very post.
\n\nOn my netbook I could run Eclipse1 and connect my phone via a USB cable and be doing Android development using the exact same tools as the people that were making “real” apps. Of course it was incredibly slow and the screen was tiny, but that just requires some additional patience. Each upgrade to my computer didn’t fundamentally change this, it just made the things I was already doing easier and faster.
\n\nOf course at some point you cross over a threshold where patience isn’t enough. If I was working on a complicated app with significantly more code, the compilation time could end up being so long that it’s impossible to have any kind of productive feedback loop. In fields like computer graphics, where the viewport has to be able to render in real-time to be useful, your computer will need to reach a minimum bar of usability.
\n\nHowever in 2020 I did manage to learn how to use Blender on my 2013 MacBook Air. It could render the viewport fast enough that I could move objects around and learn how to model—so long as the models weren’t too high detail. Actually rendering the images meant leaving my laptop plugged in overnight with the CPU running as hard as it could go.
\n\nAll those same skills applied when I built a powerful PC with a dedicated graphics card to run renders faster. This allowed me to improve my work much faster and use features like volumetric rendering that were prohibitively slow running on a laptop.
\n\n\n\n \n\nI really appreciate using tools that have a lot of depth to them, where the ceiling for its capabilities is vastly higher than you’ll ever reach. One of the awesome things about learning to program is that many of the tools that real software engineers use are free and open source, so you can learn to use the real thing instead of learning using a toy version. This is one of the reasons I wanted to learn Blender—it’s a real tool that real people use to make real movies and digital art (especially after watching Ian Hubert’s incredible “lazy” tutorials). There are apps that allow for doing some of this stuff on an iPad, but none are as capable or used substantially for real projects.
\n\nIt’s not just increases in processing speed that can create a difference in capability. My old netbook is—in a very abstract way—just as able to take photos as my phone. The only difference being that it had a 0.3MP webcam, and my phone has a 48MP rear-facing camera. The difference in image quality, ergonomics, and portability make the idea of taking photos on a netbook a joke and my phone my most-used camera.
\n\nPortability is a huge difference in capability, which has enabled entire classes of application to be viable where they were not before. There’s no reason you couldn’t book a taxi online on a desktop computer, but the ease and convenience of having a computer in your pocket that has sensors to pinpoint your location and cellular connectivity to access the internet anywhere makes it something people will actually do.
\n\nMy phone is also capable of doing almost everything that a smartwatch does2, but it’s too big to strap to my wrist and wear day-to-day. The device has to shrink below a size threshold before the use-case becomes practical.
\n\nOf course the biggest difference between any of the “real computers” I’ve mentioned so far and my phone is that it has capabilities locked by manufacturer policy. It’s much more capable from a computing power standpoint than any of my older computers, and the operating system is not lacking in any major features compared to a “desktop” OS, but since the software that can run on it is limited to being installed from the App Store and the associated rules, if you wanted to write a piece of software you’d be better off with my netbook.
\n\nMy iPad—which has just as much screen space as my laptop—can’t be used for full-on development of iPad applications. You can use Swift Playgrounds to write an app, but the app is not able to use the same functionality as an app developed on a Mac—the app icon doesn’t appear on the Home Screen, for example. If this was a truly capable platform, you would be able to use it to write an application that can be used to write applications. Turtles all the way down. On a desktop OS I could use an existing IDE like IntelliJ or Eclipse to write my own IDE that ran on the same OS, and then use that IDE to write more software. That’s just not possible on most new platforms.
\n\n“Desktop” operating systems are suffering from their own success—they’re so flexible that it’s completely expected for a new platform to require a “real computer” to do development work on for the other platform. This is a shame because it shackles software developers to the old platforms, meaning that the people that write the software to be used on a new device aren’t able to fully embrace said new device.
\n\nOnce your work gets too complicated for a new platform, you graduate back to a desktop operating system. Whether that’s because the amount of data required exceeds that built into the device (a single minute of ProRes 4K from an iPhone is 6GB), or you need to process files through multiple different applications, you’re much less likely to hit a limit of capability on a desktop OS. So unlike me, you might start on one platform and then later realise you’re outgrowing it and have to start learning with different tools on a different platform.
\n\nSmartphones have made computing and the internet accessible to so many people, but with desktop operating systems as the more-capable older sibling still hanging around, there’s both little pressure to push the capability of new platforms, or to improve on the capabilities of older ones.
\nThe Upgrade Podcast just did a special episode with panellists drafting various Mac-related things for the 40th anniversary of the original Macintosh. Here are my pics:
\n\nI was looking for an upgrade to my Acer netbook, trawling through second-hand computers. This was in 2011. My main issue when I’m buying second hand computers is having something be predictable—I didn’t want to spend a bunch of money on something that turns out to be crap. I looked in the “Mac” section and realised that they weren’t that expensive. If I got a Mac, I’d know that it was going to be reasonably well-built, usable on battery, with a half-decent screen, keyboard, and trackpad.
\n\nThe other advantage of buying a Mac was that it’s easy to know compatibility for installing Linux ahead of time. It’s a shame that the compatibility is “difficult”, but at least you know that up front.
\n\nIn the end I bought a 2008 MacBook with a Core 2 Duo processor, 160GB hard drive, and 2GB of RAM. The bigger screen and better keyboard made everything easier, compared to my tiny netbook.
\n\nI used OS X on it for a while, before installing Ubuntu (I assume version 12.04) on it. I’d occasionally dual-boot but most of my time was spent using Ubuntu. This lasted until probably late 2012 when I realised that Minecraft performed much better on OS X than on Ubuntu, and so ended up spending more time back in OS X.
\n\nMy dad upgraded his 2010 MacBook Pro to a MacBook Air—to reduce weight while travelling—and I got the Pro as a hand-me-down. This ended up being short-lived as he upgraded again to the 11” Air, and I got the previous 13” Air. That 2013 13” MacBook Air, by virtue of being the Mac I used the most and longest, is my favourite Mac. It was my first computer with an SSD, which gave it a huge speed boost compared to the MacBook Pro.
\n\n2013 was really when the Air became an awesome all-round computer. The advertised battery life was 12 hours (almost twice that of the previous generation which claimed 7 hours) which meant I could take it to university and leave the power brick at home. At a time when most people had their huge 15” ultra-glossy laptops tethered to a wall outlet, this was awesome.
\n\nIn a post-dongle world it’s weird to remember the fact that I could plug in power, a mouse, keyboard, headphones, and a display all into the built-in ports on my “entry-level” “consumer” laptop.
\n\nThe software that defined my use of the Mac in the 2010s was TextMate. It was the go-to editor for Rails development, and I used it almost exclusively from 2012 to 2017. I’d use an IDE for Java development, but everything else would be done in TextMate.
\n\nI still keep it installed in case I just need to do something quickly or wrangle some text with multiple cursors, but most of the time I’ll use Vim to make use of muscle memory and macros.
\n\nIn 2015 I bought a Magic Trackpad on a bit of a whim. I’d been using the wireless Mighty Mouse when I was working at my desk, but I liked the idea of using a trackpad for everything and must’ve found a good deal on a second-hand one.
\n\nSince then I’ve been using trackpads almost exclusively. I replaced the first-generation Magic Trackpad in 2019 since I got sick of the AA batteries running out, and the second-generation trackpad has longer-lasting built-in batteries that can be charged while the trackpad is in use.
\n\nI’ve never had any significant RSI issues using the low-profile Magic Keyboard and Magic Trackpad, and so I’m hesitant to make any changes to a setup that works so well.
\n\nThe worst Mac that I’ve used was the 2018 MacBook Pro (with Touch Bar) that I used at work. My first work laptop had to be replaced1 after the “b” key stopped working, but the replacement wasn’t that much better. I didn’t really mind typing on the low-travel butterfly keyboard, but I loathed having no gap between the arrow keys, which made feeling for them with the tips of my fingers more difficult.
\n\nIn contrast to my experience with the amazing battery life on the 2013 Air, the battery life I would get from the Pro was abysmal. This is in no small part due to the types of work that I was doing on each machine—text editing is a lot less power-hungry than large video calls—but I came to resent the fact that the fans would constantly be maxed out and the battery wouldn’t last through even one hour of meetings.
\n\nThankfully in 2022 I was able to replace this with an M1 MacBook Pro, which has amazing battery life, no fan noise, and never stutters no matter how many browser tabs I have open.
\n\nMy current personal laptop is an M1 MacBook Air, which I am using to write this post.
\n\nReplaced from my perspective, it was evidently easier to just give me a new laptop rather than have me wait on a repair—as much as I would have wanted to keep the exact machine with all my stickers on it. ↩
\nAt some point you’ve probably written or edited a config file that had the same block of config repeated over and over again with just one or two fields changed each time. Every time you added a new block you’d just duplicate the previous block and change the one field. Maybe you’ve wished that the application you’re configuring supported some way of saying “configure all these things in the same way”.
\n\nWhat this is exposing is an interesting problem that I’m sure all sysadmins, devops, SREs, and other “operations” people will appreciate deeply:
\n\nWhere should something sit on the continuum between config and code?
\n\nThis follows on from the difficulty of parsing command-line flags. Once your application is sufficiently complex, you’ll either need to use something that allows you to write the flags in a config file, or re-write your application to be configured directly from a config file instead of command-line arguments.
\n\nThe first logical step is probably to read a JSON file. It’s built-in to most modern languages, and if it’s not then there’s almost certainly a well-tested third-party library that does the job for you. You just need to define the shape of your config data structure (please define this as a statically-typed structure that will fail to parse quickly with a good error message, rather than just reading the config file as a big JSON blob and extracting out fields as you go, setting yourself up for a delayed failure) and you’re all set.
\n\nThis file will inevitably grow as more options and complexity are added to the application, and at some point two things will happen: firstly someone who hasn’t dealt with tonnes of JSON will ask why they can’t add comments into the config file, and someone will write a script that applies local overrides of configuration options by merging two config files to allow for easier development for a local environment.
\n\nTo remedy the first issue you could probably move to something like YAML or TOML. Both are designed as config-first rather than object-representation-first, and so support comments and some other niceties like multi-line strings.
\n\nIf you stuck with JSON or chose to use TOML, you’ll soon end up with another problem: you need to keep common sections in sync. Say you have something like a set of database connection configs, one for production and one for development (a good example is a Rails database.yml
file). You want to keep all the boring bits in sync so that development and production don’t stray too far from one another.
I run into this with my pods.yaml
config files. The program I wrote to track helicopter movements around the Sydney beaches has five different container configurations that I can run, all of them need the a handful of common flags:
flags:\n timezone: Australia/Sydney\n point_a:\n lat: -34.570\n long: 152.397\n point_b:\n lat: -32.667\n long: 149.469\n http_timeout: 5s\n
If this was JSON or TOML I would have to repeat that same block of config five times, and if I ever changed the area I was scanning, I would have to remember to update each place with the same values.
\n\nHowever, YAML is a very powerful config language; you can capture references to parts of the config and then re-use them in other parts of the file:
\n\nflags: &default-flags\n timezone: Australia/Sydney\n point_a:\n lat: -34.570\n long: 152.397\n point_b:\n lat: -32.667\n long: 149.469\n http_timeout: 5s\n\ncontainers:\n my-container:\n name: test-container\n flags:\n <<: *default-flags\n my-other-container:\n name: second-test-container\n flags:\n <<: *default-flags\n
This is quite powerful and very useful, but there are still plenty of things that you can’t express: mathematical operations, string concatenation, and other data transformations. I can’t redefine how I write the configuration to be completely different to what the program that’s parsing the YAML expects.
\n\n# Reference a field, and transform it\nfield: new-$another_field\n# Grab an environment variable\nfield: $USER\n# Do some arithmetic using a field\nfield: 2 * $other_field\n# A simple conditional\nfield: $PRODUCTION ? enabled : disabled\n
That being said, YAML is far from simple:
\n\n\n\n\nThe YAML spec is 23,449 words; for comparison, TOML is 3,339 words, JSON is 1,969 words, and XML is 20,603 words.\nWho among us have read all that? Who among us have read and understood all of that? Who among us have read, understood, and remembered all of that?\nFor example did you know there are nine ways to write a multi-line string in YAML with subtly different behaviour?
\n\n \n
YAML is full of surprising traps, like the fact that the presence or absence of quotes around a value changes how it is parsed and so the country code for Norway gets parsed as the boolean value false
.
Even if you decide that the power of YAML is worth these costs, you’re still going to run into a wall eventually. noyaml.com
is a good entrypoint to the world of weird YAML behaviour.
As your application becomes more complex—or as the interdependence of multiple applications becomes more complex—you’ll probably want to split the config into multiple files1.
\n\nA classic example would be doing something like putting all the common flags that are shared between environments in one file, and then the development, staging, and production configurations each in their own file that reference the common one. YAML has no way of supporting this, and so you’ll end up writing a program that either:
\n\n#include
systemAnd of course whichever option you chose will be difficult to understand, error-prone, hard to debug, and almost impossible to change once all it’s idiosyncrasies are being relied upon to generate production configuration.
\n\nThe sensible thing to do—of course—is to use an existing configuration language that is designed from the ground up for managing complex configuration, like HCL
. HCL is a language that has features that look like a declarative config (“inspired by libucl, nginx configuration, and others”) but is basically a programming language. It has function calls, conditionals, and loops so you can write an arbitrary program that translates one config data structure into another before it gets passed to an application.
This is all very good, but now you’ve got another problem: you need to learn and use another programming language. At some point you’re going to say “why doesn’t this value get passed through correctly?” and the solution will be to debug your configuration language. That could involve using an actual debugger, or working out how to printf
in your config language.
Chances are pretty high that you’re not very good at debugging this config language that you don’t pay much attention to, and the tooling for debugging it is probably not as good as a “real” programming language that’s been around for 29 years.
\n\nIf you’ve done any Rails development, then you’ve come across Ruby-as-config before. Ruby has powerful metaprogramming features that make writing custom DSLs fairly simple, and the Ruby syntax is fairly amenable to being written like a config language. If there is a problem with the config then you can use familiar Ruby debugging tools and techniques (assuming you have some of those), but the flip side is that the level of weird metaprogramming hacks required to make a configuration “readable”—or just look slick—are likely outside of the understanding of anyone not deeply entrenched in weird language hacks.
\n\nOf course you’re free to choose whichever language you like, they’re all fairly capable of taking some values and translating them to a data structure that the end application can ingest. You could even write your config in Java.
\n\nThere are a lot of additional benefits to using a real programming language to write your configuration. As well as abstracting away configuration details, you can add domain-specific validation that doesn’t need to exist in the application (perhaps enforcing naming conventions just for your project), or dynamically load config values from another source—perhaps even another config file—before they are passed into the application.
\n\nThe next iteration is when the config continues to increase in complexity2, and so you decide to make some kind of tool that helps developers make common changes. Adding and removing sections is the obvious use-case. Strictly speaking it doesn’t have to be due to the config being complex, it could just be that you want some automated system to be able to edit the files.
\n\nYour problem is that you have no guarantees about the structure of the config. Since it’s a general-purpose programming language, details could be scattered anywhere throughout the program. With JSON, it’s super easy to parse the file, edit the data, and write a well-formatted config back out—you just have to match the amount of indentation and ideally the order of keys too. Doing this for most programming languages is much more difficult (just look at the work that has gone into making rubyfmt
).
Even if you can parse and output the config program, the whole point of using a general-purpose language was to allow people to structure their configs in different ways, so to make a tool that is able to edit their configs, you’re going to have to enforce a restricted format that is easier for a computer to understand and edit.
\n\nSo if you’ve got an application that expects a config file with hostnames and ports in a list, something like this:
\n\n[\n {\n \"hostname\": \"steve\",\n \"port\": 4132\n },\n {\n \"hostname\": \"brett\",\n \"port\": 5314\n },\n {\n \"hostname\": \"gavin\",\n \"port\": 9476\n }\n]\n
The simplest translation to a Ruby DSL could look like:
\n\n[\n host {\n hostname \"steve\"\n port 4132\n },\n host {\n hostname \"brett\"\n port 5314\n },\n host {\n hostname \"gavin\"\n port 9476\n }\n]\n
If someone was deploying this to a cloud service, they might not want to write all that out, so their config might look like:
\n\nzones = [\"us-east-1\", \"us-west-2\", \"au-east-1\", ...]\nSTANDARD_PORT = 4123\n\nzones.map do |zone|\n host {\n hostname \"host-#{zone}\"\n port STANDARD_PORT\n }\nend\n
A program that has to edit these files to “add a new host” basically has to understand the intent behind the whole file3. This is an exceptionally difficult job. I read a book about robots as a child that likened computer speech to squeezing toothpaste out of a tube, and speech recognition to pushing the toothpaste back into the tube. Creating the config is like squeezing the toothpaste, having a computer edit the config is like putting the toothpaste back.
\n\nThere are two paths you can take from here: double down on the programming language and build higher-level abstractions over the existing config to remove the need for the computer to edit the files, or move towards stricter formats for config files to allow computers to edit them.
\n\nYou’re being forced to pick a position on the code-config continuum, between something that’s bad for people but good for computers, and something that’s better for people and bad for computers. There’s no right answer, and every option trades off between the two ends of the spectrum.
\nI find Discord baffling. Not in its popularity in group messaging for a class, team, or friend group—it seems fine at that—but the other, larger use cases.
\n\nIn 2020 and 2021 I learnt how to create digital art in Blender, the 3D modelling software. I watched both Clinton Jones’s videos (who I had been following from his time at RocketJump and Corridor Digital) and Blender Bob. It was Clinton’s work and the videos showing his process where I learnt that you could use computer graphics without ever thinking about video or “VFX”—that’s just where I was exposed to these ideas initially. His Instagram has a mix of both film photography and rendered computer graphics, but since he targets the same aesthetic in both, it’s often hard to tell at a glance which is which.
\n\nAnyway. Both of these creators have Discord servers where subscribers could chat, share their work, and potentially get some guidance from people in the community or the creator themselves. When I joined, both were open for anyone to join, but I think that now Clinton’s Discord is for Patreon supporters only.
\n\nThis is where the bafflement comes in. Discord is designed as a synchronous messaging system. You can obviously view or reply to messages at any time, but the interface expects you to read messages almost as soon as they are received, and reply immediately or never.
\n\nFor a team or group of friends this makes sense, you’re probably all in the same timezone and share a similar schedule. If you’re not, then at least the group is probably small enough that it’s easy to catch up on anything that you missed. Discords for “fan communities” are basically the exact opposite—they’re large and highly trafficked. The time difference is exacerbated by me being in a significantly different timezone than the typical North American audience.
\n\nThe experience that I would have was every time I checked the servers, there would be at least tens—if not hundreds—of new messages in every channel, with topics of conversation shifting multiple times. Any attempt to ask a question or have a conversation is drowned out in the noise of additional messages and threads.
\n\nThe Discord app just isn’t designed for reading all the messages. Even if I treated the server as a read-only experience (much like I do with Mastodon1), it’s difficult to go through and look at the history of a channel. If you do, you’re going to be reading it backwards as the app probably isn’t going to perfectly preserve your scroll position (something that I’m especially keen on).
\n\nIt seems to me that these Discord servers have a few roles; a support forum, a showcase of work, and a space for informal discussion.
\n\nYou know what works really well as a support forum? An actual forum with first-class support for topics, threads, and detailed discussion that can happen asynchronously as the question-asker works through their problem. As someone that remembers a time before Stack Overflow, it seems like people have collectively forgotten the experience of describing your problem on a forum, and then a day later having a kind and knowledgable person ask you to give them some more information so they can pin down the solution.
\n\nI’ve seen it mentioned on Mastodon that some software projects use Discord in lieu of a support forum or documentation, which I find absolutely baffling as trying to find something that someone mentioned within a chat conversation—and understanding all the surrounding context, while filtering out any unrelated noise in the channel that was happening alongside it—seems completely impossible. Those conversations are also not going to be indexed by a search engine, so people that aren’t aware of the Discord are almost certainly not going to stumble across it while searching for information about a problem they’re having.
\n\n\n\n\nIf the infamous discussion about whether there are 7 or 8 days in a week had happened on Discord, I wouldn’t be able to effortlessly find it 16 years later with a single search.
\n
The other two use-cases—showcasing work and having informal discussions—are less well suited to forums, but I think they’d still be passable if implemented that way. However, the actual point of this whole post was to propose an alternative for this kind of fan community: a private Mastodon server.
\n\nAs web creators move towards sharing their work on their own terms, rather than via an existing platform (an example), a suitably tech-focussed2 creator could offer membership on a private Mastodon server as a perk of being a supporter.
\n\nMastodon’s soft-realtime and Twitter-like flat-threaded structure give it a nice balance of working reasonably well for quick conversations as well as time-delayed asynchronous communication. Since the instance would be private, the “local” timeline would just contain posts made by the community, allowing members to see everything, or create their own timeline by following specific people or topics.
\n\nIdeally, Mastodon clients would allow mixing and merging accounts into a single timeline—so I could have the accounts I follow from my main account and accounts on this private instance show up in the same timeline, so I don’t have to scroll through two separate timelines.
\n\nThe biggest challenge would obviously be explaining that you’re signing up to an instance federated social media platform that has disconnected itself from the federated world in order to provide an “exclusive” experience only for supporters of the creator.
\n\nI don’t think that Mastodon will reach a level of mainstream success that such a niche use of it could be anything but a support headache, but it’s interesting to think how open platforms could be re-used in interesting ways.
\nAnother challenge in my quest to not have any programming languages installed directly on my computer is installing programs that need to be built from source. I’ve been using jj
in place of Git for the last few months1. To install it you can either download the pre-build binaries, or build from source using cargo
. When I first started using it there was a minor bug that was fixed on main but not the latest release, so I needed to build and install it myself instead of just downloading the binary.
Naturally the solution is to hack around it with containers. The basic idea is to use an base image that matches the host OS (Ubuntu images for most languages are not hard to come by) and build in that, and only copy the executable out into the host system.
\n\nTo install jj
and scm-diff-editor
I make a Containerfile
like this:
FROM docker.io/library/rust:latest\nWORKDIR /src\nRUN apt install libssl-dev openssl pkg-config\nRUN cargo install --git https://github.com/martinvonz/jj.git --locked --bin jj jj-cli\nRUN cargo install --git https://github.com/arxanas/git-branchless scm-record --features scm-diff-editor\nCOPY install.sh .\nENTRYPOINT /src/install.sh\n
This just runs the necessary cargo
commands to install the two executables in the image. The install.sh
script is super simple, it just copies the executables from the image into a bind-mounted folder:
#!/bin/bash\nfor bin in jj scm-diff-editor; do\n cp \"$(which \"$bin\")\" \"/output/$bin\"\ndone\n
So the last part is just putting it all together with a pod
config file:
images:\n jj-install:\n tag: jj-install:latest\n from: Containerfile\n build_flags:\n cache-ttl: 24h\n\ncontainers:\n install:\n name: jj-install\n image: jj-install:latest\n interactive: true\n autoremove: true\n bind_mounts:\n ~/.local/bin: /output\n
I can then run pod build
to create a new image and build new executables with cargo
. Then pod run
the container to copy them out of the image and into the $PATH
on my host system.
This is the same approach I used for the automatic install script for pod
itself—except using podman
commands directly rather than a pod
config. I’ve done the same thing to install rubyfmt
since that is only packaged with Brew, or requires Cargo to build from source.
I’m sure at some point an incompatibility between libraries inside and outside of the container will create a whole host of bizarre issues, but until then I will continue using this approach to install things.
\nShort review, it’s good but has a long way to go. Global undo is excellent, and I like the “only edit commits that aren’t in main yet” workflow. ↩
\nThis site that you’re reading now is generated by Jekyll and hosted on GitHub Pages. Originally when I set this site up, GitHub Pages only supported their own limited set of plugins, and if you wanted to do anything extra you had to generate the HTML content yourself. In the interim, you can now write a custom GitHub Action that builds the site, allowing you to run arbitrary code during generation.
\n\nIn an effort to keep things simple and avoiding the temptation to write my own site generator, I have stuck with the basic deploy-on-push system with the standard set of plugins. This has worked fairly well, and the downsides are fairly minor—for example the version of the Rogue syntax highlighter that is used is a few years old and doesn’t know about Swift’s async
or await
keywords. This is not an issue unless you write a long blog post about concurrency.
I have of course worked out a variety of ways to maximise my use of this constrained environment.
\n\nJekyll allows you to specify the default front matter attributes in the config file. Previously whenever I would read these attributes in a template I would check if they were empty, and put the default right there in the template. Being able to configure a default makes this much easier. The defaults set the layout and OpenGraph metadata.
\n\nOriginally GH Pages didn’t support a custom 404 page (instead just delivering a generic one common to all sites) but you can now create a 404.md
file and tell people they’re looking for something that doesn’t exist. This is what mine looks like.
There are four places that posts can appear on the website; the actual post page, the index page, and the two feeds (RSS and JSON). I’m sad to say that despite Liquid supporting re-using files, I just copy-pasted the content of the post header between the index page and the post layout. There were definitely a few times where I was making edits to one and getting confused why I didn’t see any changes to the site.
\n\nWhat I do now is much better, I have a template in _includes
for the HTML version of the post that has the styled title and post metadata. This is used on the homepage and individual post pages. The post page is a custom layout that adds a footer that I only want when viewing a single post. The two feeds use a separate template that omits the header (since RSS readers will make a header themselves) but adds a small footer that isn’t present on the HTML version.
The trick with getting this to work was that Jekyll stores the post information in different variables depending on whether you were rendering a page or a single post. A layout uses {{ content }}
to inject the content of the page, but in the index page you’ve got multiple posts, each with their own content that’s accessed with {{ post.content }}
. I don’t think you can pass variables to templates, but variables in Liquid templates are seemingly all global anyway, so you can just assign to post
and use that in the layout. Now anywhere I need to include a post is just:
# index.html\n{% for post in paginator.posts %}\n {% include post.html %}\n{% endfor %}\n\n# _layouts/post.html\n{% assign post=page %}\n{% include post.html %}\n
_config.yml
The content of _config.yml
is basically mapped directly to the site
object, so you can define additional configuration knobs instead of setting them multiple times across the site. I use this to define a single date format that is used wherever a human-readable date is shown. I set date_format: \"%B %-d, %Y\"
in the config and whenever I show a date I can access that format: {{ post.date | date: site.date_format }}
.
I also use this for some common URLs—not because I’m likely to change them, but to avoid me mistyping them. Or you can dump data directly from the config file into a page, as I did with the webfinger Mastodon trick earlier this year.
\n\nJekyll Admin is a web UI that allows you to edit posts and pages, as well as upload files. Since I write on my laptop but run the Jekyll dev server on my home server, this avoids some awkward scp
-ing by allowing me to just paste my posts into a webform.
The killer feature would be for it to have a basic Git integration, so you could commit changes and push them to a remote right from the admin interface. Alas the project isn’t there yet.
\n\nI don’t know if it’s a problem with the version of Jekyll that I run (I use locally whatever version GH Pages is using for consistency) but the admin interface shows constant errors when you save a post—despite it never actually failing to do anything. It’s still more convenient than scp
, but definitely doesn’t inspire confidence.
\n\n\nAdding this very post to Jekyll Admin showed an error banner that said “Error: Could not update the doc”. The doc had updated without any problems.
\n
Liquid has conditional expressions, and Jekyll has jekyll.environment
. Smash these two together and you can add extra information that you only want visible when you run the website locally. For example I have a link to Jekyll Admin show as an additional link in the status bar, and every post has an “Edit” link that takes me directly to the Jekyll Admin edit interface for that post. Since the site is statically generated, that information obviously not just hidden on the real site—it’s completely gone.
\n\n\nA Jekyll issue that’s made worse with Jekyll Admin is the handling of the site URL. If you want to listen on all interfaces—because you’re developing in a container or running the Jekyll dev server on a different machine than the one you want to view the website on—then you set
\nhost: 0
either in_config.yml
or via command-line arguments. The problem is that this overridessite.url
, so any absolute URL will behttp://0:80/my_url
which is meaningless. Jekyll doesn’t allow you to set the host without overriding the site URL, and Jekyll admin generates a bunch of these URLs that don’t work properly.
My website was actually one of the first things that I containerised and saw a real benefit. Even though Ruby environment management is a pretty well-trodden area, I still would run into dependency issues from time to time. Now I can simply just pod run
and have the server running with basically no effort. Ideally I would use the exact same image GH Pages uses to build the site, but I haven’t set that up yet and to be honest the benefits are probably fairly academic.
Jekyll supports passing a second config file that is merged with the first, which I use to only load the jekyll-admin
plugin in development—and avoid any warnings from GH Pages that it isn’t supported.
On the topic of “thinking too much about things that you didn’t really want to think about”, have you considered just how hard it is to parse command-line arguments? Most tools—especially the battle-tested standard POSIX command-line tools—have this worked out pretty well, and work in a fairly predictable way. Until you start trying to implement them yourself, you might not notice just how much of a messy job it is.
\n\nFirst off, the abstract problem that flag-parsing has to solve is taking an array of strings and mapping them unambiguously to a set of configuration options. Of course you could make this incredibly easy, just give every option a unique name, and pass every option as --${name}=${value}
. Except we add an obnoxious requirement that the input array of strings should be easily human writable (and readable) so any ultra-verbose and easy-to-implement solution is immediately unsuitable.
The convention for POSIX programs is something like:
\n\nBoolean options can be passed like -v
to turn them on. They can also be passed like --verbose
, -verbose
, or --verbose=true
. You might even support -V
to turn the option off. A single flag could be split into two arguments, like --verbose true
(the space means it’s two arguments!) but since shells are unpredictable, you should also support a single argument with a space, in case it was quoted: \"--verbose true\"
.
Flags might take arguments, which are often file paths. Like boolean options you could pass --path=/dev/null
or -path /dev/null
. If it’s a common option then maybe you let users just write -p /dev/null
—if you do that you should probably also support -p=/dev/null
.
Some flags can accept multiple values, so maybe you should support --search path/one second/path
as well as --search=path/one --search=second/path
. Of course you should support -s
and -search
and maybe even mixing and matching all of these.
To reduce the amount of typing users have to do, often the short forms of flags can be shoved together into one flag, so instead of typing -a -b -c
you can just do -abc
. Hopefully there aren’t so many short options that they could spell out the long form of other flags. Some programs allow using this short form and passing a value for the last flag. So if you had a program that has a boolean flag -b
and a string flag -s
, you could do -bs value
instead of -b -s value
.1
If your program is doing a lot of different things, it probably makes sense to group functionality into subcommands, like git clone
or tmux attach
. You should then support short subcommand names like tmux a
, but you’ve also got to match flags to a certain subcommand.
Some flags are going to apply in all cases—things like the log level config file location—but others will only apply to a specific subcommand. Do you require these flags to be in a certain order, or do you allow them to be mixed? If you allow them to be mixed then you’ll have to defer processing any flags until you know the subcommand is—since they could behave differently depending on the subcommand.
\n\nLet’s consider a program:
\n\n$ program --flag \"a value\" subcommand-one\n$ program --flag subcommand-two\n
If --flag
is defined as taking a string for subcommand-one
, and being a boolean for subcommand-two
, then you can’t decide whether subcommand-two
should be a separate argument itself, or a value for --flag
. This leads to programs (like podman
) having fairly strict orders for their CLI. Any global flags come directly after the command, then there’s the subcommand, then any flags for the subcommand, then the image name, and finally any arguments after the image name are passed into the container.
This can be annoying as you have to remember which flags go where, and specifically with podman you can easily end up doing something like:
\n\n$ podman run alpine:latest --interactive\n
And wonder why you don’t get a shell. The answer is that --interactive
is passed into the container since it’s after the image name, and not used to configure your container. echo
has almost the inverse problem, it is used to print things but what if you want to print something that is interpreted as a flag for echo
?
# This works just fine, since -t isn't a flag that echo uses\n$ echo -t\n-t\n# but this will interpret it as a flag\n$ echo -e\n\n# quoting doesn't do anything\n$ echo '-e'\n\n# you need to know that '-' is special\n$ echo - -e\n-e\n
The additional catch is that shells don’t have datatypes, everything passed to a program is a string. So there’s no difference between -e
and '-e'
, the program will always receive the string \"-e\"
. Many people get caught up on this as if you’re used to a “normal” programming language, the dash seems special and wrapping it in quotes feels like it should force it to be treated as a string.
Speaking of the dashes, they’re purely a convention. There’s no reason that you can’t structure your flags and arguments in a completely different way—it would just be confusing. I’ve seen tools that use a trailing colon to write flags instead of leading dashes:
\n\n# so this:\n$ program --flag value\n# would be\n$ program flag: value\n
It’s somewhat neat—maybe easier to type—but will be unfamiliar for most people that are going to use it. This doesn’t really allow you to have boolean flags that don’t have an explicit value.
\n\nSomething else to consider is that modern shells will provide some level of auto-completion by default, usually just for file paths. If you write flags as a single argument, using =
to separate key from value, the shell won’t as easily be able to provide autocompletion, since it will use spaces to separate units to autocomplete, and without spaces it won’t know when to start:
$ program --path=|\n$ program --path |\n
On the first line, the shell has to know to strip away --path=
and autocomplete from there (a naive implementation would just look for files starting with --path=
). On the second line, the space means --path
and the following word are treated as separate units, and so the shell can more easily autocomplete without doing any special handling.
All of this complexity is why I pretty much always outsource this to a library. I usually use clim for my projects, it’s pretty easy to use and offers more out-of-the-box than the built-in Crystal OptionParser
. As soon as you try and make a general solution, you end up having to make some significant assumptions about what the format of the commands will be.
Thanks to @postmodern on Mastodon for pointing out this omission!. ↩
\nDoing more than one thing at a time is still a somewhat unsolved problem in programming languages. We’ve largely settled on how variables, types, exceptions, functions, and suchlike usually work, but when it comes to concurrency the options vary between “just use threads” and some version of “green threads” that just allows for something that looks like a thread but takes fewer resources. We’ve also mostly been stuck on whether to actually do more than one thing at a time1, rather than how best to do it.
\n\n\n\n\nIn this post I’m going to be talking about concurrency—the ability for a program to work through multiple queues of work, switching between them where necessary. This is distinct from parallelism in that no two pieces of work will be happening at the same time. Of course parallelism has its place, but I’m interested in how concurrent programming can be made easier for most programs.
\n
Many applications (I would argue most applications) benefit hugely from concurrency, and less from parallelism since IO is such a large part of many applications. Being able to send multiple network requests or read multiple files “at once” is useful for more applications than having multiple streams of CPU-intense work happening at once.
\n\nBefore we talk about concurrency, I want to introduce you to my newly-invented programming language. It works just like every other language, except the return
keyword is replaced two new keywords: yeet
and hoik
. To accompany these two new keywords there will be two assignment operators, y=
and h=
(pronounced “ye” and “he”). y=
will be used to receive a yeeted value, h=
to receive a hoiked value. If you want to receive both, you can use both in the same expression. So for example:
def get_value(a, b):\n if a == b:\n hoik a\n elif a < b:\n yeet b\n else:\n yeet a\n\nx y= get_value(10, 5)\nprint(x) # => 10\nx h= get_value(5, 5)\nprint(x) # => 5\np h= l y= get_value(1, 2)\nprint(p, l) # => None, 2\n
If a value is hoiked or yeeted but not received by the caller with h=
or y=
, the hoiking or yeeting will propagate up to the next function.
“Wow Will, that’s so original. That’s just exceptions.” Yes, I know. I’m very clever.
\n\nThe idea of having two different ways of returning from a function seems bizarre, until you take a step back and realise that most programming languages have two routes out of a function, you just don’t really consider the second one. For example, what does this do:
\n\ndef parse_file(path):\n contents = read_file(path)\n data = parse_data(contents)\n return data\n\nparse_file(\"~/config.yaml\")\n
Does parse_data()
get called? Well of course not, config.yaml
doesn’t exist, and so read_file
raises an exception and parse_file
re-raises the exception, exiting early. The alternate path(s) through the function are basically invisible and often not given much thought.
Like it or not, humans have a serious thing with the number two. Having two ways of propagating data from a function is no exception (pun absolutely intended), and the ability for most code to ignore the exceptional case is usually convenient. There are obviously some fairly severe downsides—resource usage should be wrapped with a finally
(or similar) block to ensure cleanup happens, creating an exception with a trace is not free, and there are plenty of cases where something could be considered a valid return or an exception (like an HTTP response with a 300
-block status code). It’s up to the API designer to work out what should be communicated via a return value, and what should be communicated via an exception.
Swift has an interesting approach to exceptions; any call site that can raise an exception must be marked with try
or its friends:
try
will re-raise the exception, forcing the function to be marked with throws
and the caller one level up must handle the exception instead.try?
will turn any exceptions into an optional, so if an exception is raised you just receive nil
.try!
converts the exception into a fatal error, stopping the program.I like having an explicit marker of which calls could cause an exception and alter the flow of the program. It means that the typically-invisible alternate path through the program is clearer, and I know whenever I see try
, control flow could be jumping or returning to a different point in the program.
This does have its downsides however; there is an implicit syntactic cost to marking a function as throws
. Every caller then must choose to propagate or handle the exception somehow. In many cases this makes a lot of sense—if the call can fail, mark it as throws
and add try
. But what about calls that should never fail, but can under some circumstances? Let’s consider this fairly innocuous program:2
let text = \"oh no\"\nlet index = str.index(\n text.startIndex, offsetBy: 7)\nprint(text[index])\n
I’ve managed to create an index on the string that is outside its bounds. The subscript operator on a string isn’t marked with throws
, so its only options to communicate this failure are:
Swift chooses the second option:
\n\nSwift/StringCharacterView.swift:158: Fatal error: String index is out of bounds\nCurrent stack trace:\n0 libswiftCore.so 0x00007fe01d488740 _swift_stdlib_reportFatalErrorInFile + 113\n1 libswiftCore.so 0x00007fe01d163fe4 <unavailable> + 1458148\n2 libswiftCore.so 0x00007fe01d163e64 <unavailable> + 1457764\n3 libswiftCore.so 0x00007fe01d163b9a <unavailable> + 1457050\n4 libswiftCore.so 0x00007fe01d163720 _assertionFailure(_:_:file:line:flags:) + 253\n5 libswiftCore.so 0x00007fe01d29d54c <unavailable> + 2741580\n6 swift-test 0x000055b8dbcd7e7a <unavailable> + 3706\n7 libc.so.6 0x00007fe01c029d90 <unavailable> + 171408\n8 libc.so.6 0x00007fe01c029dc0 __libc_start_main + 128\n9 swift-test 0x000055b8dbcd7b55 <unavailable> + 2901\n
Aside from not giving us a stack trace, there’s no way for me to recover from this failure3. If the function isn’t marked as throws
, it doesn’t have a good way to report an unexpected failure. The result is that you’re forced to ensure that every value passed to the subscript operator is valid—just like if you were programming in C.
You could mark all methods like this with throws
, but that adds a lot of syntactic noise for something that should never happen. I’m sure that the end result would be most people using try!
with a justification of “I know the index is within the bounds”.
Java worked around this by having two types of exceptions, checked and unchecked. It’s up to the developer to decide which is appropriate. You can make an API clearer either by including exceptions in the type system—forcing them to be handled in a similar (if more verbose) way to Swift—or omit them from the type system, having them crash the program if unhandled, but still able to be handled in the same way as checked exceptions.
\n\n\n\n\nI presume the design of Swift’s exceptions was driven by a desire to avoid checking for failure on every single function call. I’m more interested in syntax here, understanding the performance trade-offs is another topic entirely.
\n
Swift is mostly the outlier here in terms of the status-quo of mainstream languages. The default exception-handling approach is that any function can throw an exception, and that exception will propagate up the stack until a caller catches appropriately. Designers of general-purpose application programming languages have generally decided that automatic error propagation and implicit error checking after each call is worth the performance trade-off. A language doing something different, for example requiring manual error handling, is somewhat noteworthy.
\n\nasync
/ await
& ConcurrencyThe most popular4 implementation of concurrency into language is using two keywords—async
and await
—to annotate points in the program where it can stop and do something else while something happens in the background. Usually this bridges to a historical API that uses something called a “future” or a “promise”.
The basic idea behind a “future” or “promise” API (I’m just going to call them futures from now on) is that you want to save some code for running later, and often a little bit more code for after that.
\n\nThe reason this works so well is that most languages don’t have support for pausing execution of a running function and coming back to it later, but they do have support for code-as-data-ish in the form of objects with associated methods, and often those objects can be anonymous5. So in Java land we could always do something like this:
\n\nHTTPTool.sendGetRequest(\n \"https://example.com\",\n new HTTPResponseHandler() {\n @Override\n public void handle(HTTPResponse response) {\n System.out.println(response.getBody());\n }\n });\n
The code in handle()
(and any data that it has access to) is effectively saved for later. There’s a suspension point conceptually in my code, but the actual language doesn’t really know that. It just knows about an HTTPResponseHandler
object that it needs to hold a reference to so that sendGetRequest
can call the .handle()
method.
Where this gets super messy is when you want to do one asynchronous thing after another. Say you want to make a second HTTP request with the result of the first, you’d have to do something like:
\n\nHTTPTool.sendGetRequest(\n \"https://example.com\",\n new HTTPResponseHandler() {\n @Override\n public void handle(HTTPResponse response) {\n HTTPTool.sendGetRequest(\n response.getHeader(\"Location\"),\n new HTTPResponseHandler() {\n @Override\n public void handle(HTTPResponse response) {\n System.out.println(response.getBody());\n }\n });\n }\n });\n
This results in a Pyramid of Doom where each level of async-ness is another level of indentation. Futures work around this problem by allowing “chaining”, inverting how the callbacks are built and avoiding nested indentations:
\n\nHTTPTool.sendGetRequest(\"https://example.com\")\n .then(response ->\n HTTPTool.sendGetRequest(\n response.getHeader(\"Location\")))\n .then(response -> {\n System.out.println(response.getBody());\n });\n
This is obviously much better with Java lambdas, which are less verbose than writing out a full anonymous class implementation, but are conceptually the same thing. However we’re still using closures to hack around the fact that we can’t pause a function.
\n\nMost futures APIs are pretty good at chaining a bunch of requests together, but when you get to anything more complicated, you end up having to use a sub-language that operates on futures: continue when all these finish, when one of them finish, do this if one fails, etc. It’s fairly easy to lose track of all your futures and leave one doing work to produce a result that is never used.
\n\nWhat async
/await
does is allow us to write the closures inline in the body of the function, so our code would end up like this:6
let response = await HTTPTool.sendGetRequest(\"https://example.com\")\nlet url = response.headers[\"Location\"]\nlet response2 = await HTTPTool.sendGetRequest(url)\nprintln(response2.body)\n
The code reads as though the code blocks until a value is available, but what is effectively happening is that at each await
, the compiler splits the function in two, and inserts the necessary code to turn the latter half into a callback. This way you can integrate into an existing language without having to change your byte code interpreter—Kotlin does this so it can have concurrency and still interop with Java.
When you’re introducing this awesome function-splitting compiler trick, you can’t do it by default for all functions, since anything from before the trick (ie: Java code) won’t know anything about the implicit callbacks and so won’t be able to call them correctly. To solve this problem you introduce function colours—some functions are asynchronous, some functions are synchronous, and there are rules about how they interact. In general it looks like this:
\n\nI’m borrowing the term cast here from Elixir/erlang. Casting over in that world is sending a message but not receiving a result. In most languages with async
/await
you can start an asynchronous function, but you can’t get a result from it—since you don’t know when it will finish, and your function can’t split into a callback to run when the async call finishes.
This split system introduces a problem similar to how Swift handles exceptions—you can only do async work from an async context. If you don’t get called from an async context, you can’t do any async work and receive the result. This makes it harder to reach for async as a tool—as soon as you’ve made one major API async, all callers of it must be async, and all callers of them must be async. It will propagate through your codebase like a wildfire.
\n\nUnlike exceptions, you can’t safely handle async work in a non-async context without risking deadlocking your program. A function that doesn’t throw an exception can call a function that does throw one, it just needs to handle the failure within its body and return an appropriate result. A synchronous function can’t do this if it needs to call an async function. In some cases it may be able to block the thread while it waits for a result, but in a single-threaded context, the async function never gets an opportunity to run, and so the program deadlocks. In a multi-threaded context, some work might still be constrained to a single thread (ie: the UI thread or a background thread) and if you block on that you will deadlock.
\n\nThe worst thing is that often blocking the thread will work, but it introduces a possibility of all of your threads blocking on async work at the same time, preventing any of the async work from progressing, deadlocking your program but only sometimes.
\n\nSo why do we have async
and await
in the first place? As far as I can see there are two reasons, the first is that we don’t want to break compatibility with non-async code that can’t be automatically split into callbacks. The second is that we want to make it explicit that on an await
point, the program can go off and do something else—potentially for an indefinitely long amount of time. Even if you call an async function that only takes two milliseconds to finish, most implementations use co-operative multitasking and so there’s no protection against some function calculating primes in the background preventing a context switch back to your function.
\n\n\n“Co-operative” multitasking means that each function is responsible for ensuring that there are enough points that it yields control back to the scheduler to make progress on some other work. If there’s a huge CPU-intensive calculation going on that doesn’t yield, then nothing will happen concurrently until that calculation is completely finished. “Pre-emptive” multitasking will proactively stop one function if it’s running for too long and do some other queued work.
\n
If you’re making a brand-new language that isn’t saddled with backwards compatibility to an existing language or runtime, would you make this same tradeoff? The best language ever (Crystal) and notable poster-child of concurrency (Go) both omit the need for an async
keyword.
In both languages, every function is treated as async. At any point7 in a function, execution can swap to a different function and do some work there before swapping back. Much to the fear of people that like their code to be explicit, at any point in your program, an arbitrarily large gap in execution could happen.
\n\nBefore I used a language with async
/await
I had heard people talking about how amazing it was, and always got confused because I was used to writing concurrent code in Crystal (or Go before that) where this was not needed. I felt like I was missing something and that this syntax would unlock some new way of doing things, but the reality is just that it’s most often just a way to bridge to a old API because of backwards-compatibility constraints in the language.
\n\n\nRust is in a particularly tricky situation with async, as their no-runtime and zero-cost abstractions goals mean they can’t just wrap the whole program in an event loop. I don’t know much about Rust—much less writing async code using it—but found these posts to be an interesting look at the history and state of async in Rust:
\n\n\n
\n- The State of Async Rust: Runtimes by Matthias Endler
\n- Why async Rust? by @srrrse
\n- Why you might actually want async in your project by John Nunley
\n
That’s less than half the battle. We can pause a function mid-execution, but we haven’t actually done two things at the same time1. The biggest benefit of non-blocking IO is that you can easily send off two slow requests (eg: over the network) and only wait for the slowest one before continuing, rather than doing them in sequence. This is another API design challenge. The simplest example looks like this:8
\n\n B\n / \\\n o - A D - o\n \\ /\n C\n
Our function starts on the left, does some processing in A
, does B
and C
at the same time, and then once both have finished does the final step D
. There are plenty of ways you could handle this, and the measure of a good API is how easy it is to do the right thing—not introducing race conditions, unexpected behaviour, memory leaks, etc.
The example I’ll use here is something you might see in the world’s most naive web browser—we’re going to load a page and try to also load the favicon for that webpage at the same time. Here’s one example in Go, a language that doesn’t have any notion of async
/await
because every function can be interrupted at any point:
func loadPage(url string) WebPage {\n pageChan := make(chan []byte)\n faviconChan := make(chan []byte)\n go sendRequest(url, pageChan)\n go sendRequest(url + \"/favicon.ico\", faviconChan)\n page := <-pageChan\n favicon := <-faviconChan\n return WebPage{page: page, favicon: favicon}\n}\n
And here’s an example of the same function in Swift, that does have async
/await
:
func loadPage(url: String) -> WebPage {\n async let page = sendRequest(url)\n async let favicon = sendRequest(url + \"/favicon.ico\")\n return WebPage(page: await page, favicon: await favicon)\n}\n
\n\n\nOk I’m going to pause here and say that the following section is basically just my notes on Nathaniel J. Smith’s post Notes on structured concurrency, or: Go statement considered harmful. I recommend it, it’s a good read. You can come back to this later.
\n
The main difference here is that Go doesn’t have any higher-level abstractions for dealing with concurrency as values, just as goroutines using the go
keyword, and channels using the chan
keyword. We have to hand-craft any structure in our concurrency with our bare hands. Appropriately, Swift has a keyword for this. Instead of immediately await
-ing an async function, we can assign it to a variable with async let
and then await
the value later.
What happens when our code gets a little more complicated? Let’s say we’re writing a program to fetch posts from our favourite blogs. We know that some have an Atom feed, and we should prefer that if it exists, otherwise we should fall back to the RSS feed. This might look something like:
\n\nfunc getFeedsFrom(url: string) []Feed {\n atomChannel := make(chan Response)\n rssChannel := make(chan Response)\n go fetchFeed(url + \"/atom.xml\", atomChannel)\n go fetchFeed(url + \"/rss.xml\", rssChannel)\n atomResponse := <-atomChannel\n if atomResponse.IsSuccess() {\n return parseItems(atomResponse)\n }\n rssResponse := <-rssChannel\n return parseItems(rssResponse)\n}\n
Seems reasonable? The problem is that go fetchFeed(url + \"/rss.xml\", rssChannel)
can outlive the lifetime of the function if we get a successful response back for the Atom feed first. My program would just have a process running in the background doing useless work that I don’t care about, and there’s nothing in the language to help me do this correctly.9 Some languages with async
/await
can have the same problem, it’s just spelled slightly differently. Depending on the implementation, if a value is not await
-ed, it will continue running in the background and any result or error discarded. For example this JavaScript example is much more succinct, but it has the same problem in that the RSS result will not get cleaned up when the function returns:
async function getFeeds(url) {\n let atom = fetchFeed(url + \"/atom.xml\")\n let rss = fetchFeed(url + \"/rss.xml\")\n\n let atomResult = await atom\n if (atomResult.success) {\n return parseItems(atomResult)\n }\n return parseItems(await rss)\n}\n
You don’t think about it as much since you don’t have the explicit go
keyword here, but you are doing the same thing. The control flow splits in two, one fetching the Atom feed and one fetching the RSS feed, and then you wait for the results.
Swift and Kotlin do this very well,10 I’m going to use Kotlin as an example here since it does things a little more explicitly. The only place you can split your function is within a CoroutineScope
. By default, the scope will only finish when every coroutine in it has finished. So the previous example would look like:11
suspend fun getFeeds(url: String): List<Feed> {\n return coroutineScope {\n val atomAsync = async {\n fetchFeed(url + \"/atom.xml\")\n }\n val rssAsync = async {\n fetchFeed(url + \"/rss.xml\")\n }\n\n val atom = atomAsync.await()\n if (atom.success) {\n return@coroutineScope parseItems(atom)\n }\n return parseItems(rssAsync.await())\n }\n}\n
This will wait for rssAsync
to finish before coroutineScope
returns. Even though we’ve got an early return on a successful fetch of the Atom feed, we’ll still implicitly wait for the RSS feed. If the RSS feed takes ages to respond, our whole function will take ages. This is the price to pay for encapsulation. coroutineScope
enforces our concurrent code to be a diamond pattern, instead of that fork pattern:
Always this:\n B\n / \\\n o - A D - o\n \\ /\n C\n\nNever this:\n - - - - - B - - - - - - ?\n /\n o - A D - o\n \\ /\n C\n
coroutineScope
isn’t something magical, it’s just a function with a block argument12 that exposes the async
method and keeps track of anything launched using it. If I find the “wait for everything to finish, even on early return” behaviour to be limiting, I can just write another function that uses the same building blocks to give me that behaviour:
suspend fun <T> coroutineScopeCancelOnReturn(\n block: suspend CoroutineScope.() -> T): T {\n return coroutineScope {\n val result = block.invoke(this)\n currentCoroutineContext().cancelChildren(null)\n return@coroutineScope result\n }\n}\n
As concurrency is tied to a scope, we can use this building block to create our own scopes with different behaviours—mine makes it easier for blocks to cancel outstanding work after an early return, but you could equally easily make a scope that included a timeout, or limited the number of async calls happening at any one time. Most of the time you should only need the coroutineScope
builder function, but there’s nothing stopping you from having a global variable that’s a scope, and having things work more like Go, where any function can start work in the scope that outlives the life of the function. It’s easier to spot however, since you just need to look at the cross-references for the global scope to find who’s using it. In Go you would have to manually inspect every function and understand how they handled concurrency to be sure that nothing was leaking.
The usage of scopes to handle concurrency changes how APIs are written. Take a basic HTTP server in Crystal:
\n\nserver = HTTP::Server.new do |context|\n context.response.content_type = \"text/plain\"\n context.response.puts \"Hello world!\"\nend\n\nspawn do\n sleep 5.minutes\n server.close\nend\n\nserver.bind_tcp \"0\", 8080\nserver.listen\n
After five minutes, what will this do? The documentation for #close
says:
\n\n\nThis closes the server sockets and stops processing any new requests, even on connections with keep-alive enabled. Currently processing requests are not interrupted but also not waited for. In order to give them some grace period for finishing, the calling context can add a timeout like
\nsleep 10.seconds
after#listen
returns.
So the fibres spawned by the server (that run the block passed to .new
) won’t be cancelled (which makes sense since fibres in Crystal can’t be cancelled) and will be left dangling. If Crystal had scoped coroutines like Kotlin, you could more easily change and reason about the behaviour by passing in a different scope to the server to use for handling requests—currently you have no guarantee that code in the .new
block won’t run after .listen
returns, or in theory any point after that, since an HTTP connection could take a prolonged time to establish before the handler code is run.
This would support the common use-case of cancelling outstanding requests when the server shuts down, but could easily be changed to add a timeout grace period, or stop the whole server if there is an unhandled exception (instead of printing it and continuing like nothing happened).
\n\nThis implementation that uses scopes to control concurrency basically allows you to start building towards an Erlang supervisor tree.13
\n\n\n\n\nWhen I was in university I wrote a Slack bot using Elixir. It originally didn’t handle the “someone’s typing” notification from the Slack API, which caused it to crash. The (Elixir) process that ran the bot would crash, and the supervisor would replace it with another identical process. The storage was handled in a separate process, no data was lost and the bot would reconnect after a few seconds. If I had been using almost any other language, the end result probably would have been my whole program crashing, and me having to fix it immediately.
\n
Having language support for cancelling pieces of work is also useful in a lot of other contexts, POSIX processes can be interrupted with a SIGINT
which often trigger some kind of callback in the language, and the callback needs to communicate to any currently-running things that they should stop. Cancellation being a first-class citizen could allow for better default behaviour when a program is told to stop. This same concept could apply to applications in resource-constrained environments (ie phone OSes) so that they can respond effectively to being stopped due to lack of resources.
Once you’ve got the lifetime of your concurrency sorted, you need to work out the lifetime and access for your data. Rust does this with lifetime annotations and more static analysis than you can point a stick at, Pony has six reference capabilities that define how a variable can be used in what context. Erlang and Elixir just have fully immutable data structures, so you can’t mutate something you shouldn’t—you can have “mutable” data in a stateful process and introduce a race condition by multiple processes sending messages to the stateful process.
\n\nWhen I’m writing stuff in my free time I usually have a fairly cavalier attitude to thread safety. Crystal doesn’t have many guarantees for this, and since it’s currently single-threaded, most of the time it works fine. I’ll write some dirty code that spawns a new fibre that does some work and appends the result to a list. That’s always atomic—right?
\n\nI haven’t written enough Rust to appreciate what it’s like working with the borrow checker and lifetime annotations. From what I’ve read (a recent example) the borrow checker is frustrating, to say the least.
\n\nWhat I’d like is—somehow—for concurrent data access to be verified as easily as types are checked in Crystal. I get most of the benefits of static typing and dynamic typing by using Crystal’s type inference, can the lifetimes of variables be inferred in a similar way? I think this would be a very hard problem, and probably only practical if the general population of developers was already used to adding lifetime annotations—like they are with types—so you could just require fewer of them.
\n\nFor me, the best concurrency system would be one that doesn’t require any tagging of functions, to avoid having to think about function colouring and the syntactic cost of annotating every call site, and a well-defined structured concurrency API that is used throughout the standard library and third party libraries, to give guarantees about the lifetime of concurrent work. This would need to have affordances to handle pending concurrent work as values (like Swift’s async let
or Kotlin’s Deferred<>
), and enough tools in the standard library to make it easy to handle these values. I don’t have particularly strong opinions about actors, lifetimes, or reference capabilities14 as I’ve not used them much to write any real-world programs.
If you liked this and want to read something by someone who knows what they’re talking about, I would recommend reading Notes on structured concurrency, or: Go statement considered harmful. Reading this was definitely the “ah-ha” moment where I was convinced that just tacking a spawn
function in your language wasn’t good enough.
Yeah yeah, I know it’s not actually at the same time, see my note right at the top. But you know what I mean, otherwise you wouldn’t have read the footnote. If you’re the type of person to correct a concurrency-versus-parallelism mistake, you’re also the kind of person that will read a footnote to be absolutely accurate in your correction. ↩ ↩2
\nWell maybe there is, I’m not a Swift expert. But we’re talking abstractly about syntax here, just roll with it. ↩
\nMeasured entirely on vibes. ↩
\nThis just means they don’t have a real name, and are typically defined inline where they get passed to a function. ↩
\nPart of the joy of reading my blog is getting confused as I change language in the middle of a series of examples. This next one is in Swift, since Java doesn’t have async
/await
yet, and Kotlin’s implementation is less clear about await
-ing things. ↩
As long as a function yields, see co-operative versus pre-emptive note above. ↩
\nAppreciate my effort- and bandwidth- saving ASCII diagram. ↩
\nMaybe Go has some library for keeping track of your goroutines, but my basic point is this is not the default and not what I see people doing. ↩
\nThey basically do the previously mentioned blog post. ↩
\nYes I know my Kotlin function could be more idiomatic and shorter, but then everyone would be getting confused about Kotlin’s weird syntax, instead of getting confused at concurrency. ↩
\nOk Kotlin’s blocks are kinda magic. ↩
\nIgnoring the fact you don’t have memory isolation for each process so you’ll never fully get there. ↩
\nPerhaps that’s part 2? Subscribe to the RSS feed for more! ↩
\nOne of the major usability misses with pod
was that it was tricky to setup a new project. My goal was remove the need for language-specific development tools installed directly onto my computer, but whenever I started a new project with pod
, I would need to run crystal init
to create the basic project skeleton. With the new pod init
command, this is now unnecessary.
To create a new project that wasn’t Crystal (like when I was messing around with Swift websockets) I would manually run a shell in a container using the image for the language and bind mount my working directory. I’d then use the package manager within the container to setup a project (eg: running swift package init
) and then copy-paste some containerfiles from a previous project. This is incredibly fiddly and tedious. So I added functionality to pod
that does this automatically.
Now when you run pod init
, it asks for a base image to use—I use the latest Crystal Alpine image—and runs a container using that image with the working directory already available as a bind mount. Using the shell in that container you can run whatever tools are needed to setup the files for your project (npm init
, crystal init
, cargo init
, etc). When you exit that shell, pod
will create containerfiles and a pods.yaml
file for the project, so in most cases you can just build with pod build
and then pod run
without any further changes.
Another thing that is more difficult in a container-only world is running REPLs inside the project. I don’t do this often—since the Crystal interpreter isn’t shipping in the main release yet—but I really enjoyed this way of working when I was using Elixir or Ruby more. Running an iex
shell where I could recompile and interactively test my code was probably the most pleasant development experience I’ve ever had, and I wanted to support that with pod
.
This is now possible with pod enter
. By default you can run a shell using any of the images in your pods.yaml
file, or you can configure entrypoints
and jump straight into a REPL by running a particular command. So for example this:
entrypoints:\n iex:\n image: my-elixir-project:dev-latest\n shell: iex -S mix\n
Will allow me to do this:
\n\n$ pod enter iex\nErlang/OTP 26 [erts-14.0.2] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit]\n\nInteractive Elixir (1.15.4) - press Ctrl+C to exit (type h() ENTER for help)\niex(1)>\n
This bind-mounts the working directory in, so your code is available to any tools that run in the entrypoint. If you’ve got something more complicated that requires more customisation of the container (like exposing ports or binding additional directories) you can always make a custom run
target that spawns an interactive shell.
You can imagine that if you were working on a Ruby on Rails project, you might setup something like this:
\n\nentrypoints:\n console:\n image: my-rails-project:dev-latest\n shell: bin/rails console\n
I’ve enjoyed working in a container-first and now largely container-only way, and improving pod
is what has made this possible for me to do. You can check it out here, specifically the documentation for getting started.
Since the start of this year—for some reason, I can’t put my finger on what—I’ve been reading far more RSS feeds and articles that I’ve come across. I’ve sporadically used RSS in the past, but never really got into a groove with it. Currently I’m using NetNewsWire, which is good but doesn’t quite match the experience that I want, and so I’m writing this to manifest into existence the perfect app.
\n\nI’m an absolute fiend for a reverse-chronological list of items where my position is perfectly preserved. Tweetbot and Ivory are absolutely perfect for this; I can open the app, scroll a little bit, and then leave and come back later. It’s been part of my daily routine to scroll through the tech news and gossip each morning as I start my day.
\n\nSadly no RSS reader seems to have quite the interface I want. So I’m going to describe it in enough detail that someone can find one for me, or some enterprising developer can implement it.
\n\nThe main interface would of course be a reverse-chronological list of posts (oldest at the bottom), with the key feature that your scrolling position would remain where you left it. New posts would be loaded “above” your scroll position, so you would just continue to scroll up as you read through the feed. The feed should make use of article and feed images to present a visually engaging view, rather than a simple list of titles.
\n\nIt’s not a use case that I particularly care for, but if I were making this app it’s something I’d be sure to handle: viewing a single feed. This doesn’t really mesh too well with the main list of posts, but wouldn’t be an insurmountable UI challenge. What I would probably do would allow viewing a single feed (or group of feeds) like you’d open a user profile on a social media app. Except that you’d be put into another reverse-chronological feed in the same position as the main feed—but just for posts from that publication. You could then scroll through the single feed, and once you were at the top there would be an option to clear those posts from the main feed. That way you could swap back to the main feed and continue reading without repeating posts. This would be useful if a single feed has dumped a big collection of posts and you just want to see if there’s something interesting, otherwise get them out of the way.
\n\nThe second most important feature would be a built-in read-later service. I switched from Instapaper to GoodLinks and am very happy with it so far, but I would be a lot happier if it were built right into my feed reader. I’ll often come across an interesting post, but won’t have time or be in the mood for reading a longer or more technical post. Ideally in this case I could just mark it for reading later, without having to share the post to a different app (even if that other app is very good). This would unlock the ability to read half a post, realise you’ve run out of time, and then just close the article and have it automatically saved for later—with your position already saved.1
\n\nAutomatically saving posts would definitely be a UX challenge. You don’t want to flag every single post that gets opened as “read later”, but you also don’t want to have the interaction be unreliable. I would probably lean towards just having a very convenient “close and keep for later” button that is just as easily accessible as swiping back to exit the article.
\n\nThe next UI challenge would be presenting the main feed and the read-later feed in such a way that neither appears to be playing second fiddle to the other, while also making it easy to swap between them. Perhaps you’d automatically switch between the two depending if there were new posts? Or maybe that would just end up being annoying.
\n\nThe app would need all the features of a good read-later app; saving links from other apps, presenting web pages in a friendlier reader view, saving reading progress, and saving pages for offline reading.
\n\nA problem that I would like solved—but I’m not sure if link-sharing APIs allow for this—is knowing where the link was shared from. I find myself getting to the bottom of a post that I’ve saved from somewhere and thinking “oh whoever shared this obviously has excellent taste, I should see what else they do” but have no good way to find where I got it from. Alternatively I think of someone that I should share it with, only to find out that they were the person that sent it to me in the first place.
\n\nI have considered creating a second Mastodon account and just subscribing to the feeds of websites I follow (or using RSS-to-activitypub translators), and adding this account to Ivory. What stops me from doing this is that it would only get me half way there—no read-later integration—and Ivory would be doing double duty, meaning I’d have to switch accounts constantly.
\n\nIf you’re looking for some fresh feeds, I quite like grumpy.website (examples of frustrating UI design), and Pixel Envy (links and commentary on technology with a focus on privacy and open design, which is my jam).
\nIt’s not too uncommon for me to save stuff straight to GoodLinks if I think I might not read it in full immediately, so I don’t have to find where I got up to later. ↩
\nRecently I attempted to learn about Swift’s async support by doing my favourite thing—writing an RPC framework. In this case the “RPC framework” is just a request/response abstraction over websockets (which are message-based), which makes the actual RPC bit very simple, as all it’s really doing is wrapping some objects and matching responses to requests.
\n\nIn doing this, I think I went through all five stages of grief1, which often happens when I try and use Swift on Linux—despite my previous excitement about it.
\n\nSo first of all I found the documentation for URLSession#webSocketTask(with:)
. At first glance the API seemed pretty reasonable. I had a quick read over some blog posts and ended up with some code to test out:
let task = URLSession.shared.webSocketTask(\n with: URL(string: \"ws://brett:9080\")!)\ntry! await task.send(.string(\"test message\"))\n
This seems pretty easy, I create a websocket task and then send a message using it. The message should be received by a simple Crystal HTTP::WebSocketHandler
and logged, so I know when it’s working.
I run the program, and it just hangs. No error, no timeout (at least not one that I was patient enough to wait for). Now there isn’t anything that I can see from the documentation that I’m missing (mostly because there is no documentation for send(_:)
).
Eventually I look back over the blog posts and see that you need to call resume()
on the URLSessionWebSocketTask
for it to do anything.
This is very frustrating. If I were writing the documentation for this class, I would make sure that the requirement to call resume()
was the first thing anyone saw when looking at the docs. Currently you have to go to the URLSessionTask
superclass and find the resume()
method docs which state:
\n\n\nNewly-initialized tasks begin in a suspended state, so you need to call this method to start the task.
\n
A friendly API would raise an exception if you tried to use it before it was ready—failing fast is going to reveal your problem more readily than silently doing the wrong thing. However, I don’t know enough about the wider URLSession
API know whether there’s a design tradeoff here that makes failing fast impractical.
Ok so I’ve wasted a bunch of time trying to work out what’s wrong all because my task was suspended. Never mind, at least I know what the problem is now. I add the resume()
call and now I get:
Fatal error: 'try!' expression unexpectedly raised an error: Error Domain=NSURLErrorDomain Code=-1002 \"(null)\"\nCurrent stack trace:\n0 libswiftCore.so 0x00007fecbfa6eb80 _swift_stdlib_reportFatalErrorInFile + 112\n1 libswiftCore.so 0x00007fecbf76043f <unavailable> + 1442879\n2 libswiftCore.so 0x00007fecbf760257 <unavailable> + 1442391\n
Hmm an NSURLErrorDomain
problem. A -1002
problem to be precise. This is my first rodeo in Swift-networking-land so I don’t know what a -1002
means off the top of my head. Eventually I find some info that points me to this list of all the error codes. Hilariously it doesn’t include the code in the list—just the name—so you have to open each case one-by-one until you find the one that matches your error code. The fourth from last one turned out to by my error: NSURLErrorUnsupportedURL
.
Immediately I start thinking of all the possible ways that you could consider a URL unsupported, maybe the ws://
scheme should be wss://
? or maybe it won’t handle hostnames and needs an IP address? Perhaps I’ve messed something up in my container2 and it’s counting a closed port as an unsupported URL? (a bizarre thing to do, but at this stage all bets were off).
So maybe URLSessionWebSocketTask
is a lost cause, but SwiftNIO is always an option. I won’t go into this too much, but basically I stumbled at the first hurdle when I followed this post to add SwiftNIO as a dependency. I don’t really understand all the moving pieces here but basically:
dependencies: [\n .package(url: \"https://github.com/apple/swift-nio\", from: \"2.58.0\")),\n],\n.executableTarget(\n name: \"WebSocketRPC\",\n // bad, doesn't work\n dependencies: [\"SwiftNIO\"],\n // also no good\n dependencies: [\"NIOWebSocket\"],\n // perfect and excellent\n dependencies: [.product(name: \"NIOWebSocket\", package: \"swift-nio\")]\n)\n
Why do I need a .product
instead of just a string? No idea, and I couldn’t find this mentioned anywhere in the SPM documentation. I happened to stumble across an NIO example project and looked at the Package.swift
file to find this.3
However after learning more about the SwiftNIO websocket implementation, it seems that I would need to handle much more of the underlying protocol and HTTP-to-websocket upgrade than I had expected. The example websocket client has over 200 lines to do the same thing I was hoping to accomplish in two.
\n\nMaybe websockets aren’t that cool anyway, what if I just use plain old HTTP? Maybe this will help me understand whatever I’m doing wrong with the websocket API. While I’m at it, why don’t I translate the callback-based API into an async
one—that was the original purpose of this exercise in the first place, right?
func download(url: URL) async throws -> String {\n return try await withCheckedThrowingContinuation { (continuation: CheckedContinuation<String, Error>) in\n let task = URLSession.shared.dataTask(with: url) { (data, response, error) in\n if let error = error {\n continuation.resume(throwing: error)\n } else if let data = data {\n continuation.resume(returning: String(data: data, encoding: .utf8)!)\n } else {\n fatalError(\"impossible?\")\n }\n }\n task.resume()\n }\n}\n\nprint(try! await download(url: URL(string: \"https://willhbr.net\")!))\n
And that just works first time? That’s definitely weird. Was this an excuse to include some tidy callback-to-async code? Maybe.
\n\nAt this point my curiosity got the better of me—would it work on MacOS? Maybe I would get a better error and suddenly understand what was going wrong?
\n\nAfter a bit of an adventure with xcrun
(it turns out you can’t use the Swift compiler that’s installed with the Xcode Command Line Tools), I installed Xcode and ran the exact code I had been trying on Linux for hours.
And it worked first time without any issues. The most frustrating result.
\n\nEventually I found this GitHub issue linked from a project’s README:
\n\n\n\n\nfatalError when trying to send a message using URLSessionWebSocketTask
\n\n…
\n\nThat code runs perfectly fine under macOS (using Swift 5.7), but as soon as it’s run on Linux I get the error from above.
\n
A few people chime in saying they see the same issue, and then this comment points to this page of the libcurl
documentation:
\n\n\nWebSocket is an EXPERIMENTAL feature present in libcurl 7.86.0 and later. Since it is experimental, you need to explicitly enable it in the build for it to be present and available.
\n
So if your underlying library doesn’t support websockets, it makes sense that a websocket URL is unsupported.
\n\nI don’t have much of a conclusion here, apart from the fact that this was a very frustrating journey. I’m sad to see that almost eight years after being open-sourced and supporting Linux, Swift is still full of subtle traps that are hard to debug. Hopefully the Swift Server Working Group is aware of these issues and continues to make improvements—a simple @available
annotation would have saved a lot of time.
Yeah I know the titles don’t really match the content, I just did this for a funny title, alright? ↩
\nOf course I’m running this in a container. ↩
\nWhile writing this I did end up finding that towards the bottom of the readme for SwiftNIO there is a “Getting Started” section that has the correct incantations. Only after you’ve read past the conceptual overview, repository organisation, and versioning scheme, however. ↩
\nYesterday I came across Warp Terminal via their advertisement on Daring Fireball.1 Immediately I was fascinated to know what their backwards-compatibility story was, and how their features were implemented. This is in a similar vein to the difficulties of modernising shells, that I wrote about in more detail last month.
\n\n\n\n\nIf you’re not sure of the difference between a terminal and a shell, The TTY demystified is a really good read to understand the history and responsibilities of both. Basically the terminal emulator pretends to be a computer from 1978, and the shell runs inside of that.
\n
I only spent about half an hour playing around with Warp, so my impressions are not particularly well-informed, it’s still in beta so many of these issues could be on a roadmap to fix. I didn’t look at any of the AI or collaboration features, I’m only interested in the terminal emulation and shell integration.
\n\nWhat sets Warp apart from other terminal emulators is that it hooks into the shell and provides a graphical text editor for the prompt, rather than using the TTY. For normal humans that are used to the standard OS keyboard shortcuts, and being able to select and copy text in a predictable way this is an excellent feature. The output from each command you run lives in a block, which stack up and scroll off the screen. In the prompt editor, autocomplete and other suggestions are native UI, not part of the TTY. They can be clicked, support non-monospaced fonts, and many other UI innovations from the last 40 years.
\n\nIn their blog post “How Warp Works” there is a brief explanation of how they integrate with the shell.2 Basically they use callbacks within popular shells (ZSH, Bash, and Fish) to know when the command is started. If my interpretation of this is correct, they do away with the shell prompt entirely, and instead use their non-shell editor to allow the user to write their command, then they pass the whole finished command to the shell, and use hooks in the shell to know when to cut off the output and create a new block.
\n\nWhat this means is that Warp has some significant limitations on what it can “warpify”. Only the input to the shell prompt gets the magic editor experience, if you run another interactive program (like irb
) then you’re back to inputting text like it’s the ’70s. You can tell Warp to inject some code into certain commands, but this will only work in the aforementioned shells. If the command doesn’t understand POSIX shell syntax with the functions that Warp expects, it won’t work.
So by default, if you start your login shell and then run bash
to start a sub-shell, that sub-shell will miss out on the Warp features. I’m aware that this argument is entirely a “perfect solution” fallacy but hey, someone’s got to advocate for a perfect solution.
What is nice is that if you run a command that uses the “full screen” TTY, it will just work—the block takes up the whole screen while the command is running. You can still run vim
and tmux
, so if this takes over I’ll still be able to get things done.
The prompt editor is definitely good if you’re not used to working with a traditional shell, but since I’m used to having Vim mode in ZSH, going back to a normal editor feels broken. Also since the editor is split out from the shell, autocompletions are in a separate system. I have a few custom autocompletes setup in ZSH, and not being able to access those in the editor was frustrating. I’d type gcd <TAB>
, expecting to see a list of my projects, but instead just get a list of the files in the current directory. I assume there’s some way of piping this information into Warp, but it’s a shame they don’t (yet?) have integration to pull this straight from ZSH.
The autocompletes that I did get were mostly good—files or arguments from my shell history—but I did get a few weird suggestions. I tried ssh
and was suggested a bunch of hosts with names that were some base64-encoded junk. None of these appeared in my shell history of SSH config files.
\n\n\nI said I wasn’t going to look at any of the AI features, but then I connected to my server to see how the
\ndialog
command worked. The answer was that it wasn’t installed. Warp then said “✨ Insert suggested command:dig 13:02:20
”. I don’t know how it made the leap in logic from “command not found” to “do a DNS lookup”, or why it wanted to suggest passing the current time to the DNS lookup—it was 1:02PM UTC when that suggestion popped up.
Warp is another example of how hard it is modernise things that directly interact with the underlying OS concepts. Perhaps Warp can partner with the nushell
developers and reinvent the shell and terminal at the same time.
In the end I’m obviously not going to move away from using iTerm. Warp is solving a bunch of problems that I don’t have, and adding a whole suite of AI features that I have no interest in. If you are a fairly light terminal user, and get frustrated at editing commands in the traditional shell prompt, then maybe Warp is for you. Use my referral code so I can get a free t-shirt.
\n\n\n\nYou get like 80% of the benefit of using Warp’s fancy editor by knowing that in the MacOS terminal, option-click will move the cursor around by sending the appropriate arrow keys to the shell.
\n
The curse of knowledge is the idea that as you become more of an expert in an area, it becomes harder to explain basic concepts in that area, because your assumed based level of knowledge is much greater than the typical level of understanding. Basically you might try and explain at an undergraduate level, but in reality you need to start from a high school level and build up from there. You forget the difficulty of grasping the key concepts of the topic.
\n\nA similar phenomenon happens when you try and make a “simple” version of something, which requires you to become an expert in the thing you’re attempting to simplify. Once you’ve become an expert, you understand the edge cases, tradeoffs, and other complexities in the system, and often you’re able to use the complex thing without needing it to be simplified, and appreciate why it is not simple in the first place. You’re then left to explain the subtleties of this complex system to people that have yet to make the leap in understanding—and experience the difficulty of explaining something it in basic terms.
\n\nI went through this whole process with tmux. Before I was a certified tmux nerd, I wanted a simpler way of configuring and controlling my tmux panes. The binding and manipulation controls seemed too limited, I wanted to be able to send commands to different tabs and split the output of commands to different panes. I managed to do some of this by hacking small scripts together, but I wanted a solution that would unify it all into one system.
\n\n\n\n\nThere are a few projects that do similar things (like
\ntmuxinator
), but they are mostly focussed on automatic pane/window creation, rather than adding scripting to your interaction with tmux.
So I spent months learning the ins and outs of tmux’s command-line interface, and the functionality available in control mode. Eventually I had a program that ran alongside tmux and provided an object-oriented scripting interface to basically the entirety of tmux. You could do something like:
\n\nserver.on_new_session do |session|\n session.on_new_window do |window|\n window.panes.first.split :vertical\n end\nend\n
Under many layers of abstraction, this would listen for events in tmux, run the associated Ruby code, and send any commands back to tmux if the model had changed. It was a wonderful hack, and I’m still very happy with how it all fit together.
\n\nHowever, in doing so I learnt a lot about the tmux CLI, and started to get a fairly in-depth understanding of how it had been designed.
\n\nOk I need to share just how neat the tmux API is. It’s all really well documented on the man page. Control mode outputs tmux events to stdout, so if you read from that process you can receive what’s happening with every tmux session on a server—input, output, layout changes, new windows, etc. You can also write commands into stdin of the control mode process, and their output will be returned as a control mode message.
\n\nMost tmux commands print some kind of output, by default it’s somewhat human-readable, intended to display in a terminal. Take tmux list-sessions
as an example:
$ tmux list-sessions\nflight-tracker: 2 windows (created Fri Jul 28 10:41:53 2023)\npixelfed-piper: 1 windows (created Fri Jul 28 11:14:18 2023)\npod: 3 windows (created Sat Jul 29 03:17:47 2023)\nwillhbr-github-io: 2 windows (created Fri Jul 28 11:13:50 2023) (attached)\n
It would be really annoying to write a script to parse that into a useful data structure (especially for every single command!), and thankfully we don’t have to! Every tmux command that prints output also supports a format string to specify what to print and how to print it:
\n\n$ tmux list-sessions -F '#{session_id}||#{session_name}||#{session_created}'\n$1||flight-tracker||1690540913\n$3||pixelfed-piper||1690542858\n$4||pod||1690600667\n$2||willhbr-github-io||1690542830\n
The only logical thing for me to do was write an RPC-like abstraction over the top of this, with macros to map fields in the generated format string to attributes on the objects that should be returned. This allowed me to build a fairly robust abstraction on top of tmux.
\n\nAfter that I started learning about all the features that tmux supports. Almost every option can be applied to a single pane (most normal people would apply them globally, but if you want they can be applied to a just one session, window, or pane)—so if you want one window with a background that’s unique, you can totally do that. You can also define hooks that run when certain events happen. You can remap keys (not just after the prefix, any key at all) and have arbitrary key “tables” that contain different key remappings. Windows can be linked for some reason—I still don’t know what this would be used for—and you can pipe the output of a pane into a command. Exactly how all these features should be used together is left as an exercise for the user, but they’re all there ready to be used.
\n\nWith this much deeper understanding of how to use the tmux API, I no longer really needed a scripting abstraction, I was able to pull together the existing shell-based API and do the handful of things that I’d be aiming to accomplish (like my popup shell). I’d basically cursed myself with the knowledge of tmux, and now a simple interface wasn’t necessary. So I abandoned the project.
\n\nOne of my software development Hot Takes™ is that git has an absolutely awful command-line interface.1 The commands are bizarrely named, it provides no guidance on the “right” or “recommended” way of using it,2 and because of this it is trivial to get yourself in a situation that you don’t know how to recover from. Most git “apologists” will just say that you should either use a GUI, or just alias a bunch of commands and never deviate from those. The end result being that developers don’t have access to the incredibly powerful version control system that they’re using, and constantly have to bend their workflow to suit the “safe” part of its API.
\n\nThe easiest example of something that I would like to be able to do in git is a partial commit—take some chunks from my working copy and commit them, leaving the rest unstaged. The interface for staging and unstaging files is already fairly obtuse, and then if you want to commit only some of the changes to a file, you’re in for a whole different flavour of frustration.
\n\ngit add
stages a file (either tracked or untracked)git restore --staged
removes a file from being stagedgit restore
discards changes to an unstaged fileWhy we haven’t settled on a foo
/unfoo
naming convention completely baffles me. stage
/unstage
and track
/untrack
tell you what they’re doing. restore --staged
especially doesn’t match what it does—the manual for git-restore
starts out saying it will “restore specified paths in the working tree with some contents from a restore source”, but it’s also used to remove files from the pre-commit staging area? That doesn’t involve restoring the contents of a file at all. Just read the excellent git koans by Steve Losh to understand how I feel trying to understand the git interface.3
What I really want is an opinionated wrapper around git that will make a clear “correct” path for me to follow, with terminology that matches the actions that I want to take. Of course the only correct opinionated wrapper would be my opinionated wrapper, which means I need to make it. And of course for me to make it, I need to have a really good understanding of how git works—so that I can make an appropriate abstraction on top of it.
\n\nSo this is where I’ve ended up, I want to make an abstraction over git, which would require me to learn a lot about git. If I learn enough about git to do this, I will become the thing that I’ve sworn to destroy—someone who counters every complaint about git with “you just have to think of the graph operation you’re trying to achieve”.
\nIs it a hot take when you’re right? I guess not. ↩
\nThis would probably be considered a feature to many people, which I suppose is fair enough. ↩
\nTo be honest, much of this is probably because I forged my git habits back around 2012, and since then a lot of commands have been renamed to make more sense. I’m still doing git checkout -- .
to revert unstaged files and it makes absolutely no sense—isn’t checkout
for changing branches? ↩
Avid readers will know that I like to fly my drone around the beaches in Sydney. The airspace is fairly heavily trafficked, and so I take the drone rules very seriously. This means no flying in restricted airspace (leading to other solutions for getting photos in these areas), no flying in airport departure or arrival paths, and no flying above the 120m ceiling (or 90m in certain areas). This is easily tracked with a drone safety app (I’m a big fan of ok2fly).
\n\nWhat is more difficult is flying a drone in an area that may have other aircraft nearby. The drone rules state:
\n\n\n\n\nIf you’re near a helicopter landing site or smaller aerodrome without a control tower, you can fly your drone within 5.5 kilometres. If you become aware of manned aircraft nearby, you must manoeuvre away and land your drone as quickly and safely as possible.
\n
This basically means that if a helicopter turns up, you should get the drone as low as possible and land as quickly as possible. In theory, crewed aircraft should be above 150m (500ft), with a 30m (100ft) vertical gap between them and the highest drones. However on the occasions where there have been helicopters passing by, to my eye they seem to be much closer than that, which makes me anxious—I want my drone to remain well clear of any helicopters.
\n\nVirtually all aircraft carry an ADS-B transmitter which broadcasts their GPS location to nearby planes and ground stations. They use this location to avoid running into each other, especially in low-visibility conditions. Flight-tracking services like flightradar24 aggregate this data globally and present it on a map.
\n\nMy first idea was to write an app that would stream the ADS-B data from a service like flightradar24 for any aircraft in the nearby airspace, and sound an alert if an aircraft was on a trajectory that would intersect with my location. This would be great, but it would be a lot of work, require some kind of API key and agreement from the data provider, and ongoing use would require paying the annual $99USD/$150AUD Apple developer program fee.1
\n\n\n\n \n\nThe next best idea was to setup a Stratux ADS-B receiver using a Raspberry Pi. This would either allow me to pull data from it to my phone (no need to deal with API keys and suchlike) or do all the processing on the Pi (no need to deal with developer restrictions). While this would have been cool, it would have also cost a bit to get all the components, and working out some kind of interface to an otherwise-headless RPi seemed like a frustrating challenge.
\n\nAfter considering these two options for a while I settled on a completely different third option. Instead of building something to alert me in real time, I could just work out which beaches would have nearby aircraft at what times of day, and avoid flying during those times. This is when I came across the OpenSky Network, a network of ADS-B receivers that provides free access to aircraft locations for research purposes. So all I had to do was get the data from Opensky for aircraft in Sydney, and then visualise it to understand the flight patterns around the beaches.
\n\nOpensky has a historical API with an SQL-like query interface, as well as a live API with a JSON REST interface. I requested access to the historical data, but was informed that they only provide access to research institutions due to the cost of querying it. So to make do I wrote a simple program that would periodically fetch the positions of aircraft within the Sydney area. This data was then saved to a local SQLite database so I could query it again later. Since the drone rules also forbid flights during the night, I only needed to fetch data during civil daylight hours.
\n\nTo visualise the data, I used my hackathon-approved map rendering solution: get a screenshot of Open Street Map and naively transform latitude/longitudes to x/y coordinates. After messing up the calculation a bunch, I got a map with a line for every flight, which looked something like this:
\n\n\n\nEventually after staring at this map2 for a long time, I realised that most helicopter (or rotorcraft as they are referred to in the API) routes went from north from the airport, passed along the western side of the city, directly over the Harbour Bridge, did a few loops over the harbour (as seen in the map above), exited the harbour by Watson’s Bay, then turned south and hugged the coastline along the beaches, before finally turning west at Maroubra to get back to the airport.
\n\nI finally had the realisation that probably should have been fairly obvious a long time before this—all these helicopters are tourist flights, repeating the same route over and over again. Sure enough if I search for “helicopter sight seeing Sydney” I find the website for a helicopter tour company that does the exact route I saw plastered over my map. Optimistically I emailed them asking how many flights they usually flew in a day, and what time their earliest flight was—this would give me enough information to make a reasonably informed decision about when was best to fly my drone. Sadly they said they couldn’t share this information with me.
\n\nOk so I would have to do some more data visualisation to work this out for myself. First of all I filtered out any data points that were above 200 metres, since they would be well clear of any drones.
\n\n\n\nThere are some interesting things in this map:
\n\nI then compared that with the same view over the northern beaches:
\n\n\n\nIt’s worth noting that all the maps contain data for just over one month of flights. There is definitely still a large number of flights going up the coast, but they thin out significantly as you get further north, especially past Long Reef—the headland south of Collaroy beach. I was surprised to see that no aircraft fly over the harbour side of Manly, they instead follow the water out the harbour entrance.
\n\nA friend suggested a nice way of visualising the data: plot the time of day on one axis, and the position down the coast on the other, and create a heatmap of the highly-trafficked times/areas. In theory you should be able to see a line for each flight flying down the coast. Sadly my matplotlib
skills aren’t that good, so this is the best I could come up with:
The left axis is the latitude (limited in range from Bondi to Maroubra) and the bottom axis is the fraction of the day (eg 0.5 is midday). Using this we can see that the bulk of flights start at 0.4, which is 9.6 hours into the day, or 9:36 AM. Which makes sense for tourist flights, since passengers presumably have to sign some waivers and do a safety briefing, and they’re not going to want to get out of bed too early. I added the ability on my map to filter out flights past a certain time of day, and sure enough if I only look at flights before 10:00am, the sky is much clearer.
\n\nArmed with this new knowledge, I can make some more informed decisions about when to fly my drone around the beaches in Sydney. I’m just not going to bother flying during the middle of the day anywhere between Bondi and Maroubra, if I want to fly there I’ll do it just after sunrise—which will give me better light3 anyway. Flying in the further north beaches is still an option, but I will still want to position myself somewhere with a good view up and down the coast to see other aircraft coming. Since the flight paths are much more predictable than I had expected, if I did make some kind of alerting system, I could simply trigger whenever an aircraft exited the harbour, since their next move is likely to be up or down the coast.
\n\nOf course the most important thing—and the lesson I hope you take away from this—is to follow the rules, always check airspace restrictions before flying, be aware of your surroundings, and if in doubt just descend and land as promptly as possible. Don’t use a few map screenshots from someone’s blog as guidance on where to fly your drone.
\n\nMap data © OpenStreetMap contributors.
\n\nFlight data from OpenSky:
\n\n\n\nBringing up OpenSky: A large-scale ADS-B sensor network for research\nMatthias Schäfer, Martin Strohmeier, Vincent Lenders, Ivan Martinovic, Matthias Wilhelm\nACM/IEEE International Conference on Information Processing in Sensor Networks, April 2014
\n
The next step in my containerising journey is setting up Prometheus monitoring. I’m not going to use this for alerts or anything fancy yet, just to collect data and see what the load and health of my server is and be able to track trends over time. In doing this I wanted:
\n\nThere are plenty of existing posts on setting up Prometheus in a container, so I’ll keep this short. I used pod to configure the containers:
\n\ncontainers:\n prometheus:\n name: prometheus\n image: docker.io/prom/prometheus:latest\n network: prometheus\n volumes:\n prometheus_data: /prometheus\n bind_mounts:\n ./prometheus.yaml: /etc/prometheus/prometheus.yml\n ports:\n 9090: 9090\n labels:\n prometheus.target: prometheus:9090\n\n podman-exporter:\n name: podman-exporter\n image: quay.io/navidys/prometheus-podman-exporter:latest\n bind_mounts:\n /run/user/1000/podman/podman.sock: /var/run/podman/podman.sock,ro\n environment:\n CONTAINER_HOST: unix:///var/run/podman/podman.sock\n run_flags:\n userns: keep-id\n network: prometheus\n labels:\n prometheus.target: podman-exporter:9882\n\n speedtest:\n name: prometheus_speedtest\n image: docker.io/jraviles/prometheus_speedtest:latest\n network: prometheus\n labels:\n prometheus.target: prometheus_speedtest:9516\n prometheus.labels:\n __scrape_interval__: 30m\n __scrape_timeout__: 2m\n __metrics_path__: /probe\n
prometheus
contains the actual Prometheus application, which has its data stored in a volume. podman-exporter
exports Podman container metrics, accessed by mounting in the Podman socket.1 speedtest
isn’t essential, but I was curious to see whether I had any variations in my home internet speed, and running one more container wasn’t difficult. This also forced me to work out how to customise the scraping of jobs configured via Prometheus HTTP service discovery.
To meet my first requirement of having no global config, I needed to setup some kind of automatic service discovery system. Prometheus supports fetching targets via an HTTP API—all you have to do is return back a list of jobs to scrape in a basic JSON format. Since I already run a container that shows a status page for my containers (more on that another time, perhaps) I have an easy place to add this endpoint. You just need to add the endpoint into your prometheus.yaml
config file once:
scrape_configs:\n - job_name: endash\n http_sd_configs:\n - url: http://my_status_page:1234/http_sd_endpoint\n
That endpoint returns some JSON that looks like this:
\n\n[\n {\n \"targets\": [\"prometheus:9090\"],\n \"labels\": {\n \"host\": \"Steve\",\n \"job\": \"prometheus\",\n \"container_id\": \"4a98073041d6b\"\n }\n },\n {\n \"targets\": [\"prometheus_speedtest:9516\"],\n \"labels\": {\n \"host\": \"Steve\",\n \"job\": \"prometheus_speedtest\",\n \"container_id\": \"db95c10b425cc\",\n \"__scrape_interval__\": \"30m\",\n \"__scrape_timeout__\": \"2m\",\n \"__metrics_path__\": \"/probe\"\n }\n }\n]\n
targets
is a list of instances to scrape for a particular job (each container is one job, so only one target in the list). labels
defines additional labels added to those jobs. You can use this to override the job name (otherwise it’ll unhelpfully be the name of the HTTP SD config, in my case endash
) and set some of the scrape config values, if the target should be scraped on a different schedule.
My status dashboard has an endpoint that will look at all running containers and return an SD response based on the container labels. This allows me to define the monitoring config in the same place I define the container itself, rather than in some centralised Prometheus config. You can see in my pods.yaml
file (above) that I use prometheus.target
and prometheus.labels
to make a container known to Prometheus as a job.
The thing that really makes this all work is Podman networks. The easiest way to get Prometheus running is to run it on the host
network, so that it doesn’t run in its own containerised network namespace. So when it scrapes some port on localhost
that’s the host localhost
, not the container localhost
. This works reasonably well if all your containers publish a port on the host. This is definitely an acceptable way of setting things up, but I wanted to be able to run containers without published ports and still monitor them.
You can do this by creating a Podman network and attaching any monitor-able containers to it, so that they are accessible via their container names:
\n\n> podman network create prometheus\n> podman run -d --network prometheus --name network-test alpine:latest top\n> podman run -it --network prometheus alpine:latest\n$ ping network-test\nPING network-test (10.89.0.16): 56 data bytes\n64 bytes from 10.89.0.16: seq=0 ttl=42 time=0.135 ms\n64 bytes from 10.89.0.16: seq=1 ttl=42 time=0.095 ms\n...\n
The one wrinkle of using a Podman network is that it makes accessing non-container jobs more difficult. I wanted to setup node_exporter
to keep track of system-level metrics, and it can’t run in a container as it needs full system access (or at least, it doesn’t make sense to run in a container). Thankfully this ended up being super easy, I can just install node_exporter
via apt
:
$ sudo apt install prometheus-node-exporter\n
Which will automatically start a service running in the background and serving metrics on localhost:9100/metrics
. To access this from our Prometheus container, you can just use the magic hostname host.containers.internal
, which resolves to the current host. For example:
> podman run -it alpine:latest\n$ ask add curl\n$ curl host.containers.internal:9100/metrics\n... a whole bunch of metrics\n
So I have to add one static config into my prometheus.yaml
file:
scrape_configs:\n - job_name: steve\n static_configs:\n - targets: ['host.containers.internal:9100']\n
So now I’ve got a fully containerised, automatic monitoring system for anything running on my home server. Any new containers will get picked up by podman-exporter
, and get their resource usage recorded automatically. If I integrate a Prometheus client library and export metrics, then I can just add monitoring config to the pods.yaml
file for that project, and have my service discovery system pick it up and have it scraped automatically.
\n\nI’ve added a lot of functionality to pod since I first wrote about it, I’m aiming to get it cleaned up and documented better soon.
\n
This obviously gives the exporter full access to do anything to any container, so you’ve just kinda got to trust it’s doing the right thing. ↩
\n