Why Modernising Shells is a Sisyphean Effort

Anyone that knows me is probably aware that I spend a lot of time in the terminal. One of the many things that I have wasted time learning is the various oddities of shell scripting, and so I am cursed with the knowledge of the tradeoffs in their design. It seems to be something that most people don’t appreciate. Your shell has to find a balance between getting out of your way for interactive use, and being the best way to link together multiple unrelated programs to do something useful. The Unix philosophy of having many small tools, each dedicated to one simple job means that you can more easily replace one with an alternative, or a new tool doesn’t have to reinvent the wheel before it can be useful.

The problem is that to most people, the shell is completely inscrutable. Even experienced programmers who have no problem juggling many other programming languages will get into a muddle with even a simple shell script. To be honest, you can’t really blame them. Shell languages are full of bizarre syntax and subtle traps.

The root of the problem is POSIX; it defines the API for most Unix (and Unix-like, e.g: Linux) operating systems. Most important is the process model. A POSIX process receives arguments as an array of strings, input as a stream of bytes, and can produce two streams of output (standard output and error). Unless you’re going to redesign the whole operating system1, you’ve got to work within this system.

POSIX does also define the syntax for the shell language, which is why Bash, ZSH, and other shells all work in a similar way. Fish, xonsh, nushell, and Oil are not entirely POSIX compatible, and so are free to alter their syntax.

What sets a shell apart from other languages is that external programs are first-class citizens2, you don’t have to do anything special to launch them. If you type git status the shell will go off and find the git program, and then launch it with a single argument status. If you were to do this in Ruby, you’d have to do system('git', 'status')—more fiddly typing, and completely different from calling a function.

So if you want programs to fit in just the same as shell functions, your functions need to work like POSIX processes. This means they can’t return something—just input and output streams—and their arguments must be handled as strings. This makes implementing a scripting language that can be compared to Ruby or Python basically impossible. The constraints of having all your functions act like processes hampers your ability to make useful APIs.

This makes it really difficult for your shell language to support any kind of strong typing—since everything passed to any command or function needs to be a string, you’re constantly reinterpreting data and risking it being reinterpreted differently. Having everything be handled like a string is consistent with how programs run (they have to work out how to interpret the type of their arguments) is a constant source of bugs in shell scripts.

My favourite fun fact about shells is that some of the “syntax” is actually just a clever use of the command calling convention. For example, the square bracket in conditionals is actually a program called [.

xonsh is a new shell that merges Python and traditional shell syntax, except it does it by trying to parse the input as a Python expression, and if that doesn’t make sense it assumes it should be in shell mode. This gets scripting and interactive use tantalisingly close, except it seems to me (without having used xonsh) that it would end up being unpredictable, and you would have to always be aware of the fact you’re straddling two different modes at all times.

nushell attempts to solve the problem in a different direction. It requires you to either prefix your command with an escape character or write an external command definition to have it be callable from the shell. This moves away from the typical design of shells, and relegates external programs to be second-class citizens. nu is really a shell in search of a new operating system—to really make the most of their structured-data-driven approach, you’d want a new process model that allowed programs to receive and emit structured data, so that all the features for handling that in the shell could be used on arbitrary programs without writing an external command definition first.

So if we’re too snobby to resort to parser tricks or fancy wrappers, what are we left with? Well we’ve got some serious constraints. The input space for command arguments is every single letter, number, and symbol. Any use of a special character for syntax makes it potentially harder for people to pass that character to commands, for example if + and - were used as maths operators, you’d need to quote every flag you passed: git add "--all" instead of git add --all, since the dashes would be interpreted as different syntax.

You’ve probably already come across this using curl to download a URL with query parameters:

$ curl https://willhbr.net/archive/?foo=bar
zsh: no matches found: https://willhbr.net/archive/?foo=bar
$ curl 'https://willhbr.net/archive/?foo=bar'
# ...

Since ? is treated specially in most shells to do filename matches, you have to wrap any string that uses it in quotes. Since so many people are used to dumping arbitrary strings unquoted as command-line arguments, you don’t want to restrict this too much and force people to carefully quote every argument. It’s easy to start an escaping landslide where you keep doubling the number of escape characters needed to get through each level of interpolation.

oil is the most promising next-generation shell, in my opinion. From a purist perspective, it does treat functions and commands slightly differently, as far as I can see. This does look like it’s done in a very well thought out way, where certain contexts appear to take an expression instead of a command. This is best understood by reading this post on the Oil blog.

# the condition is an expression, not a command so it can have operators
# and variables without a `$` prefix.
if (x > 0) {
  echo "$x is positive"
}
# you can still run commands inside the condition
if /usr/bin/false {
  echo 'that is false'
}

Once you’ve split the capabilities of functions and commands, you might as well add a whole set of string-processing builtin functions that make grep, sed, cut, awk and friends unnecessary. Being able to trivially run a code block on any line that matches a regex would be excellent. Or being able to use code to specify a string substitution, rather than just a regex.3

There’s also a third dimension for any shell, and that’s how well it works as an actual interface to type things into. The syntax of the Oil ysh shell is better than ZSH, but in ZSH I can customise the prompt from hundreds of existing examples, I can use Vim keybindings to edit my command, I have syntax highlighting, I have integration with tools like fzf to find previous commands, and I have hundreds of lines of existing shell functions that help me get things done. And to top it all off, I can install ZSH on any machine from official package sources. Right now, it’s not worth it for me to switch over and lose these benefits.

  1. Which doesn’t seem to be something many people are interested in; we’re pretty invested in this Linux thing at this point. 

  2. Except for modifying variables and the environment of the shell process. 

  3. I know I can probably somehow do all this with awk. I know that anything is possible in awk. There are some lines I will not cross, and learning awk is one of them. 


Picking a Synology

One of the key characteristics you want from a backup system is reliability. You want to minimise the number of things that can fail, and reduce the impact of each failure for when they do happen. These are not characteristics that would be used to describe my original backup system:

a small computer sitting on a shoebox with an external HDD next to it, surrounded by a nest of cables

The first iteration of my backup system, running on my Scooter Computer via an external hard drive enclosure.

This setup pictured above evolved into a Raspberry Pi (featured unused in the bottom of that photo) with two external 4T hard drives connected to it. All my devices would back themselves up to one of the drives, and then rsnapshot would copy the contents of one drive across to the other, giving me the ability to look back at my data from a particular day. The cherry on top was a wee program1 that ran an HTTP server with a status page, showing the state of my backups:

screenshot of a webpage with a list of backup times in a table

My custom backup status page that told me whether I was still snapshotting my data or not.

Naturally, this system was incredibly reliable and never broke,2 but I decided to migrate it to a dedicated NAS device anyway. Synology is the obvious choice, they’ve got a wide range of devices, and a long track record of making decent reliable hardware.

With the amount of data that I’m working with (<4T) I could absolutely have gone with a 1-bay model. However this leaves no room for redundancy in case one disk fails, no room for expansion, and I already had two disks to donate to the cause. Two bays would have been a sensible choice, it would have allowed me to use both my existing disks and have redundancy if one failed. But it would have limited expansion, and once you’re going two bays you might as well go four… right? If I’m buying something to use for many years, having the ability to expand up to 64T of raw storage capacity is reassuring.

At the time that I was researching, Synology had three different four-bay models that I was interested in: the DS420+, DS418, and DS420j.

The DS420+ is the highest end model that doesn’t support additional drive expansion (there are some DS9xx+ models that have 4 internal bays and then allow you to expand more with eSATA). It runs an x86 chip, supports Btrfs, allows for NVMe flash cache, and can run Docker containers. It has hot-swappable drive bays and was released in 2020 (that’s the -20 suffix on the model name3).

The DS418 is the “value” model, it’s basically just the one they made in 2018 and kept around. It also runs an x86 chip, supports Btrfs, and can run Docker containers. It uses the same basic chassis as the DS420+, so also has hot-swappable drives.

The DS420j is the low-cost entry model, running a low-end ARM chip, no Btrfs support, no Docker, and a cheaper chassis with no hot-swappable drives.

Btrfs is a copy-on-write filesystem that never overwrites partial data. Each time part of a block is written, the whole block is re-written out to an unused part of the disk. This gives it the excellent feature of near-free snapshots. You can record some metadata of which blocks were used (or even just which blocks to use for the filesystem metadata) and with that you get a view into the exact state of the disk at that point in time, without having to store a second copy of the data. Using Btrfs would replace my existing use of rsnapshot, moving that feature from a userspace application to the filesystem.

This had initially pointed me towards the DS420+ or DS418. My concern with the 418 was the fact that it was already over 4 years old. I didn’t want to buy a device that was bordering on halfway though its useful lifespan (before OS updates and other software support stopped). The cost of the DS418 was only a little bit less than the DS420+, so if I was going to spend DS418 money, I might as well be getting the DS420+.

The other feature of the DS418 and DS420+ was Docker support—you can run applications (or scripts) inside containers, instead of in the cursed Synology Linux environment. I wasn’t planning on running anything significant on the Synology itself, it was going to be used just for backup and archival storage. Anything that required compute power would run on my home server.

Eventually I decided that the advantages of Btrfs and Docker support were not enough to justify the ~$300 price premium when compared to the DS420j. I already knew and trusted rsnapshot to do the right thing, and I could put that money towards some additional storage. The DS420j is a more recent model, and gives me the most important feature, which is additional storage with minimal hassle.

I’ve had the DS420j for about three months now, it’s been running almost constantly the entire time, and my backup system has moved over to it entirely.

The first thing I realised when setting up the DS420j is despite the OS being Linux based, it does not embrace Linux conventions. Critically it eschews the Linux permission model entirely and implements its own permissions, so every file has to be 777—world read and writable—for the Synology bits to work. This has knock-on effects to the SSH, SFTP, and rsync features; any user that has access to these has access to the entire drive. Since I’m the only user on the Synology, I’m not that bothered by this. The only reason I’d want different users is to have guarantees that different device backups couldn’t overwrite each other.

The best thing by far with the Synology is how much stuff is built in or available in the software centre. Setting up Tailscale connectivity, archives from cloud storage (eg Dropbox), and storage usage analysis was trivial.

The most difficult thing about moving to the Synology was working out how to actually move my data over. Archives of various bits were scattered across external hard drives, my laptop, and my RPi backup system. Since I was using the disks from the RPi in the Synology, I had to carefully sequence moving copies of between different disks as I added drives to the Synology (since it has to wipe the drive before it can be used).

During the migration having USB 3 ports on the NAS was excellent, with the RPi I’d be forced to copy things from over the network using another computer, but now I can just plug directly in and transfer in much less time. An unexpected benefit was that I could use an SD card reader to dump video from GoPros directly onto the Synology (since I knew I wasn’t going to get around to editing it). This will probably come in handy if I want to actually pull anything off the Synology.

At the moment I’m using 4.1T of storage (most of that is snapshots of my backups). According to the SHR Calculator I can add two more 4T drives (replacing my 2T drive) to get 12T of usable space, or two 8T drives to get 16T. Since my photo library grows at about 400G per year, I think my expansion space in the DS420j will be sufficient for a long time.4

  1. The program was written in Crystal, and those in the know will be aware just how painful cross-compilation to ARM is! 

  2. It actually only broke once when one of the disks failed to mount of all my data was spewed onto the mount point on the SD card, filling up the filesystem and grinding the whole thing to a halt. 

  3. Can you really trust your backups to a company that has a naming scheme that is going to break in a mere 77 years? 

  4. Until I get a Sony a7RV and the size of my raw photos almost triples. 


Why Crystal is the Best Language Ever

Crystal is a statically typed language with the syntax of a dynamically typed one. I first used Crystal in 2016—about version 0.20.0 or so. The type of projects I usually work on in my spare time are things like pod, or my server that posts photos to my photos website.

Type System

This is the main selling point of Crystal, you can write code that looks dynamically typed but it’ll actually get fully type checked. The reality of this is that if I know the type and the method is part of a public interface (for me that’s usually just a method that I’m going to be calling from another file), I’ll put a type annotation there. That way I usually only have to chase down type errors in single files. If I’m extracting out a helper method, I won’t bother with types. You can see this in the code that I write:

private def calculate_update(config, container, remote) : ContainerUpdate
  ...

The three argument types are fairly obvious to anyone reading the code, and since the method is private the types are already constrained by the public method that uses this helper. If I wrote this in Java it would look something like:

private ContainerUpdate calculateUpdate(
  Config config, Container container, Optional<String> remote) {
  ...

There’s a spectrum between language type flexibility and language type safety. Dynamic languages are incredibly flexible, you can pass an object that just behaves like a different object and everything will probably work. The language gets out of your way—you don’t have to spend any time doing work explain to the compiler how things fit together—it’ll just run until something doesn’t work and then fail. Languages that boast incredible type safety (like Rust) require you to do a bunch of busywork so that they know the exact structure and capabilities of every piece of data before they’ll do anything with it. Crystal tries to bend this spectrum into a horseshoe and basically ends up with “static duck typing”—if it’s duck shaped at compile time, it will probably be able to quack at runtime.

It definitely takes some getting used to. The flow that I have settled on is writing code with the types that I know, and then seeing if the compiler can work everything out from there. Usually I’ll have made a few boring mistakes (something can be nil where I didn’t expect, for example), and I’ll either be able to work out where the source of the confusing type is, or I can just add some annotations through the call stack. Doing this puts a line in the sand of where the types can vary, making it easy to see where the type mismatch is introduced.

The Crystal compiler error trace can be really daunting, since it spits out a huge trace of the entire call stack of where the argument is first passed to a function all the way to where it is used in a way it shouldn’t be. However once you learn to scroll a bit, it’s not any harder than debugging a NoMethodError in Ruby. At the top of the stack you’ve got the method call that doesn’t work, each layer of the stack is somewhere that the type is being inferred at.

This can get confusing as you get more layers of indirection—like the result of a method call from an argument being the wrong type to pass into a later function—but I don’t think this is any more confusing than the wrong-type failures that you can get in dynamic languages. Plus it’s happening before you even have to run the code.

A downside of Crystal’s type system is that the type inference is somewhat load-bearing. You can’t express the restrictions that the type system will make from omitting type annotations, the generics are not expressive enough. So very occasionally the answer to fixing a type error is to remove a type annotation and have the compiler work it out.

Standard Library

This is probably the thing that keeps me locked in to using Crystal. Since I’m reasonably familiar with the Ruby standard library, I was right at home using the Crystal standard library from day one. As well as being familiar, it’s also just really good.

Rust—by design I’m pretty sure—has a very limited standard library, so a lot of the common things that I’d want to do (HTTP client and server, data serialisation, for example) require third-party libraries. Since Crystal has a more “batteries included” standard library, it’s easier for my small projects to get off the ground without me having to find the right combinations of libraries to do everything I want.

API design is hard, and designing a language’s standard library is especially difficult, since you want to leave room for other applications or libraries to extend the existing functionality, or for the standard library types to work as an intermediary between multiple libraries that don’t have to be specifically integrated together. This is where I really appreciate the HTTP server and I/O APIs. The HTTP server in the standard library is really robust, but the HTTP::Handler abstraction means that you can fairly easily replace the server with another implementation, or libraries can provide their own handlers that plug into the existing HTTP::Server class.

The IO API is especially refreshing given how hard it is to read a file in Swift. It’s a great example of making the easy thing easy, but then making the more correct thing both not wildly different, or much harder.

# Reading a file as a String is so easy:
contents = File.read(path)
# do something with contents
# And doing the more correct thing is just one change away:
File.open(path) do |file|
  # stream the file in and do something with it
end

And then since all input and output use the same IO interface, it’s just as easy to read from a File as it is to read from a TCPSocket.

There is definitely a broader theme here; Crystal is designed with the understanding that getting developers to write 100% perfect code 100% of the time is not a good goal. You’re going to want to prototype and you’re going to want to hack, and if you’re forced to make your prototype fully production-ready from the get-go, you’ll just end up wasting time fighting with your tools.

Scaling

I wrote back in 20171 thinking about how well different languages scaled from being used for a small script to being used for a large application. At this point I was still hoping that Swift would become the perfect language that I hoped it could be, but over five years later that hasn’t quite happened.

The design of Crystal sadly almost guarantees that it cannot succeed in being used by large teams on a huge codebase. Monkey-patching, macros, a lack of isolated modules, and compile times make it a poor choice for more than a small team.

Although I remain hopeful that in 10 years developers will have realised that repeatedly writing out type annotations is a waste of time, and perhaps we’ll have some kind of hybrid approach. What about only type annotations for public methods—private methods are free game? Or enforce that with a pre-merge check, so that developers are free to hack around in the code as they’re making a feature, and then baton down their types when the code is production ready.

Flexibility

I’m of the opinion that no piece of syntax should be tied in to a specific type in the language. In Java, the only things that can be subscripted are arrays—despite everyone learning at university that you should always use List instead. This limits how much a new type can integrate into the language—everything in Java basically just ends up being a method call, even if an existing piece of syntax (like subscript, property access, operator, etc) would be neater.

Pretty much everything in Crystal is implemented as a special method:

struct MyType
  def [](key)
    ...
  end

  def property=(value)
    ...
  end
end

There’s no special types that have access to dedicated syntax (except maybe nil but that is somewhat special), so you can write a replacement for Array and have it look just like the builtin class. Being able to override operators and add methods to existing classes allows things like 4.hours + 5.minutes which will give you a Time::Span of 4:05. If you did this in Java2 you’d have something like this, which absolutely does not spark joy:

Duration.ofHours(4).plus(Duration.ofMinutes(5))

Safety

While Crystal’s type system is game-changing, it doesn’t break the status quo in other ways. It has no (im)mutability guarantees, and has no data ownership semantics. I think this is down the design goal of “Ruby, but fast and type checked”. Ruby has neither of those features, and so nor does Crystal.

An interesting thought is what would a future language look like if it tried to do what Crystal has done to type checking to data ownership. The state of the art in this area seems to be Rust and Pony, although it seems like these are not easy to use or understand (based on how many people ask about why the borrow checker is complaining on Stackoverflow). A hypothetical new language could have reference capabilities like Pony does, but have them be inferred from how the data is used.

Macros

Every language needs macros. Even Swift (on a rampage to add every language feature under the sun) is adding them. Being able to generate boring boilerplate means developers can spend less time writing boring boilerplate, and reduces the chance that a developer makes a mistake writing boring boilerplate because they were bored. If my compiled language can’t auto-implement serialisation in different formats (JSON, YAML, MessagePack) then what’s even the point of having a compiler?

It’s a shame that Crystal’s macros are a bit… weird. The macro language is not quite the full Crystal language, and you’re basically just generating text that is fed back into the compiler (rather than generating a syntax tree). Crystal macros are absolutely weak-sauce compared to macros in Lisp or Elixir—but those languages have the advantage of a more limited syntax (especially in the case of Lisp) which does make their job easier.

Crystal macros require a fairly good understanding of how to hack the type system to get what you want. I have often found that the naive approach to a macro would be completely impossible—or at least impractical—but if you flipped the approach (usually by leveraging macro hooks) you can leverage the flexible type system to produce working code.

The current macros are good enough to fit the use cases that I usually have, and further improvements would definitely be in the realm of “quality of life” or “academically interesting”. You can always just fall back to running an external program in your macro, which gives you the freedom to do whatever you want.

The Bottom Line

Back in my uni days there would be a new language each week that I was convinced was the future—notable entries include Clojure, Elixir, Haskell, Kotlin, and Go. There are aspects to all these languages that I still like, but each of them have some fairly significant drawback that keeps me from using them3. At the end of the day, when I create a new project it’s always in Crystal.

Other languages are interesting, but I’m yet to see something that will improve my experience working on my own small projects. Writing out interface definitions to appease a compiler sounds just as unappealing as having my program crash at runtime due to a simple mistake.

  1. I’d only dabbled in Crystal for less than a year at this point, and was yet to realise that it was the best language ever. 

  2. After researching for hours which library was the correct one to use. 

  3. Really slow edit/build/run cycle, process-oriented model gets in the way for simple projects, I just don’t think I’m a monad type of guy, experience developing outside of an IDE is bad, lacking basic language features. 


Interfaces of Spatial Photo Editing

How would you import, edit, and export photos using an AR/VR headset? I personally think there is a lot of potential for this to be an exceptional experience, far better than working on a laptop, especially in sub-optimal working conditions. I also think the jump from hand to face is a significant hurdle that you might not want to dive head-first into—I’ve relegated a lot of that to the footnotes.1

As with everyone else, I have been inundated with people’s thoughts on spatial computing. The assumptions that I’ve made here are largely based on information from:2

Let’s cast our mind into the not-too-far future, let’s say about five or six years from now. Spatial computing devices (AR/VR headsets) have gone through the rapid iteration and improvement that happened in the first years of smartphones, and we’ve arrived at a device that is more refined from the first generation. Probably smaller, more robust, and with enough battery life that you don’t really worry about it.

Interface

The interface would obviously depend on what idioms are established over the next few years. On the safe end of the spectrum would be something like the current touch-first interfaces present in the iOS Photos app and Photomator for iOS—a list of sliders that control adjustments, and a big preview image, all contained in a floating rectangle. You’d do some combination of looking at controls and tapping your fingers to make changes to the image.

An obvious problem with an eyes-as-pointer is that you usually want to look at the image while changing the slider, and a naive click-and-drag with your eyes would make this impossible. I’m sure that any sensible developer would realise this immediately, and work out a gesture-based interface where you can look at a control, grab the slider, and then move your hand to change it while your eyes are free to look elsewhere in the image.

Taking the interface one step further, the controls would probably escape the constraints of the app rectangle and become their own floating “window”, allowing you to hold your adjustments like an artist’s palette while your image floats separately in front of you. Sliders to represent adjustments might not even be necessary, each adjustment could just be a floating orb that you select and move your hand to adjust. There are definitely some touch interfaces that use the whole image as a control surface for a slider, and perhaps this will become the norm for spatial interfaces.

Or maybe we’ll go in a less abstract direction; the interface will resemble a sound-mixing board with rows and rows of physical-looking controls, that can be grabbed and moved.

The photo library interface has similar challenges. The safe choice is to simply present a grid of images in a floating rectangle, using the standard gestures for scrolling and selection. Something that I foresee finding frustrating is an insistence for everything to animate, with no alternative. Swapping quickly between two photos to see differences and select the better shot is a common operation, and is made much less useful when there is an animation between the two (this is something I appreciated moving from editing on an iPad to a Mac).

A floating rectangle would get the job done, but doesn’t take advantage of the near-infinite canvas available in your virtual world. Could you grab photos from the grid and keep them floating in space to deal with later—like living directly inside your desktop folder? This will really depend on what the idioms end up being for data management. Perhaps the standard for grouping related objects will be stacks that stay spatially consistent, floating wherever you left them last.

Spatial consistency is obviously very easy to understand, since that’s how the real world works3, but when you start adding more and more data, the limitations of space become more apparent. What I don’t want to happen is the flexibility of the digital world is restricted in order to match the limitations of the real world. In the real world an object can’t exist in two places at once, but in the digital world it can be really useful to eschew this and allow putting photos in multiple albums, or creating different views over the same data.

Data Management

I spend a lot of time working out how to get photos from my camera, into my computer, and then back out of my computer. At this point I’ve got fairly good at it. For new spatial computing devices, I think the data management story will be far closer to my experience editing on my iPad than editing on my Mac. Let’s work through it, step by step.

Getting photos from the camera. In the future, I think photographers will still be taking photos on dedicated cameras. The difference in potential quality and flexibility is just down to physics, having a bigger device to hold a bigger sensor and a bigger lens just gives you better photos4. As much as cameras get better each year, the best way to get the photos from them is still by reading the SD card. Wirelessly transferring photos is slow, painful, and tedious.

My Sony a6500 (which still commonly sells for AU$1,200) which was announced in 2016, has USB 2 (over micro USB) for wired transfers and 802.11n WiFi for wireless. The a6600, which was released in late 2019 has the same connectivity. I don’t foresee wired or wireless transfer eclipsing the convenience of reading the SD card directly for the type of cameras that I buy.5

Maybe your headset will support a dongle, but I am not optimistic. Instead you’ll probably do that little dance of connecting to the camera’s wifi network, and then using some half-baked app you can import the photos. It’s not really clear to me what “background processing” might look like in a headset. If you’ve got 10GB of photos to import, do you need to keep the headset on while its transferring (the same way you’ve got to keep an iPad’s screen on), or can you take it off and have it do work in the background?

Once the photos are on the device you can do the actual fun part of editing them. I assume apps like Photomator will be able to hook into the system photo library just like they do on iOS. Although if you want to do more complicated things that require multiple apps to work together (like stitch a panorama or blend parts of multiple images into one), you’re probably going to have to jump through similar hoops as you do on iOS. The OS might support dragging and dropping images at the flick of your eye, but if the image is silently converted from raw to jpeg in the process, it’s not very useful.

Hand and eye tracking might make the level of precision control more akin to a mouse or trackpad rather than a touchscreen, which could allow apps like Pixelmator Pro to bring their more complicated interface into the headset, but a lack of a wider ecosystem of professional-level tools (and OS features to make data-heavy workflows possible) might cause first movers to shy away.

Once you’ve edited your photos, you can probably share them directly in the headset to friends, social media, or via something like AirDrop to your phone.

Then comes the real scary question: can you reliably back up your data without bring locked in to a single cloud storage provider? Again I see this as being more like an iPad than a Mac, backing up to dedicated photo storage services will be relatively easy, but if you want to backup to something you own, or handle storage off-device (on external drives, etc6) you’re probably out of luck.

Even if you choose to back everything up to a cloud service, you’ll have to make sure that the headset is powered on for long enough for the data to transfer. In my neck of the woods, the internet upload speed I can get at a practical cost is 20Mb/s upload. Perhaps in five years this will have doubled and I’ll have 40Mb/s upload. That’s 5MB/s, so about 5 seconds for a 24MP image, which is about 2 hours to upload all the 1,300 photos from my trip to NZ earlier this year, assuming that the cloud provider can receive the photos that fast, and no one else is using the internet connection. It’s not terrible, but definitely something I’d want to be able to happen in the background while the device isn’t on my face.

Workflow

Let’s imagine that all these problems have been solved (or were never a problem to begin with), how would I see myself using this as my primary photo-editing machine?

Usually I edit photos on my laptop on the couch. I could replace the somewhat small 13” screen with an absolutely huge HDR screen, without even having a desk. The photos could be surrounded by a pure black void, so I could focus entirely on the image, or I could become self-important and place them in a virtual art gallery. Or in the middle of the two, I could edit them in pure black and then see which one would look best framed on my wall.

I’m not sure how I would show my in-progress edits to people, ideally something like my TV could be a bridge between the real and virtual worlds, allowing me to present a single window to it. This would probably work with my TV on my home network, but if I’m at someone else’s house I doubt this would be possible across different platforms—given how fragmented doing this sort of thing is currently. What would probably end up happening is me exporting in-progress photos to my phone and using that to show people, and hopefully remembering to delete them later.7

When I go on a trip I’ll usually bring my laptop so I can sort through my photos and edit some to get some gratification sooner, rather than waiting until I get back home. A headset could be a significant improvement as an on-the-go photo editor: at the very least it’ll be smaller and lighter than my laptop so it’ll take up less of the carry-on allowance and space in my bag8.

Usually my laptop would be left wherever I’m staying, since I can’t realistically use it in bright sunlight or in a vehicle. But a headset could be used in these scenarios, so on the way back from an adventure I could plug myself into the virtual world and edit photos from the back seat of the car or plane or whatever, without having any glare on the screen or getting any debris in the keyboard.

You wouldn’t use your laptop in the back seat of a car going down a windy dirt track from a ski field, but you could totally put on a headset and edit your photos through the car window.


The bottom line for me is that this type of device could be significant jump from what we have now, decoupling the physical limitation of device size from screen size and quality. Most of the hesitation I have is from a practicality perspective; can this be used for the way I work, or do I have to change what I’m doing to suit it?

Obviously the elephant in the room is the social aspect. People have been looking at markings on things ever since the first cave-dwellers realised that you can make a mark on a rock with a stick. Things have progressed slightly since then, but at its core a book or newspaper isn’t that different to a phone or tablet. They’re held in your hand, and you look at them with your eyes. The jump from hand to face is not something I think should be taken lightly.9

  1. These ones! They’re like little extra treats at the bottom of (or hidden within) each post. 

  2. I also read this hilariously negative post on Wired which doesn’t add much new information, but is a fun read. 

  3. Apart from my AirPods, they seem to just disappear and reappear around my apartment and in my bags without me doing anything. 

  4. Maybe in 5 years we’ll all be taking spatial 3D photos, but until we’re all spending all our time in augmented reality, having photos that can be printed at high quality or viewed on a traditional screen will still be common. 

  5. The Sony a7 line of full-frame cameras have had USB 3 and 802.11ac for a few generations now, but they also cost well over twice as much, and I’d guess that most people that use them still read from the SD card directly. 

  6. Not all data needs to be in an off-site backup, and the detritus of shots that didn’t work out is a good example of something that should be backed up but doesn’t require the same level of redundancy as high-quality edited photos. 

  7. A good gauge on how finicky this can be is to imagine you’re in a holiday house and you want to show something on the TV. The only reliable thing to do is bring an HDMI cable and appropriate dongles, and plug in directly. There is no equivalent in the wireless realm yet. 

  8. Well, it’ll take up a different space in the bag, the laptop is a convenient shape for putting in backpacks, a headset less so. Perhaps this means I need a new bag? 

  9. I haven’t really shared my thoughts about this too much, but my general gist is that I think in order to avoid the chances of descending into a cyberpunk hellscape, bringing technology closer to our senses should be done hesitantly. It’s already difficult to exist in society without a smartphone, and using a smartphone makes your activities in the real world more accessible to the online world. Augmenting your vision is allowing software to control your primary way of experiencing the world, and I don’t think I ever want that to be necessary to operate in society.10 

  10. This is obviously not something that will happen in the foreseeable future, but examples that comes to mind is shops removing price tags “since it’s visible in AR anyway”, bus stops and other public markings are removed or outdated as the source of truth is in AR. Once AR is ubiquitous enough to basically be required, then your visual experience in the physical world can be used as advertising space. 


DJI Mini 3 Pro

Most camera reviews are pretty decent when it comes to photo and video quality (although for the type of cameras I buy, photo quality is usually an afterthought1). The thing that seems to be left out is the annoying nits and limitations that you become aware of after using something for a while. I just upgraded from the DJI Mini 2 to the DJI Mini 3 Pro and oh boy do I have nits to share.

A shot of surfers floating over underwater rocks

Shot on the Mini 3 Pro, this one had plenty of room to play with the colours.

Although I would be remiss to omit the difference in image quality. The Mini 3 Pro (which I will just call the “Mini 3” from here to save on typing) is substantially better than the Mini 2. The Mini 2 has a 6.3x4.7mm sensor with a fixed f2.8 aperture, whereas the Mini 3 has a 9.7x7.3mm sensor at f1.7. The much larger sensor (71mm2 versus 30mm2) and larger aperture mean the Mini 3 can shoot with a lower ISO and at a faster shutter speed.

Looking at some of the photos I’ve taken at the beach at the same time of day, the Mini 2 shot at 1/1250s and the Mini 3 shot at 1/2500s. I’ve seriously considered whether I should get an ND filter to let me slow the shutter speed down—that’s just how fast it shoots. This gives me a lot of confidence that I can keep the ISO really low during sunset, sunrise, or on an overcast day (since flying at night in Australia is a no-no).

I don’t really understand enough about cameras to accurately explain why you can edit the Mini 3 photos more. There’s more dynamic range? More bits? Deeper bits? Something for me to understand another time. The end result is that editing the Mini 2 photos feels like trying to sculpt almost-dry clay. You can’t really make substantial changes, and if you try too hard you’ll end up breaking something. On the other end of the spectrum is raw files from the a6500, which can be edited like modelling clay. The Mini 3 isn’t nearly as flexible as that, but it’s substantially better than the 2. I can actually recover some of the highlights, and I don’t have to discard photos because the sky is completely blown out.

A photo of an ocean pool with a wave breaking towards it

Shot on the Mini 2, it’s challenging to get the exposure of the waves right without making the whole image too dark. This is easier in the Mini 3, but still easy to get wrong.

On the Mini 2 I would shoot everything with AEB, so I’d get three photos at different exposures. If I messed the exposure up I could use the brighter or darker exposed version (or both). I have been using this with the Mini 3 on occasion, but have found that the range of a single shot is good enough. The trick seems to be to underexpose quite significantly (I’ve been shooting at between -1EV to -2EV during bright days), which retains highlight detail while still keeping good detail in the shadows.

I haven’t used the 48MP mode enough to have a good feeling of when it makes sense to use it. It would be left on all the time if it didn’t take significantly longer to capture a photo compared to the single-shot mode (since low-light isn’t really possible in Australia). The photos do have an impressive amount of detail, so my thinking is that I’ll only turn it on when I’m taking a shot that could look good printed out really big, or needs to be cropped significantly.

The #1 reason to upgrade to the Mini 3 is actually the improved experience of capturing a top-down panorama. Let me explain.

If you don’t want to capture something from really high up (since the legal limit is 120m) you can capture multiple top-down shots and stitch them together, creating the illusion of taking a photo from a much higher altitude. To do this you point the camera straight down, take a photo, fly the drone forwards, take another photo, and repeat in a grid.

When the drone is stationary, the camera points directly down. When it flies forward, the drone body tilts forward and the camera gimbal tilts upwards to compensate—keeping it pointed down. When the drone stops (especially if it stops suddenly) the body tilts backwards to counteract its forward momentum. The gimbal tries to keep the camera pointed down, but the gimbal on the Mini 2 doesn’t rotate far enough to compensate for the backwards angle of the drone. The result is that the camera view appears to be “kicked” upwards whenever the drone comes to a stop, and you have to adjust it back down before taking each shot.

This could be worked around in software in the Mini 2—once the braking move is complete, readjust the gimbal to be in the position it was while the drone was in motion. Or you could just re-engineer the next version of the drone to allow the gimbal to rotate far further backwards, allowing it to stay pointing downwards no matter what the drone is doing.

A tasteful topdown drone shot from Newport beach

Shot on the Mini 3 Pro, it’s not too hard to play with the colour from the raw image, and the detail of the footprints in the sand is impressive.

In reality the best thing about the Mini 3 is having true vertical shooting. I don’t know if I’m just a slave to the Instagram 4:5 format or if there’s some other explanation, but I love shooting vertically on the Mini 3. The perspective of a linear feature stretching off into the distance is probably my second favourite drone angle after a top-down shot. With the Mini 2 I would have to throw away almost half of the pixels to get this perspective, but now I can get it for free.

The Rakaia river stretches off into the distance in a low-quality Mini 2 photo

Shot on Mini 2, getting a vertical composition means cropping out the left and right sides of the image.

The next nit is capture speed. The Mini 2 would pause for what seemed like an eternity while it took a photo, and the entire interface would be locked out—including the video feed. The Mini 3 is substantially better, the video feed only drops for a moment and you’re able to see where you’re going while it saves the photo. This is probably the biggest quality-of-life issue I have with the Mini 3, I’d like to be able to just fly around and take photos without having to pause while it reads all the pixels off the sensor.

It hasn’t been too windy around here so I haven’t been able to see how wind-hardy the Mini 3 is, but the Mini 2 would rarely get too bothered by the wind. It would occasionally complain that there was too much wind for it to get home automatically, but you could always keep going and just hope you could fly it home yourself. Probably the most stressed I’ve been while flying a drone was after taking this picture (below), doing another flight further up the valley on the same battery and flying downwind, and then having to fly back against the wind above a huge drop-off with the battery close to running out. Don’t do that, it’s a bad idea. Although the Mini 2 handled the wind like a champ, and I am assured that the Mini 3 is even more capable.

The view down Otira gorge

Shot on Mini 2, you don’t want your battery to run out while over something like this.

My impression of the controller from watching reviews is that the DJI RC—the one with the built-in screen—is completely life-changing and revolutionises the drone-flying experience. I wasn’t convinced, and I only got it because there was a good second-hand price.

I remain unconvinced after using it.

The controller is really good, it’s well-built and comfortable to hold, the buttons are positioned well, and the screen is bright enough to see in sunlight. The shutter button has a two-step press which allows you to focus before you shoot, there are more customisable buttons than the standard RC-N1 which lets you control the portrait-mode camera and suchlike. These are mostly minor benefits, and not something I’d recommend most people spend the extra money on.

The most common claim that reviewers make is that it’s much faster and more convenient to not need a phone to use the drone. Naturally I conducted a test; I timed how long it took for me to setup the Mini 2 and Mini 3. Starting out with the drone and controller out of my bag, I had to unfurl the drone arms, remove the gimbal cover, power the drone and controller on, and connect the phone. The test ended when the drone sent a video signal back to the controller. It took 45 seconds for me to setup the Mini 3, and a minute to setup the Mini 2 with a phone.

That’s 15 seconds of additional fiddling, which is annoying. But it’s only 15 seconds. In both of these tests I was just setting up the drone at my normal pace, I wasn’t trying to be particularly quick—I could probably make both a little faster by working out which thing to turn on first to ensure the controller is booted up and the drone has a GPS connection as fast as possible.

The biggest disadvantage of the DJI RC is that it is much more delicate than the standard controller. The standard controller is built like a robust game controller, it’s hefty and doesn’t have anything that can break easily. When travelling I keep it right in the bottom of my bag alongside the drone batteries. You could drop it on concrete and it would almost certainly be fine. The DJI RC on the other hand has a large screen right on the front. I keep it in a little microfibre bag in case the drone scratches it. I don’t know what I’ll do when I travel with it; maybe I can put it in the bottom of my bag with the screen facing upwards? (It’s a Peak Design Everyday Zip so it’s very well padded anyway)

Oh and the thumb sticks! Both controllers have thumb sticks that screw into the controller, but what no one told me was that the screw threading is much worse on the DJI RC. On the standard controller, the thumb stick is the bolt and the controller has holes. The thread is quite large, and it’s easy to screw them in even if your hands are cold. On the DJI RC, the hole is on the bottom of the thumb stick itself, and the thread is tiny which makes it much more fiddly to put on. I’m now seriously considering whether I could get some custom low-profile thumb sticks made that can stay on the controller permanently.

The Mini 3 Pro is an impressive jump from the Mini 2, which brings my drone photos much closer in quality to photos from my real camera. You can see both types of photos on my photos website, Pixelfed, or Instagram.

  1. The Sony a6X00 line of cameras are incredibly popular for video, and so it was really hard to find useful information comparing photos when I upgraded from the a6000 to the a6500. 


pod, the container manager

I’ve been working on a project to make development using containers easier (specifically Podman), to remove dependency conflicts, and make it easier to run applications on other servers.

The project is called pod, you can learn more at pod.willhbr.net or willhbr/pod on GitHub. It’s a wrapper around the podman command-line tool, with the aim of reducing the amount of boilerplate commands you have to type.

Local versions of both this website and my photos website have been using pod for a while. This has made it really easy to run a server while I’ve been making changes, as well as allowing me to easily daemonise the server and have it continue to run in the background.

At its core, pod is a YAML file that configures the arguments to a Podman command. Most commands will map one-to-one. The simplest example is something like:

# pods.yaml
containers:
  alpine-shell:
    name: pod-alpine-example
    image: docker.io/library/alpine:latest
    interactive: yes
    args:
      - sh

This defines an interactive container that runs an Alpine Linux shell. You can start it with pod run.

Where pod really shines is configuring a setup for a development server, and a production server. As I talked about in my previous blog post, having a development container that mounts the source code from the host machine speeds up development massively. The server behind my photo publishing system follows this pattern, with this config:

defaults:
  build: dev
  run: dev
  update: prod

images:
  dev:
    tag: pixelfed-piper:dev-latest
    from: Containerfile.dev
  prod:
    tag: pixelfed-piper:prod-latest
    from: Containerfile.prod

flags: &default-flags
  feed_url: https://pixelfed.nz/users/willhbr.atom
  repo_url: git@github.com:willhbr/sturdy-guacamole.git

containers:
  dev:
    name: pixelfed-piper-dev
    image: pixelfed-piper:dev-latest
    interactive: true
    autoremove: true
    bind_mounts:
      src: /src/src
    ports:
      4201: 80
    flags:
      <<: *default-flags

  prod:
    name: pixelfed-piper
    image: pixelfed-piper:prod-latest
    interactive: false
    ports:
      4200: 80
    flags:
      <<: *default-flags
      check_period: 54m

When I’m ready to deploy a change, I can build a production image with pod build prod—which will make a release Crystal build—and then start a container on a server using that image.

pod second half is a simple updating system. It will look at the containers running on your server, match their config against the config in pods.yaml, and update any containers that have changed. So instead of having to stop and start the prod container myself, I can just run:

$ pod update --diff prod

Which will show the difference and then update the running containers to match the intent. pod fully supports podman-remote, so it can handle containers running on a different machine just as easily as it can handle those running locally.

I’m super happy with what pod is able to do, and plan on using it to manage building and running any container I use. You can find it on GitHub, the project website, or read my previous post explaining some more of the backstory


Overcoming a Fear of Containerisation

I was first introduced to containers at a Docker workshop during my first software engineering internship. The idea was enticing; the ability to package up your applications configuration in a standard way, and run that on a server without having to first to through manually installing dependencies and adjusting configuration. This was while I was still deep in Ruby on Rails development, so setting up servers with things like Puma and Unicorn were all too familiar.

However I never really managed to live the containerised dream. The Docker CLI was clunky (you’ve either got to write out the tag for your image every time, or copy-paste the image ID), and I couldn’t find much information on how to deploy a Rails application using docker without going all the way to Kubernetes.

There were tonnes of blog posts that described how to use containers for development, but then said that you should just deploy the old-fashioned way—this was no good! What’s the point of using containers if you still have to toil away compiling extensions into nginx?

Another questionable practice I saw was people using one Dockerfile for development and one for production. To me this seemed to be against the whole point of Docker—your development environment is supposed to match production, having two different configs defeats the whole purpose.

Although if we fast forward to sometime earlier this year, when I decided to have a look at Podman and understood more about the tradeoffs with designing a good Containerfile. What I realised was that having one Containerfile is a non-goal. You don’t need to have your development environment match production perfectly. In fact you want to have things like debug symbols, live reloading, and error pages so the two are never going to be the same anyway.

I shifted my mind from “one config that deploys everywhere” to “multiple configs that deploy anywhere”. Instead of having one Containerfile I’d have multiple, but be able to run any of them in any context. If there’s a problem only appearing in the “production” image, then you should be able to run a container from that image locally and reproduce the issue. It might not be as nice of a development experience, but it’ll work.


So then we get really deep into the land of designing effective Containerfiles. Let me take you on a journey.

We’ll start out with a simple Ruby program:

# main.rb
puts "I'm a simple program"

And we’ll make a fully productionised1 Containerfile for it:

FROM ruby:latest
WORKDIR /src
COPY Gemfile .
RUN bundle install
COPY main.rb .
ENTRYPOINT ["ruby", "main.rb"]

Our development iteration then goes something like:

  1. Make a change to main.rb
  2. Build a new image: podman build -t my-image .
  3. Run the image: podman run -it --rm test:latest
  4. Observe the results, and go back to 1

Building the image takes a few seconds, and that’s without any dependencies and only one source file. If we’re not careful about the ordering of our commands in the containerfile, we can end up with a really slow build. And we have to do that every time we want to run the container! We’ve just taken the fast iteration of an interpreted language and made it as slow as a compiled one.

Thes is the point that I had previously lost interest in containers, it seemed like a very robust way to slow down development for the sake of uniformity. However, if we allow ourselves to have multiple images we can significantly improve our iteration speed.

The key is to use the development image as a bag to hold all of our dependencies, but not our source code. It has all the pieces the application needs to run (a compiler/interpreter and all our libraries) but none of the source code.

We then use a bind mount to mount the source code to the container when we run it—which stops us having to re-build the image every time we make a change to our source files. Development looks something like this now:

  1. Make a change to main.rb
  2. Run the development image:2
    podman run --mount=type=bind,src=.,dst=/src -it --rm test:latest
    
  3. Observe results

Starting a container has very little time difference from starting a new process, so by omitting the build step we’re working at the same speed as if Ruby was running directly on the host. We only need to do the slow re-build if our dependencies change.

When it comes time to deploy our amazing script, we can use a more naive containerfile that copies the source code into the image—build time doesn’t matter nearly as much here.

Since I’m writing in Crystal most of the time, I’ve ended up with a Crystal containerfile that I’m pretty happy with:

FROM docker.io/crystallang/crystal:latest-alpine
WORKDIR /src
COPY shard.yml .
RUN shards install
ENTRYPOINT ["shards", "run", "--error-trace", "--"]

This installs the dependencies into the image, and sets the entrypoint so that arguments will be passed through to our program, instead of being interpreted by shards. The source files are mounted into the container in the same way as with the Ruby example.

I noticed that builds were always a little slower than I would expect, and remembered that Crystal caches some build artefacts, which would get thrown away when the container exited. So I mounted ~/.cache/crystal in the container to a folder, so that it would be persisted across invocations of the container. Doing this sped up the builds to be in line with running the compiler directly.

This frees me up to have a fairly involved “production” containerfile, optimising for a small final image:

FROM docker.io/crystallang/crystal:latest-alpine AS builder
WORKDIR /src
COPY shard.yml .
RUN shards install
COPY src ./src
RUN shards build --error-trace --release --progress --static

FROM docker.io/alpine:latest
COPY --from=builder /src/bin/my-project /bin/my-project
ENTRYPOINT ["/bin/my-project"]

Living the multi-image lifestyle has meant that I can use containers to run any one of my projects (including when I run this website locally to make changes) in the same way without having a major development experience impact.

Although these commands are quite long, and I can’t type that fast or remember all those flags. So I made a command-line tool that makes dealing with multiple images or containers easier. That’s actually what I have been using to do my development, and to run projects on my home server. You can read more about it:

The tl;dr is that with some fairly simple config, I can run any project with just:

$ pod run

Which runs a container with all the right options, even simpler than using shards run.

  1. My productionised Containerfile might not match your standards of productionisation. This is for tinkering on in my free time, so corners are cut. 

  2. Wow that’s a long command, if only we didn’t have to type that every time! 


Photomator for MacOS

Since moving my photo editing to MacOS just over two years ago, I have been using Pixelmator Pro as my photo editor of choice. The move from Pixelmator Photo was easy—the editing controls are the same, and I’m generally familiar with layer-based image editors from a misspent youth using the GIMP.

However, the workflow using Pixelmator Pro in Apple Photos was not ideal—it works as a photo editor extension, so you need to first enter the standard edit mode in Photos, and then open the Pixelmator Pro extension. Once you’re done with your edits you need to first save in Pixelmator and once again in Apple Photos. While this is no means a dealbreaker—I’ve been doing this for over two years—it is clunky. On the occasion I deviate from landscape photography and take photos of people, I typically have many photos that require a little bit of editing, rather than a few photos that require a lot of editing. This is where the Pixelmator Pro workflow really falls down.

So of course Photomator for MacOS is the natural solution to my photo-editing problems. It’s been out for just over a week now, and I’ve been using the beta for a few weeks before the release.

Just like its iOS counterpart, Photomator provides its own view into your photo library, along with the familiar editing interface that is shared with Pixelmator Pro. The key improvement is that you can immediately jump from the library into editing a photo with just a single keypress, since there are no Photos extension limitations at play here. I’d say this saves a good 5 seconds of waiting and clicking on menus per image. It also makes me more likely to try out editing a photo to see what an adjustment looks like, since I don’t have to navigate through any sub-menus to get there.

Previously

My workflow with Pixelmator Pro was fairly simple—I’d import photos into Photos, creating an album for each photo “excursion” I went on. I would flick through the album a few times, favouriting the ones that stood out (in Photos pressing “.” will toggle favourite on a photo).

I’d then switch over to the Favourites view, and on each photo I’d click open the edit view, open “Edit with” > “Pixelmator Pro”, and then actually do the editing. After editing I click “Done” in the Pixelmator extension, and “Done” in the Photos edit interface.

Since the extension is full Pixelmator Pro, you have full layer control and the ability to import other images. So if I’m stacking photos, I would just add a new layer using the Pixelmator photo picker. This is the quickest way of editing multiple photos as layers while staying inside the Photos system (ie having your edits referenced to a photo in the photo library).

If I need to create a panorama or stack stars for astrophotography, I’d export the originals to the filesystem and import them into Panorama Stitcher or Starry Landscape Stacker1, and then re-import the result.

Currently

With Photomator, this workflow hasn’t changed that drastically. The main difference is that I don’t have to do multiple clicks to get to the Pixelmator Pro editing interface.

I start out the same way by importing from an SD card into Photos (I could do this in Photomator, but I don’t see a benefit currently). In the album of imported photos I flick through my photos, favouriting the ones worth editing. This is still done in Photos as Photomator has a noticeable (about 400ms) delay between showing a photo and rendering an appropriate-quality version. This is distracting if you’re trying to go quickly, so I stick to doing this part in Photos.

Next I go through the favourites in Photomator (the delay doesn’t matter here as every photo is worth an edit) and apply basic edits. If something requires adjustments that Photomator doesn’t support (basically anything with multiple image layers, like an exposure bracket, or other multi-photo blend) then I’ll go back to Photos and open the Pixelmator Pro extension to make the changes.

With time, I’m sure the shortcomings in Photomator will be patched up, and I’ll be able to simplify my workflow.

Ideally I would import straight into Photomator—perhaps through a Shortcut or other piece of automation to filter out unwanted JPEGs2—and then triage the photos in the Photomator interface. I could then work through my edit-worthy photos, applying quick adjustments and crops right there.

Anything that requires more tweaks, could be seamlessly opened in Pixelmator Pro with a reference back to the original image in Photos. When I save the edit in Pixelmator Pro, the original image should be modified with my edits. If I re-open the image in Photomator, it should know that it was edited in Pixelmator Pro and use that as the editing interface.

I could use Photomator full-time without smart albums, but they are such a powerful feature in Photos for keeping things organised that I would almost certainly go back to Photos to use them. A quick search seems to show that NSPredicate supports arbitrary expressions, so it doesn’t seem like there’s an API limitation that prevents Photomator from doing this.

We’re definitely still in early days of Photomator on MacOS. I’ve had a few crashes (no data loss, thankfully), and there are a few odd behaviours or edge cases that need to be tidied (most annoying is that images exported and shared via Airdrop lose their metadata). The team is responsive to feedback and support email, so I’m confident that this feedback is heard.

So after using the beta for a few weeks, I ended up buying the lifetime unlock (discounted) as soon as the first public release was out. I have edited thousands of photos in Pixelmator Pro on MacOS and Pixelmator Photo on iPadOS, and am quite happy to pay for a more convenient and focussed version of the tool that I’m most familiar with.

Photomator would be my recommendation to anyone that is wanting something more powerful than the built-in editing tools in Photos, as long as they’re not likely to need the layer-based editing currently only offered by Pixelmator Pro. Although the pricing is a bit weird, Photomator is about the same cost in yearly subscription as Pixelmator Pro is to buy. I wouldn’t be surprised if Pixelmator Pro becomes a subscription soon.


A side note that I can’t fit elsewhere: Nick Heer (Pixel Envy) wrote:

For example, the Repair tool is shown to fully remove a foreground silhouette covering about a quarter of the image area. On one image, I was able to easily and seamlessly remove a sign and some bollards from the side of the road. But, in another, the edge of a parked car was always patched with grass instead of the sidewalk and kerb edge.

The Pixelmator folks definitely lean into the ML-powered tools for marketing, but I generally agree with Nick that they don’t work as flawlessly as advertised—at least without some additional effort. You can get very good results by combining the repair and clone tools to guide it to what you want, but expecting to be able to seamlessly remove large objects is unrealistic.

I also found the machine learning-powered cropping tool produced lacklustre results, and the automatic straightening feature only worked well about a quarter of the time. But, as these are merely suggestions, it makes for an effectively no-lose situation: if the automatic repair or cropping works perfectly, it means less work; if neither are effective, you have wasted only a few seconds before proceeding manually.

This is the real key, the ML tools can be a great starting point. They have a very low cost to try (they typically take less than a second to compute on my M1 MacBook), and they’re easy to adjust or revert if they do the wrong thing. Almost all edits that I make start with an ML adjustment to correct the exposure and white balance—and it often does a decent job.

  1. Starry Landscape Stacker does what it says on the tin, but it is also an example of software that requires some significant UX consideration before anyone would enjoy using it. 

  2. Consumer-level DJI drones can shoot in RAW, but can’t only shoot in raw. They will shoot raw+JPEG, so you’re forced to fill your SD card with JPEGs just to delete them before importing the raw files to your computer. 


Complicated Solutions to Photo Publishing

As previously discussed there have been some challenges keeping the photos on my website up-to-date. The key constraint here is my insistence on using Jekyll for the website (rather than something with a web-based CMS) and wanting somewhat-efficient photo compression (serving 20MB photos is frowned upon). Obviously I considered writing my own CMS for Jekyll with a web interface that I could access from my phone—this seemed like the natural thing to do—but I quickly realised this would spiral into a huge amount of work.

My intermediate idea was absolutely brilliant—but not very practical, which is why it’s the intermediate idea. The key problem that I had before was that Shortcuts is cursed and every second I spend dragging actions around takes days off my life expectancy due to the increase in stress. The resizing and recompression would have to happen on a Real Computer™. Thankfully I have a few of those.

Something I didn’t mention in my previous blog post was that there was another bug in shortcuts that made this whole situation more frustrating. The “Convert Image” action would convert an image to the output format (eg JPEG) but it would also resize the output file to be about 480px wide. This was really finally broke my will and made me give up on Shortcuts. If I can’t trust any of the actions to do what they say and instead have to manually verify that they’re doing what they say after every software update… I might as well just do the actions myself.

Speaking of OS updates breaking how images are handled: MacOS 13.3.1 broke rendering of DJI raw files, which stopped me from editing drone photos. This is thankfully now fixed in 13.4.

So the main challenge was how to get the images from my phone to a real computer that could do the conversion and resizing. My brilliant solution was to use a git repository. Not the existing GitHub Pages repo, a second secret repository!

On my phone I would commit a photo—at full resolution—to the secret repo and write the caption and other metadata in the commit message. This would be pushed to a repo hosted on one of my computers, and a post-receive hook would convert the images, add them to the actual git repo, and write out a markdown file with the caption. Truly the epitome of reliability.

Thankfully, I never actually used this house of cards. I ended up signing up to Pixelfed (you can follow me!), which has a decent app for uploading photos with a caption. Being a good web citizen, Pixelfed publishes an RSS feed for user posts. So all I have to do is read the feed, download the images, copy it over to my website, and publish them.

Naturally the program is written in Crystal (and naturally I came across a serious bug in the stdlib XML parser). It checks if there are any posts in Pixelfed that aren’t already on the website, downloads the photos, uses the ImageMagick CLI (which I think can do just about anything) to resize, strip metadata, and re-encode them, and then commits those to a checkout of the GH pages repository.

This was running via cron on my home server for a while, but I’ve recently containerised it for a bit of additional portability. It does still need access to my SSH keys so it can push the repo as me, since that was just much easier than working out the right incantations to get GitHub to give me a key just for writing to this one repo.

The biggest drawback of this solution is that images on Pixelfed (or at least pixelfed.nz, the instance that I’m on) are only 1024px wide, which is just a bit narrower than a normal sized iPhone screen so the images don’t look amazing.

To be honest, now that I’ve gone through all this effort and have a container running a web server at all times… I might as well just make an endpoint that accepts an image and commits it to the repo for me.

Shortcuts can somewhat reliably send HTTP requests, so it’s just a matter of base64-ing the image (so you don’t have to deal with HTML form formats and whatnot), making a cursed multi-megabyte JSON request, and have the server run the exact same resizing logic on the image it receives.

So now if you look at my photo website you should see some recent photos are a bit higher quality now:

A drone photo of the Sydney city skyline

You might even be able to zoom in and spot me somewhere in that photo!


DJI DNG Rendering Broken on Ventura

As previously mentioned I use my M1 MacBook Air to edit photos, which I post on my website and on Instagram. This past weekend I went to the beach and flew my drone (a wee DJI Mini 2) around, and got some nice pictures of the rocks, sand and the stunningly clear water.

Well, I thought they were good until I got home and looked at them on my laptop—every one of them had a horrible grid of coloured blobs overlaid on it, which made them basically unsalvageable. This is not something I’d come across before with my drone, so it was definitely a surprise. Naturally I started debugging.

A photo from my DJI Mini 2 with coloured splotches over it in a grid

My first thought was that it was glare from the polarising filter that I usually have attached, however it was present on all the photos—not just the ones facing towards the sun. Nevertheless I powered the drone up and took some shots without the polariser on. My next thought was that there was a bad software update, and that another software update would fix the issue. There was an update available so I applied that, and took a few more test shots.

When I had the images loaded onto my laptop I could see that photos without the polariser and even with the software update still had the issue. JPEGs were unaffected, so this is just a raw image problem. Very strange. Thankfully I have plenty of other images from my drone in similar situations, so I can compare and see if maybe I was missing something. There aren’t any issues with any of my old photos, but then I remember that Photos is probably caching a decoded preview, rather than reading the raw file every time. So that means if I export the DNG file and try to preview it, it should fail.

Gotcha! It’s a bug in MacOS! If I export any raw file from my drone and preview it on Ventura, it renders with terrible RGB splotches in a grid all over it. The silver lining is that the photos I took at the beach are still intact—I just can’t do anything with them right now.

I wondered if other DNG files have the same issue, so I took a photo with Halide on my iPhone and downloaded a DJI Mini 3 Pro sample image from DPReview. The iPhone photo rendered fine, and the Mini 3 photo was even more broken than my photos:

Sample photo from Mini 3 Pro with brightly coloured vertical lines all the way across the image

Naturally the next thing to do is to try and work out how widespread the issue is while chatting with support to see if they can tell me when it might be fixed. I only managed to work out that my old laptop (Big Sur) has no issues, and that there are some forum posts if you already know what to search for (“DNG broken MacOS” didn’t get me many relevant results at the start of this escapade).

Support says that I just need to wait until there’s a software update that fixes it. So no drone photos until then.

Update: MacOS 13.4 appears to have fixed this issue.