Every so often, I get pulled into a discussion about how do you identify a great systems programmer. Mostly because I hang out with other systems programmers and we’re evaluating a candidate for a job opening. And then the usual discussion about interview questions and projects emerge. In fact there is a quora question that I answered on the topic. The discussion usually devolves into a discussion of the ability to understand things like hardware software interface, kernel internals, asynchronous behavior etc.
Just recently, at a meeting in Juniper, it struck me that we never talk about the truly rare skill of systems architecture. And more importantly, how do you find and recognize that skill?
So what is systems architecture?
Systems architecture is the ability to understand the abstract system architecture of the problem, understand what kind of hardware options exist and then define a software architecture that is able to exploit the hardware in ways that add tremendous business value.
System architecture is what takes something on the left of the picture and then defines something on the right. `
Mouthful ain’t it?
So let’s break this up a little bit.
Abstract system architecture
When you consider a system of any kind, there exists a way to describe that system that is decoupled from any implementation yet at the same time is readily recognized by experts in the art.
Let’s consider something like a file system. A trivial file system has a block virtualization layer that maps logical blocks to physical blocks, a way to organize virtual disk blocks into containers and these containers can then be organized in various interesting ways involving different hierarchies, a mechanism for writing disk, and a mechanism of reading blocks from disk into memory.
Immediately anyone who is expert in the art, will point out a shit-load of stuff that I glossed over. And that discussion is important because system architecture is about agreeing what are the important things that are always there. And more intriguingly what things that were important that can be dropped.
But beyond just being able to articulate the abstract system architecure, you also have to have a keen insight into the relative computational complexity of the pieces. For example, how much memory and CPU does maintaining a map consume versus doing a read or write.
And beyond the computational complexity understanding what pieces must be very robust and what pieces can be less robust is also important – for example the individual blocks are unimportant unless they contain map information.
Understand what kind of hardware options exist
Given an abstract system architecture, then the next question is how to manifest that model. The truth is that systems architecture is the ultimate wrong tool problem. The perfect abstract system can not be implemented without huge compromise to business value – either in terms of performance or cost or availability.
For example, in the case of file systems, there is the choice of processor and memory and what kind of storage you will use. The more CPU and memory you have the more computation you can do per IO and the less IO you need from the underlying physical sub-system. The faster the physical storage system you have, the more performance you need out of the CPU because you have less time to do work.
Understanding the tradeoffs and trends is really important. And understanding the different kinds of options within a category is really important.
What’s not important is understanding the exact details until you get to actually building a specific instance of your architecture.
Knowing that CPU’s of type A perform 3x better than of type B and that the projected performance curve over the next 5 years (from vendor roadmaps) is crucial especially if Type A has a different architecture than Type B with different tradeoffs on how you write software.
Knowing the ratio of performance between Disk and Memory is important.
And knowing how all of these ratios work with each other is also really important.
Add tremendous business value
So I am jumping ahead because this is the most peculiar statement. The thing about abstract systems architectures is that they can be viewed as an end goal in and of themselves. And there is a lot of value in pursuing that research and understanding.
In my mind, systems is really about building something in the hear and now, and something that is built in the hear and now needs to add value in some material way. And, for better or for worse, I use the term business value to define material way.
Perhaps more prosaically, a better definition is that systems architectures that are interesting have to deliver better top-line performance that is sustainable, better availability or better price/ performance where price can now include power consumption.
Define a software architecture
Most software architectures look like a collection of boxes that have arrows that point to each other.
Fundamentally a systems software architecture is a decomposition of a software system that maps to how the hardware can be taken advantage of to deliver exceptional business value. A system software architecture will not, almost definitionally, look like the abstract system architecture of the system because the implementation in the real world requires trade offs to deliver value.
This kind of decomposition, in my mind, is the essential difference between systems architecture and user application programming. Systems architecture considers how the hardware behaves and decomposes the software to take advantage of the hardware capabilities. User application programming considers how people behave and decomposes the software around that axis.
So why is this important?
Massive revenue and value opportunities exist when you are able to take an abstract system architecture, and then define a software architecture that leverages new hardware that allows you to get a 100x improvement along some kind of business value axis.
NetApp, back in the day, is an example of such an opportunity. Hitz and others were able to see a huge opportunity around RAID and were able to articulate a software architecture that exploited RAID in a unique way that then delivered massive business value.
Not every systems architecture is that valuable, mind you, but some are.
And this skill to defining architecture is applicable to whether you’re building an OS, a radically new kind of PHP compiler or a cloud application. And this skill is, in my mind, not a skill we spend enough time defining and examining and interviewing for.