Very early on in my career, I read this post by Paul Graham, https://www.paulgraham.com/avg.html which argued that software architecture matters. He observed that choosing the right technology could put you on a profoundly different curve. I have always looked for such examples. Those create permanent curves, not transitory ones.
Years ago, while at Juniper, I investigated ARM processors as replacements for x86 processors in high-performance tasks.
I consulted hardware architects and CPU experts, who agreed that ARM consumed less power due to its limited features and said that as its capabilities expanded, power consumption would increase.
This was reassuring until I spoke with a kernel expert who pointed out that the Mac’s power advantage came from macOS’s design, which intentionally reduced power use per task. His analysis was credible.
When Apple released the M1-4 laptops, I tested this hypothesis. Indeed, macOS outperforms Windows PCs and surpasses x86 laptops in power efficiency.
In summary, Apple engineered a system with a fundamentally different power utilization curve than Windows, meaning Microsoft would need significant re-engineering to compete.
VMware customers often wonder about Nutanix’s simplicity. Is the system simple because it is genuinely simple, or is it engineered to be simple even as capability grows?
Many have faced the vSphere complexity curve. In 2015, deploying vSphere was so challenging that VMware engineering spent nine years improving the upgrade and lifecycle on ESX and vCenter. While vSphere can be as simple as Nutanix, VCF is not as simple as NCI.
Is Nutanix simple because it is simple? I thought so, but I was mistaken.
Nutanix’s architecture removes complexity. It features a distributed database per control plane in user space, with separate data, management, and control planes. This database is scoped to a single or multiple clusters (Prism Central).
The database functions as the data path and control plane interface, allowing Nutanix systems to switch data paths without sacrificing control features. Moreover, since the data path works in user space, it can be adapted to Kubernetes.
This is simplicity by design.
The architecture is cloud native because the founders were from Google. A cloud-native application is a distributed database with microservices, precisely what Nutanix is. Adding more management and control plane features can be done as quickly as with any modern application, and by design, the system scales. Porting the system is straightforward. And the proof is in the results.
The team quadrupled Prism Central’s scale in a few years, rebuilt the Networking Control plane, and added support for external storage and Kubernetes.A company a 5th the size of pre-Broadcom VMware to deliver a system with a comparable feature set to VCF.
the nutanixist 07: delivering the first real SDDC
Nutanix announced external storage support for its platform.
And for those who live in the vSphere bubble, it feels a lot like “whatever” or, more boringly, “Really?”
It’s about perspective. Nutanix has delivered Raghu’s vision of an SDDC in a way VMware and VCF have not.
When Raghu stood before everyone in 2012 to argue for SDDC, he envisioned a software control plane that would allow infrastructure programmability.
Realizing that vision forced Nutanix and VMware to focus on limiting customer choice to provide simplicity. It was a necessary first phase of the market.
VMware and Nutanix adopted a solution in which customers had to back off their choice of hardware to gain the benefits of the SDDC. The software systems imposed various constraints on workloads and on data center construction.
VMware made some progress with VCF, but VCF only fulfills that promise with a very brittle system and a complex set of software choices that only partially realize the potential.
VCF’s answer to the need for customer choice was to provide that choice and simultaneously sacrifice the simplicity and vision of SDDC. You want choice, you get choice, but the integration of the system is sacrificed.
VCF software architecture forces customers to either not use a feature or not get the value of VCF.
The failure of VCF and Nutanix did not mean that the market demand did not exist for an SDDC that delivers programmability of infrastructure, choice, and availability.
And that’s what Nutanix has delivered. The Nutanix system uses Dell Powerflex storage, which gives customers a consistent operating model across SDDCs and hardware choices without sacrificing features.
Nutanix, because the software system is built using a distributed database with microservices that interact on a consistent data model, has—again—taken a better approach.
VCF forces you to choose between the simplicity of vSAN, the hardware choices of VMFS, and the complexity of SDDC manager and NSX while requiring new hardware and software infrastructure to deploy any of these. Furthermore, the feature set of vSphere is artificially constrained in a VCF deployment due to the limitations of the overall solution.
Nutanix offers an elegant model that works fantastically for HCI and now for external storage. Customers now possess a software platform that delivers the SDDC and is not tied to a specific hardware architecture.
External storage with all the availability, simplicity, and advantages of HCI. Because it’s Nutanix, it also offers the availability and recoverability that such a system requires.
Most importantly, Nutanix has delivered an infrastructure that allows applications more choice in the hardware trade-offs they need while retaining a consistent interaction model.
Calling what Nutanix did – “external storage support” – is like calling the telephone a device to exchange pleasantries. It’s both accurate and misses the change.
the nutanixist 06: why modern workloads need an infrastructure with a clustered control plane.
Since working at Zynga, I have been trying to build a clustered infrastructure control plane for on-prem infrastructure to simplify application development.
As I dug into understanding Prism Element and Prism Central’s architecture, it struck me that my holy grail, building an easy-to-use and configured clustered control plane, was a solved problem. Nutanix had what I was trying to build.
Why was it my holy grail?
A simple principle of software engineering is that it’s easier to write code on top of robust systems than on top of fragile systems. If the system is robust, then it’s forgiving of programming errors. A great example is performance; your code doesn’t have to be as efficient if your hardware is fast enough.
If that intuition doesn’t work, when you write code, you introduce bugs. It’s much easier to debug the new code you just wrote than debug the code you use that you may have no access to.
If that intuition doesn’t work, consider a server running many applications and then installing a new one. It’s much easier to debug the availability of the latest application if the server without the application has predictable availability.
Traditional applications that run in a single VM rely on the OS to be more reliable than their application. So, when their application crashes, they can debug it in isolation.
One of the astonishing engineering outcomes of the last twenty years was the emergence of hypervisors that are more reliable than the OS running the application.
That enabled significant consolidation, engineering efficiency, and operational efficiency.
The challenge with clustered applications is that they have no analog to the hypervisor that manages their infrastructure.
As a result, the application cluster control plane manages the application and the infrastructure.
The recent re-infatuation with bare metal is a side effect of this problem. If the virtualization layer can’t offer a clustered control plane, then the application must. And if the application can do it, why do I need the virtualization layer? Virtualization provides a container on a server. That container and the server are disposable. So why am I paying money and cost?
If your hypervisor doesn’t have a clustered control plane, you may be right.
And since I assumed Nutanix didn’t have one, I planned to build one here.
Imagine my surprise when I discovered that it has been in shipping for 10 years.
Part of the elegance of the Nutanix VDI solution is the Prism Element’s clustered control plane.
Using Nutanix infrastructure, you can rely on the underlying infrastructure control plane to be more reliable than the application control plane. More importantly, you can offload some of the complexity of the application control plane. Most importantly, you can share your infrastructure easily.
Welcome to the future.
the nutanixist 05 – Nutanix is the RDS of your enterprise.
I was chatting with one of the many brilliant folks I’ve worked with over the years about what makes Nutanix easy. As we talked, it hit me: If you reduce the operational challenge of managing all of the software to manage your enterprise to just managing a database, the operational challenges are much simpler.
Why is that even theoretically possible?
An enterprise infrastructure system has three elements:
1. The software that interacts with the physical infrastructure (the data path)
2. Meta data that describes that infrastructure.
3. Stateless software that interacts with the metadata to configure and monitor the physical infrastructure (the control and management)
A fundamental problem for any enterprise infrastructure system is protecting metadata from being lost.
At the enterprise scale, the operational challenge is that the more you want to do with software, the more you invest in operationalizing the software, which means the less money and time you have to do stuff.
You spend so much time setting up software that it bottlenecks you from doing more things.
Databases are a great example. Standing up databases and getting them to work, backing them up, and setting up DR get in the way of actually using a database.
RDS addressed that operational challenge, and now, more folks in an enterprise use databases than they would otherwise.
Nutanix solved the really hard problem of building a distributed database. And can now reap what it has sown.
Nutanix, by encapsulating all infrastructure metadata into a reliable, scalable, and transparent database, reduced the most complex operational problem of metadata management to a database operations problem.
The operational challenges of a single database can be solved. It’s not easy, but it can be.
So, when you look at Nutanix, you see very small customers deploying SDNs. Why? Because the deployment’s operational complexity is slight. Why? Because managing the SDN’s metadata database is easy.
It’s easy not because the SDN system has its own database and the rest of the control system has another one but because they are—literally—the same one.
Solving operational challenges for one feature eliminates the need to set up another database and address the operational problems associated with it, as well as the challenges of maintaining two databases in sync.
And so the complexity of deploying the SDN is significantly lower. This makes the consumption of the capability less costly and easier to do.
And the value of an SDN is real.
And if you’re in the business of running a business, shouldn’t you be spending your time using stuff instead of figuring out how to deploy it?
the nutaxinist 04: the magical distributed database
Everyone has a take on their company’s magical mystery sauce. For me, it’s without a doubt the distributed database that underpins Nutanix’s product’s storage, management, and control planes.
At its most reductionist core, a cloud is a distributed database with microservices that implement business logic and an IO path.
The hard problem in the cloud is building a distributed database that is transparent to the microservices’ consumers.
The problem is so complex that the public cloud providers have struggled to bring their platforms to on-prem.
The complexity is along three dimensions: the first is building such a thing, the second is getting it to work, and the third is making it work within a large number of unique customer deployments where the customer does the deployment and life cycle.
The cloud vendors solved the problem by owning the database’s deployment and lifecycle. They could control the hardware and software deployment and, by doing that, achieved astonishing scale and availability. They also have a tremendously sophisticated engineering team that can operate those systems at scale.
Nutanix, because of its origin in HCI, built a clustered database, and you can find out more about it here: nutanixbible.com
That database has then become the basis of their control and management planes.
Because of its origins, that database is—from the management software perspective—infinitely available and infinitely reliable. Its existence is transparent to the customer.
Working at Nutanix, I am struck by the fact that, unlike every other competitor in the space, our system is architected from the ground up to be a control and management plane for the cloud.
That difference delivers real business value. For example, backup and restore of the control and management plane are trivial. Recovery is plausible with a minimum of fuss. That is telling when you compare it to other enterprise-class products, especially other infrastructure products.
Because of how the system is built, you need one backup of one system to get all of the microservices’ states. Restoring is done with a single point-in-time copy, so you don’t need to restore multiple different databases.
Or consider DR and HA; there is precisely one way to do it, and it works for all services.
For systems like backup, DR, and HA, the complexity of the backup/recovery process, the HA process, or the DR process intrinsically affects the system’s availability; Nutanix has a shockingly good system.
In fact, when I joined, I was stunned by how good it was.
It’s the power of that fundamental core architectural building block that makes Nutanix a magical platform.
the nutanixist 03: AHV is more reliable for a broad range of workloads than alternative systems.
Let me clarify upfront: ESXi is a fantastic operating system. Its availability is exceptional. The team that maintains it is outstanding. The management leadership that ensures quality is superb.
However, the argument that it is more available is very narrow and insists that one definition of availability is the only one that matters.
Consider a single VM. If the VM relies solely on external storage, then ESXi’s local and stateful control plane guarantees the VM will keep running as long as the server has power and the storage functions properly.
What AHV offers, compared to systems like OpenShift, vSphere, and others, is the guarantee that the clustered control plane remains available to all hosts within a quorum.
This is a fundamentally different and compelling guarantee, and it is critical for the correct operation of a workload.
Consider any modern workload that depends on the infrastructure control plane, such as any Kubernetes workload. If the Kubernetes system cannot allocate a persistent volume because the infrastructure control plane is down, then the workload is impacted.
Or consider a scenario where a set of hosts gets partitioned. If the workload must run within the same partition, systems lacking a clustered control plane cannot ensure they stay in the same partition. Thus, the VMs might be running, but the workload itself isn’t.
A clustered system guarantees that the VMs and the workload run within a single partition.
Similarly, any workload requiring a clustered service as part of the infrastructure, like HCI or SDN, depends on a control plane external to the local OS. If that control plane becomes unavailable, the workload is also unavailable.
Additionally, running a workload effectively involves maintaining system balance. If the OS on the server is running but the load balancer is not, then the system will run in an unbalanced state until the load balancer goes online. During that time, performance will be impacted. And if performance is affected, then the workload’s availability will be impacted.
What is clear is that a workload depends on the local host and the cluster control plane being operational.
In all these cases, the AHV guarantee of the availability of the clustered control plane offers clear advantages. Its control plane for Kubernetes can tolerate a single host failure and continue running. Its control plane supports non-disruptive upgrades. AHV will only start VMs within an active partition. The storage cluster’s availability is measured in 99.999% uptime and is fully autonomous. vSAN, on the other hand, requires vCenter for critical functions like upgrades. AHV’s load balancer remains operational as long as a quorum of hosts exists.
While at VMware, I tried hard to fix this in vSphere. I initiated a series of projects that were consistently deprioritized to address this critical functional gap.
A funny story. I had a 1:1 with Hock Tan. It was a fun meeting. He asked me what I was working on, and I replied, “Well, I am working on making ESXi clustered, but it got canceled.” And he was about to explain to me that the reason it got canceled was because of VMware’s inability to set priorities.
And of course, I couldn’t resist and said – “Well, actually no. What happened was that you bought the company, and so we decided to use that time to pivot to subscription revenue.”
Hock looked at me in a way that I interpreted as “Well, I was about to give you this lecture, and you deprived me of it.”
He recovered, however. There’s a reason why he’s who he is, and he said, “We’ll get to it after the acquisition closes.”
I was hopeful. Unfortunately, things didn’t quite work out for me and the project.
My critique of single-node OS’s for clustered systems is a long-standing one.
the nutanixist 02: the problems with stateful hypervisors
I recently wrote about how AHV is deeply misunderstood. And what struck me is how deeply misunderstood AHV, ESXi, and Openshift are.
ESXi is amazing software with a dedicated team and satisfied customers. Its architecture enables continuous operation of VMs even if hosts become disconnected, thanks to a local control plane on each host that manages registered VMs independently, relying solely on storage. However, this control plane maintains state, so if it fails or becomes unavailable, ESXi becomes inaccessible even though VMs continue running. The control plane relies on user-space services, such as hostd and vpxa, which can fail for various reasons. This architectural design, while effective, has inherent limitations. Notably, this architecture serves as the foundation for every other commercial Hypervisor, except AHV. Openshift faces similar issues, due to kubevirt and kubetcl.
I will now focus on the architecture rather than the products.
When you build a clustered system, you are creating a clustered control plane. And you have two design choices. One is to build on top of a local cluster control, and the other is to build directly on the data plane.
If you build on the local cluster control plane, you have two challenges. The first step is to detect any actions taken by the local cluster control plane and reconcile those actions. The second is that there are plenty of scenarios where the VM is running but the local cluster control plane is down, and can’t be recovered. And so, it becomes very tricky to determine whether a host is up or down. Because, as the cluster control plane, you don’t know if the VM is running or not.
A clustered control plane is almost always running in a split-brain mode, where it hopes that it knows just enough of the local state to make the right decisions, and it expects the local control plane won’t make decisions that break it. Not being able to determine the state of a host deterministically, and whether it is up or down, makes the system fragile. Why? Because the host, while “seemingly” being down, can be up. And while it’s in this disconnected mode, the host can be modified. At the same time, the cluster state can also change. When the host rejoins the cluster, a human must reconcile a state that cannot be reconciled. Although this happens infrequently, it can’t be guaranteed, so all this complexity exists solely to handle issues that arise from the basic guarantee of stateful hypervisors.
So, are clustered control planes good, and are stateful ones bad? No. That’s simplistic. However, if your system relies on a clustered service, such as HCI, or if the applications are clustered, or if you have modern workloads that require interaction with the underlying control plane to operate, a clustered control plane is necessary. And if it is essential, then the choices for building it and the implications of those choices matter.
the nutanixist 01: the deeply misunderstood AHV
Eleven years ago, when Nutanix announced AHV, my initial reaction was: This will fail.
The idea that someone could successfully introduce a new commercial hypervisor into the market seemed ridiculous.
But 11 years later, it has proven me wrong.
Despite this, AHV remains deeply misunderstood because of its uniqueness.
Most operating systems are stateful, meaning that the system’s state is stored within the OS, and when the server restarts, the OS retrieves its state from disk.
AHV, however, is stateless, and so?
Consider a VM. For example, with ESX, you can create a VM through the ESX console, and if ESX crashes, the VM’s state is saved on local disks that ESX reads to restart it.
Much of vCenter’s job is to monitor what’s happening in ESXi and respond to an environment that doesn’t match its expectations of what ESXi was doing.
With AHV, creating a VM requires accessing the cluster control plane, which runs on the CVM—a special VM that manages the cluster’s state. For more details, see The Nutanix Bible.
Thus, when the OS boots, the cluster control plane determines its state.
Why is this so powerful?
During a reboot, the control plane doesn’t need to determine what AHV considers to be running, nor does it need to stop or start processes.
More importantly, the AHV state can’t be out of sync with what the control plane believes.
This setup also significantly simplifies everything. Systems like ESXi or Linux require building a layered control plane on top of the OS’s control system. This layered system must interpret and respond to the actions of the underlying control plane. If it needs to stop or change something, it is at the mercy of the underlying controls.
Most of the time, this isn’t an issue because complex software, such as kubevirt or hostd, tries to reconcile conflicting control goals.
With AHV, that intermediary layer doesn’t exist.
This leads to the misunderstanding: AHV is stateless and can’t be directly compared to ESXi or Linux/KVM. Instead, you should compare it to ESXi + vCenter or Linux with something like OpenShift.
When you make this comparison, you realize that AHV provides a level of availability and control that’s unmatched.
For example, vCenter manages multiple clusters, and any single cluster can impact all workloads. If vCenter fails, it affects all workloads, necessitating the use of numerous vCenters, which in turn leads to vCenter sprawl and increased operational overhead.
With AHV, the boundary is a cluster. Each cluster is isolated from the others.
Thanks to Prism Central, it’s more feasible to run multiple workloads on separate, isolated clusters.
Because the AHV control plane is clustered, its availability is tied to the availability of Nutanix clusters, unlike vCenter, which runs as a single VM.
All of this is possible because AHV’s stateless design allows the creation of a cluster control plane with a small team, achieving something once thought impossible.
