the nutanixist 03: AHV is more reliable for a broad range of workloads than alternative systems.

Let me clarify upfront: ESXi is a fantastic operating system. Its availability is exceptional. The team that maintains it is outstanding. The management leadership that ensures quality is superb.

However, the argument that it is more available is very narrow and insists that one definition of availability is the only one that matters.

Consider a single VM. If the VM relies solely on external storage, then ESXi’s local and stateful control plane guarantees the VM will keep running as long as the server has power and the storage functions properly.

What AHV offers, compared to systems like OpenShift, vSphere, and others, is the guarantee that the clustered control plane remains available to all hosts within a quorum.

This is a fundamentally different and compelling guarantee, and it is critical for the correct operation of a workload.

Consider any modern workload that depends on the infrastructure control plane, such as any Kubernetes workload. If the Kubernetes system cannot allocate a persistent volume because the infrastructure control plane is down, then the workload is impacted.

Or consider a scenario where a set of hosts gets partitioned. If the workload must run within the same partition, systems lacking a clustered control plane cannot ensure they stay in the same partition. Thus, the VMs might be running, but the workload itself isn’t.

A clustered system guarantees that the VMs and the workload run within a single partition.

Similarly, any workload requiring a clustered service as part of the infrastructure, like HCI or SDN, depends on a control plane external to the local OS. If that control plane becomes unavailable, the workload is also unavailable.

Additionally, running a workload effectively involves maintaining system balance. If the OS on the server is running but the load balancer is not, then the system will run in an unbalanced state until the load balancer goes online. During that time, performance will be impacted. And if performance is affected, then the workload’s availability will be impacted.

What is clear is that a workload depends on the local host and the cluster control plane being operational.

In all these cases, the AHV guarantee of the availability of the clustered control plane offers clear advantages. Its control plane for Kubernetes can tolerate a single host failure and continue running. Its control plane supports non-disruptive upgrades. AHV will only start VMs within an active partition. The storage cluster’s availability is measured in 99.999% uptime and is fully autonomous. vSAN, on the other hand, requires vCenter for critical functions like upgrades. AHV’s load balancer remains operational as long as a quorum of hosts exists.

While at VMware, I tried hard to fix this in vSphere. I initiated a series of projects that were consistently deprioritized to address this critical functional gap.

A funny story. I had a 1:1 with Hock Tan. It was a fun meeting. He asked me what I was working on, and I replied, “Well, I am working on making ESXi clustered, but it got canceled.” And he was about to explain to me that the reason it got canceled was because of VMware’s inability to set priorities.

And of course, I couldn’t resist and said – “Well, actually no. What happened was that you bought the company, and so we decided to use that time to pivot to subscription revenue.”

Hock looked at me in a way that I interpreted as “Well, I was about to give you this lecture, and you deprived me of it.”

He recovered, however. There’s a reason why he’s who he is, and he said, “We’ll get to it after the acquisition closes.”

I was hopeful. Unfortunately, things didn’t quite work out for me and the project.

My critique of single-node OS’s for clustered systems is a long-standing one.

Trackbacks

the nutaxinist 13: x86 virtualization may not be what you think it is, bare metal is roaring back and why you need a different platform like AHV says:

July 24, 2025 at 1:49 pm

[…] I wrote this a while ago, and since then, I have learned a great deal more about what makes AHV special. And although I talk about the database here, it’s not just about the database; it’s also about the kind of OS and the availability models of that system. […]

Share this:

Like this:

Trackbacks

Leave a ReplyCancel reply