the nutanixist 25: why nutanix’s single database approach eliminates SDDC orchestration

Over the past 14 years, across three companies, I have been trying to figure out how to build a deterministic SDDC. An SDDC that, when you reconfigured it, stayed reconfigured and didn’t require any human intervention.

I failed. I had to come to Nutanix to see what I was missing.

What’s the problem?

To control an SDDC, you must interact with a software control system.

And what is a control system?

Logically, every software control system has three elements: an API, a database, and the control software.

The API/UI/CLI updates the database, and then the control system reads the database state and updates the datapath.

But what if you have multiple control systems?

And that’s where I failed.

See, the original idea I had was that you could solve the problem by controlling the control systems via their APIs.

We call that orchestration.

But that didn’t work. Because

1) Each control system has its own database, meaning it can get out of sync with the orchestration layer.

2) When an operation needs to be performed by multiple control systems on the same entity, drift is inevitable because they have numerous databases

The first point is obvious: the controller’s API can reconfigure the control system, bypassing the orchestration layer. And thus break the orchestration layer’s model.

The second point is less obvious. If two controllers need to operate on the same entity and that entity appears in two different databases, unless you are using transactional updates, each control system is working on a similar but not identical entity.

And so what happens is that inevitably both (1) and (2) require a human to figure things out. Which means the orchestration fails, and now someone has to debug it.

And maybe I stumbled on that, but what I completely missed is the database for the Hypervisor/OS and the server image.

See, the Hypervisor has a database, the local configuration, and the workloads it was running. It also has an image.

And if the image and the configuration are not controlled by the control plane, then, by definition, the Hypervisor can also be out of sync.

So what do you do?

Well, you create a stateless hypervisor that relies on the control plane to tell it what to do.

Which brings me back to non-Nutanix systems like OpenShift, VCF, and Proxmox. None of them controls the entire system. ~~They control parts of the system because the Hypervisor has its own database—the configuration files and the list of processes it starts up on reboot.~~

So I wrote that statement about databases, and Vytautas Jankevičius was right to say I was wrong. The way I wrote it isn’t accurate. For example, OpenShift has a single database that pushes configuration down. What I meant was something a lot more abstract. What I meant was that there was a persistent universe where, depending on who you asked, you got a different answer. For example, with OpenShift, if the machine doesn’t reboot, the VM will keep running. During that time, depending on who you ask, you will get different answers. And it’s that “different answers until human intervention” that creates a split-brain universe. And to resolve that inconsistency, you need a third system —or an orchestration —and a human. In my mind, if I can get two answers, and the only way to fix it is to have a human intervene, then there exist two databases that are sometimes in sync.

My point was too abstract, and I phrased it — well, wrong. In my defense, I was trying to squeeze this into the limited text window of LinkedIn.com …

Nutanix took a very radical approach: What if there was precisely one database? If there is exactly one database, every control system has the same view of the entity. And therefore every control system sees the same updates in the same order. Nutanix maintains that property by having the hypervisor shut down if it can’t reach the control plane. In other words, there is no arbitrary period during which the hypervisor is running and the control plane can’t control it.

Obviously, the next challenge is scaling that system. And for that, I recommend The Nutanix Bible.

Share this:

Like this:

Leave a ReplyCancel reply