In an earlier post, I observed how VCF’s state was like the cat in Schrodinger’s box. It was impossible to determine the state because each observer had a different view of the state, and there was no consensus protocol.
Nutanix’s engineering team took a fundamentally better approach than VCF.
As always, read the https://nutanixbible.com for more details.
Both the Nutanix Cloud Platform and VCF have the same problem: how do you share state in a distributed system?
In particular, given a set of programs in a distributed system, they must all agree on the value of any shared state. If they don’t agree and don’t know that they don’t agree, then each program will make different decisions based on its view of the value. For more https://lnkd.in/gVHePnME)
Where it gets gnarly is when you have failures. And that for everyone to agree on the value, everyone has to see the same updates to the value in the same order (there are variations on this requirement).
For example, suppose I have three databases, and each database has a copy of my bank account.
My starting balance is 100$, I deposit 100$ and withdraw 150$
With no consensus protocol, the following is possible:
Database 1 thinks I have 50$
Database 2 adds an overdraft charge because it saw the withdrawal of 150$ after it saw the deposit of 100$
Database 3 thinks I have a balance of 200$ because it never saw the withdrawal
With a consensus protocol, there is only one possible outcome, namely that each database thinks I have the 50$ in my bank account.
Both VCF and NCP are distributed systems. VCF has a set of central databases (NSX db, vCenter Postgres, Operations) and a set of edge databases on ESX hosts. NCP has a single centralized database and a set of edge databases in the form of clusters.
I already discussed VCF, so today, let’s focus on NCP.
So how does Nutanix maintain consensus between the central database and the clusters?
Each cluster database notifies the central database of all updates in the cluster in the order they were made.
As a result, the central system always has a complete and consistent view of the state of the cluster.
And all of the products built on the central system have a single, consistent view of the state of each other and the clusters.
This doesn’t sound like much, and it’s why restore actually works on NCP.
When I restore Prism Central from a backup, it has a consistent view of every cluster. It is not possible (modulo a bug) that the backup will contain a state of the environment that never existed. Nor is it possible for different services to have a different view of the environment.
It’s why you can restore from backup and recover an environment, whereas with VCF, you must do a rebuild.
The problem with this system, however, is not just backup, it fundamentally affects scale and availability.
Why does this matter?
Obviously, because backup 🙂
But it also affects scale. And correctness.
The VCF system works because, although theoretically the databases are out of sync, the system is working very hard to keep them in sync. And so as long as changes to the environment occur less frequently than the time needed for each database to figure out what is going on, the system works.
So what? Doesn’t every consensus protocol impose some cost? Yes, but the VCF consensus protocol, such that it is, doesn’t guarantee that the state is consistent; it says the state should be consistent. So if you scale the system incorrectly, instead of the system becoming slower, it will behave incorrectly.

Leave a Reply