nutanixist – Page 2

the nutanixist 19: the arrogance of the broadcom shift in cloud credits

September 8, 2025 by kostadis roussos Leave a Comment

I usually don’t discuss business models, but what Broadcom did is a good example of how thinking you have an irreplaceable product and not understanding your customers can cause problems.

One of the main challenges with VMware was the variation in business models, which made license portability difficult.

That variation also led engineering teams to go to great lengths to avoid collaborating.

In many ways, VMware was like several companies in one, each with its own distinct business model, selling layered products on top of vSphere.

Changing this corporate structure has been the main driver behind Broadcom’s changes to VMware’s product setup.

At the same time, we live in a very complex world. Corporations have very complicated budgets.

The core of selling goods is to meet your customers where they are.

Broadcom’s goal is to remove customer choice. The idea is that by forcing customers to work with the team that owns VCF credits, workloads will be pushed back into on-premises environments or not moved to the cloud.

Here’s how it works:

Think about “Big Corp Co.” with two teams. Team A wants to run some workloads in the cloud, while Team B is responsible for on-premises virtualization. Team A has workloads on vSphere that need to be migrated to the cloud, and the most cost-effective way to do this is to utilize some of their corporate credits.

But now, Team A can’t use those credits.

Since running the stack on VCF requires VCF credits, they will be directed to the internal VCF team, Team B, which will tell them that instead of running in the cloud, they should run on-premises.

Team A might protest, but Team B, which controls the budget, will explain that there isn’t enough budget for the cloud deployment and offer an on-premises alternative.

Therefore, Team A, because they are forced to use VCF for certain reasons, will theoretically have to move any workload that could have run in the public cloud back on-premises.

This approach assumes Team A has no options.

And that’s where this model of limiting choices fails.

Customers always have options.

And when you force someone to do something, they will quickly find ways to choose differently.

It’s why Nutanix has been adding many customers lately.

the nutanixist 18: relying on infrastructure instead of application specific availability

September 7, 2025 by kostadis roussos Leave a Comment

From 1996-2009, it was believed that application availability was an infrastructure issue, so improving reliability meant making infrastructure more resilient. Tandem NeverFail symbolized the ideal of reliable infrastructure, with systems like the Origin 2000, featuring a single system image and NUMA, representing the peak of scalable computing. However, in the mid-1990s, Gregory Pfister’s book “In Search of Clusters” argued that building such infrastructure was too complex, advocating instead for clustering.

At the time, this idea seemed absurd. When distributed systems were first being deployed in the mid-1990s, it seemed truly crazy.

As a result, infrastructure vendors continued to focus on making single systems more resilient.

When the cloud emerged, infrastructure architects like myself viewed it skeptically because of its lack of guaranteed availability. “How would applications run on it?” we wondered.

What we didn’t realize was that software naturally seeks to operate on cheaper hardware, and because of this, new technologies have arisen to make that easier.

For me, the pivotal moment came in 2009, when Cafeville was running on an effectively 1000-node cluster. The team combined various components with some critical innovations.

This marked the beginning of an era where availability shifted from being purely an infrastructure concern to an application problem because infrastructure became less reliable.

My critique of vSphere and similar systems that aren’t natively clustered is that they are inherently less reliable than what applications require. Consequently, application teams must write code assuming infrastructure instability rather than depending on the system’s reliability.

What do I mean by infrastructure instability?

In the pre-Cloud era, infrastructure was assumed to either work or fail. In the cloud era, uncertainty in infrastructure was acceptable if the application, its developers, and the operations team could identify what went wrong—uncertainty was tolerated.

The problem was that this increased the cost of maintaining and supporting applications and slowed down development, as teams spent more time on infrastructure issues than on the applications themselves.

At Zynga, when my team provided a reliable infrastructure, team sizes decreased, and productivity for the game teams increased.

Our team ensured there would be no ambiguity about how the infrastructure was performing. We provided guarantees.

By stating that infrastructure needs to be more robust, I mean that it must ensure the application and its components are operational, that data is available, that no infrastructure changes have occurred, and that the system can recover if needed from a backup without requiring the system to be rebuilt.

And that, in an era of clustered applications, clustered infrastructure that can give those guarantees by default, like Nutanix, is the only way forward.

the nutanixist 17: the consistent nutanix cloud platform

August 18, 2025 by kostadis roussos Leave a Comment

In an earlier post, I observed how VCF’s state was like the cat in Schrodinger’s box. It was impossible to determine the state because each observer had a different view of the state, and there was no consensus protocol.

Nutanix’s engineering team took a fundamentally better approach than VCF.

As always, read the https://nutanixbible.com for more details.

Both the Nutanix Cloud Platform and VCF have the same problem: how do you share state in a distributed system?

In particular, given a set of programs in a distributed system, they must all agree on the value of any shared state. If they don’t agree and don’t know that they don’t agree, then each program will make different decisions based on its view of the value. For more https://lnkd.in/gVHePnME)

Where it gets gnarly is when you have failures. And that for everyone to agree on the value, everyone has to see the same updates to the value in the same order (there are variations on this requirement).

For example, suppose I have three databases, and each database has a copy of my bank account.

My starting balance is 100$, I deposit 100$ and withdraw 150$

With no consensus protocol, the following is possible:

Database 1 thinks I have 50$
Database 2 adds an overdraft charge because it saw the withdrawal of 150$ after it saw the deposit of 100$
Database 3 thinks I have a balance of 200$ because it never saw the withdrawal

With a consensus protocol, there is only one possible outcome, namely that each database thinks I have the 50$ in my bank account.

Both VCF and NCP are distributed systems. VCF has a set of central databases (NSX db, vCenter Postgres, Operations) and a set of edge databases on ESX hosts. NCP has a single centralized database and a set of edge databases in the form of clusters.

I already discussed VCF, so today, let’s focus on NCP.

So how does Nutanix maintain consensus between the central database and the clusters?

Each cluster database notifies the central database of all updates in the cluster in the order they were made.

As a result, the central system always has a complete and consistent view of the state of the cluster.

And all of the products built on the central system have a single, consistent view of the state of each other and the clusters.

This doesn’t sound like much, and it’s why restore actually works on NCP.

When I restore Prism Central from a backup, it has a consistent view of every cluster. It is not possible (modulo a bug) that the backup will contain a state of the environment that never existed. Nor is it possible for different services to have a different view of the environment.

It’s why you can restore from backup and recover an environment, whereas with VCF, you must do a rebuild.

The problem with this system, however, is not just backup, it fundamentally affects scale and availability.

Why does this matter?

Obviously, because backup 🙂

But it also affects scale. And correctness.

The VCF system works because, although theoretically the databases are out of sync, the system is working very hard to keep them in sync. And so as long as changes to the environment occur less frequently than the time needed for each database to figure out what is going on, the system works.

So what? Doesn’t every consensus protocol impose some cost? Yes, but the VCF consensus protocol, such that it is, doesn’t guarantee that the state is consistent; it says the state should be consistent. So if you scale the system incorrectly, instead of the system becoming slower, it will behave incorrectly.

the nutanixist 15: schroedinger’s VCF

August 2, 2025 by kostadis roussos 1 Comment

In my previous post, I talked about irreducible control planes. And hand-waved their implications, but those implications explain why Nutanix is so fundamentally different.

brown cardboard box on white table — Photo by Sahand Babali on Unsplash

An irreducible control plane is the minimal control plane that must exist for a service to exist. For VMs on ESX, that’s ESX. With the ESX host running and access to the ESX host, you can perform any actions you want on a VM.

Every other control plane adds value but is not essential.

vCenter is an optional system. And thus, the first peculiar part of vSphere is the split brain between vCenter and ESXi. vCenter does not control ESXi entirely. It has to account for the possibility that the state of an ESXi host changed while it was running.

In effect, vSphere states that if ESX is running and you can authenticate to the ESXi host, you have complete control of the system, even if all other software is not running. It also states that you can always log into the ESXi host and take action, even if all other systems are running.

Thus, ESXi and vCenter are permanently running in a split-brain mode, and it’s vCenter’s job to react to what happened on ESXi.

The design principle is a good one. That’s not a bad design principle. And I have argued that a singular design principle is why ESXi is the best system for running traditional single VM legacy workloads, where all you care about is running the VM.

But if you care about doing anything other than the VM running, then how vCenter knows about the state of the ESXi host is super interesting. If vCenter and ESXi have different views of the state of the host, and those views get out of sync, what happens?

For example, when an ESXi host gets disconnected from vCenter, and a host in the cluster that the ESXi failed, and vSphere HA starts a VM on that disconnected ESXi host, what happens?

What does vCenter know and when did it know it?

And if vCenter doesn’t know and you can’t explain what it knows and when it knows it, how does a system layered on vCenter know what vCenter knows?

So, for example, NSX relies on vCenter to know things about the ESXi host. And if vCenter has wrong information, then NSX has bad information.

And maybe that -might- be okay, if NSX and vCenter had consistent but wrong information.

But they don’t.

vCenter talks to ESXi via hostd and vpxa. And those agents don’t return all the information on a host. And they are certainly not the only way to manipulate the host state.

NSX has its agent, which returns information about the networking state, and that information is returned to NSX.

Now NSX depends on what vCenter thinks is running on ESXi to make decisions about the state of the VMs running on ESXi.

So the complete state of the system is:

What NSX thinks the ESXi host thinks.
What vCenter thinks the ESXi host thinks.
What NSX thinks vCenter thinks the ESXi host thinks.

And nobody can explain precisely the consistency guarantees of what each system thinks.

And thus we have Schrödinger’s VCF.

ESXi has one point of view, vCenter another, and an NSX a third.

And most of the time, things are okay, but when they aren’t, it’s impossible for any mere mortal to tell you what exactly the state of the system is.

To demonstrate that point, when you recover a vCenter from backup, the NSX system can’t just trivially recover, because the world suddenly changed under its feet. From ESXi, it has one point of view, and from vCenter, it has an entirely different point of view. And until vCenter is reconciled with the current ESXi state, and then NSX is reconciled with the new state of vCenter, the system is in an indeterminate state.

That flaw in the system is precisely what Nutanix’s architecture eliminates and simultaneously addresses the fundamental problem of availability of the cluster and autonomy of the cluster when Prism Central is unreachable.

the nutanixist 14: the PSC, identity and architecture

July 27, 2025 by kostadis roussos 1 Comment

blue and white desk globe on green grass field during daytime — Photo by Guillaume de Germain on Unsplash

When I joined VMware, the company had just released version 6.0, and along with it, the external PSC (Platform Services Controller).

The idea behind the PSC was to centralize a set of core services, upon which the rest of the product portfolio could be built, including vCenter.

The problem was availability.

vCenter 6.0 depended on the PSC to be available so you could log in.

If the PSC were down or unreachable, you would be unable to log in.

You encountered a peculiar complexity: vCenter was up, but the PSC was unreachable, and you couldn’t log into vCenter to resolve the issue with the PSC (and yes, vSphere HA can’t help).

In short, the failure domain of the PSC and vCenter was distinct, and because they were different, your infrastructure only worked when the PSC and vCenter were available and reachable.

As you add more vCenters to a PSC, in what became known as the MxN configuration, for your entire environment to work as expected, everything had to be working.

So? If your environment consists of 10 vCenters and one PSC, and the connection to the PSC is broken for any vCenter, then that vCenter isn’t functioning correctly.

So? Just use the local account.

Except, the whole point of the PSC was not to use local accounts.

Because not being able to log into vCenter was unacceptable, customers had to maintain both local accounts and PSC accounts.

So what? It’s just an occasional thing. Except it isn’t. As the number of vCenters increases, the probability of any one vCenter being unable to reach its PSC increases, which means some part of your infrastructure is always requiring you to use a local password.

Where things got messy was during the restoration from backup. The PSC didn’t just include your identities; it also included tag definitions. If you restored a PSC from backup, then from the point of view of all the vCenters, it appears to have reverted to an earlier state. And so the tag definitions could suddenly disappear. And if they did, then any operational tooling that depended on the tags would break.

From 6.5 to 7.0, the vCenter team rectified that architectural flaw. We worked to make the transition as seamless as possible, driven by one consideration: never break the fault domain between vCenter and login.

What surprised me, according to the VCF 9.0 documentation, which I probably misread, is that VCF now requires a single centralized broker and also requires local identities.

So while VCF has gone around and around on this topic, Nutanix chose a different path; instead of relying on a single global broker, each cluster and system has its own broker, which can federate with one another.

I wondered why both companies took different paths. At the core, it comes down to the fact that Nutanix considers the cluster the irreducible control plane, while VMware considers ESX. A cluster can have an identity broker, whereas every host cannot—details in architecture matter.

the nutaxinist 13: x86 virtualization may not be what you think it is, bare metal is roaring back and why you need a different platform like AHV

July 24, 2025 by kostadis roussos Leave a Comment

I wrote this a while ago, and since then, I have learned a great deal more about what makes AHV special. And although I talk about the database here, it’s not just about the database; it’s also about the kind of OS and the availability models of that system.

Photo by Jonny Gios on Unsplash

One of the more enduring mysteries about x86 virtualization is how profoundly misunderstood it is by those who are distant from it, considering its widespread use.

For folks with a passing understanding, they assume that something intercepts every instruction between the actual processor and the workload and translates it on the fly.

Except that’s not been the case from more or less the beginning.

What VMware and most other vendors did was virtualize a processor’s control instructions, not the workload instructions.

A processor has a set of instructions and capabilities for running workloads and a set of capabilities and instructions for managing the hardware.

The OS-processor interface is peculiar and continuously evolving. By its very nature, it was initially engineered to assume that only one OS ever interacted with it.

VMware virtualized that OS-Processor interface, enabling multiple different OSs to run on the same x86 hardware.

Once the processor’s control plane was virtualized, it became possible to build an OS (ESXi) that treated VMs as first-class abstractions.

ESXi enabled far more sophisticated control and sharing of the physical resources. It could do that because the control plane was virtualized, and when it needed to interfere with the running guest, it was able to.

Nowadays, every OS does the same thing—it virtualizes the OS-processor interface and, using that abstraction, can run multiple VMs on a single processor.

Unfortunately, we take this for granted because it is an astonishing technical result, and we are too impressed with it.

And given that every OS on the planet, including many free ones, and that customers wanted to run mixed workloads and they chose to use an inferior form of virtualization, the cognitive dissonance between I like bare metal and virtualization isn’t good enough, hurt my head.

So I dug into it.

They are saying that a modern application’s control plane is a distributed system. They want a distributed infrastructure control plane on which multiple applications can rely. Virtualization does provide a mechanism for sharing a server, but that’s not useful without a distributed infrastructure control plane that applications can share.

The industry-leading virtualization does not have a clustered control plane. So, customers naturally look towards bare metal Kubernetes (K8s) because it has a distributed database.

And then the same customers use kube-virt to create VMs. The shift to bare metal is thus not about virtualization, but about what control plane virtualization you need. Today’s applications require the infrastructure control plane to be virtualized.

The next generation of infrastructure management will depend on vendors who figure out how to virtualize the interface from the K8s API server to the underlying infrastructure itself.

To achieve this, you need a distributed database that is more reliable than etcd. Why? Because etcd is what you don’t have to pay for.

Fortunately, Nutanix has one of those.

the nutanixist 12: the deeply misunderstood SPOF and availability

July 20, 2025 by kostadis roussos 1 Comment

I recently had a conversation with a friend of mine about a company he worked for years ago that had no backups for its production environment.

The team argued that they didn’t need backups because they had synchronous mirroring.

The person pulled their hair out and asked, “So if a table gets corrupted, what then?”

What I find odd in the IT industry is that we think of SPOF as a hardware failure.

With modern systems, the total outage after a hardware failure lasts only minutes. Although that’s unfortunate, it’s not disastrous.

In contrast, recovering from a backup can take days.

Furthermore, recovering from a backup can result in data loss.

Therefore, the cost, complexity, and downtime associated with backups are so high that people consider backups the most crucial part of their availability strategy.

But they don’t.

I would say, “But this is a SPOF,” and folks responded, “Well, that is 5 minutes of downtime, not a big deal.”

And I realized that what I am saying is, “Since this cannot be restored from backup, we are one human mistake away from destroying all of this infrastructure and taking multiple days or hours to recover.”

When I became the architect of VCF, I saw a system that could not be restored from backup without intensive support from VMware customer support. A typical VCF instance comprises two NSX deployments with their respective databases, an SDDC manager with its database, and several vCenters, each with its internal database. Even worse, upon examining the products, they contain multiple internal databases and configuration files. Products like the supervisor, Operations, and Automation are further dependent on all these systems having the same view of the state to work correctly.

While I was the architect of vCenter, I ensured that backups worked effectively. I reviewed and examined the file-based restore feature. Every feature had to explain what would happen after a restore. I led the effort to make MOIDs stable so that, after a backup, VMs retain their original IDs. I pushed as hard as possible to get a DKVS so that the restore would work without breaking clusters.

What VCF has can be made to work. There is a prodigious amount of research in distributed systems that would allow this to work. However, simply reading the documentation, asking Google, and consulting Reddit will reveal how fragile the current system is.

What astonishes me about Nutanix is that Prism Central is routinely restored from backup by customers. Because it’s so easy to restore from backup, customers will restore from backup rather than try to figure out what broke.

Does that mean Prism Central always works? No. However, there is a qualitative difference between something that cannot work without any painful intervention and something that primarily works.

Why does Prism Central work? Because the team did the hard systems work that takes years to implement.

It is possible to architect a system that can recover from a backup. It is possible to build a system that minimizes data loss. The DKVS was part of such a system.

It’s just hard and takes time.

What astonished me is that Prism Central has such a system.

the nutanixist 11: Nutanix Cloud Native Architecture and NCM Disaggregation

July 14, 2025 by kostadis roussos Leave a Comment

I joined Nutanix because of great people, inspiring leadership, a solid business model, innovative technology, and a lovely 15-minute commute.

In my early days here, I was a bit puzzled about how we managed to thrive, especially considering the significant energy I had previously devoted to trying to put Nutanix out of business.

Everything changed when I had the chance to meet the CTO Ambassadors. During a conversation, Joe Garvey shared his insights about how Nutanix customers often restored from backup too quickly instead of tackling the challenges they faced with Prism Central.

I was genuinely taken aback.

It was amusing because, from my viewpoint, the ability to restore from backup felt remarkable. The Nutanix team seemed a bit puzzled, unsure why “restoring” from backup was such a significant consideration.

This experience launched me on a fascinating journey of discovery, revealing just how incredibly special Nutanix truly is.

Their management and control plane is a cloud-native application, easily packaged and installed by customers on-premises.

And I have talked about this before (the nutanixist 05 – Nutanix is the RDS of your enterprise. and the nutanixist 08: Is it simple because it’s simple, or because it’s always engineered to be simple? A parable about ARM and Nutanix, or why Paul Graham is right. and the nutaxinist 04: the magical distributed database)

So, what truly defines something as cloud-native? The essential components include scalability, seamless upgrades, no single points of failure, the use of microservices, and an API-centric architecture. Prism Central beautifully incorporates all of these features.

As I delved into the system, I uncovered a very contemporary application architecture that, in some aspects, seemed ahead of its time compared to the Kubernetes platform design. However, the challenge we faced was that the industry ecosystem had evolved alongside Nutanix, making it practical to shift to the standard platform to optimize our investments in the broader ecosystem.

As a result, we recognized the need to replatform our products. In our case, replatforming was a monumental effort, but it wasn’t about rewriting everything. From a business logic perspective, it ended up being mostly transparent.

So, what benefits do we gain?

Today, Prism Central stands as a single-scale-out cloud-native application that can scale to three VMs. With the introduction of the new disaggregated platform, we successfully migrated some Prism Central services to their own Kubernetes cluster, allowing us to scale these services independently.
Just like with Prism Central, our customers don’t need to be platform administrators; we utilize Kubernetes to keep the platform experience seamless and transparent.

What’s even more remarkable is that we didn’t need to entirely shift to Kubernetes to harness all of that value.

It’s truly impressive to think that Nutanix managed to evolve from a modern yet somewhat outdated platform to a newer one in just two years. This transformation speaks volumes about the product architecture and its alignment with future advancements, rather than working against them.

the nutanixist 10: the differences between Nutanix SDDC, VCF, and external storage

July 14, 2025 by kostadis roussos Leave a Comment

I made an audacious claim about Nutanix: they had the first real SDDC when they added external storage.

Of course, many folks quickly and correctly pointed out the large feature gaps between Nutanix’s offerings and VMware’s. vSphere supports a broader set of storage offerings, NSX has a wider feature set, and vCenter has folders.

Having been the architect behind vCenter (versions 6.5, 7.0, and 8.0, as well as VCF 4.x up to 5.0), I really have a good understanding of the gaps between these products.

My time at VMware was spent trying to build a real SDDC. I have a long list of attempts, code words, and projects I pushed, some with success and some without, within VMware.

I was highlighting another perspective. To me, an SDDC offers a cohesive collection of services that simplifies the physical infrastructure beneath it. This means that, no matter the hardware in use, I can operate consistently and seamlessly. While various hardware might bring a few additional features, my team’s operations remain unchanged and steady.

vSphere’s approach to storage involves creating a new filesystem for each type of storage. vSphere includes VMFS for external storage vSphere features a distinct NFS client that operates differently than VMFS for NFS storage. Additionally, vSphere has vSAN to enhance the performance and capacity of both flash and HDD. Lastly, vSphere introduced another filesystem, VSAN ESA, designed to meet the performance requirements of NVME drives.

Each point product was brilliant, feature-rich, and incompatible, requiring forklift upgrades.

Nutanix took its entire SDDC feature set and ran it on top of external storage. It didn’t create a new file system, data management layer, API, etc.; it just added another storage type.

That simplicity adds business value. What makes it shocking, to me at least, is that they did it in a record amount of time with very few resources.

As a Nutanix customer, incremental new hardware capabilities do not create pools of incompatible infrastructure. It all works the same way; the different hardware just gives you different choices.

There is power and value in that. And that Nutanix could do that with two radically different kinds of storage – HCI and External Storage was incredible.

the nutanixist 09: why does Nutanix have the only SDDC and has rewritten the rules?

July 14, 2025 by kostadis roussos Leave a Comment

A while ago, someone asked the question: given that VMware and Nutanix have similar capabilities, what makes Nutanix the only real SDDC?

It was a great question. And to be quite honest, I was stumped. What was it? Was I being a fan of my employer, or was there something there? And I came up with the following answer –

It’s about the vision and the reality.

If you aim to assemble pieces of software, orchestrate them, and deal with their complexity, then you’re trading off hardware complexity for software complexity.

If your goal is a control plane that absorbs different hardware infrastructure, has a simple-to-deploy and operate model, and offers a consistent API and user experience, that’s the SDDC.

And yes – if you are willing to fight with the complexity of the VCF stack and work around its limitations, you can make it do incredible things.

But that wasn’t the SDDC that I envisaged.

That isn’t programmable, dynamic, flexible infrastructure.

Worse, VCF’s intrinsic limitations of the availability of the control plane make it a poor solution when the control’s availability defines the infrastructure’s availability.

SDDC is more than just a bag of features you can script together. It’s an operating model of infrastructure that is programmable and works. Nutanix delivers on that.

At VMware, I passionately argued for such a system. And ultimately, the company had other priorities.

VCF is on a long journey to get there.

Nutanix is on a long journey to add the features.

the nutanixist 19: the arrogance of the broadcom shift in cloud credits

Like this:

the nutanixist 18: relying on infrastructure instead of application specific availability

Like this:

the nutanixist 17: the consistent nutanix cloud platform

Like this:

the nutanixist 15: schroedinger’s VCF

Like this:

the nutanixist 14: the PSC, identity and architecture

Like this:

the nutaxinist 13: x86 virtualization may not be what you think it is, bare metal is roaring back and why you need a different platform like AHV

Like this:

the nutanixist 12: the deeply misunderstood SPOF and availability

Like this:

the nutanixist 11: Nutanix Cloud Native Architecture and NCM Disaggregation

Like this:

the nutanixist 10: the differences between Nutanix SDDC, VCF, and external storage

Like this:

the nutanixist 09: why does Nutanix have the only SDDC and has rewritten the rules?

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: