Architecturalist Papers

24 architecturalist papers: how to not engage the A-team.

February 18, 2020 by kostadis roussos Leave a Comment

There are many reasons to change jobs. Some are better than others. But the best is when your peers or your boss don’t engage you for your best work.

Let me start with a story. In 2009, I was a technical director at NetApp. At many other companies, this role is analogous to a senior technical leader with a pay grade that is equivalent to that of a director-level manager.

At the time, NetApp was engaged in a multiyear effort to converge the operating system of the company they had acquired Spinnaker and their platform OnTAP. After the first unsuccessful attempt with the product known as OnTAP GX, the technical and business leadership of the company rallied around a strategy that eventually became what is now known as OnTAP 8.0.

This effort required vast amounts of synchronization. At one point, I was the architect for the data protection portion of the business. The overall architect for the effort sent me an email with a detailed task breakdown of what the data protection team needed to accomplish over the next two years. As I looked at the list I had some concerns with some of the details and the overall general direction, and so I flippantly responded with a “this is so detailed that I’m not sure what value I’m going to add. Why don’t you just send it directly to the management team in the product managers.” And so the author of the email took my response and forwarded the email with his comment, “you’re right.”

After that, it took very little for me to want to leave NetApp.

Over the years I have wondered why this particular exchange was so critical in my leaving NetApp. NetApp was at the time treating me well. I would’ve made more money at NetApp than I did at Zynga. I had a five-week vacation at NetApp. I was well respected by the then CTO and chief science officer. My boss at the time and I had some differences of opinion, but those differences of opinion were resolvable. And the problem space remained interesting.

So why did I leave?

I left because at the end of the day that other architect did not want to engage with my best self. He was not interested in working with me to come up with the right answer. He just wanted me to do exactly as I was told and to take accountability for his decisions.

In short, what I heard him say is, “I don’t need you to think. I need you to do exactly as you’re told and to make sure that the things I need done are done.”

Over the years, I have seen this pattern play out again and again. Sometimes with me on receiving that message or the author of such a message. What I have concluded is that if you’re engaging someone to solve a critical business problem and you talk to them in a way that demonstrates you do not see value in their abilities, then you’re asking them to leave.

But there is another insidious problem that also can occur. If you’re trying to engage with another team where you happen to know a lot about how that team’s systems are built, it is tempting to bypass the current architect and just tell them to do this and this. The problem is that what you’re doing is not engaging the team to be able to solve your problems. And what this kind of communication does is discourage precisely the people you need. The people who want to have a scope to think and to imagine possible solutions. They decide to not want to work on your problems.

If the problem is small in scope, then this is not necessarily a bad thing. If the problem is significant in scope, this is a calamitous decision. You’ve traded off the delivery of a small set of critical capabilities at the expense of a deeper understanding of a hard, complex problem. You have reduced a team that could potentially add value by understanding the problem and thinking about it hard and long for a team that will do exactly as they’re told and no more.

“But that wasn’t the goal,” I have said on more than one occasion.

No, it wasn’t. But that is precisely what I have achieved in the past. Because the people who could think realized that there was room for them to think, only to do. So instead of looking at a problem and trying to solve it, they saw a list of activities that could be done by someone who couldn’t think as deeply as they could. At the end of the day knowledge workers have a tremendous amount of freedom to choose what problems they work on. They also have a tremendous amount of freedom to decide how long and how hard they want to work on an issue. The single most calamitous decision you can make as an architect is to engage with another architect by treating them as less than an equal.

Over the years, I have made this mistake. To those that I treated poorly, my loss is massive but nothing as compared to how poorly I treated you. All I can say is that life is about getting better and learning more new things. And maybe this goes a little distance as an apology.

23 architecturalist papers: latency kills

January 17, 2020 by kostadis roussos 2 Comments

While at NetApp, I saw the incredible effort that became known as ONTAP 8.0 and was part of the spinnaker acquisition.

From that experience, I learned a few seminal things that continue to resonate. The short version is that latency kills.

Let me start by saying, that the hard problem in storage is how to deliver low-latency and durability. Enterprise storage vendors earn their 70% gross margin because of the complexity in solving two issues that appear to conflict. The conflict is that durability requires a copy, and making a copy slows things down.

The solution was, and is, to use algorithms, in-memory data structures, and CPU cycles to deliver latency and durability.

When Spinnaker was acquired, there was a belief within the storage industry that single-socket performance had reached a tipping point, and that performance could only be improved if we threw more sockets at the problem.

And, in retrospect, they were right. Except, we collectively missed another trend. Although the single-thread performance was no longer going to double at the same rate, the performance of media was going to go through a discontinuity and radically improve its performance.

But at the time, this wasn’t obvious.

And so many folks concluded that you could only improve performance through scale-out architectures.

The problem with scale-out architectures is that although single node-latency can be as good as local latency, remote latency is worse than local latency.

And application developers prefer, for simplicity, to write code that assumes uniform latency of the infrastructure.

And so applications tend to be engineered for the worst-case latency.

And single-node systems were able to compete with clustered systems. As media got faster, and as single-node performance improved, application performance on non-scale-out architectures was always better.

In short, the scale-out architectures delivered higher throughput, but worse latency.

And it turns out that throughput workloads are not, generally, valuable.

And so scale-out for performance has it’s a niche, but it was not able to disrupt non-scale out architectures.

Over time, clustered storage systems added different value than performance, but the whole experience taught me that customers will always pay for better latency. And that if there is enough money to be made in the problem space, it will be solved in such a way to avoid applications from changing.

22 architecturalist papers: multi-tenancy and quotas

January 2, 2020 by kostadis roussos Leave a Comment

Over the last years, I have gotten into a series of protracted debates about multi-tenancy.

What I have begun to understand is that it is essential to define the objectives of multi-tenancy before one starts to talk about it.

And even before we get to that need to define what is multitenant.

Consider a piece of hardware, say a server with four sockets. An individual owns the server. Another individual owns the building in which the server resides.

In effect, when there are two actors Mary and Tom, that have access to a system, that system is said to be multitenant if Mary and Tom do not trust each other.

But how much do they trust each other? The trust goes to how much the system must protect Mary from Tom and vice versa. For example, suppose Mary trusts Tom. Then Mary doesn’t care that Tom has physical access to the hardware. And Mary takes no actions to protect her data or her applications running on that server. In effect, Mary and Tom are the same people; they have different roles.

But suppose Mary trusts Tom, but Tom doesn’t want to damage Mary’s system accidentally. Identities and roles play a factor. What Mary would like to do is have a role that Tom can use that allows him to do the things he needs to do to Mary’s server and no more.

And so this is where things get complicated. There are two basic approaches; the first is to bake into the system the set of controls that Tom has access to and to use some role-based access system integrated with some identity system that determines what Tom can do. The problem with such an approach is that if Tom needs to do something that is not in the system, he has no way to do it and has to ask Mary. Now, if Mary is okay with that, all good, however, Mary may not want to do the task and may wish to allow Tom to do the job. But if the system has no way for her to do that, then she is forced to give him access to more controls than he is capable of using.

The second approach is to use layering. You create a net new interface that interacts with Mary system through some APIs, and that net new interface is what Tom uses. Thus when Mary wants to enable Tom to do something new, she will, Tom, can extend his tool to do that. The problem with this approach is that Tom now has access to a whole bunch of operations he shouldn’t have. The only thing preventing Tom from using those operations is his adherence to procedure and the fact that at the end of the day, Tom isn’t malicious. He’s a good guy.

My observation is that approach one doesn’t work. The reason it doesn’t work is the set of operations that Tom needs to perform is ever-evolving. Worse, the collection of activities that Mary wishes Tom to do is ever-expanding. And as a result, they end up using the second approach.

Okay, so what?

The problem is that too many people attempt to build the first model. For example, suppose I have an interface for interacting with the system. That interface allows me to create objects delete objects, or modify objects. Then what happens is that somebody decides that the hierarchy of those objects should reflect some authorization scheme. Then what happens is that Tom and Mary can’t do their jobs because the hierarchy or the complexity of configuring and setting up the hierarchy and setting up authorization is not expressible by the system. In effect, the hierarchy and system that allows you to create edit and manipulate objects for one task is not the same hierarchy you would use for another.

And so, ultimately, what you do is you create a tool that has a specific set of operations that Tom needs. Mary and Tom configure the tool so that it only does what it needs to do.

But, the advocates of the first system point out that the second approach is less secure. And they are right. Or I’ll take them at their word.

They ask, what if Coke and Pepsi want to run their software on the same physical servers. I always found that to be an absurd question. Even if we could assume that the system was entirely secure, there is human error. I thought that Coke and Pepsi would always buy their servers. What is interesting is that the market seems to be doing that even in the public cloud. The Nitro hardware that Amazon has produced mainly provides physical instances on a shared server. And this was before we discovered that there are architectural holes in our systems that allow data to leak between programs running on the same physical server that belong to different tenants.

And so, my assumption has always been that if you care about security, air gaps are about the only thing you should trust.

What does this mean?

Consider the server. With no software, it can do anything you could imagine. The minute you start running software, the set of things you can do becomes increasingly more limited. It turns out that there is a whole slew of user interfaces that are a lot more useful than just starting with the hardware. Over time, a set of interfaces for using a system and controlling access to that system have emerged. And we have figured out over many years how to make them address both Tom and Mary’s needs. A great example is the use of root and less privileged users on most operating systems.

A user interface for the system is handy to both Mary and Tom is incredibly powerful. And therefore, whenever a new way of interacting with servers emerges, there is a temptation to try to figure out what the boundary between the two tenants should be. The reality is that such interfaces developed after years of hard work and experience in operational practicality. Therefore in a new system, you are most likely to draw the boundary in the wrong place. And thus, in my mind, how you access a system should be independent of how you control access.

Okay, I’m dangerously close to talking about security, and when it comes to security, I know that I know nothing.

The problem is that even if you don’t care about security, another critical use case of multi-tenancy is to reduce the cost for the infrastructure provider, Indigo. What Indigo wants to be able to do is assign quotas to Mary and Jane and Tom. And what they want is to ensure that Mary, Jane, and Tom don’t ever exceed their quotas.

Amazon’s solution was to create an infinite supply of servers and bill Mary Jane and Tom for their usage.

The limitation of such a system is that you can only buy the set of servers that Amazon has decided to make available. The other limitation is that it assumes an infinite infrastructure.

If, however, Indigo does not have access to an infinite infrastructure or it’s inappropriate for their use case, what to do?

In my opinion, they should choose approach number two. What does this mean? There are a set of objects that Mary Jane and Tom use to do their jobs. Indigo has a set of quotas that they assigned to Mary, Jane, and Tom. Mary, Tom, and Jane’s need to refer to the quotas and use them transparently. And so there is a temptation to encode them in the objects. But instead of having quota enforcement done at every access to the objects, it should be done lazily unless you exceed some threshold.

And if you look at what Amazon does, they do the same thing.

If you want to use one more server, they will give it to you. If you’re going to employ 10,000 more servers, that involves a phone call. They have their quotas that are lazily enforced and, at some point in time, block access.

In effect, Amazon has decoupled secure isolation from quota enforcement.

And so, when we talk about multi-tenancy, what we need to do is ask, are we trying to solve for secure isolation, or are we trying to solve for quota enforcement? The requirements for security depend on the customer, the trust, the legal requirements, etc. How you do quotas is independent of all of those security restrictions and should be treated as such.

21 architecturalist papers: always be right, right now.

January 1, 2020 by kostadis roussos 1 Comment

One of the particular challenges of being a software architect is that the average manager and vice-president of engineering assume you are incapable of adding value, right now.

Engineering managers have releases they need to deliver to now. They have engineers who have problems now. The managers have budget squabbles now. The teams have debates with product management now.

An architect has visions of the future, and those visions are often years away from delivery, and worse, even more years away from solving any of the immediate problems engineering management has.

And so the VP of engineering begins to see the architect as a distraction. The architect distracts in two ways. The first is that they are always complaining that the team is not investing in the future. The second is that the things they want are unimplementable. And worse, the architect is typically unwilling or unable to go figure anything out. Powerpoint gets produced, confluence pages get edited, wiki pages get updated, and in some cases, small prototypes get written, but no useful code gets changed.

And thus, the VP tries to manage that distraction. There are two fundamental approaches. The first is to keep them away from anything that matters and fire them at some opportune moment. The second is to focus them on a small problem. A small problem keeps the architect from distracting other people and makes it easy to measure their delivery of value. In other words, force the powerpoint jockey to write code, and either they figure out how to write code, or they get fired.

And you know what, the VP of engineering is right.

The thing about being a software architect is that you always have to be right. What I mean is that you are guiding a team, and the direction has to be correct. If it’s not correct, then the team is heading towards a catastrophe. You can change course over time, but it always has to be the right course.

In addition to always being right, you also have to be sufficiently high-level, so everyone is doing the right thing. Pulling this off is another tricky thing to master. If the correct answer is not inclusive of the entire organization, then somebody is doing something wrong. Furthermore, if it’s too low-level, then there are no white-spaces for engineers to innovate.

Another customer for the long term architecture is the CEO/GM and Product Management team. They have to see enough value that they can talk about it to their customers.

In short, you need to be right, and high-level enough that you can’t be wrong, but if that’s where you end, you fail.

The software architect must also add value right now. What does that mean? It means if the head of product management chooses to fund X features, all X are stepping stones to your architectural vision. If the engineers have to design something, it should be evident from the high-level architecture what they should do. If the managers have to figure out how to trade off long term vs. short-term execution, they should make that decision in the context of the global vision.

But it’s more than that. Managers operate in terms of things that need to get done with some set of resources. And so it’s vital that the architecture, be broken down into discrete tasks.

And so my mantra,

I must always be right.
I must be sufficiently high-level to be always right.
I must be right, right now.

19 architecturalist papers: why doing the right thing matters, a tale of Facebook and charities.

October 1, 2019 by kostadis roussos Leave a Comment

When I was at Zynga, Mark Pincus and the executive team had this brilliant idea on how to raise money for charity, selling virtual goods.

The idea was pretty simple, they had a virtual good, that virtual good was relevant to the game, and if you used real money, we gave a portion of the money to some charity.

This technique generated a lot of money for charities. And, to be fair, it was great for Zynga as well. Even if we did not keep the money, getting people to spend on a free game was hard, but once you got them to pay, it was straightforward to get them to pay more.

But we had to stop.

Why?

Facebook Credits.

See Facebook and Zynga signed a deal to have Zynga use Facebook Credits instead of real dollars. Feels a lot like Libra, but I am bitter. And because we used Facebook Credits, we needed to get them to do some back-office paperwork.

So I got the foundation to agree to do anything and everything that Facebook needed.

And they said, no.

I said that I would write a blog raking them over the coals for not prioritizing incremental revenue over doing good.

And they said, “Do it, we do not care.”

So I worked with our MarComm team to put something together. And we had layoffs, and our business was imploding and they asked me to not post it. They had so many other fires to put out, that this felt over the top.

And I agreed.

And I was wrong to agree.

Because, since then, no one has done this. Not one single freemium game has done this. Nada. Not one.

At Zynga, we pioneered a lot of the pay-to-play game mechanics. But Facebook’s payment team of the time pioneered the idea that charity was not a business priority.

It’s my fault for not having a spine six years ago. I wonder if I wrote that blog, things would be different. How many people would be alive if I had just done what was right?

When Facebook started it’s “charitable” giving on their timeline, I puked. I got so angry that I donated 1000$ to Mother Jones because they were the only publication that was willing to call out Facebook. Heck, I offered to give another 500$ as a matching donation. No one from Mother Jones asked, I just did it. I went on twitter and said if people sent me a note with a proof of a donation, I would donate 500$ to Mother Jones; I was that angry. And while we are here, give to Mother Jones, they are an excellent liberal paper that fights the power.

I screwed up.

So why am I writing now? Because a friend of mine saw a freemium game that did something for charity, and it made me happy. It meant that some games were trying to do the right thing again.

The Elder Scrolls Online

@TESOnline

Thousands of Dragons have been slain since Elsweyr released – but now you can continue defeating them for a good cause! Raise money for real-world charities that support pets in need with each Dragon kill in #ESO. beth.games/2oVobFW #SlayDragonsSaveCats

And I also wanted to remind everyone that there are consequences to not doing the right thing. I get angry when I see folks ask how do we incentivize tech companies to do the right thing. We should be asking them what kind of moral bankruptcy exists that says the right thing to do isn’t something you do? But I didn’t do the right thing. And the industry is different as a result. And worse, a lot of people are not better off because I didn’t bother to write that blog.

As software architects, we make choices, and we are accountable for those choices.

18 architecturalist papers: As was foretold.

June 13, 2019 by kostadis roussos Leave a Comment

Recently, a coworker of mine approached me and said, “All you have to do is figure out what the final right answer is.”

And I stared at him, and I was surprised at how ridiculous the comment sounded.

I turned to him and I said, “anyone can do that!”

In fact, it was at that point in time that I realized how little I understood what the nature of system architecture is. Figuring out how to build the correct answer is the least interesting part. Figuring out how to build the next part, while giving you optionality to build the correct solution later is the real job.

The job is to see what is being done today, understand where you want to go, and course correct efforts that are going in the wrong direction.

System architecture lives between the now, and perfect future, an area of complete grey. And the challenge is in that gray area; there are no correct answers.

The Minbari Grey Council is a perfect metaphor. Their job was to stand before the now, and the future that they knew was coming and making the hard choices. That space between the now and the future was unclear and uncertain. And they chose, when confronted with the future, to not make a choice.

The job of the system architect, is to know when the right thing to do is to break the Grey Council and when it is not.

The challenge for the system architect is that when you see a project that is going off the rails, you need to understand how much you need to get involved. Is this a project that if it succeeds will take the company in the new wrong direction? Or is this an effort that will open new opportunities that currently don’t exist? The height of hubris is to assume you know the answer. But to do nothing, is to say yes to everything.

Ultimately system architecture is a reflection of the taste of the architect. And like fake turning machines, not all taste is correct all problems.

The challenge, is to understand when you were taste is getting in the way of new opportunity and when your taste is telling you that something is going wrong.

17 architecturalist papers: go fast and build things

May 17, 2019 by kostadis roussos Leave a Comment

Over the last ten years, I have struggled with a Facebook motto: Go Fast and Break Things.

It reminds me too much of the Great Gatsby quote:

They were careless people, Tom and Daisy—they smashed up things and . . . then retreated back into their money . . . and let other people clean up the mess they had made.

The process “go fast and break things” does not describe a method for creating value; it represents a process for an adrenaline high, an excuse to do whatever you want.

Let me ground this into a real-world example. Suppose I have a system, and I want to radically re-imagine what the system can do.

There are two paths.

The first is to create a brand new system, that is entirely incompatible and breaks everything. What do I mean by everything? Any system is part of an ecosystem of tools and operations, and people that interact with the system. When you break the system, you are breaking that web of relationships and interactions. The net outcome is a radical change of that web.

So why do it? Well, because the cost of that change is borne by the people who use the system, not by the people who built the system. The more powerful the market position, the easier it is for the entrenched system to break things.

For example, Facebook used to break APIs all of the time. And that was okay for Facebook, because their captive audience, had no choice but to change to use the new APIs. The consumer-owned the full cost of the disruption.

The second approach is to evolve the system in a way that doesn’t break anything. In this model, instead of forcing the world to adapt to your system, you figure out how to integrate into their world.

Intel is an excellent poster-child for both of those. The first time was when they delivered the Pentium. At the time of the 486, there was a bunch of RISC processors like MIPS, Alpha, SPARC, that those of us in the field thought had a legitimate chance of dethroning Intel because the CISC core was intrinsically slower than RISC systems. But getting off of Intel meant breaking things. Instead, Intel did something that was a surprise to the casual followers of the industry, they embedded a RISC-like core into their processor, preserving the CISC instructions. By choosing not to break things, they won the CPU core wars.

Ironically, a little later, Intel pursued a foolish – in retrospect – strategy of the Itanium. The thing about the Itanium was that at the time there was no 64 bit Intel processor. Intel planned to move away from the x86 instruction set to go to a new kind of instruction set. The switch was highly disruptive. And AMD delivered what the market wanted, a 64-bit x86 processor and achieved a huge market opportunity.

In both cases, the winner deliberately chose to break as little as possible and add value in a way that did not disrupt the consumers of the technology.

To me, that is the best kind of engineering.

16 architecturalist papers: you work for the future GM

March 23, 2019 by kostadis roussos Leave a Comment

One of the most challenging parts of the job of strategic software architect is that your job is to think about the future, and the GM’s job is to think about the present. And what’s worse, your planning horizon is typically beyond the planning horizon of the current GM.

Why is that a problem? Because we hate our future selves.

There is a lot of behavioral research that suggests we hate our future selves. We will do things that optimize for current happiness vs future happiness. Explains so many things about our choices.

This, for example, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5611653/ or a more accessible and possibly more useful version here: https://www.anderson.ucla.edu/faculty-and-research/anderson-review/future-self-health

And this leads to my favorite story about the conflict between GM’s and their Strategic Software Architects.

In 2006 I sat in a room with Guy Churchward while at NetApp.

He was the new GM, and it was our first 1:1. And I told him: Look, Guy, you’re going to be gone in 18 months. And I’m going to be here for 3-5. My job is to make sure you don’t screw this technology up. If you do something I think is stupid, I will assure you it will not happen. Because I will make sure the dumbest, worst engineers are working on it. If you want to do something smart, I will make sure it succeeds. So you need to get me on board. And even if you get the smart guys to work on it, I will undermine their success, because it’s my job to make sure there is a technology that the next guy sitting across me has to go to market with.

It was a breathtakingly arrogant comment. But it was a true comment. Guy looked at me wondering if Technical Directors at NetApp could be fired.

Unfortunately for NetApp, he left way too soon from his job. And I left shortly thereafter.

He left not because he failed, but the nature of the GM job tenure is less than the tenure of the strategic software architect, and that is by design. We want the GM to be more short term focused and we want the strategic software architect to take the longer view.

The challenge is that we are working for the next GM. And the current GM is not interested in helping the future GM who is probably going to be somebody different.

So the architect – in some sense – is that person who gets in the way of the current GM plan’s to help the future GM, someone the current GM hates (even if it’s him in 2 years).

So then what?

As architects, we are constantly fighting our current boss.

How does this manifest itself concretely:

If I am a GM and I have a product, to hit my numbers, I only need junior engineers. But if none of those turn into senior engineers in 1-2 years then product will have problems in 4. And if none of them turn into architects in 3, the product is dead in 7.

We can argue about the dates and ranges but the story holds true. If you don’t have senior technologists thinking about the future, then you miss the future.

So now what?

As strategic software architects, our job is to make the current GM think that the future GM is working for him.

Here’s how I always try to do that.

1. Make sure that the Strategic Software Architecture is something that the current GM will profit from. A GM has to make sales to companies who want to believe the company has a future, he has to attract technologists to build the current stuff, having a compelling technology strategy is very useful for both.

2. Make sure that the Strategic Software Architecture adds value all of the time to the current GM. This means that future pieces deliver value now.

3. Hire for the people you need to build the future, and have them build the present. This is this weird thing. You bring in someone to build your future, and have them work on the immediate problem. While you are doing that you wonder if you are making a mistake. I think of it as a twofer. The new gal learns something new AND gains credibility AND will build the future thing better.

4. Be flexible in planning. Every new GM will have new priorities, so be willing to change what you recommend to be built.

15 architecturalist papers: The Turing Machine Fallacy

February 17, 2019 by kostadis roussos Leave a Comment

One of the most enduring mistakes software architects make is what I call the Turing Machine Fallacy. The argument goes like this.

1. My system solves this fantastic class of problems

2. This system will address all problems.

3. Therefore, there is no room for any other system

The most recent example of this fallacy was the cloud. Four years ago, when I joined VMware, everyone assumed that the public cloud was the future. The assumption was that a small set of public cloud providers would provide the infrastructure that everyone would consume. That computing infrastructure was fundamentally undifferentiated and, therefore, not something that would be worth investing in.

I didn’t, don’t, and never will agree.

I believed when VMware’s stock price was in the mid-’50s that this was a ridiculous proposition.

In short, my point is the following. If you believe that a single computing infrastructure will meet all computing needs, you also believe that all software will run on a single computing platform and that that computing platform is X.

We have a name for X, it’s called a Turing Machine, and last I checked, the guys selling paper were not making a lot of money.

I believe that the minute the industry has coalesced around yet another fake Turing Machine, I need to start looking for the thing that will replace it.

But why do I believe this?

Because fundamentally, the software is an approximation of the real world. The software is a model of the real world. It is not the real world. And the real-world changes. And when the world changes, the software approximations become increasingly ill-fitting until they no longer fit. And changing software is very, very hard.

Changing software is so hard that new software fits into the gaps between the current software and the real world.

The strategic system software architect’s role is to recognize that your job is to see where the approximations are ill-fitting and cause investment to happen in solutions that will fit into those areas.

And those investments are software that is – by definition – different than the current winning system architecture. And those investments will drive hardware investments to support that software. And the hardware architecture, which is, in turn, an approximation of reality, will change. And as the hardware architecture changes to adapt to the change in software architecture, the winning system stops being the final answer to software system architecture.

We realize that the system architecture is not a universal Turing Machine but a computer.

Hardware evolves to support software, which is continuously evolving to support a changing world. Performance, form factor, power consumption, and legal requirements are in constant flux, and therefore the needs change as well.

And so to end on a proof point, in 2015, everyone assumed that Amazon would own The Cloud. And yet, here we are, and what is clear is that there will be a plethora of clouds – private, public, authoritarian, IoT, etc. Each with their optimizations that deal with their specific requirements.

The universal Turing Machine is an excellent mathematical abstraction, but there is no such thing in the real world where I live and work.

14 architecturalist papers: the imaginary architect

February 7, 2019 by kostadis roussos 4 Comments

One of the trickiest parts of the job is how to start. If it’s a brand new system, at a brand new company, I am not sure I have any useful advice, and more practically, that’s not the kind of role this blog series is trying to describe.

After all multi-product system architecture of a single product with a single engineer is putting the cart before the horse.

Over the years, this process of how to become productive has tortured me through a series of jobs. In fact, in 2009, I almost got pushed out of Zynga because I hadn’t figured it out. And in 2008 I got my worst professional review because I hadn’t figured it out.

The crux of the problem is that when someone hires a very senior technical person, what they are hiring is an entire leadership org chart. They don’t realize that at the time. Worse, they don’t even realize how many new people they will have to hire. And worse the woman being hired has no clue as to how many people she will have to hire or fire.

At Zynga we had a high infant mortality rate with senior folks who would quit the company relatively quickly because we had a really bad on-boarding process. We paid them like they were senior, but they didn’t operate as senior folks because they lacked context and we didn’t help them get the context.

When I arrived at Zynga, Michael Luxton whom I mentored in the past, helped me. He basically told me the following: On the day I arrive I am an MTS 2. And should be given MTS 2 tasks. In about a year, I will be doing the job he hired me to do. It turned out to be a little bit faster, not because I was particularly capable, but because circumstances and luck made it go faster.

It was a bitter pill. And I almost didn’t swallow it. In fact, I was so frustrated that I almost walked out of the company one fine evening, frustrated that I was failing. But another great engineer talked me out of it.

At Juniper, and later at VMware, my bosses who were great VP’s of engineering provided a lot of structure in my onboarding and basically followed the same process.

But before I get to the process, let’s talk a little bit about what the problem is. The problem is that you don’t know where a company is on the technology curves. Is it a bleeding edge company that needs you to go invent something radically new? Or is it a laggard that needs you to fill feature gaps? Can your team write code but can’t architect? Or can it architect but not write code?

Put pedantically: can they write a block of code correctly? Can they write a function correctly? A module? A system? A product? Depending on where they are, the problem is different.

Is the problem not the code being written, and the architectures being proposed the release process? The build system? The schedules?

Is the problem the set of features the product team wants?

Does the team have the people who can build the features the product team wants? If you want to build a distributed system where you need world-class distributed systems engineers, and you don’t have them, you have the wrong team or wrong product ideda.

And they only way to figure that out is to get down and dirty.

So here’s the process I used for myself and I use for people who join the team. This is a process that my bosses codified when I joined VMware and Juniper and Michael Luxton intuited and mentored me through.

An architect is an imaginary architect – meaning she’s done nothing for you. And an imaginary architect can’t possibly be trusted to do anything. So step one is to make the imaginary architect do something that is not imaginary, for example, deliver code. The next step is to deliver a feature. The next step is to architect something. The next step is to sell something big they want to build. And to remind them every step of the way that they are imaginary until they accomplish all of the above.

In general, my perspective is every new architect should spend a month fixing bugs in the core product they are working. They should fix a bug, check it in. Ideally several bugs. They should attend the scrum’s, the etc. And they shouldn’t pontificate but they should prove to their team that they can fix the bugs and do the work that an engineer can do.

In fact at VMware, I only started to be taken seriously when I showed folks I could use a debugger. I remember one of the architects looking at me: WOW you know how to use a debugger. And I almost exploded in frustration because of course I know how to use one. But then realized that maybe if I had done that on day one, I could have saved myself some grief.

Once the architect has crossed that threshold, then and only then can we expand their scope.

Meanwhile we are feeding her the fire-hose of everything they need to do.

Let’s get even more granular.

After the architect has been hired, settled into their desk, you tell them: you don’t have any credibility with anyone on my team. Worse, you have anti-credibility because it means someone who thought they had the job no longer has it.

To get that credibility you need to go fix some bugs on product X, Y, Z. You need to own a feature in that product. You need to prove to people you can ship shit.

Until she does that people won’t take her seriously.

She need to demonstrate you understand more about the system than they do. In pedantic detail.

The next step in my mentoring process is to tell the imaginary architect that she needs to get the team to buy into building something new. I tell her, “I don’t care what it is.” As far as I am concerned, the technical team must want to build the next thing you are selling.

Thirdly you need to sell to product management the thing you are building.

And if they can’t sell to product management, tell them “I won’t overrule product management .” And if they say: What if I can’t? Like George Washington told Alexander Hamilton the only correct answer response: Well I suppose we’ll have to find an architect who can. Go figure it out!

And at that point, you no longer have an imaginary architect, you have a real one. And guess what during that entire process, the architect has been delivering value, gaining credibility with the team, and learning the corporate values and when they tell the team go left, the team is willing to follow them.

24 architecturalist papers: how to not engage the A-team.

Like this:

23 architecturalist papers: latency kills

Like this:

22 architecturalist papers: multi-tenancy and quotas

Like this:

21 architecturalist papers: always be right, right now.

Like this:

19 architecturalist papers: why doing the right thing matters, a tale of Facebook and charities.

Like this:

18 architecturalist papers: As was foretold.

Like this:

17 architecturalist papers: go fast and build things

Like this:

16 architecturalist papers: you work for the future GM

Like this:

15 architecturalist papers: The Turing Machine Fallacy

Like this:

14 architecturalist papers: the imaginary architect

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: