wrong tool

You are finite. Zathras is finite. This is wrong tool.

  • Email
  • LinkedIn
  • RSS
  • Twitter

Powered by Genesis

29 architecturalist papers: running out of time

September 12, 2020 by kostadis roussos Leave a Comment

Photo by Robert Hrovat on Unsplash

Several years ago, in a post on Quora, I wrote that the only thing I feared as a software engineer was running out of money.

It was an odd realization on my part, and it has been the central guiding principle of my approach to software architecture.

So, I think it’s worth expanding on why I feel that way and how I use it to drive decision-making. At NetApp and Zynga, I had seen how financial decisions killed technology. I spent four years of my life working on a product called the net cash. The last thing I built was a multithreaded system that offered unique safety properties. It was the most intellectually satisfying thing I ever made. And it sits in a code repository collecting dust. Here’s the patent https://patents.justia.com/patent/7373640 for those that care.

At Zynga, I watched how a brilliant team that had built up a brilliant set of tools for operationalizing private and public clouds got destroyed in the space of four months because we couldn’t pay the bills.

My one personal regret is that I did not appreciate how much value there was in offering servers on demand.

What the experience taught me is that unless you are making money every quarter, you are going out of business. Any plan has to be making money now, or the organization will get destroyed.

So? I take a very pragmatic you towards software architecture. There are two critical questions.

1. What is the right answer to the problem?

As computer science is a science, we can reason about correctness. For a given problem and a given a set of constraints on computation and a bunch of desired outcomes, it is possible to articulate a correct answer that is independent of staffing or resourcing or timelines. Without knowledge of the right answer, it is impossible to know whether the problem has a solution.

So what?

Without the knowledge of a correct answer, it is elementary to spend a lot of time building wrong things. I am not talking about technical debt, but something so wrong that the only path forward is a complete and utter rewrite.

Let me give an example. Suppose you have a system where two entities must synchronize before the system converges to correct behavior. If the system assumed a human operator would observe and take action if the convergence doesn’t happen, and you wanted it to work without a human being, you need a new system.

Another example I like to use is the fallacy that it is possible to build a distributed system from a monolithic system by just adding RPC’s. The reality is that because of how RPC’s are different from a function call in a shared memory address. This approach doesn’t work. When failure modes get introduced in new places or timings of functions change, the system starts to break in all sorts of wonderfully wondrous ways. In my career, I have had to stop four different projects at four companies where this was the proposed direction.

Understanding whether the problem you’re trying to solve has a correct answer and that the proposal you’re making is correct is a critical element to any architecture.

In many ways, software architecture is proof of overall system correctness. For those who are knowledgeable in the field and understand the system well, it is possible to understand by reading the architecture spec whether the system is or is not correct.

So this brings me to the second important question.

2. What is the right answer, right now?

Money is what pays the bills. Often, I have found myself in debates over “the long-term answer” versus the “short term answer.” On the one hand are architects who are upset that we aren’t taking shortcuts to build the correct answer. On the other hand, some are frustrated that there is the immediate business value that could be derived if we were willing to make some sensible shortcuts.

I believe that any architecture that cannot deliver value right now is of no value. But wait, I hear you say you can’t deliver something in six months if it takes two years to build. True. But if you know what the long-term answer is, you can make better short-term trade-offs that move you along the long-term trajectory.

In effect, I believe that knowing what the correct answer is, allows you to evaluate whether the trade-offs of the short term answer are worth it.

Furthermore, understanding what the correct answer is, allows you to look at a short term answer and determine what set of use cases it will work for and what set of use cases it will not.

Finally, that understanding allows you to determine the business value of the short term answer. For example, if the short term answer is for 70% of the use cases, and that represents 97% of the revenue, then it may be okay. However, if it addresses 97% of the revenue that is shrinking rapidly and 0% of the revenue that is exploding, it may be a waste of time.

In conclusion, the notion that there is a trade-off between the long-term and the short-term is a fallacy. The long-term is always changing, and the short-term is what pays the bills, and software is a very malleable substance. Understanding the long-term allows you to make the right trade-offs in the short term such that the correct long-term answer is always within reach and that every step of the way, the exchanges are being made explicitly and not blindly. And software can always be changed. The question is how much.

Share this:

  • Email a link to a friend (Opens in new window) Email
  • Share on Reddit (Opens in new window) Reddit
  • Share on X (Opens in new window) X
  • Share on Tumblr (Opens in new window) Tumblr
  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
  • Share on WhatsApp (Opens in new window) WhatsApp

Like this:

Like Loading…

Filed Under: Architecturalist Papers

28 architecturalist papers: titles and money matter

August 30, 2020 by kostadis roussos Leave a Comment

In my professional career, one thing that troubled me is the statement from folks that “titles and money” don’t matter. That if you do good work, both will come, and that fixating on them was a bad thing.

My professional career has taught me the exact opposite – titles matter a lot and money matters. The only thing I learned over the years is the – how much – depends.

Let me offer a personal perspective.

[edit – forgot one more piece of the narrative]

In 2005, I was ready to quit NetApp. As a Senior Engineer at NetApp, I was never in the room where it happened. And so all of my ideas went nowhere. I had good intuition on what the storage management team needed to do, but my role made it impossible to move the ball forward.

After my wedding, and realizing that I could spend the next few years frustrated, I called an old friend and former boss and said – “I’m out because I can’t impact the company the way I want to.”

He took it upon himself to get me the role and title I wanted. And I spent the next four years at NetApp doing some amazing stuff.

Until the accumulated frustration and powerlessness to affect product strategy pushed me out again. I didn’t have the role and title to get myself heard about what the company wanted to do.

In 2009, I was looking for a job. I thought that NetApp’s strategy was wrong. I will observe that ten years later, the company seems to be pursuing a better strategy than what they were doing then.

When I went looking for a job, I made the decision that money be damned, I wasn’t taking a step back in the role. Companies that could not offer me a similar position were just not interesting. Zynga’s org structure at the time, had precisely the work I wanted, that of a CTO of a team with a large amount of operational freedom.

The cash money was lousy, and the equity was good.

In 2013, after Zynga changed its CEO to Don Mattrick, the company had to choose who to make the CTO. There were three excellent choices, and Don made a fantastic choice in picking Nick Tornow. I was disappointed it wasn’t me, and at the same time, I know Nick was a better choice. In fact, he is such a better choice, I spent several years trying to recruit him.

After that, I quit.

Why? Because the title mattered. Why did it matter? It mattered because it was a recognition and validation of the blood sweat toil and tears I had put into the company. And the successes I had been part of. It was a public statement of my accomplishments that my new boss had to acknowledge. When he didn’t, it was a personal statement about me. He disagreed with my contributions, and more importantly, didn’t see me the way I saw myself.

The money wasn’t that important. In fact, if I look back at the money I walked away from at Zynga, it was more than the money I made after Zynga until I lucked into a job at VMware.

After Zynga – I had to find a new job.

At the time, I was stuck between a rock and a hard place. Thanks to Zynga, I had made some money, and for a large but not insurmountable amount more, I could have had enough of a nest egg, that when I turned 55-ish, I could think of retiring as long as I didn’t dip into my savings.

The painful experience of that time was that I mismanaged my career. I came up with this cutesy title, “Chief Engineer,” instead of a title like GM or VP of Engineering. As a result, when I went looking for a job, and recruiters applied their ML algorithms to look for people, no one looked at me. I spent more time explaining why I had that title than what I did. And when I explained to them that I was a GM, but had no GM title, you could imagine the credibility gap.

I went looking.

And I had a CTO dream job with an old friend, but the guaranteed money of Juniper mattered more. Because of what was a non-trivial amount of money, my ability to fund my retirement, and my kid’s college education – it meant I had to take the job that paid more.

The role at Juniper was weird, I basically was working for a GM whose job it was turn around a company. I had no experience in security or networking, but I knew a lot about how to motivate teams and build software. The thesis was I would help with the team and software, and we would surround me with networking and security experts.

The money was excellent. And it solved a particular personal problem. If I could stick it out for three years.

Long story short, thanks to a hedge fund, a new CEO was hired, and then the new CEO made a series of unbelievably boneheaded decisions that lead to my layoff in about 1 year.

Because of the way the deal was structured, the three years of money was obtained in one year.

Having gotten that money, I was interested in what kind of job and impact and title.

In 2015 – VMware and nimble made competing offers. Nimble’s VP of Engineering created a great offer.  But the then GM at VMware, who was looking to hire me, made an excellent point – that being a VP of engineering matters. And that having that title on my resume from a company like VMware mattered.

He pointed out that as a VP, you have access to information, and you are at tables that you are not invited to as a non-VP.  And lastly, he pointed out how it will help my next job.

And so when I weighed the opportunity, I chose VMware. I believed at VMware I could do great things. But also I think I could have done great things at Nimble. The title VMware offered made it clear how much more the scope of impact was at VMware.

What I have learned from my experience and continue to learn from that experience is that titles matter. Maybe not for your current job, but for the next one. And money matters, because it’s how you choose what to do next.

And most importantly of all, titles are given to people who make certain decisions. And those decisions drive strategy.

Every career decision is very personal and context-dependent. There are times when I felt that someone made a horrible decision to pursue a title. But I had no idea what is going on with their lives, and so I respected their decision even though I didn’t understand it. My lack of comprehension was more about me not being them.

Share this:

  • Email a link to a friend (Opens in new window) Email
  • Share on Reddit (Opens in new window) Reddit
  • Share on X (Opens in new window) X
  • Share on Tumblr (Opens in new window) Tumblr
  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
  • Share on WhatsApp (Opens in new window) WhatsApp

Like this:

Like Loading…

Filed Under: Architecturalist Papers

27 architecturalist papers: the four laws of infrastructure or why private clouds exist

August 24, 2020 by kostadis roussos Leave a Comment

 

inaki-del-olmo-NIJuEQw0RKg-unsplashWhen I joined VMware, friends asked why? Wasn’t the future public cloud?

And I rejected that hypothesis.

The list below is an abridged version of a lot of deep thinking. These points have served me well over the years as I think about infrastructure.

One: Capex is more efficient spend than op-ex for things that can survive for more than 3 years

It’s taxed more efficiently, frees up cashflow, etc, etc. As power moves from gas/to solar, and spinning media runs on silicon, hardware can last about 1 decade. The systems don’t fail and the cost of running goes to 0.

If you know your capacity, you can literally buy once capitalize for 3 years, and run it for free for 7

Mainframes continue to survive for that reason.

Two: The Turing machine can run any program, and yet we have all kinds of hardware.

Why?

For any given workload, there is optimal hardware that will deliver the desired performance/reliability at an optimal cost.

Cloud doesn’t offer that hardware.

Three: Any prototype of a new system is best done in a typeless scripting language, any understood system is best done in a typed compiled language leveraging hardware

Every python project ever written that required performance or reliability had to be re-written in C/C++

Cloud optimizes for agility, not optimal execution.

Four: Computer systems that are inherently reliable are cheaper to operate than computer systems that are not

The single most important variable in making a system unreliable is how often it changes. A system that never changes, never breaks. The more people that touch a system, the more unreliable it becomes the more costly it is to operate.

Cloud infrastructure is always in-flux, thefore less reliable.

 

Share this:

  • Email a link to a friend (Opens in new window) Email
  • Share on Reddit (Opens in new window) Reddit
  • Share on X (Opens in new window) X
  • Share on Tumblr (Opens in new window) Tumblr
  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
  • Share on WhatsApp (Opens in new window) WhatsApp

Like this:

Like Loading…

Filed Under: Architecturalist Papers

Those AI classes turned out to be useful

July 29, 2020 by kostadis roussos Leave a Comment

I sat in a meeting the other day where someone said, “well, computer scientists are obsessed with determinism and refuse to recognize non-determinism.”

And it got me thinking, again, about something I wrote about many, many years ago (2012).

What I wrote was that the history of thought was about moving from a universe where everything was understandable to a world where everything could not be understood. And that article can be found here https://www.forbes.com/sites/quora/2012/06/04/why-would-anyone-want-to-work-for-zynga/#7846a1cb658d. There are a lot of things that I wrote that are embarrassing. I was naive. I was optimistic. And yet, I was right in echoing the thoughts of much smarter people.

Later on, I synced up with an old friend, and we wrote an essay on the limitations of human understanding. That, homo sapiens are inherently limited in their ability to understand the universe. And that limitation makes revelation, the intuition of truth without the ability to prove the truth, not a failure of reason, but an indication of its limits.

And so ten years later, I found myself giving a talk to a bunch of engineers about desired state systems.

The core of the discussion was that planning algorithms that attempted to search a state-space exhaustively were inherently flawed, if the system was exposed to unknown external inputs. When trying to change the state of such a system, if you assume you know how to go from the current state to the desired state, you are wrong because the current state is invalid at the time you made the plan.

30 years ago, I remember sitting in a class learning about planning, and recent research on machine learning, POMDP, and thinking what does this have to do with anything.

It turns out, everything.

Share this:

  • Email a link to a friend (Opens in new window) Email
  • Share on Reddit (Opens in new window) Reddit
  • Share on X (Opens in new window) X
  • Share on Tumblr (Opens in new window) Tumblr
  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
  • Share on WhatsApp (Opens in new window) WhatsApp

Like this:

Like Loading…

Filed Under: Uncategorized

26 architecturalist papers: gaslighting

July 12, 2020 by kostadis roussos Leave a Comment

If you read about gaslighting, and you’re a half-decent human being, you may think you never gaslight. But if you are in a position of authority, unless you are careful, you do it all of the time.

As a junior engineer, you can ask a question, but as a senior architect, every issue is loaded. The person on the other side has a very different model for what is going on.

Authority implicitly changes every question.

Let’s take my favorite – “What do you think about my idea, X?”

The person on the receiving end is going to feel gaslit. If the big cheese is asking the question, and you say, “No,” does that end your career? And if you say “Yes,” and it turns out to be a bad idea, does that end your career? And is the big cheese asking your opinion or is he trying to get information about you and your boss?

You are pretending that they have an opinion when they don’t. It’s like asking someone,” Do you think I am fat?”

The other favorite is the” can we do this crazy idea alpha?” The person on the other side has no idea how to respond. If it’s an unfortunate but not damaging idea, the right thing may be to say yes and hope the big boss forgets what you said. If it’s a bad idea and dangerous idea, then you have to argue with the boss. And that sounds fun, but if the boss is committed to their concept, you became a naysayer. And can find yourself trying to keep your role and job.

As a leader, when you propose a solution to the problem, the debate has implicitly ended. And if the idea is a bad one, people are trying to figure out how to understand a bad idea. And thinking about a bad idea feels like your brain is being attacked, that your ability to think is being targetted.

What to do? The right thing to do as a boss is to frame the problem and ask for solutions, not propose them. Or, if you have a solution in mind, phrase it differently.

Instead of” what do you think about idea x,” a person in authority says,” I have thought a lot about this idea that I intend to implement, and I am trying to get a few more perspectives. I would love it if I could run it by you to see if I missed anything and to get your view.”

No longer is the person in authority faking a level of equality that does not exist. Instead, they are telling the truth. And the truth is they don’t care what that other person thinks, but they are interested in knowing if they forgot or missed anything.

But what if the person in authority is frustrated that they can’t get honest feedback? What if they feel that their subordinates are unnecessarily frightened? What if they do want to be challenged and are not?

It must be the spinelessness of their subordinates.

Nope.

It’s not the other’s fault; it’s the boss’ fault. For example, have they created an inclusive environment? If someone objects do they get attacked? Etc.

In a corporate environment gaslighting occurs when the boss pretends your opinion matters, but it doesn’t. Gaslighting occurs when the person in charge acts like they want to hear someone’s opinion, but don’t. Gaslighting occurs when the leader says every idea is on the table, but it isn’t. Gaslighting occurs when the decision-maker says that they will consider every idea reasonably and only attack every suggestion that they don’t agree with.

Learning how not to gaslight, and I am first among equals of those who have to improve, is a critical part of being a technology leader.

Share this:

  • Email a link to a friend (Opens in new window) Email
  • Share on Reddit (Opens in new window) Reddit
  • Share on X (Opens in new window) X
  • Share on Tumblr (Opens in new window) Tumblr
  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
  • Share on WhatsApp (Opens in new window) WhatsApp

Like this:

Like Loading…

Filed Under: Architecturalist Papers

Being a professional and Albert Speer

June 28, 2020 by kostadis roussos Leave a Comment

When I started my career, I was a mercenary. I cared about the money and the puzzle. I didn’t give a damn about what the stuff I built was used for.

My first job was at SGI, and my first bit of tech helped design stuff at Labs that were so secret I couldn’t ask any questions. All I knew was that every question was, “I’m sorry we can’t tell you.”

My next job was at NetApp, where I built streaming media caches. The first use of those systems was for porn. The whole point of the internet at that time was porn. I used to find it amusing that I helped people see porn and enjoy porn.

Twenty-one years ago, porn was seen as – well – bad. And being a sex worker was seen as -bad-. And I’ve changed that point of view. But back then, I liked being part of the bad industry and being able to claim, like Albert Speer, my position is apolitical.

My mom would ask me what I did, and I would stare at her mischievously and tell her that I helped people.

My line was, “I am a professional. If the problem was how to build baby torture devices, and it was interesting, and the pay was good, I would do it.”

But somewhere, in the back of my head, the story of Albert Speer scared me. See, Albert Speer was the guy a whole generation of Europeans used to justify their silence and blindness in the face of the Holocaust. He was just a technocrat. A man that you could almost admire.

In my head, he was the guy who made the evil possible. He was the representative of the worst kind of human being, who was the professional without whom the madmen would never have been able to kill at scale.

One thing about growing up is that you can sometimes have two contradictory thoughts in your head until revelation happens.

In my case, it was a rebirth of my Christian faith. And a realization that being that professional was wrong. Those actions mattered.

But revelation and action take a long time.

And over time, I have started to make decisions about who I work for and where I work based on the principles of leadership and their willingness to take action on things I care about.

My Christian faith makes it impossible for me to expect Saints, but it also demands that I look for better leaders.

After I left NetApp, I went to Zynga. And there, I discovered Mark Pincus, who, despite all of his flaws, showed that being a principled leader was possible. I won’t forget his decision to insist that the Mafia Wars design team delete a creepy scene from Mafia Wars II. There were other decisions, but that one still sticks out.

And my personal success makes it possible to take risks that others can’t.

I don’t want to be Albert Speer.

Growing up, I couldn’t understand how people would worship that man. And the lesson I had learned was that you could have it all if you knew how to ignore the evil you helped create.

He was the consummate professional. And I could have it all if I were like him. I could be a technologist bereft of a moral compass and have it all.

But as I grew up, I became disgusted with myself for being like him, and I started to change.

In the back of my head, the fact that his reputation survived galled me. It meant that amoral professionals never got their due.

It’s with great satisfaction that the latest biographies of Albert Speer make it clear he was evil and should have been hanged. It’s with great relief that I see his reputation crumble, and the people who fell for him have their reputations crumble alongside him.

It’s 2020, and we technologists enable systems that create harm, like Facebook. Without us, Mr. Zuckerberg could not choose to allow hate to spew. Our systems allow him to make choices that are questionable at best, evil at worst.

And Mr. Zuckberg is not alone. There are others. Our personal morality can not be entirely divorced from our profession. Being a professional doesn’t absolve you from not knowing.

And if we think history will be kind, let’s remind ourselves of Albert Speer. History was not kind.

Share this:

  • Email a link to a friend (Opens in new window) Email
  • Share on Reddit (Opens in new window) Reddit
  • Share on X (Opens in new window) X
  • Share on Tumblr (Opens in new window) Tumblr
  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
  • Share on WhatsApp (Opens in new window) WhatsApp

Like this:

Like Loading…

Filed Under: Uncategorized

25 architecturalist papers: playing chess while everyone else plays checkers

February 25, 2020 by kostadis roussos Leave a Comment

One of the tough challenges of acting as a strategic software architect is that it’s not precisely an understood job. Many people ask me what I do, and after several fumbling minutes, I point them to this blog.

The most recent analogy that became somewhat useful is that strategic software architecture is about playing chess while the rest of the world plays checkers. And what I mean is that the world is looking at short time horizons, planning the next step, while the role requires planning two or more steps—and simultaneously making moves during the current phase.

Strategic software architecture is not about the stuff that is shipping now. It’s about the thing that will ship in the future. And the job is not to deliver the current thing. The job is to make sure that the team can deliver the current stuff without you.

As that strategic software architect, when you are working on an immediate deliverable, what it means is you failed your team a long, long time ago. If they need your help, it means that you either didn’t provide them the resources, the people, or strategy that would ensure their success. Fail enough times, and there is a new strategic software architect.

In most software companies, the planning cycle doesn’t extend out longer than 18 months. The world changes too fast for anything more than that. And so nobody is thinking past 18 months.

There is one group that is thinking past 18 months, these strategic thinkers. They are a critical, sufficiently unrecognized group across a large number of business functions, but this is about engineering, so I am focusing on that. As this role doesn’t exist and isn’t recognized, and there are no rewards for long term success and, it begs the question of ‘does it exist’?

There are many people like that at a company. They are the ones who seem to generate magic just when the company needs it. That continues to deliver value, and nobody knows why or how. As engineers, they do it with well-timed code-reviews, speaking whispers to the right people, working on the right project, checking in something that nobody expected. They call meetings to discuss things in private, and thereby create a social network that is impenetrable and built around the respect they have earned and the reputation they have acquired.

And over time, the company strategy is the strategic thinkers’ strategy, even if the company thinks otherwise. For many reasons, beginning with hiring that is shaped through their biases. What is easy to build and hard to make is what they and their social network think is easy and hard. What can be created is controlled by their tastes. What is easy and hard to do is shaped through the myriad of small technical decisions that make change very hard. And their software architecture ossifies their decisions through org charts that can endure long past the code choices that formed them are relevant.

Currently, this entire area is left to chance. We are lucky to hire people who can do the job. And I have seen it in my career. Where there are groups that seemingly out of nowhere, keep doing the right thing. And things keep getting better, but I can’t figure out why. And finally, somewhere someone turns up that has a plan . That plan exists in their heads, or on a piece of paper or a confluence page that nobody reads but that everybody is working on.

I believe that a company that figures out how to do strategic software architecture as a discipline and incorporates it into the 18-month planning has a decisive advantage. I also believe that if they can couple this approach with a rabid focus on immediate delivery, they can’t lose.

Share this:

  • Email a link to a friend (Opens in new window) Email
  • Share on Reddit (Opens in new window) Reddit
  • Share on X (Opens in new window) X
  • Share on Tumblr (Opens in new window) Tumblr
  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
  • Share on WhatsApp (Opens in new window) WhatsApp

Like this:

Like Loading…

Filed Under: Architecturalist Papers

24 architecturalist papers: how to not engage the A-team.

February 18, 2020 by kostadis roussos Leave a Comment

There are many reasons to change jobs. Some are better than others. But the best is when your peers or your boss don’t engage you for your best work.

Let me start with a story. In 2009, I was a technical director at NetApp. At many other companies, this role is analogous to a senior technical leader with a pay grade that is equivalent to that of a director-level manager.

At the time, NetApp was engaged in a multiyear effort to converge the operating system of the company they had acquired Spinnaker and their platform OnTAP. After the first unsuccessful attempt with the product known as OnTAP GX, the technical and business leadership of the company rallied around a strategy that eventually became what is now known as OnTAP 8.0.

This effort required vast amounts of synchronization. At one point, I was the architect for the data protection portion of the business. The overall architect for the effort sent me an email with a detailed task breakdown of what the data protection team needed to accomplish over the next two years. As I looked at the list I had some concerns with some of the details and the overall general direction, and so I flippantly responded with a “this is so detailed that I’m not sure what value I’m going to add. Why don’t you just send it directly to the management team in the product managers.” And so the author of the email took my response and forwarded the email with his comment, “you’re right.”

After that, it took very little for me to want to leave NetApp.

Over the years I have wondered why this particular exchange was so critical in my leaving NetApp. NetApp was at the time treating me well. I would’ve made more money at NetApp than I did at Zynga. I had a five-week vacation at NetApp. I was well respected by the then CTO and chief science officer. My boss at the time and I had some differences of opinion, but those differences of opinion were resolvable. And the problem space remained interesting.

So why did I leave?

I left because at the end of the day that other architect did not want to engage with my best self. He was not interested in working with me to come up with the right answer. He just wanted me to do exactly as I was told and to take accountability for his decisions.

In short, what I heard him say is, “I don’t need you to think. I need you to do exactly as you’re told and to make sure that the things I need done are done.”

Over the years, I have seen this pattern play out again and again. Sometimes with me on receiving that message or the author of such a message. What I have concluded is that if you’re engaging someone to solve a critical business problem and you talk to them in a way that demonstrates you do not see value in their abilities, then you’re asking them to leave.

But there is another insidious problem that also can occur. If you’re trying to engage with another team where you happen to know a lot about how that team’s systems are built, it is tempting to bypass the current architect and just tell them to do this and this. The problem is that what you’re doing is not engaging the team to be able to solve your problems. And what this kind of communication does is discourage precisely the people you need. The people who want to have a scope to think and to imagine possible solutions. They decide to not want to work on your problems.

If the problem is small in scope, then this is not necessarily a bad thing. If the problem is significant in scope, this is a calamitous decision. You’ve traded off the delivery of a small set of critical capabilities at the expense of a deeper understanding of a hard, complex problem. You have reduced a team that could potentially add value by understanding the problem and thinking about it hard and long for a team that will do exactly as they’re told and no more.

“But that wasn’t the goal,” I have said on more than one occasion.

No, it wasn’t. But that is precisely what I have achieved in the past. Because the people who could think realized that there was room for them to think, only to do. So instead of looking at a problem and trying to solve it, they saw a list of activities that could be done by someone who couldn’t think as deeply as they could. At the end of the day knowledge workers have a tremendous amount of freedom to choose what problems they work on. They also have a tremendous amount of freedom to decide how long and how hard they want to work on an issue. The single most calamitous decision you can make as an architect is to engage with another architect by treating them as less than an equal.

Over the years, I have made this mistake. To those that I treated poorly, my loss is massive but nothing as compared to how poorly I treated you. All I can say is that life is about getting better and learning more new things. And maybe this goes a little distance as an apology.

 

 

 

Share this:

  • Email a link to a friend (Opens in new window) Email
  • Share on Reddit (Opens in new window) Reddit
  • Share on X (Opens in new window) X
  • Share on Tumblr (Opens in new window) Tumblr
  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
  • Share on WhatsApp (Opens in new window) WhatsApp

Like this:

Like Loading…

Filed Under: Architecturalist Papers

23 architecturalist papers: latency kills

January 17, 2020 by kostadis roussos 2 Comments

While at NetApp, I saw the incredible effort that became known as ONTAP 8.0 and was part of the spinnaker acquisition.

From that experience, I learned a few seminal things that continue to resonate. The short version is that latency kills.

Let me start by saying, that the hard problem in storage is how to deliver low-latency and durability. Enterprise storage vendors earn their 70% gross margin because of the complexity in solving two issues that appear to conflict. The conflict is that durability requires a copy, and making a copy slows things down.

The solution was, and is, to use algorithms, in-memory data structures,  and CPU cycles to deliver latency and durability.

When Spinnaker was acquired, there was a belief within the storage industry that single-socket performance had reached a tipping point, and that performance could only be improved if we threw more sockets at the problem.

And, in retrospect, they were right. Except, we collectively missed another trend. Although the single-thread performance was no longer going to double at the same rate, the performance of media was going to go through a discontinuity and radically improve its performance.

But at the time, this wasn’t obvious.

And so many folks concluded that you could only improve performance through scale-out architectures.

The problem with scale-out architectures is that although single node-latency can be as good as local latency, remote latency is worse than local latency.

And application developers prefer, for simplicity, to write code that assumes uniform latency of the infrastructure.

And so applications tend to be engineered for the worst-case latency.

And single-node systems were able to compete with clustered systems. As media got faster, and as single-node performance improved, application performance on non-scale-out architectures was always better.

In short, the scale-out architectures delivered higher throughput, but worse latency.

And it turns out that throughput workloads are not, generally, valuable.

And so scale-out for performance has it’s a niche, but it was not able to disrupt non-scale out architectures.

Over time, clustered storage systems added different value than performance, but the whole experience taught me that customers will always pay for better latency. And that if there is enough money to be made in the problem space, it will be solved in such a way to avoid applications from changing.

Share this:

  • Email a link to a friend (Opens in new window) Email
  • Share on Reddit (Opens in new window) Reddit
  • Share on X (Opens in new window) X
  • Share on Tumblr (Opens in new window) Tumblr
  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
  • Share on WhatsApp (Opens in new window) WhatsApp

Like this:

Like Loading…

Filed Under: Architecturalist Papers, Storage

22 architecturalist papers: multi-tenancy and quotas

January 2, 2020 by kostadis roussos Leave a Comment

Over the last years, I have gotten into a series of protracted debates about multi-tenancy.

What I have begun to understand is that it is essential to define the objectives of multi-tenancy before one starts to talk about it.

And even before we get to that need to define what is multitenant.

Consider a piece of hardware, say a server with four sockets. An individual owns the server. Another individual owns the building in which the server resides.

In effect, when there are two actors Mary and Tom, that have access to a system, that system is said to be multitenant if Mary and Tom do not trust each other.

But how much do they trust each other? The trust goes to how much the system must protect Mary from Tom and vice versa. For example, suppose Mary trusts Tom. Then Mary doesn’t care that Tom has physical access to the hardware. And Mary takes no actions to protect her data or her applications running on that server. In effect, Mary and Tom are the same people; they have different roles.

But suppose Mary trusts Tom, but Tom doesn’t want to damage Mary’s system accidentally. Identities and roles play a factor. What Mary would like to do is have a role that Tom can use that allows him to do the things he needs to do to Mary’s server and no more.

And so this is where things get complicated. There are two basic approaches; the first is to bake into the system the set of controls that Tom has access to and to use some role-based access system integrated with some identity system that determines what Tom can do. The problem with such an approach is that if Tom needs to do something that is not in the system, he has no way to do it and has to ask Mary. Now, if Mary is okay with that, all good, however, Mary may not want to do the task and may wish to allow Tom to do the job. But if the system has no way for her to do that, then she is forced to give him access to more controls than he is capable of using.

The second approach is to use layering. You create a net new interface that interacts with Mary system through some APIs, and that net new interface is what Tom uses. Thus when Mary wants to enable Tom to do something new, she will, Tom, can extend his tool to do that. The problem with this approach is that Tom now has access to a whole bunch of operations he shouldn’t have. The only thing preventing Tom from using those operations is his adherence to procedure and the fact that at the end of the day, Tom isn’t malicious. He’s a good guy.

My observation is that approach one doesn’t work. The reason it doesn’t work is the set of operations that Tom needs to perform is ever-evolving. Worse, the collection of activities that Mary wishes Tom to do is ever-expanding. And as a result, they end up using the second approach.

Okay, so what?

The problem is that too many people attempt to build the first model. For example, suppose I have an interface for interacting with the system. That interface allows me to create objects delete objects, or modify objects. Then what happens is that somebody decides that the hierarchy of those objects should reflect some authorization scheme. Then what happens is that Tom and Mary can’t do their jobs because the hierarchy or the complexity of configuring and setting up the hierarchy and setting up authorization is not expressible by the system. In effect, the hierarchy and system that allows you to create edit and manipulate objects for one task is not the same hierarchy you would use for another.

And so, ultimately, what you do is you create a tool that has a specific set of operations that Tom needs. Mary and Tom configure the tool so that it only does what it needs to do.

But, the advocates of the first system point out that the second approach is less secure. And they are right. Or I’ll take them at their word.

They ask, what if Coke and Pepsi want to run their software on the same physical servers. I always found that to be an absurd question. Even if we could assume that the system was entirely secure, there is human error. I thought that Coke and Pepsi would always buy their servers. What is interesting is that the market seems to be doing that even in the public cloud. The Nitro hardware that Amazon has produced mainly provides physical instances on a shared server. And this was before we discovered that there are architectural holes in our systems that allow data to leak between programs running on the same physical server that belong to different tenants.

And so, my assumption has always been that if you care about security, air gaps are about the only thing you should trust.

What does this mean?

Consider the server. With no software, it can do anything you could imagine. The minute you start running software, the set of things you can do becomes increasingly more limited. It turns out that there is a whole slew of user interfaces that are a lot more useful than just starting with the hardware. Over time, a set of interfaces for using a system and controlling access to that system have emerged. And we have figured out over many years how to make them address both Tom and Mary’s needs. A great example is the use of root and less privileged users on most operating systems.

A user interface for the system is handy to both Mary and Tom is incredibly powerful. And therefore, whenever a new way of interacting with servers emerges, there is a temptation to try to figure out what the boundary between the two tenants should be. The reality is that such interfaces developed after years of hard work and experience in operational practicality. Therefore in a new system, you are most likely to draw the boundary in the wrong place. And thus, in my mind, how you access a system should be independent of how you control access.

Okay, I’m dangerously close to talking about security, and when it comes to security, I know that I know nothing.

The problem is that even if you don’t care about security, another critical use case of multi-tenancy is to reduce the cost for the infrastructure provider, Indigo. What Indigo wants to be able to do is assign quotas to Mary and Jane and Tom. And what they want is to ensure that Mary, Jane, and Tom don’t ever exceed their quotas.

Amazon’s solution was to create an infinite supply of servers and bill Mary Jane and Tom for their usage.

The limitation of such a system is that you can only buy the set of servers that Amazon has decided to make available. The other limitation is that it assumes an infinite infrastructure.

If, however, Indigo does not have access to an infinite infrastructure or it’s inappropriate for their use case, what to do?

In my opinion, they should choose approach number two. What does this mean? There are a set of objects that Mary Jane and Tom use to do their jobs. Indigo has a set of quotas that they assigned to Mary, Jane, and Tom. Mary, Tom, and Jane’s need to refer to the quotas and use them transparently. And so there is a temptation to encode them in the objects. But instead of having quota enforcement done at every access to the objects, it should be done lazily unless you exceed some threshold.

And if you look at what Amazon does, they do the same thing.

If you want to use one more server, they will give it to you. If you’re going to employ 10,000 more servers, that involves a phone call. They have their quotas that are lazily enforced and, at some point in time, block access.

In effect, Amazon has decoupled secure isolation from quota enforcement.

And so, when we talk about multi-tenancy, what we need to do is ask, are we trying to solve for secure isolation, or are we trying to solve for quota enforcement? The requirements for security depend on the customer, the trust, the legal requirements, etc. How you do quotas is independent of all of those security restrictions and should be treated as such.

Share this:

  • Email a link to a friend (Opens in new window) Email
  • Share on Reddit (Opens in new window) Reddit
  • Share on X (Opens in new window) X
  • Share on Tumblr (Opens in new window) Tumblr
  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
  • Share on WhatsApp (Opens in new window) WhatsApp

Like this:

Like Loading…

Filed Under: Architecturalist Papers

  • « Previous Page
  • 1
  • …
  • 8
  • 9
  • 10
  • 11
  • 12
  • …
  • 27
  • Next Page »

Loading Comments...

    %d