wrong tool

Scaling efficiently instead of scaling up or out.

November 11, 2015 by kostadis roussos 1 Comment

Over the last few months, I’ve been involved in a lot of discussions about how to make software systems more efficient.

When we look at making software go faster, there are three basic approaches

Pick a better algorithm
Rearchitect software to take advantage of hardware
Write more efficient software.

From about 1974 when Intel introduce the original 8080, up until 2004, conventional wisdom was that writing more efficient software was a losing proposition. By the time the more efficient software was written, Intel’s next generation processor would be released improving your code’s performance. The time you spent on making your software go faster, represented a lost opportunity to add features.

As a result, a generation of software engineers was taught of the evil of premature optimization.

Textbooks, and teachers routinely admonished their students to write correct code, and not efficient and correct code.

Starting in 2005 with the shift to multi-core processors making software go fast was about taking advantage of multiple cores.

Software developers had to adapt their systems to be multi-threaded.

At the same time, software developers noticed that the number of cores per system was limited and to get ever increasing scale, they had to be able to leverage multiple systems.

And thus the era of scale out distributed architectures began.

In this era, software engineers had to create new algorithms and new software architectures, and writing efficient code was still not viewed as an important part of delivering ever faster software.

See from 1974 to 2015, the name of the game was to use more and more hardware to make your software go faster without any consideration to how efficient the software is. From 1974 to 2004, you just waited for the next processor. From 2004 to 2015 you re-architected your software to take advantage of more cores and then later to scale out to more systems.

And by 2012, writing large scale distributed systems was easy. A combination of established frameworks and patterns made it easy to build a system that scaled to hundreds of systems.

Software engineering had discovered the magic elixir to ever increasing performance. We could harness an increasingly large number of systems combined with multi-threaded code to get infinite performance.

If the 1975-2004 era made writing efficient code of dubious value, the scale-out age made it even more questionable because you could just add more systems to improve performance.

High-level languages, coupled with clever system architectures could make anyone deliver an application at scale with minimal effort.

Was this the end of history?

No.

It turns out that large scale-out systems are expensive. Much like processors hit a power wall, massive data centers that consume huge amounts of energy are expensive. And companies started to wonder how do I reduce the power bill?

And the answer was to make the code go more efficiently. And we saw things like Hip-hop emerge, and Rust. Hip-hop tried to optimize code. Rust tries to provide a better language for writing efficient code. And in parallel we see programming languages like Node.js and Go become popular because they allow for more efficient code.

Software efficiency has become valuable again. The third pillar of software performance after a 40-year wait is the belle of the ball.

And what is interesting is that the software systems of the last 40 years are ridiculously inefficient. Software engineers assumed hardware was free. And because software engineers made that assumption, large chunks of software are very inefficient.

The challenge facing our industry is that to improve the efficiency of software we will either have to rewrite the software or figure out how to automatically improve performance without relying on hardware. No white knight is coming to save us.

And we are now looking at the world where performance and scale are not just going to be a function of the algorithms, and the architectures, but of the constants. And in this brave new world, writing efficient and correct code will be the name of the game.

We will not only have to scale out and up, we will also have to do so efficiently.

Put differently, perhaps there is no longer such a thing as a premature optimization?

Square fails

November 6, 2015 by kostadis roussos Leave a Comment

First of all congratulations to the square team for getting to an IPO.

https://recode.net/2015/11/06/square-takes-an-ipo-bullet-for-all-of-the-overpriced-unicorns/

Not so good news for employees whose equity is worth about 1/3 less… The number was pure fiction before and at least now it’s a real bumber.

Unicorns the latest technology for taking money from the working stiff!

Thank you Scott!

November 5, 2015 by kostadis roussos 1 Comment

Whe I started my career, my first boss on my first day told another engineer:

Are you doing anything useful or are you just breathing my air?

He then later turned to me and informed me that there was this five year rule. And I asked what is the five year rule?

And he said:

You keep your mouth shut for five years.

And so I learned to model that behavior. I thought being a technology leader meant being a dick.

And then I met Scott Schoenthal at NetApp and learned that you could be a polite civil and compassionate leader. That being nice was a better way to lead. That ripping people to shreds, publicly shaming them, calling them names wasn’t the only way you could get your point across.

I didn’t always model my behavior after Scott – Lord knows how many rants I produced in this life …but I am glad I learned about that other way to lead.

As an engineer it’s easy to see the cantankerous asshole email and say: I want to the guy who writes that email. Writing that email means you arrived. It means you are the man.

Except it doesn’t. It means you are an asshole. You are reveling in your ability to abuse someone who is defenseless. You are modeling poor behavior.

Just Don’t Do It.

Scott Schoenthal was the first person to show me another way….

How Stanford Screws the Middle Class

October 4, 2015 by kostadis roussos 3 Comments

One of my personal enduring mysteries was why do the top 100 private colleges charge about the same amount of money for tuition.

Given their wide variance in size, location, and endowments, you would expect to see a wide variance in price.

Except you don’t. The list price for a college education is about the same.

And then I spoke to someone who is deep in the bowels of Stanford’s budget and figured out how exactly Stanford is screwing the middle class.

Let’s begin with the following startling observation. Stanford has two sources of revenue. The first is a draw on their endowment. The second is their ability to issue bonds to borrow to build (check out http://bondholder-information.stanford.edu/home.html) Tuition, is a drop in the proverbial budget, a rounding error.

Just to make it real, the draw on endowment is about 5% a year so

21.4 billion * 5% = 1.07 billion

Student tuition = 14k * 3 quarters * 7k = 294 million

Ah you say, look! it’s 30% of the budget… except. about 4679 get some kind of tuition reduction, so let’s cut that number in half so it’s about 150 million dollars or about 15% … A drop in the proverbial bucket in a billion dollar budget.

Let me think about this for a moment. Stanford benefits from tax exemptions from gifts and simultaneously benefits from tax benefits while borrowing money while demanding money it doesn’t need from parents after tax income.

Hmm…

Let me repeat, the tuition that basically destroys a college graduate’s ability to buy a house or devastates a parents retirement is a rounding error in Stanford’s budget and comes from your taxable income.

Is this about Stanford? Certainly not, it’s about Harvard and Yale and Brown and by implication every institution of higher learning that is charging more money because they can.

Why does your college education destroy your life and your parents retirement? Because we’re stupid enough to pay for it.

Apple v Google – c’est la guerre

September 19, 2015 by kostadis roussos Leave a Comment

Just before Greece and Italy went to war, the Italian Ambassador came to General Metaxas and told him the terms for peace. Metaxas apparently replied in French that they both spoke: ‘Alors c’est la guerre.’

What had happened was that the irreconcilable differences between the two great nations created war. The goals of the Fascist Italian leadership and the national goals of Metaxas made war. And this in spite of the ideological alignment between the two leaders. Mussolini and Metaxas were both fascists.

Apple’s decision to create ad blockers was inevitable. Apple is primarily aligned with the folks who buy their devices. They are absolutely committed to building a compelling experience on their platform at the expense of everyone on their platform. Customers pay for the platform and Apple’s biggest source of revenue is that platform. That single-minded focus on the platform experience had to lead to ad-blocking because ads are the single biggest source of irritation in the web-ecosystem.

Google because it was aligned to maximizing ad revenues and not maximizing the web-ecosystem experience, left Google vulnerable and by extension the web vulnerable to Apple’s decision.

Google had about 15 years to make the web better, instead they focused on extracting even better revenues from the web through more efficient ad-revenues. Google never took ownership of the web-platform. It’s unclear that Google could have taken ownership or stewardship. Because Google didn’t own the platform.

This outcome, the assault on the ad system that powers the web to the detriment of the user experience was inevitable. The only option is for Google and the ad networks it supports to figure out how to make ad’s better.

Packet re-ordering is bad.

September 13, 2015 by kostadis roussos Leave a Comment

One of the weirdest things at Juniper was the obsession the networking teams had about reordering packets. They kept talking about how applications could not tolerate reordering.

And this confused me to no end.

After all TCP was based on the assumption of packets being reordered and out-of-sequence and surviving that mess?

And then it was explained to me as if I was the networking NOOB that I am. The problem is that when a packet gets reordered TCP doesn’t perform as well as when the packet gets sent in order. And there are scenarios where TCP will assume that the network is congested if the packet doesn’t get sent in time and will slow down the network connection.

And so to work around the TCP protocol thinking it understands what is going on in the network, ASIC engineers do heroics to ensure that packets flow through routers in order.

Then I read this today and I was reminded of those conversations:

http://gafferongames.com/2015/09/12/is-it-just-me-or-is-networking-really-hard/

There are all sorts of very interesting applications that run over the Internet that really are just pumping packets and want them arriving in order or not at all.

And that because of these applications the design complexity of routers is vastly more complex than if the layers above the network did not assume anything about a reordered packet.

Facebook hitting a billion people – is this the day that open messaging died?

September 4, 2015 by kostadis roussos Leave a Comment

In the 1980’s visionary technologists created e-mail as an open non-proprietary messaging system. This allowed anyone to communicate with anyone on open networks.

With Facebook hitting a billion, and WhatsApp hitting 900 million people, we now have a new proprietary network that has the reach that email does.

In the open messaging world, messages were owned by the sender, and the recipient and were portable. Your social network – the set of people you interact with – and your chats were owned by the people who created them.

Facebook and Whatsapp have now up-ended that open communication channel. They own your social network and they own your messages and they have the reach to displace open communications. A private entity owns your friends and the relationship to your friends.

And hence snapchat and wickr. If there is no portability and durability that is independent of the service provider, then you may as well treat the messages as ephemeral.

Who would have thought, my most private and important data would be owned by someone else.

Oh brave new world!

The completely misunderstood IOPS

September 1, 2015 by kostadis roussos Leave a Comment

I was recently in a meeting about performance where the discussion turned to how many IOPS was the database doing.

And what was interesting was how much of our thinking about performance is formed in a world where IOPS are a scarce resource because the underlying media was soooo slow.

In the modern, post spinning rust world, IOPS are practically free. The bottleneck is not the underlying media, SSD’s and later things like 3D Xpoint memory (what a horrible, horrible name for such an important technology) have essentially free IOPS. The bottleneck is no longer the media (disk drive) but instead the electronics that sit in front of the media.

The electronics include things like networks, memory busses, and CPU’s. We are now bandwidth and CPU constrained, no longer media constrained. What that means is – of course – interesting.

One practical consideration is that looking to optimize IOPS is no longer a worthy effort. Instead, we should be looking at CPU and Memory cost per IOP. And we should be willing to trade off some CPU and Memory for more IOPS to improve overall system behavior.

For folks, like myself, who grew up working really hard to try and avoid doing disk operations, embracing IO is going to be hard…

And like a buddy of mine once said, these material scientists keep investing these new exotic technologies that keep us system software engineers busy.

It’s a good time to work in systems.

Metrics over usability

August 30, 2015 by kostadis roussos Leave a Comment

This is the kind of shit that drove Zynga customers nuts.

In an attempt to drive metrics to other features … We add friction to the top activity… I didn’t know about collages and not do I care to know about them and I certainly don’t want to be reminded of them all of the time.

I used to be able to just enter a status, now I have to pick one.

This is just another example of egregious Facebook metric driven feature – like the hyper aggressive attempts to get me to turn on notifications.

The end of storage tiers

July 2, 2015 by kostadis roussos Leave a Comment

I wrote about this in 2008 on my now defunct corporate blog at NetApp. It’s fun to be working at a company that can actually create the IOPS tier.

Flash has once again thrown into stark relief the absurd classification of storage into tiers

Talk to a storage vendor and Tier 1 is their most expensive stuff. Talk to a storage architect and Tier 1 is their most expensive stuff. If you’re lucky there is some overlap.

Then we have Flash. Is it Tier 0? Does Flash make Disk Tier 5? What is the role of Flash and Disk? Is Disk the new Tape? So do we need to have Tier -1 for storage that is faster than Flash?

Then there is the whole disk storage is secondary storage. Secondary to what?

I never really did get all of those classifications of storage into tiers. I tend to think of storage in terms of how it is used.

So instead let me propose a new model for storage tiers based on the ratio of application CPU and memory, the amount of IOPS required and the capacity needs of the application or the ratio CPU:memory:IOPS:Capacity.

Based on that ratio there are three storage tiers

Captive IOPS, where IOPS are all dedicated to a single application. In this deployment the ratio is 1:1:1:1. Add more CPU and Memory and you add more IOPS and Capacity. Because of the nature of the application and how many IOPS it consumes, there is nothing left over for another application.
Shared IOPS, where IOPS are shared across a collection of applications. In this deployment the ratio is M:N:1:1. As you add more CPU and memory, the number of IOPS increase but not at the same rate. So you can share the IOPS across a number of applications rather than dedicating them to a single application.
Capacity Efficient where the number of IOPS is dwarfed by the capacity requirements. In this deployment the ration M:N:1:Q. where as M and N increase, Q increases but IOPS do not. A good example is a backup server. As more data gets backed you need more capacity, but you don’t actually need more IOPS. Another good example is a home directory where capacity needs increase, but actual IOPS do not.

Next, I’ll explore the implications of these three tiers.

Scaling efficiently instead of scaling up or out.

Like this:

Square fails

Like this:

Thank you Scott!

Like this:

How Stanford Screws the Middle Class

Like this:

Apple v Google – c’est la guerre

Like this:

Packet re-ordering is bad.

Like this:

Facebook hitting a billion people – is this the day that open messaging died?

Like this:

The completely misunderstood IOPS

Like this:

Metrics over usability

Like this:

The end of storage tiers

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: