Storage

23 architecturalist papers: latency kills

January 17, 2020 by kostadis roussos 2 Comments

While at NetApp, I saw the incredible effort that became known as ONTAP 8.0 and was part of the spinnaker acquisition.

From that experience, I learned a few seminal things that continue to resonate. The short version is that latency kills.

Let me start by saying, that the hard problem in storage is how to deliver low-latency and durability. Enterprise storage vendors earn their 70% gross margin because of the complexity in solving two issues that appear to conflict. The conflict is that durability requires a copy, and making a copy slows things down.

The solution was, and is, to use algorithms, in-memory data structures, and CPU cycles to deliver latency and durability.

When Spinnaker was acquired, there was a belief within the storage industry that single-socket performance had reached a tipping point, and that performance could only be improved if we threw more sockets at the problem.

And, in retrospect, they were right. Except, we collectively missed another trend. Although the single-thread performance was no longer going to double at the same rate, the performance of media was going to go through a discontinuity and radically improve its performance.

But at the time, this wasn’t obvious.

And so many folks concluded that you could only improve performance through scale-out architectures.

The problem with scale-out architectures is that although single node-latency can be as good as local latency, remote latency is worse than local latency.

And application developers prefer, for simplicity, to write code that assumes uniform latency of the infrastructure.

And so applications tend to be engineered for the worst-case latency.

And single-node systems were able to compete with clustered systems. As media got faster, and as single-node performance improved, application performance on non-scale-out architectures was always better.

In short, the scale-out architectures delivered higher throughput, but worse latency.

And it turns out that throughput workloads are not, generally, valuable.

And so scale-out for performance has it’s a niche, but it was not able to disrupt non-scale out architectures.

Over time, clustered storage systems added different value than performance, but the whole experience taught me that customers will always pay for better latency. And that if there is enough money to be made in the problem space, it will be solved in such a way to avoid applications from changing.

Open Facebook API or what to do about Facebook

December 28, 2019 by kostadis roussos Leave a Comment

When I left Zynga in 2013, I was convinced that Facebook was a malevolent entity run by leaders who could not be trusted. But I was also bitter about a 6$ stock price and my life choices.

Fast-forward to 2019, and it turns out that what I thought was just sour grapes, undersold the net harm Facebook has created.

An option that isn’t considered very seriously is the following simple proposal. Don’t break up Facebook, but regulate the access to and control of the friend graph and the ability to use the friend graph to publish information.

In 2012, when Facebook and Zynga stood off, the debate that was at the heart of the disagreement was ownership of the friend graph. Facebook believe they owned the friend graph and by extension owned how it could be used. We disagreed. In the end, we caved. I know this because I worked on the software systems necessary to create a parallel friend graph of people who were friends with other people who played Zynga games.

Facebook would love for us to spend time talking about breaking things up, instead of talking about the one thing that matters, a regulated open-api and regulated data portability.

Consider the messenger space. Because the friend graph is in my personal address book, it’s trivial to talk to several dozen different friends. Because the content is on my phone, typically pictures or documents, I can share anything with anyone.

Consider how many more messenger apps there are, versus how many social networks there are.

But let’s look to the past. During the failed MSFT anti-trust trial, a peculiar part of the agreement said that MSFT could no longer have private APIs, and that they had to communicate changes in a very specific public way.

This ruling enabled NetApp, which had built a reverse engineered CIFS server to survive and thrive. Because MSFT was losing the CIFS business, it also pushed MSFT to look for alternatives to CIFS, like SharePoint for document sharing and collaboration.

But over the long term, it enabled companies like Box and Google Drive and other file-sharing companies to emerge. Without the guarantee that a single man couldn’t break an API, a healthy and vibrant ecosystem in data storage has emerged.

If we had an open-social graph, and an open api, and data portability then I suspect that over time new social networks would emerge. Every social network would probably cater to different kinds of people.

In many ways Facebook does this today with Facebook Groups. For example, I happen to have joined two Facebook groups, one dedicated to old-school rpg, and another to 5E. The two groups hate each other. But because my social graph is portable, I can communicate to both groups within facebook.

Or we can even go back to Facebook’s origins. When Mr. Zuckerberg opened up the API, he promised it was going to be open and portable. He lied, of course, but not before Mark Pincus and Zynga figured out how to exploit the graph to grow Facebook’s business. Once, Mr. Zuckerberg figured out that owning the graph and how you communicate with it was very valuable, he squashed us like a bug. And destroyed the Facebook app eco-system.

Which brings me to regulation, we can’t trust Mr. Zuckerberg . Like we couldn’t trust Mr. Gates. And breakups don’t always work. Look at ATT, 40 years after the breakup, they control everything, again.

The completely misunderstood IOPS

September 1, 2015 by kostadis roussos Leave a Comment

I was recently in a meeting about performance where the discussion turned to how many IOPS was the database doing.

And what was interesting was how much of our thinking about performance is formed in a world where IOPS are a scarce resource because the underlying media was soooo slow.

In the modern, post spinning rust world, IOPS are practically free. The bottleneck is not the underlying media, SSD’s and later things like 3D Xpoint memory (what a horrible, horrible name for such an important technology) have essentially free IOPS. The bottleneck is no longer the media (disk drive) but instead the electronics that sit in front of the media.

The electronics include things like networks, memory busses, and CPU’s. We are now bandwidth and CPU constrained, no longer media constrained. What that means is – of course – interesting.

One practical consideration is that looking to optimize IOPS is no longer a worthy effort. Instead, we should be looking at CPU and Memory cost per IOP. And we should be willing to trade off some CPU and Memory for more IOPS to improve overall system behavior.

For folks, like myself, who grew up working really hard to try and avoid doing disk operations, embracing IO is going to be hard…

And like a buddy of mine once said, these material scientists keep investing these new exotic technologies that keep us system software engineers busy.

It’s a good time to work in systems.

Inside Onedrive

January 21, 2015 by kostadis roussos Leave a Comment

Ever since Microsoft gave away storage for a product I would buy anyways, I have been working to move ~1TB of storage to the cloud and have encountered many of the limitations of the service and learned a little bit of the technical underpinnings.

I suspect when the folks who created OneDrive imagined the service they thought pictures. And pictures have a reasonable size (1-2MB) and a small number of them. They did not imagine using one-drive to backup entire multi-100GB file systems.

And that impedance mismatch has been tough.

OneDrive uploads a small number of files at at time (2-4). OneDrive scans the entire FileSystem to find new files.

All of this will get fixed over time, I am sure although things are now kind of rough.

One of the more interesting things is that I have gotten some insight into how OneDrive is built.

OneDrive has four core elements

Skydrive.exe that is the sync engine that actually copies the data to the cloud
WSearch the search engine for windows doubles as a way for onedrive to keep track of the files on the file system.
The use-of-stubs to manage offline files and provide the illusion of a single file system. Possibly the only good use of hierarchical storage management in the history of the snake oil known as HSM.
A pretty UI

All of these technologies have been around for many many years and OneDrive is really a repackaging of all of them.

OneDrive, as a product, has the property of something that was cobbled together over time without any of the architectural integrity of competing products like DropBox or Google drive. This probably reflects the ambivalence Microsoft had towards cloud services. I am encouraged to read for Windows 10 Microsoft is working to improve Onedrive significantly.

One of the challenges for a company like Microsoft is that building a product that has the feature set of DropBox is easy, but building a competing product is a completely different can of worms. A competing product requires a deeper level of engineering than the cobbling of re-purposed technologies that the current Onedrive product is.

Microsoft’s decision to embrace DropBox may reflect that reality.

Is this really about Google Docs?

October 27, 2014 by kostadis roussos Leave a Comment

With respect to MSFT today, Google and Box have been offering free unlimited storage for a while now. The difference is that Google and Box required you to spend about more than 40$ a month for a business plan.

My theory is that this latest announcement is really about Google.

I am a happy user of Google products. Even though Google Docs are less capable than their MSFT counterparts their freeness makes up for that.

Now that my storage requirements are starting to push me to pay for a larger storage plan, I am shopping for an alternative.

And MSFT new pricing for storage is going to make me look at MSFT for the first time in ages.

My theory is that MSFT is figuring that there are enough people like me who when confronted with the choice may buy the MSFT plan because it’s cheaper. This may help slow down the growth of Google Docs.

DropBox may be collateral damage.

Dropbox will start showing ads Real Soon Now

October 27, 2014 by kostadis roussos Leave a Comment

With MSFT dropping the marginal price of storage for consumers to 0, what does this mean for Dropbox?

The old business model that was really awesome was that each user represented a permanent annuity. As each user consumed more storage, over time, the user paid more. And as the user consumed more, the ability to move the users data declined. And with features like photo-sharing and data sharing, the ability to move data became harder and harder and harder.

Although I am sure that DropBox assumed the cost per GB would drop over time, the assumption in the plan has to be that it never got to 0 and it always increased as people stored more.

This was a sound business model until or unless the annuity goes away.

And that is what MSFT just did. They eliminated the annuity business. I am sure that DropBox will resist. But here is what will happen: as people who start pushing into the higher and higher tiers of cost start looking at their bills, the desire to move to cheaper solutions will outweigh the inconvenience. They will either move all of their data or start moving parts of their data to newer cheaper solutions.

The net effect is that with a cost of 0 dollars, it makes a lot of sense to use the free DropBox offering and then when you have to pay go to MSFT for any excess data.

Now Dropbox has to come up with a new plan. Their annuity strategy is crippled.

And the new plan may be advertising. DropBox was a storage company that offered file sharing on the cloud. Now they are a content repository with some nifty content management and content sharing tools for consumers. Companies that provide tools for consumers that can not grow their revenue as an annuity will turn to trying to monetize their customers more efficiently. And with all of that user data, the temptation to use it to advertise will be great.

Gmail made it okay to have your email automatically scanned for advertising, – i wish I could have seen the ads on General Petraeus account, you have to believe DropBox customers will be okay with this as well…

And now unlimited – MSFT lays down the gauntlet

October 27, 2014 by kostadis roussos 4 Comments

Microsoft just announced that they are offering unlimited OneDrive storage for ~7$ a ~~year~~ month along with Office 360.

Monetization of capacity in the storage industry is very hard. The storage industry was able to do that because of the cabling limitations of controllers and disk drives. Eventually you needed to buy a new controller because you could add no more disk drives to a controller. In the cloud the consumer never has to buy another controller, so the requirement to buy stuff to increase capacity never happens.

The fact that capacity is now going to be free – you’re paying for Office360 is showing that to be true.

After all the cost per GB is 0.03, the ability that you can charge 3x the cost of the media (30$ vs 100$) for a terabyte is unsustainable.

First movers in this space offered the novelty of cloud capacity, now that the capability has been commoditized, the end game for vendors in this space is going to be – interesting.

Other vendors will have to react to this change. How is going to be very unclear.

Google will quickly follow with a similar offer. I expect Box to be forced to do the same since they are competing with Google and MSFT for the same customers. DropBox will fight the hardest to avoid doing this but they too will eventually collapse.

Edit: Fixed the pricing to be monthly instead of yearly.

23 architecturalist papers: latency kills

Like this:

Open Facebook API or what to do about Facebook

Like this:

The completely misunderstood IOPS

Like this:

Inside Onedrive

Like this:

Is this really about Google Docs?

Like this:

Dropbox will start showing ads Real Soon Now

Like this:

And now unlimited – MSFT lays down the gauntlet

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: