Software – Page 3

How do box vendors get disrupted?

October 22, 2014 by kostadis roussos 2 Comments

One of the more interesting questions confronting anyone who works at a box company, like I do, is what causes a vendor to get disrupted?

There are a lot of business reasons, and technical reasons covered in a wide variety of sources…

My pet theory is the following:

A box vendor produces a box that has to be deployed as a box because of what it does. For example, to switch you need a box that can sit between three physical cables and make decisions about where to forward the packets.

Deploying a box is a pain in the ass. Replacing a box is hard.

And the problem is that once you, as a customer, deploy a box, you realize that you need the box to do more stuff.

And the vendor starts adding software features into the box to meet that need.

And at some point in time, the box vendor believes that the value is the software and not the box. And they are partly right, except that the only reason the customer is buying the software from the box vendor is because they must buy the box.

And the box over time becomes an increasingly complex software system that can do more and more and more and more.

And software engineers hate complexity. And where there is complexity there is opportunity to build something simpler. And competition tries to break into the market by making a simpler box.

The problem with the simpler box is that if the set of the things a customer needs to do is A, and you can do A/2 – you’re simpler and incomplete. Inevitably you will become as complex as the original box.

What causes the disruption is when the customer no longer needs to deploy the box.

To pick an example that I can talk about, many vendors in the storage industry used spinning rust disk drives to store data. When customers decided that they no longer wanted to use spinning rust to store data, vendors like Nimble and Pure started to win in the market because they stored data in flash.

Nimble and Pure certainly didn’t have the feature set of their competitors – how could they. The reason they won deals was because the decision criteria for the customer wasn’t software it was the desire to store the data differently on a different kind of physical media – flash. The combination of a customer desire to store the data differently coupled with a simpler box made it possible for Nimble and Pure to win in the market place.

To put it differently Pure may, for all I know, have A/5 of the features of the competition, but if the first order decision is that you want to store data on flash in an external array, then that is irrelevant because you’re not comparing Pure to a spinning rust array, but Pure to another flash array. And there Pure has an advantage.

The networking industry has stubbornly resisted disruption for years. And part of the reason is that the physical box hasn’t really changed over time. Parts of the industry have changed, and overall the same leaders are still winning.

However, there is a possibility of a disruption in the networking industry, in particular, in the modern cloud data center.

The reason being that for the first time in a long time, the fundamental network stack may be re-wired in a very unique way.

In an earlier post, I discussed thee Network Database. In a traditional network, every network element has to be a full fledged participant in the Network Database.

And like traditional applications that have to interact with a database to do anything interesting, network services must also interact with the Network Database to do anything interesting.

And it turns out that building an application that uses the Network Database is hard, unless your application fits into that model and … well … runs on the network element.

Companies like to whine that network vendors are slow, maybe they are – or maybe the problem they are trying to solve in the way they are trying to solve it is just hard and takes time. Having worked with folks in this industry, I am convinced of the hardness thesis rather than the laziness thesis.

SDN – has the potential – to disrupt the model of software applications being built as distributed services running on multiple network elements. For one reason: it actually makes building network applications easier because it aligns with how the vast majority of programmers think. Building applications out of distributed protocols is hard. Building applications on a centralized database is easy. And there are claims that well you’ll need multiple databases to scale, and it turns out that too is easy – after all that’s what the web guys have been doing for years.

And that creates an interesting disruption in the network stack. That is different than flash and disk drives but potentially as massive.

The value of the software stack that the traditional vendors have built over time begins to diminish as more services get built using a different model. One argument is that it will take time for the new services to be as complete as the old model. And that is true. If you believe, however that the new programming model is more efficient and expands the pool of programmers by a step function, then the gap may be closed significantly faster.

Having said all of that, I am reminded of a saying:

Avec des si et des mais, on mettrait Pari does une bouteille.

The Network Box vendors are making their strategic play as well, and the industry will change and we will most likely still see the same players on top ….

Lambda C++

September 27, 2014 by kostadis roussos Leave a Comment

After a large time gap between large C++ systems, been catching up on the language. Feels like meeting a high-school friend who you didn’t friend on Facebook.

One of the things that got me to realize that this was not my childhood’s C++ was the existence of lambdas.

At first, I was like: EWWWW… First we had Java envy and now we have Scala envy… does anything ever change.

Except now that I am starting to dig into this little feature, the fact that you can write this piece of code is wicked convenient:

vector<int> v;
v.push_back( 1 );
v.push_back( 2 );
//...
for_each( v.begin(), v.end(), [] (int val)
{
    cout << val;
} );

My personal frustration with using STL may be finally overcome…

Debugging an archaeological find

September 20, 2014 by kostadis roussos Leave a Comment

When confronted with a bug in a piece of software whose authors are lost in the mists of time, and whose internal workings are opaque and mysterious, debugging can be a challenge.

But that’s why we became engineers, we like challenges.

The first problem is to understand what the nature of the bug is. Typically you get some crash that has a signature that suggests that something went wrong in the machinery of the ancients.

Our first reaction, because we are human is to cry out:

That’s not true. That’s impossible!

How can the machinery of the ancients be broken? It never breaks!

The first challenge is to understand the nature of the breakage. Sitting in a sea of memory with no clue what is going on… you have to begin the process of sifting through the code and the memory to understand what exactly has happened. Not what the bug is, but what the sequence of events that occurred that produced a crash.

The goal is to create a hypothesis that explains how the crash occurred, not the why, but the how.

There is one strategy that is interlinked. The first part is to start reading code and analyzing core files, and looking to see if similar bugs got reported in the past and got swept under the rug. The second part is to desperately and frantically try and reproduce the bug.

Essentially what you are trying to do is gather experimental evidence to guide the analysis of the software and the ancient bug reports.

Once you have figured out how the bug occurred eg: memory got corrupted and that resulted in this set of instructions to execute, the next step is to begin the process of how.

This turns out to be both trickier and easier. Easier because you now know how the crash happens. Trickier because you know need to understand an increasingly larger scope of the system …

In the case of corruptions there are at least two possibilities: structural or wild.

A structural corruption is caused by the code that manipulates the data structure … This is easy because the problem is localised. Tricky because as an archaeoligst you may need to go several layers away from where the crash occurred requiring more analysis to follow the code all the way to the source of the flaw.

More inelegantly, there is a core data structure that is busted, there are other data structures that are related to that data structure, either as inputs or as outputs, being able to see how the dependent data structures look as compared to the corrupted one can guide your investigation. You are looking for the source of the corruption and seeing the input and output data structures can tell you where to start looking depending on their state. Sometimes there are no input and output data structures, just lots of dependent ones but the principal holds.

In this case you want the testing to get narrower and narrower to find the bug faster and in a more focused way. As you get a better understanding of the code and the processes and the internal data structures the testing goes from being – use a thermometer to use an MRI…

Wild corruptions are caused by two unrelated pieces of code causing each other harm. Some random piece of code is causing the ancients code to get fubar’ed. And if the code is vast and large, understanding where that can happen can be hard but not impossible. To attack that problem you a useful approach is to do a brute-force attack on the code to see what combination of features executing in parallel or in isolation can cause the bug. Your goal is to find the places that are running and see how you can find the one piece of code that is doing the wrong thing. This is why reproduction of the bug remains the single most important process in debugging archaeology.

The nice outcome of this process is that in the end the understanding of the mysterious ancient technology is revealed. And with that comes a moment of personal satisfaction. You are now one with the ancients.

And then a desire to rewrite their code overcomes you… and then you too become a new ancient one….

How do box vendors get disrupted?

Like this:

Lambda C++

Like this:

Debugging an archaeological find

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: