kostadis roussos

The real story behind net-neutrality?

September 22, 2014 by kostadis roussos Leave a Comment

There is a lot of dialogue about net-neutrality, what isn’t really discussed is this problem:

Jeff Reading, a communications director for Mayor Ed Murray, told MyNorthwest.com that the city wants people to limit their “non-essential mobile conversation” so that cell networks can stay unclogged in case of emergencies.

Basically the mobile internet was not built to handle video and the huge volume of images that we’re creating.

Everyone involved in creating rich media applications knows this or should…

There is a debate raging over how do we pay for the necessary infrastructure upgrade.

This is a classic IT vs Business Group debate with no CEO in charge to make a decision. In a traditional company the business team wants something out of IT, and IT is happy to provide as long the business group provides the budget … These debates go on for months until someone either finds another way to solve the problem or someone caves.

In a more socialist country, the state would just foot the bill and we would have better pipes. In our less socialist country, corporate entities argue amongst themselves and appointed officials make decisions based on interpretations of the law that result in further litigation that ultimately result in someone paying the bill…

The basic problem is the following:

The content management companies and content distribution companies want to force IT to pay for the bill from their budget
IT wants to get a piece of the action and wants the content management businesses to pay more for the infrastructure build out.

And billions of dollars of wealth for the owners and senior managers of those companies are at stake. This will take a while to resolve. The infrastructure build out to support the new network is going to be colossal. We are going to need new handsets, towers, gateways and backbones and all of that is going to be very expensive to replace and upgrade.

The one company that is taking an orthogonal point-of-view to this debate is Google. Google is basically telling IT: Screw you… if you won’t build it we will… And so we have Google Fiber. And Google’s action may force traditional IT to pay for the upgrade through internal operational optimization rather than new sources of revenue … Proving to me, at least, Google is one of the most interesting companies on the planet.

Was a technology conscious prick

September 20, 2014 by kostadis roussos Leave a Comment

http://www.theonion.com/articles/iphone-6-plus-vs-samsung-galaxy-s5,36969/

Now planning to become appearance conscious asshole.

Debugging an archaeological find

September 20, 2014 by kostadis roussos Leave a Comment

When confronted with a bug in a piece of software whose authors are lost in the mists of time, and whose internal workings are opaque and mysterious, debugging can be a challenge.

But that’s why we became engineers, we like challenges.

The first problem is to understand what the nature of the bug is. Typically you get some crash that has a signature that suggests that something went wrong in the machinery of the ancients.

Our first reaction, because we are human is to cry out:

That’s not true. That’s impossible!

How can the machinery of the ancients be broken? It never breaks!

The first challenge is to understand the nature of the breakage. Sitting in a sea of memory with no clue what is going on… you have to begin the process of sifting through the code and the memory to understand what exactly has happened. Not what the bug is, but what the sequence of events that occurred that produced a crash.

The goal is to create a hypothesis that explains how the crash occurred, not the why, but the how.

There is one strategy that is interlinked. The first part is to start reading code and analyzing core files, and looking to see if similar bugs got reported in the past and got swept under the rug. The second part is to desperately and frantically try and reproduce the bug.

Essentially what you are trying to do is gather experimental evidence to guide the analysis of the software and the ancient bug reports.

Once you have figured out how the bug occurred eg: memory got corrupted and that resulted in this set of instructions to execute, the next step is to begin the process of how.

This turns out to be both trickier and easier. Easier because you now know how the crash happens. Trickier because you know need to understand an increasingly larger scope of the system …

In the case of corruptions there are at least two possibilities: structural or wild.

A structural corruption is caused by the code that manipulates the data structure … This is easy because the problem is localised. Tricky because as an archaeoligst you may need to go several layers away from where the crash occurred requiring more analysis to follow the code all the way to the source of the flaw.

More inelegantly, there is a core data structure that is busted, there are other data structures that are related to that data structure, either as inputs or as outputs, being able to see how the dependent data structures look as compared to the corrupted one can guide your investigation. You are looking for the source of the corruption and seeing the input and output data structures can tell you where to start looking depending on their state. Sometimes there are no input and output data structures, just lots of dependent ones but the principal holds.

In this case you want the testing to get narrower and narrower to find the bug faster and in a more focused way. As you get a better understanding of the code and the processes and the internal data structures the testing goes from being – use a thermometer to use an MRI…

Wild corruptions are caused by two unrelated pieces of code causing each other harm. Some random piece of code is causing the ancients code to get fubar’ed. And if the code is vast and large, understanding where that can happen can be hard but not impossible. To attack that problem you a useful approach is to do a brute-force attack on the code to see what combination of features executing in parallel or in isolation can cause the bug. Your goal is to find the places that are running and see how you can find the one piece of code that is doing the wrong thing. This is why reproduction of the bug remains the single most important process in debugging archaeology.

The nice outcome of this process is that in the end the understanding of the mysterious ancient technology is revealed. And with that comes a moment of personal satisfaction. You are now one with the ancients.

And then a desire to rewrite their code overcomes you… and then you too become a new ancient one….

Software Archaeologists

September 17, 2014 by kostadis roussos Leave a Comment

Work at a large company with a massive code base that has evolved over many years, and you eventually have to engage in Software Archaeology.

Software archaeology is the process of trying to understand critical software systems that are poorly documented, poorly understood or not understood at all and super-critical to your system.

Imagine you have some module that you are dependent on that has worked for years, and then a bug is uncovered.

You have to go and learn what exactly the module does. By the time you look at the code, the original authors, and their children have all left the company or even if they haven’t its been years since they worked on the code… Sometimes it’s been years since they looked at code period, now being managers or directors or vice-presidents…

At times you can feel like you’re one of those modern explorers violating some tribal rules by exploring in areas that are forbidden. And if you have to modify the code, you wonder if you are like Indiana Jones about to remove the gold idol…

The problem isn’t understanding the structure of the code, software is software. The problem is understanding the intent of the code: understanding where choices were made that were well reasoned, and where choices were made that were expedient. And the problem isn’t even the software that you have to inspect, but that it’s part of a broader sea of software that the ancients wrote that is equally opaque and mysterious.

And the real problem is that static analysis is fine, but what you really need is to understand is how the running system behaves. How it uses memory, how it uses the CPU, how the data structures grow and shrink, what the heart beat of the system is…

At SGI in the late 90’s I did some compiler research as a master’s student to try and address this specific problem. In particular how do you infer things like locking hierarchies in a multi-million line code base when the original authors of the code have since left? Just reading the code is insufficient. Knowing that locks are taken is important, but you also need to understand things about contention, frequency of locks, interplay between systems that are so widely split apart to be mysterious …

When I tried to solve the problem, I looked at using compilers to go and insert code everywhere where something looked like a lock and then use the testing infrastructure to find the locks and their hierarchies…

And then I discovered that the code of the ancients because it was always working has no tests.

Doing your run-time analysis involves figuring out how to test things that always worked. And then you uncover not just one bug, but hundreds… Or are your tests broken because you don’t understand the code… And you wonder as you play with this mysterious gadget… Are you about to destroy the world? Or save it? How many of these bugs are real? And how many of them are real but other pieces of software have worked around those bugs making the whole system work?

And what will be the blow-back of fixing them…

And while you sit there with the svn commit about to change the structure of the code of the ancients, you wonder if your hubris is about to bite you in the ass… How could code that has worked for so long, be broken?

And your management team looks at you like those tribal leaders who looked at the explorers, with suspicion and doubt and fear. And you can hear them telling the village youth to arm themselves and kill the interloper before he causes too much harm… Or are they like the hot woman or man begging the evil villain to not destroy the world, or asking the hero:

Are you sure this is going to work?

No fear, the ancients were humans, just like you, you tell them… And that bag of sand is about the same weight as the idol…

Run Indy! Run!

The iPhone did change everything revisiting my predictions from 7 years ago.

September 12, 2014 by kostadis roussos Leave a Comment

7 years ago, I wrote a post questioning the Apple Fan Boys statements that the iPhone changed everything.

I got somethings very right. And I, obviously, got some things very wrong.

My assumption that unlike the iPod Apple would not rule the cellphone market turned out to be correct. My other assumption the iPhone would push the design of phones like the original Mac did turned out to be right.

My original assumption that MS and the cellphone vendors would create a viable alternative ecosystem that would own a much broader chunk than Apple turned out to be partially correct. Instead of MS, the real winner was Google who released the Android.

Let’s look at what I said and got right and wrong.

The iPhone was crucial to Apple. And yes it was.
The iPhone was going to push technology trends. Oh boy did it ever
That it was going to be a marginal player. Oh boy was I wrong! WRONG. The share of profits is staggering.
That integration with laptops was important. WRONG! WRONG! WRONG! Obviously I had no idea how much more important integration with cloud services was going to be.
That Microsoft was going to be the bigger threat. OOOPS! Google wasn’t even mentioned! Of course, I wrote the post before Android shipped so that’s my defense. I could not imagine that it would take MS so frigging long to build a credible OS for the phone. And I suspect that Nadella will end that experiment soon.
That the mobile phone providers were going to compete and push Apple into a niche. Right but I guessed wrong on who would do that. Samsung did. Nokia made the strategically boneheaded to go with Windows Mobile instead of Android. A bunch of other players did some good stuff.
The laptop market that I thought was important turns out to be less important than I ever could have imagined for Apple.
And of course, I completely misunderstood the app economy.

7 years later many it’s fun to point out how wrong everyone else got it – it’s even more fun to see how wrong YOU got it 😉

Grand Moff Tarkin Didn’t Want to Pay for Defense in Depth

September 8, 2014 by kostadis roussos Leave a Comment

In Episode IV of Star Wars, Han Solo, Luke Skywalker, Obi-Wan, and Chewbacca are trapped on the Death Star after their jump from hyperspace.

The Storm Troopers are quickly overwhelmed, and our heroes can access a physical terminal. R2D2 is then able to plug into the computer systems of the Death Star.

R2D2 can quickly access all of the information, including schematics and prisoner information.

In the post-mortem on Coruscant, I can imagine the dialogue:

Palpatine: How were they able to access all of the information?

CISO for the fleet: Well, the Grand Moff decided that the cost of adding firewalls and security systems to partition the network was too costly. He chose to rely on a big ass external firewall. His priority was the ability for his teams to access the information not to protect it.

Palpatine: A single droid was able to quickly and trivially get all of our operational information … Because we had no firewall?

CISO for the fleet: visibly sweating Well it’s more complicated than that. A firewall would have delayed the attack, and at the very least made it harder but nothing could have protected us against a determined attack.

Palpatine: A single bot that was put behind our firewall was able to get everything…

CISO for the fleet: Grand Moff Tarkin felt that it was impossible for a bot to escape the station or communicate externally…

Palpatine: Grand Moff is dead?

CISO for the fleet: Yes, Grand Moff is dead.

Palpatine: Pity. At least we won’t need to replace the commander of our space station. I suppose we’ll need a new CISO for the fleet.

Blue lightning crackles from the Emperor’s hand. The CISO for the fleet crumbles. His second in command steps forward…

New CISO for the fleet: Emperor, we’ll re-organize our security protocols immediately.

At Zynga, our security team was – actually – ahead of the curve. Our strategy was not to rely just on a hard shell. We also created internal segmentation of our systems. Basically, we created firewalls around each of our games and each of our systems. This kind of internal segmentation was a layer of protection that I thought was standard practice. More honestly, I thought this kind of protection was unnecessary. The recent disasters show that it is not. Too many people rely on a single external hard shell … unfortunately, once you get through the hard shell, everything is available.

This kind of internal segmentation is not as yet standard practice across the industry, nor was it standard in a galaxy far, far away…

And in all cases, the results were not that pretty…

Market signals and the shortage of CS majors

August 11, 2014 by kostadis roussos Leave a Comment

One of the ongoing themes in the tech industry is the shortage of qualified engineers and what can be done about it.

Having lived through one bubble, I thought I might share this picture:

This image was taken from a talk delivered by Ed Lazowska at the NCWIT 10th anniversary Summit in 2014. The notes are mine.

The general theme of the talk was what to about the increase in computer science students and was this increase a one time event or part of a broader secular growth.

What was of interest to me was the sudden increase in computer science graduates coinciding with a sudden perceived increase in the financial outcomes of CS students.

Very smart people have a lot of options. And all things being equalled, people will choose things that have a higher payout. If the payout is significantly larger than other options, then they will pick a sub-optimal option to get the money.

Philip Greenspun has a really good post about this. The money quote is the following:

A good career is one that pays well, in which you have a broad choice of full-time and part-time jobs, in which there is some sort of barrier to entry so that you won’t have to compete with a lot of other applicants, in which there are good jobs in every part of the country and internationally, and in which you can enjoy job security in middle age and not be driven out by young people willing to work 100 hours per week.

This is how people actually make professional decisions. And as long as the tech industry compares unfavorably to other career choices there will be a smaller number of people in tech than other fields. And that shortage is not just about exposure to CS, it is also about financial outcomes. And yes, articles about ageism don’t help.

Update: A friend of mine made another interesting observation about cyclicality of software. Given that stability is important, the cyclical nature of software also pushes people away from this industry. People who are unwilling to tolerate the kind of risk profile that might result in protracted periods of unemployment or work at substantially lower pay. Combine salaries with job uncertainty and ageism it’s a frigging miracle that anyone is in this industry… Of course, if you read the rest of Greenspun’s article about why men predominate in science you realize that there is analogous argument to be made about the computer industry. Hmm….

As a personal anecdote, in 1994 a very famous professor of computer science told everyone in the room that was studying to get a degree in CS at Brown that we were all doomed. That our jobs were going to go to India. Unsurprisingly the total number of CS graduates that year was 13.

By 2001 the total number of graduates at my school had gone over 100 (in a class size of 2001) a 10x increase.

The dot-com bubble had made CS and the web the way to make money.

And then it collapsed in 2004.

I always wondered if that was a Brown University phenomenon. Turns out it is an industry wide phenomenon.

And … here we go again… As the payout increases, the number of CS majors increases…

My one – obviously self-serving – observation is that if tech companies really are experiencing a shortage of employees, they do have a strategic option open to them – dramatically increase the salaries of folks in tech to compete with other industries like finance.

Storage Arbitrage

July 22, 2014 by kostadis roussos 2 Comments

Have been looking at options for cloud storage at the 1TB capacity limit.

Google offers 1TB for 9.99$ a month.

MSFT offers 1TB for 69$ ~~a month~~ <ooops> a year! with MS office apps and 1TB per user for 5 users for 100$

Dropbox offers 500GB f0r ~~99$~~ <000ops> it’s actually 499$ a year

Why I hate the Great Filter known as Code Reviews

July 20, 2014 by kostadis roussos Leave a Comment

Code reviews have in the wrong environment enabled engineers to relieve themselves of the accountability for writing good code, turned technology leaders into powerless whiners, and enabled managers to ship on time crappy products while remaining blameless for the crap.

Much like the great filter that supposedly eliminates civilizations, code reviews are supposed to eliminate bad code and in the process of being used in that way, eliminate good code as well. The net effect is that the code reviewer is reviewing crap all day. And as a technologist that is very very annoying.

Let’s step back.

A lot of management culture in many large organizations is focused on making sure the lazy ass employees do their jobs correctly. The theory being that at scale, the average employee is … average.

Given the average nature of the average employee, then how do you make sure that the quality doesn’t degrade below average?

Hold the managers accountable for the quality of their work.

In tech companies, a key part of the work of engineers is writing code.

The approach some companies take is to have the managers accountable for the quality of the code.

The problem is that managers are also accountable for shipping on time. And the pressure to ship on time creates an unbearable pressure on the the managers to create a culture that pushes for date over quality over code.

The solution is to create a separate set of leaders who are responsible for technology quality, people like me who have titles like Distinguished Engineer, Architect etc.

The theory being that the tension between the technology leader and the manager will result in solutions that meet the business requirements for both date and quality.

An unfortunate outcome of this process is that the technology leaders are then tasked with ensuring that the code quality is good enough because that is their job. And the process that is used to ensure that code is good enough is the code review.

The problem, and this drives me nuts, is that in the wrong hands what happens is the accountability for the quality of the code is on the reviewer not the author.

A big bug escapes and the first question isn’t:

Who wrote the code

but

Who reviewed it?

Who tested it?

This creates a perception that the author of the code isn’t actually accountable for the bug. The author is, even though they wrote the code, blameless.

This makes the reviewer of the technology the organization’s bitch. On the one hand you’re accountable for the quality of the technology, on the other hand no one reports to you, and managers decide bonuses.

Guess what loses?

Yup, the quality of the technology…

How do you fix this mess?

The first is that you need to change the culture of what a code review is there for and you need to change the relationship between the manager and the technologist.

The code review has to be not about finding bugs but improving the overall quality of the product being produced. The way I like to describe it, everyone is supposed to be doing their best work, and reviews are to make great work better not filtering out bad work.

The manager and the technologist have to both be responsible for the quality of the technology and the date. If the project misses the date or the technology sucks they both failed.

Returning to the title of my post, code reviews when they act as a filter instead of a booster reflect a dysfunctional organization that is broken at it’s core. And the reason I have hated code reviews is that the problem wasn’t the review, the problem was the underlying first principles that motived their existence.

Rethinking the Internet of Things

July 11, 2014 by kostadis roussos 3 Comments

Over the last month I’ve been struggling with the Internet of Things. My scale has an internet connection, my TV has an internet connection, my toe has an internet connection and soon my watch will have an internet connection and managing the WIFI passwords and connectivity was a Pain-in-the-Ass.

I kept telling my wife that we need a better solution.

Turns out two very interesting technology trends are going to radically remake the internet of things.

The first is that 4G LTE chips are really cheap. For those not in the know, it used to be the case that building a chip that could do cellular was black-magic, but with 4G LTE this is no longer the case. Thus the how do I connect to the internet is really a how do I make a cellular call and that’s a much simpler user experience.

The second is that BlueTooth LE is transformative. If you think about the internet of things, the things are actually generating a small amount of data very frequently or infrequently. For devices that don’t have enough power to justify placing a 4G LTE chip, BlueTooth LE is a really interesting alternative. BlueTooth LE, if you believe the marketing buzz, will allow a device to transmit longer than the life time of the battery without requiring a battery replacement. Given the range of BlueTooth LE and the existence of the ultimate BlueTooth LE receiver – your cell phone, I can totally imagine a combination of BlueTooth LE talking transparently to your cell phone and your cell phone talking transparently to the internet to transmit the data.

The core objection I had to Internet of Things seems to be addressable with existing technology that is going to market right now.

Cool.

The real story behind net-neutrality?

Like this:

Was a technology conscious prick

Like this:

Debugging an archaeological find

Like this:

Software Archaeologists

Like this:

The iPhone did change everything revisiting my predictions from 7 years ago.

Like this:

Grand Moff Tarkin Didn’t Want to Pay for Defense in Depth

Like this:

Market signals and the shortage of CS majors

Like this:

Storage Arbitrage

Like this:

Why I hate the Great Filter known as Code Reviews

Like this:

Rethinking the Internet of Things

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: