Identifying the problem requires understanding what correlation is (and what it isn’t)

Lately where I work (though I won’t be there much longer) we’ve been grappling with performance issues. Just this morning I deployed a new component which I had measured to be about 10x as efficient as the component it replaced, based on both execution speed and memory usage. But later in the day my boss called me to say that it seemed to hurt performance, and to ask me to revert it back to the way it was before.

A CPU usage graph

Aaahhh! My poor CPU!

Now, before I go on, a concession: performance really is in the eye of the user. If an application appears to operate more slowly in the user’s eyes, then it might as well be treated as fact. This is a well-researched phenomenon: copying files in Windows Vista is faster than in Windows XP but feels slower; Google Chrome appears to load up much faster than Mozilla Firefox, even though Firefox is nearly as fast; and so on. So I cannot claim that my boss was “wrong” for wanting to revert back to the previous version.

To be honest, what bugged me most about it—aside from the fact that I’m a proud person, and I do tend to get a bit defensive when I feel my work is being judged unfairly—was the metric used to determine that “performance” was worse: CPU usage.

Now, I have two points to make, really. The first is pretty straightforward: to say that a piece of software is performing poorly because it is causing high CPU usage is kind of like declaring that a project team is performing poorly because everybody’s working. Can you imagine if this same metric were applied to employees in an office environment?

The QA team seems to be very inefficient; everybody’s always doing something.

The dev team, on the other hand, is running quite optimally; only half of them are working at a time!

So that bugged me a little bit. But let’s be practical: high CPU usage certainly can be a symptom of a poorly performing piece of software. And if you see the CPU usage go up, and then you notice other indicators that your software is acting sluggish even under the hood (e.g., in the case of my company—a trading firm—we started to fall behind the market in our pricing calculations), it’s very reasonable to associate high CPU use with poor performance.

Now here’s my second point, which is a little more subtle: correlation is not the same as causation.

Xkcd comic about how correlation is not the same as causation

What does this mean? Whenever I think about this common misunderstanding, I’m reminded of a comment my biology teacher made to the class when I was in ninth grade. He walked into the classroom one day and announced: “If any of you is curious to know how you can live longer: get a dog. Apparently they did a study and people with dogs live longer, healthier lives.”

Even at the time, as a stupid teenager, I knew there was something fishy about this claim. Do dogs actually make you live longer? I thought, or are people with dogs just the sort of people who live longer anyway?

In case you’re not getting my question, let me give a more obvious example. Let’s think up some harmless generalization… for instance, most of the best basketball players are very tall. Say we were to graph this trend on a pair of axes where X is a player’s basketball skill and Y is his/her height. It might look like this:

A graph of height versus skill

Now, suppose I look at this chart and conclude, So if you want to become taller, get really good at basketball!

Wait, no. That’s backwards, isn’t it? Just because two things are correlated—i.e, they tend to occur together, in a statistically significant way—that doesn’t necessarily say anything about a causal relationship between them. Obviously, getting good at basketball is not going to magically make you taller. If anything, it’d be the other way around: being taller might make you better at basketball (though I can’t really bring myself to declare something so simplistic as if it’s an actual fact).

This is why I was so skeptical of my biology teacher: is it really that having a dog causes one’s health to magically improve? Maybe. I’m not saying that it’s definitely untrue. But doesn’t it seem just as reasonable—if not more so—to suggest that healthy people, who eat better and get more exercise than the average member of the population, might just be, on average, the sort of people who are more likely to own dogs?

So when I hear, “Performance has gotten worse because the CPU use has gone up,” my first instinct is to ask the question: What is really going on here? And if the reasoning for this conclusion is, Whenever CPU usage goes up, the performance seems to get worse, I have to wonder if this is conflating correlation with causation in the same way my bio teacher did with dogs and longevity.


One thought on “Identifying the problem requires understanding what correlation is (and what it isn’t)

  1. Kathryn says:

    A quick Google search didn’t reveal what I was looking for, but I *swear* I read a study done once about how elderly people who were given houseplants to take care of lived longer than the control group. I don’t remember if I drew this conclusion, or if the study did, but in my mind it was because taking care of something gives you a sense of purpose. Could it be the same for taking care of dogs?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: