The trouble with stop-the-line CI

http://philosopherdeveloper.com/posts/trouble-with-stop-the-line-ci.html

Advertisements
Tagged , ,

2 thoughts on “The trouble with stop-the-line CI

  1. Mark says:

    Hey Dan,

    Firstly, I think it’s always valuable to challenge beliefs and assumptions (something which you do so well!) so from that perspective I applaud your article.

    Given that, some points that I would make in response:

    To me your post suggests that a red build means that all development comes to a halt. I’m pretty sure you don’t actually mean this, but that’s how it came across to me and I think it’s worth emphasizing that this is not the case; someone (or some pair) should be assigned to investigate and resolve the issue, but everyone else should be able to continue working (especially when using a DVCS, which allows people to continue committing locally). If the red build is blocking people from continuing work then there should be a mechanism in place to revert the commit that caused it. This is easy in Git, perhaps not so in other source control systems; this is definitely an area that could have been improved on in some of the teams I’ve worked on.

    In the case of low priority bugs, I think it’s better that the team is aware of the bug then not. A decision could be made that there is no need to fix (at least for now) and so the failing test can be set to be ignored by CI (with an appropriate comment added to say why it is ignored). Personally I’d rather that then be rushing to diagnose a problem at 3 o’clock in the morning because an issue has occurred and nobody knows why when everything’s been working fine for so long.

    In my experience, red builds are usually caused by one of two things:
    1) Failure to run the tests locally before pushing to the CI server. This *shouldn’t* be happening, and if it does a lot then perhaps there are some other issues on the team that need resolving (i.e. why are the tests not being run – too slow, devs that don’t see the need or fuss, etc.?)
    2) Flaky tests. These should be identified and addressed sooner rather than later (of course this may be easier said than done ;-))

    On the issue of performance issues, I agree that these can be harder to identify using CI, which could be a reason to push continuous deployment – the sooner it gets in the hands of real users, the sooner they’ll start complaining about performance and the sooner you’ll know there is a problem and you can do something about it. Stuff like canary deployments can be used so that the entire user base isn’t adversely affected by a bad build.

    As always, a thought-provoking post!

    Mark

    • Daniel says:

      Mark,

      You make some great points. On a high level, I agree with what I feel is the general theme of your comment: that I have been a little too hard on CI, and in reality teams that use it practice a great deal of common sense which, for the most part, mitigates the issues I've discussed.

      On a lower level, though, I do have responses to your points (saw that coming, didn't you?):

      First, you're right that on typical projects a red build does not literally cause everyone on the team to stop working (though we both know it can, in rare cases). However, it does prevent developers from committing code, which is not an insignificant problem. Really, though, the case of the entire team halting was just an extreme example, meant to illustrate the point that a policy of treating a red build as a stop-the-line event—whether for the whole team or just one pair (or one dev)—fails to distinguish between important and less-important defects. And we're not even addressing the sadly common case of a red build being a false positive altogether.

      I would also agree that we might as well know about low-priority bugs. But to me, that actually supports the argument that a red build should not necessarily stop the team (or even a pair). The alternatives are (1) that a red build always stops someone, in which case low-priority defects covered in CI might not pay for themselves; or (2) your suggestion, that the team discusses a defect and decides to ignore it in CI. But I would prefer a "yellow" build, for a couple of reasons:

      – Stopping to discuss a defect and decide its priority reactively still costs time

      – Ignoring a test in CI just makes the build go green again, making the defect easily forgotten

      I recently spoke with Sameer on this topic, and he made the point that the fundamental value of CI is that it informs us whether the current state of the code is "deployable" (green) or not (red). I actually like this dichotomy; but from this perspective, I guess my argument could be summarized as that the state of the code is not binary. If anything, I'd prefer to think of the build as either not deployable (red: high-value specs are failing) or maybe deployable (green or yellow: there might be some small defects).

      Maybe we can grab some drinks soon and discuss it further… or fight about it!

      Dan

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: