Designing a continuous deployment system: cautious deployment

This article will be about a deployment scheduling technique which I call operationally cautious deployment. (I didn't pick the name so the acronym would end up being OCD! Honest! I swear!).

Note: This isn't new. I didn't invent it. The fact that a big number of people around the globe with similar problems have come up with this idea independently and that most of them are happy with the results speaks for itself: this is mostly just common sense thinking applied to fixing a real problem. It's one of those things that sound obvious when you have them explained to you, but you probably wouldn't come up with unless you spend quiet a bit of time pondering the problem.

So you have a wonderful continuous deployment system! Congratulations! You deploy a new version, monitor your cluster of hosts, and create wonderful graphs:

[[posterous-content:NBFS9fiUjfTCi8Xr5KIW]]

Except, of course, it's not that simple. Your test suite, no matter how much you've honed and polished it, is imperfect. It will contain subtle assumptions that just aren't true in deployment. (When I say "subtle assumptions", I mean the kind that you don't know are assumptions until everything blows up.). If you're going to be deploying software all the time, deploying something that has a fatal mistake in it is a matter of when, not if. So one day, the real graph is going to look something like this:

[[posterous-content:5k8ymL4PxLDrYfbFgL8h]]
It would be much nicer if we could deploy more cautiously. Try it out, and only go through with the complete deployment. A sort of live test, deploying with a feedback loop. That might look a bit like this:

[[posterous-content:apvQ27GHbg2rEh2CWgEa]]
That deploys the new version to 2**N servers, with N starting at 0 and increasing until all servers use the target version. The number of servers deployed to is exponential, because that gives us the behavior we wanted: slow and cautious at first, and rapid as soon as we think it'll work.

Of course, that doesn't give you an example of a failure mode. Failures look a bit like this:
[[posterous-content:tibBqAUzF8OCiIOWmnDZ]]
Just like in the previous picture, a roll-out of a new version starts. Except this time, the new version doesn't actually work. The system catches this, and reverts the hosts that have already converted.

Of course, the package is still broken! You didn't fix the problem yet, you just reduced its impact. Your dev room should still look roughly like an control room from a bad Cold War movie right after the other party's launched a bunch of nukes.

It's just another bug, albeit a particularly nasty show-stopping one. Like all bugs, they're not so much bugs in the software as they are bugs in the test suite that incorrectly specified the requirements for the software. This becomes painfully obvious here: there's something broken, but your test suite didn't catch it. Hopefully, it's something that's at all testable.

The difference between this state of panic and the original state of panic is that, ideally, your users never noticed there was a problem.

There's one obvious problem with all of this. Being able to continue deployment on positive feedback and revert deployments on negative feedback relies on actually having feedback. Useful health metrics for a cluster are halfway between art and science, and are definitely outside of the scope of this article.

The next article will be about another benefit of cautious deployment: continuous availability. That one is probably going to be halfway between an article about continuous deployment and a rant about how people need to learn more statistics.

Designing a continuous deployment system: deployment formats and content

Hi and welcome again to my articles on continuous deployment! In this article, I'm going to talk about what exactly I think you should deploy and what format you should use to deploy it in.

Again, the stuff here probably isn't super controversial. I'm probably not kicking any shins in this one unless you've already done continuous deployment and you picked something else. Analogous with the last article, if you don't really care about deployment formats, save yourself the time and don't bother reading.

When getting in touch with a few people I know are doing continuous deployment, there are a number of things I've heard people say about formats and what exactly to deploy. Here's what they used:
  1. Distribution packages (debs, rpms...)
  2. Language's native packages (eggs, gems, ...)
  3. The development environment
  4. The test environment
(Although it appears that this list confounds two things (formats and content); that's not entirely true. These packages are generated using standard practice packaging rules with standard content, almost as if they were part of a distribution/PyPI/Rubyforge...)

Some people have reported success deploying native packages. It definitely has it's advantages in terms of accounting, plus you get to use all of the tools already available for managing package installation on large networks. (As a practical example: PPAs and apt repositories in general, together with Landscape or other tools for managing updates for Ubuntu servers.) Hence, the choice for native packages is understandable: in big existing companies, using existing infrastructure is a big plus.

When greenfielding, however, I would prefer to avoid both distribution packages and native packages. One problem is that pretty much all of these are designed for system-wide installations. There are methods of getting around this, such as Python's virtualenvs, but that makes the system specific to a particular language and set of deployment tools, and I've already argued in a previous post why I'd rather avoid that unless there really is no better option. Since nowhere near every reasonable development platform has such a sandboxing tool, it's not possible to form an abstraction over it.

A related problem is that it's generally impossible to reliably install multiple versions of these concurrently in a widely supported way (again,with the exception of language-specific tricks like virtualenvs). That's not necessarily a huge problem, but it does make reverting to an older version slightly harder. That might not seem like a big deal, but easy reverts are a big part of cautious deployment (more about that in the next installment). Additionally, there's nothing to prevent a bad/broken package from having side effects that break older installs as well -- as usual, shared mutable state is asking for trouble.

Another thing I've seen people package is their development environment. Unless you're also using this as the base for your tests, I think this is a very bad idea. I think we've all had cases where something worked fine on a development box, and started breaking for new checkouts. During development, all sorts of stuff Some languages/development environments are more susceptible to these problems than others, but they're prevalent enough to want to avoid the problem altogether.

Even if you are testing your development environment, I dislike it for purity reasons. The people doing this said they did that because it was the easiest way to get stuff running, and they just went along with the path of least resistance. As much as I'm willing to believe that's true, I also believe that if you need a developer to chant some magic runes and sacrifice a chicken over a checkout of your code before it'll actually work, you're doing it wrong.

I think repackaging test environments for deployment is a sensible idea. I talked this over with Holger Krekel, the guy who wrote Tox, and he immediately pointed out a flaw in that plan: it'll only work reliably if the testing environment is identical to the deployment environment in terms of platform, architecture, versions of available software... I thought this was a limitation at first, but I'm now convinced this is a useful limitation (another idea I stole from Holger ;)). If you didn't test it in environment X, what business do you have deploying it there?

I think I've got most of the annoying administrative stuff out of the way now, maybe we can move on to more interesting things that are actually specific to continuous deployment :-) That starts with the next article, which, as I've already hinted above, will be about "cautious deployment", a robust technique for scheduling deployment.

Well now, this changes everything.

The previously planned crazy installment (I was going to install my own Launchpad instance) has been cancelled (or maybe deferred) to bring you this important announcement.

For those who still haven't heard yet (sorry about the late blog -- this happened around 4AM my time), Atlassian have bought Bitbucket and are doing crazy awesome things with it.

They changed some pricing plans, too. If you remember my graph from last time, it now looks a bit more like this:
[[posterous-content:MGLlpFXYHW3ksEDqpcWR]]
If you remember my previous rant, the big problem I had with Launchpad was pricing. Specifically, I think they charged based on the wrong property: number of projects versus size of team.

So Bitbucket went out and did just that. Every Bitbucket plan now gets an unlimited amount of space and repositories (both public and private!). What they charge on is size of the team. Get this: everyone now gets infinite private repositories that they get to share with 5 people. That pretty much means that Atlassian just gave 80% of the small commercial dev houses I know free hosting. Even better, if you change your plan before October 3, you get 10 users for a year. That's 90% of the small commercial dev houses I know right there.

That's pretty darn cool.

Disclaimer about the graph: my idea of "big dev house" probably isn't your idea of big dev house. Obviously the $80/month = $960/yr you'd pay at Bitbucket for unlimited private collaborators is about four times what you'd pay at Launchpad for an equally sized (that means "more than fifty", mind you) team if they only work on one project. Since they charge based on different properties it's very hard to make a sensible price comparison: I think Launchpad has to compete based on features, and not on price, because fifty people working constantly on one project doesn't sound like the kind of dev team I'd like to see more of :)

I think it's time to revisit Bitbucket. Even if most of the tools end up being horrible, you just can't argue with 'free'. If feature X in Bitbucket ends up being unusably bad, I can just install my own code review tool. I've been told Bitbucket repositories are accessible over SVN, so it may even be possible to hook it up to Rietveld and end up with something where I do zero hosting.

Regardless of what happens to Bitbucket, I'd like to congratulate Jesper on Atlassian buying Bitbucket and wish him the best of luck in whatever he does next, be it Bitbucket or something else.

Designing a continuous deployment system: on being language agnostic

Hi, and welcome again to my series of rants about continuous deployment :) In the next few posts, I'll try to scope out what I think a hypothetical greenfielded continuous deployment system should and shouldn't do, and then get that down to what a feasible continuous deployment system should and shouldn't do. Hopefully there won't be too much cutting.

Fair word of warning, there's some ground rules to cover first before I get to talk about anything really interesting. This article, specifically, is about being language agnostic. It's not exactly a very controversial thing, and is mostly here so I have something to point to when someone asks me why I did X rather than Y. If you're willing to concede that the added complexity of supporting arbitrary languages and build/deployment systems is worth it, feel free to skip this one.

Python's great and all, but lock-in is pretty much always bad. There's two big reasons why I think this is worth doing, despite potential added complexity:
  1. You want other people to use your system.
  2. You want to be able to use different things, now or in the future.
The first point is pretty straightforward. For something as critical as a continuous deployment system, you'll want a lot of other people using it and contributing to it. In CI systems, this sort of traction is what made Buildbot great and Hudson popular. Selling people something that works with whatever it is they have is a lot easier than selling people something that works with whatever it is you have.

The second point is just because nothing can be the best way to express every possible problem all the time. You might want to use different tools, if only purely for the sake of experiment.

If your architecture makes it possible, why not allow a programmer to foray into the slightly unknown? Who knows, you might develop something nice. Programmers are typically creative people: giving them a lot of a freedom to tinker tends to repay itself many times over (Google figured this out with 20% time).

The bottom line is simple: don't make it work for Python only. Or, which is more like what I'm going to do: make it Python only for now, but don't make it hard for other people to do whatever they like with it.

The next article is going to be about what to deploy and the format to deploy it in.

A continuous deployment interlude: rolling your own development stack

Hi, and welcome to the second part to my ranting about development stacks. In my previous post, I talked about the issues with some of the popular hosted solutions (Github, Bitbucket and Launchpad). In this post, I'll try to roll my own dev stack that hopefully doesn't suck too much.

Recall from the last post that I'm looking for three things:
  1. version control
  2. issue tracking
  3. preferably some form of code review
I think it's a good idea if these things are integrated. Specifically, I think it's a great idea for all of these things to live in the repository. That makes them very easily accessible at every step along the way: your continuous deployment system shouldn't care about deploying software when it has known major bugs in the issue tracker, for example. Code review is integral in resolving tickets, because resolving tickets generally involves changing or adding code (and in both cases, you really should have code review).

It's fine for this code meta-data to be available in other ways than reading it out of the repository, as long as it satisfies some useful definition of "available". You could just speak HTTP to your issue tracker or code review tool's web page . It just adds another layer that can break -- people who've had to manage these tools can probably imagine the problems when your fancy new CI system reports failed builds, and it's really about Trac being down, or some software upgrade destroying your ability to scrape a web page.

With that in mind, let's look at the available stuff for rolling your own stack. First, I filtered and ordered tools by the amount I'm willing to put up with them, and came up with:
  1. Bazaar
  2. Fossil
  3. Mercurial
(That doesn't mean I think everything else won't work. It means I didn't consider them. I'm only one guy -- barely have enough time to research this, let alone produce an exhaustive list.)

For Bazaar, well, outside of Launchpad life isn't very interesting and you come up with the usual suspects. For issue tracking and project management you've got stuff like Trac and Redmine. In terms of distributed bug tracking you've got Bugs Everywhere and Ditz (I can't figure out which is nicer -- I'm defaulting to BE because it's Python so easier for me to extend -- if anyone has experience with both please chime in). In terms of code review, well, there's Review Board. None of these things are Bzr-specific (which is great, less lock-in is better, except some of them don't have Bazaar support either, which is annoying if it's the tool you're going to be using all day and every day.)

Next up: Fossil. Fossil is pretty obscure; it's an SCM built on top of SQLite, written by the same author. That's a big plus, if you didn't know it yet, SQLite is some of the most extensively tested, well-engineered yet still simple code out there. Fossil works well with my earlier argument about as much metadata as possible being managed inside your repository. It comes with an integrated wiki and issue tracker. The entire repository is stored in a SQLite database. A nice feature that it shares with Trac is the ability to create your own queries over tickets using SQL. Twisted has used that successfully to improve the code review part of their issue system (see {15}, which is review tickets in the order they should be reviewed). Fossil is also very easy to host: in fact, Fossil's website is Fossil serving itself. For. I've had a brief chat with Zed Shaw about Fossil since he's used it quite extensively in real projects such as Mongrel2 (note that that's Fossil serving that webpage again), and he seemed pretty happy with how it works in general. Fossil's easy enough to use for hosting multiple repositories itself: there's a somewhat unfortunate but understandable choice for using CGI, so you can use pretty much anything that can serve CGI to serve Fossil web pages for you.

Last up: Mercurial. When it comes to code review, I've already explained in a previous post why I think Bitbucket's pull requests are a missed opportunity. (Short version: they make a distinction between local branches and branches in different repositories, which is in my opinion unnecessary. I think the way Launchpad allows merge proposals between arbitrary branches is much better. Github has recently copied this behavior somewhat with Pull Requests 2.0.). However, the main reason I wanted that is for a point to introduce code review. Georg Brandl (birkenfeld) pointed out that Mercurial has the pretty impressive hg-review extension. It has a sleek, usable web interface (try it!). It also has the advantage of the point I made earlier: data is included with your repository, you don't need to go fish for it. Even if you don't use that now, it has the great advantage of being able to do cool things with it later (such as identifying recurring problems in the review process and pointing those out to the reviewer/author).

Stay tuned for the next installment, where I try something totally crazy.

A continuous deployment interlude: picking a development stack

At some point in time, you're going to have to pick a set of tools to do your dev work with. You probably want a bunch of things, including:
  1. A version control system. Preferably not CVS, Visual SourceSafe, ClearCase, Perforce, RCS, SCCS...
  2. A ticketing system. Preferably not Trac.
  3. A code review system, possibly just informal code review as part of the ticketing system.
Like pretty much everyone, I'm not without bias. I love Bazaar. It's the best version control system I've ever used (and I've used most of the popular ones). I've got a number of reasons why I prefer it over git and hg, but I'm not going to debate VCSes here. All of the version control systems I've just named are all pretty great pieces of software, far superior to what we had to put up with before we had them. That's unfortunate, because then Parkinson's law of triviality comes into effect, and people who really don't even care very much either way will feel the need to defend their favorite piece of software to the death. I think we should come to terms with VCS wars being the new editor wars. (The inverse relation between clue and volume is an unfortunate perennial in these things.)

At my old place of work, we found a way around that. We had an SVN trunk, and no human was allowed to commit to it. There's a bot takes patches which commits to trunk -- and patches are only sent after two (later three) people sign off on the code review. One obvious advantage is that you completely avoid religious wars. I don't have to convince people to switch to $AWESOME_VCS. There's also the problem of authorship erasure: because the bot commits everything and this was a very simplistic bespoke system, tools like svn blame stop working. At first sight, this seems pretty bad. I'm not entirely convinced: it's not great, but on the other hand, it is a sort-of neat way of enforcing collective code ownership. Overall, I think this was a nice experiment, but it's about time we stop mucking around and build some better tools.

I started by looking all of the hosted services and the DVCS that powers them:
  1. Launchpad with Bazaar.
  2. Bitbucket with Mercurial
  3. Github with git (and hub)
Because of my affinity for Bazaar, Launchpad obviously came first. Alas, there's a problem with Launchpad for small commercial projects. Pricing.

[[posterous-content:shOrCG05NvDQO8UBylbu]]
Here's how it works. Github (in dark red) has a bunch of plans for private repositories. Bitbucket (in light blue) has a bunch of plans for private repositories. Launchpad (in yellow) has a one size fits all $250/yr/project plan.

I am in no way saying $250 is an unsurmountable amount of money for a company. I am saying that at the low end of the spectrum, you get a whole lot more bang for your buck at Github and especially Bitbucket than you get at Launchpad. The services they provide are not equivalent: merge proposals are quite a bit better than pull requests in both Github and Bitbucket (although Github's Pull Request 2.0's are halfway there, their pull-request-is-an-issue philosophy makes it impossible to use pull requests as proposed solutions to tickets), and the issue tracker feels a lot more polished than both Github's and Bitbucket's (although between those two, Bitbucket probably wins that round). I'm not saying it's not better. I'm saying I'm having a hard time justifying that it's around $200/year/project better. It's not that Launchpad's expensive as much as the competition is really, really cheap.

Let's say I'm a tiny bootstrapping startup with somewhere between $10k and $30k to spend. If you've got one project, $250/yr that's quite all right. However, building your big app as a set of smaller ones has a number of advantages, in terms of administration and deployment. Once I've got 5+ tiny projects (which, put together, make up the code in my startup), it's about $1250/yr. Still not insurmountable, but I've got more interesting things to spend $1k+ on than Launchpad. $1k/year for hosting is not a sensible way to spend your money when most of those projects only get a few tickets a week.

So, just to be clear: I'm not saying Launchpad sucks. I think Launchpad is great. I'm not saying Launchpad's pricing scheme is asinine. I'm saying their pricing scheme works out very badly for what I want to do, and that I'm having a hard time convincing other people the extra cost is worth the difference in functionality.

In the next post, I'll talk about rolling your own development stack.

Building a continuous deployment system, part three, preparing for continuous deployment

This post is just about the process of moving a dev shop to continuous deployment. If you're just interested in the things I'd like to build, feel free to skip this one.

Despite the ominous messages in my last post about continuous deployment changing your dev shop dramatically, I do think it's possible to prepare properly. That means that you get up to the point where you could be doing continuous deployment, but you just aren't. Or, to put it differently, imagining you had to release right now, and getting trunk up to the point where that doesn't raise your blood pressure.

The first step is getting your testing up to an adequate level. I can't stress this enough -- I'm hoping it won't be a tough sale considering the amount of people who've hammered it before me -- given today's tools, there is very little excuse left for not having an extensive suite of things that check your code. I'm deliberately not saying "unit tests", because it encompasses a lot more than just unit tests:

  1. unit tests (These still go up as number one because they're what you're dealing with most. Plus, out of the things in your code that are just plain broken, unit tests will help you find at least 90%.)
  2. static analysis (pyflakes is great, but Python is quite hard to do sensible static analysis on -- statically typed languages, for example, can expect to reap much greater benefits)
  3. integration tests (although these are generally part of my normal test suite and are ran together with unit tests, they're really quite different)
  4. user interface tests (if you're building a webapp, Selenium is wonderful)

Testing is very important, but it's hardly the whole story. Testing (with the minor exception of some forms of static analysis) make sure your code works -- it makes no guarantees about that code being any good. For now, computers aren't too great at judging code in such as subtle fashion, and you really want humans to do it for you. Especially in languages that let you do pretty much anything, code review is very important if you want to keep your code sane. Like test-driven development, a lot of people much smarter than me have driven this one home for a long time, so I won't try to convince you too much.

Another advantage of code review is collective code ownership and a resulting increase in bus number. High bus numbers are very important, especially so when you're doing continuous deployment. You can't, or at least shouldn't, delay a release. Once something does go wrong, fixing the situation should be everyone's immediate priority. Only one person understanding how a particular part of the project works slows things down. That sounds like a restriction, and in some ways it is, but it's the kind of restriction that forces you to do the right thing.

I have found that a lot of people do one kind of code review, and contrast them. I've had good experiences with doing extensive and multiple kinds of code review. Most people believe that to be a waste of time and resources. I'm not convinced. First of all, there's Linus' Law arguing for many reviewers. Secondly, I've found that different kinds of review (and different kinds of reviewers) spot different problems, because they care about different things:

  1. A pair programmer will spot nitpicks about code quality and save you a bunch of "duh" moments
  2. A reviewer working on the same project reviewing a feature branch will pay more attention to interoperability issues with the rest of the code
  3. Someone who's not even a programmer (yep, "everybody writes code" works out great) will spot documentation issues
  4. ... (this is not, by any means, an exhaustive list)

Adding a form of code review takes a load off for all of the other forms of code review. The curve is sublinear: adding more code review doesn't make code review take proportionally longer (it won't make it shorter either, of course).

A lot of people dislike being criticized about their code. Stamp this behavior out as soon as possible. Everyone's only human, and plenty of ugly, broken or otherwise inadequate code gets written. The point is to get it out, not berating the person that wrote it.

Pressing the big red button once you're done will still be scary. You might even have a hiccup or two. But ideally, nothing will blow up. After a while (and probably in less time than you think it will), releases will become, as Timothy Fitz puts it, a non-event.

Building a continuous deployment system, part two: what is it?

Hi! This is part two of my continuous deployment rantseries which started here. In this part, I'll try to explain what I'm talking about when I say continuous deployment.

Strictly speaking, continuous deployment just means that you push code out to production servers all the time, practically as it gets written.

Think about that for a second. Are you scared? If so, why? Don't feel too bad if you are. Most people I've talked to about this are somewhere halfway between intrigued and terrified, and are mostly glad it's not happening to their code. A lot of those people also produce great software anyway. The people that don't feel anxious at all either haven't produced enough production software yet, or are already busy doing continuous deployment.

People who already practice continuous integration, with all sorts of tests and code quality metrics, often talk about how trunk is always ready for production. Continuous deployment is about putting your money where your mouth is and actually doing just that. The basic idea is similar to that of continuous integration: test as many things as you sensibly can and fail as quickly as possible once things go awry. Hopefully, that'll contain minor hiccups before they become real problems (generally defined as "the kind that costs revenue").

However, I don't think the strict definition is a very useful thing to do or talk about purely by itself. I've learned that continuous deployment doesn't come alone. Much more so than continuous integration, which feels like a natural extension of TDD, continuous deployment pretty much turns your development process upside down. Doing continuous deployment without a disciplined team to back it up and a testing infrastructure to constantly check your work is a bit like having a transmission without an engine and a steering wheel. (I apologize for the car analogy, and promise to try and not make a habit out of it.)

From what I've read and my own experience, teams who have successfully applied continuous deployment see these two things come back inseparably. If you want to make sure your code isn't plain broken, you better have a wide array of tests. If you want to make sure the code you're now responsible for in production isn't terrible, you better be doing code review (pair programming and/or tool-assisted code review is great -- we've had success doing both). There's nothing quite like the first time you commit a file and then watching your shiny new piece of code do something on a production server less than a minute later.

So, I guess the series is partially about continuous deployment proper, and partly about greenfielding a development process that rocks.

 

Building a continuous deployment system, part one

Lately, I've been thinking a lot about building a continuous deployment system.

I was really glad to find out that Timothy Fitz was doing continuous deployment over at IMVU (partially because Timothy and I like some of the same software). Eric Ries, former CTO and co-founder of IMVU, has had a lot of interesting things to say about it as well. I used to do this at my old day job, and it's been an absolute eye-opener -- yes, it does require quite a bit of discipline, but I'm convinced it pays off handsomely, especially in the long run.

Despite my newfound conviction, the system we used there wasn't all that fancy. That has its upsides, the less complex it is, the less can go wrong. Unfortunately, being a horrendous hodgepodge (I'd call it an unholy alliance, but there's no VB or COBOL) of bash, perl and Python distributing software with rsync, it's not very suitable for showing off.

The goal of this series of blog posts is to end up with a continuous deployment system people can agree on. (If you're laughing, yes, I realize that's probably a little naive -- I'm hoping to come out relatively unscathed and with a continuous deployment system I can agree on.)