Designing a continuous deployment system: deployment formats and content

Hi and welcome again to my articles on continuous deployment! In this article, I'm going to talk about what exactly I think you should deploy and what format you should use to deploy it in.

Again, the stuff here probably isn't super controversial. I'm probably not kicking any shins in this one unless you've already done continuous deployment and you picked something else. Analogous with the last article, if you don't really care about deployment formats, save yourself the time and don't bother reading.

When getting in touch with a few people I know are doing continuous deployment, there are a number of things I've heard people say about formats and what exactly to deploy. Here's what they used:
  1. Distribution packages (debs, rpms...)
  2. Language's native packages (eggs, gems, ...)
  3. The development environment
  4. The test environment
(Although it appears that this list confounds two things (formats and content); that's not entirely true. These packages are generated using standard practice packaging rules with standard content, almost as if they were part of a distribution/PyPI/Rubyforge...)

Some people have reported success deploying native packages. It definitely has it's advantages in terms of accounting, plus you get to use all of the tools already available for managing package installation on large networks. (As a practical example: PPAs and apt repositories in general, together with Landscape or other tools for managing updates for Ubuntu servers.) Hence, the choice for native packages is understandable: in big existing companies, using existing infrastructure is a big plus.

When greenfielding, however, I would prefer to avoid both distribution packages and native packages. One problem is that pretty much all of these are designed for system-wide installations. There are methods of getting around this, such as Python's virtualenvs, but that makes the system specific to a particular language and set of deployment tools, and I've already argued in a previous post why I'd rather avoid that unless there really is no better option. Since nowhere near every reasonable development platform has such a sandboxing tool, it's not possible to form an abstraction over it.

A related problem is that it's generally impossible to reliably install multiple versions of these concurrently in a widely supported way (again,with the exception of language-specific tricks like virtualenvs). That's not necessarily a huge problem, but it does make reverting to an older version slightly harder. That might not seem like a big deal, but easy reverts are a big part of cautious deployment (more about that in the next installment). Additionally, there's nothing to prevent a bad/broken package from having side effects that break older installs as well -- as usual, shared mutable state is asking for trouble.

Another thing I've seen people package is their development environment. Unless you're also using this as the base for your tests, I think this is a very bad idea. I think we've all had cases where something worked fine on a development box, and started breaking for new checkouts. During development, all sorts of stuff Some languages/development environments are more susceptible to these problems than others, but they're prevalent enough to want to avoid the problem altogether.

Even if you are testing your development environment, I dislike it for purity reasons. The people doing this said they did that because it was the easiest way to get stuff running, and they just went along with the path of least resistance. As much as I'm willing to believe that's true, I also believe that if you need a developer to chant some magic runes and sacrifice a chicken over a checkout of your code before it'll actually work, you're doing it wrong.

I think repackaging test environments for deployment is a sensible idea. I talked this over with Holger Krekel, the guy who wrote Tox, and he immediately pointed out a flaw in that plan: it'll only work reliably if the testing environment is identical to the deployment environment in terms of platform, architecture, versions of available software... I thought this was a limitation at first, but I'm now convinced this is a useful limitation (another idea I stole from Holger ;)). If you didn't test it in environment X, what business do you have deploying it there?

I think I've got most of the annoying administrative stuff out of the way now, maybe we can move on to more interesting things that are actually specific to continuous deployment :-) That starts with the next article, which, as I've already hinted above, will be about "cautious deployment", a robust technique for scheduling deployment.