The road to full deployment automation

The road to full deployment automation

[post-meta]

As much as we like writing code and building cool solutions, our lines of code are worth nothing until they run in production. On the other hand, time spent putting things in production is time not spent on new features. An interesting dilemma, and one we can solve with what we love most: automation!

Automating your deployments can feel scary. After all, if you’re not there to catch production if it falls over, then it will hit the ground and shatter in a thousand pieces. But in our experience, the most successful companies deploy as often as possible.

While many roads lead to Rome, there are a couple of common borders we all have to cross to get there. In this post, we want to take you partway along one of these roads, and to take a stop at some of these common milestones that you will want to visit. Along the way, we will tell you about the way we implemented this step at one of our projects. Even without traveling all the way to a fully automatic CI/CD flow from development to production, the milestones we hit in this first part of the journey can still help any development team!

Deployable artifacts

To start with, we should ensure that our project ends up packaged in a single artifact that can be deployed. For every part we manually superglue to the side of this artifact when we bring it to any deployment environment, we add an additional manual step that can go wrong at the worst possible time. Even in the best case, it’s still added work for whoever is lucky enough to have to deploy it.

A deployable artifact might be something like a JAR or WAR file for Java code, or its equivalent for other languages. It could be a container image. But consider this: is your project only dependent on the code, or are there other components that can be considered part of the solution and deployable as a whole?

If your company works with a public cloud, the deployable artifact might as well be a Terraform module or CDK project. It could create the required DNS entries, file storage or other network or compute dependencies. By defining this as Infrastructure as Code, your application-specific infrastructure is as much a deployable package as your application itself.

Data migrations might also be a part of your deployable application. Tools like Flyway or Liquibase give you the ability to include the data migration in your application deployment, ensuring your database schema is always up-to-date.

Our reference project is a Java application written in Spring Boot, which will be deployed as a container running on Kubernetes. This means that our main deployable artifact is a container image. Along with this, we also provide a Terraform module along with the application code, which provides a consistent way to deploy that application to the Kubernetes cluster.

The important part here is to have something that’s repeatable and consistent, no matter where we build or run it.

One-click deployments

Once you have these artifacts, we should actually make the deployment part boring.

By automating the deployment of your app and infrastructure, you create a feeling of repeatability and reliability around it. Write one bash script that does the deployment. One terraform apply. One run of your Jenkins pipeline or Github flow. Whatever you pick, make it so that you can run that one command, and either a fresh or existing environment is brought fully up-to-speed with your application.

Note that when we say “automatic deployments”, we do not mean you need to deploy every artifact you deliver, to any environment. All we mean is to make your deployments as one-click as possible – and in almost every case, “as much as possible” simply boils down to “one click”. It doesn’t matter that your actual deployment involves several steps – we simply need to ensure these are always executed the exact same way. And we do that by writing automation scripts, not by writing guides for humans to follow!

This is something you should be using during your development and test cycles as much as possible – continuously, if possible. By gaining trust in the way your one-click deploy works by doing it for every stage, it becomes routine, and you work out all of the kinks.

Because our reference project already has a Terraform definition, making this deployment a one-click operation was reasonably simple: a terraform apply is, theoretically, enough. We made use of this by automatically running that in our Jenkins pipeline to deploy the latest commit on our integration branch to our development environment. In other environments, that same module can be deployed, with slightly tweaked variables to account for the difference between environments.

The goal is to minimise manual steps as much as possible. Every human intervention in the process is a step that can be inconsistently executed or forgotten entirely once you get to production.

Constant verification

We all know how to write unit tests to verify your code in isolation. We also all know that this is only part of the story, and it will miss the more complex problems.

Most companies implements at least parts of the testing pyramid. The unit tests are almost universally present, and depending on the maturity of the product, so are tests that take multiple pieces of code of a single application and test them together (integration tests). The step after that takes an entire application and tests it as a black box, preferably in an environment similar to production. These tests should hit our most important flows, so we can always be sure that the steps that would hurt us the most when they fail – like signups, logins or payments – remain functional.

This next step to the top of the pyramid is often more troublesome to perform automatically. But by making deployments boring and automateable, we enable continuous end-to-end testing!

There are two flavours of automating our end-to-end tests that we’ve enabled by now, and which one works best for you depends a lot on your environment and application:

The simplest is to automatically run end-to-end tests against a given environment (let’s say “test” in the traditional DTAP street) once it’s been deployed. We can add these tests as one of the final steps in our one-click deployment script, as a step in our Jenkins pipeline, Github Actions flow or other CI/CD flow. This gives us a guarantee that every deployment of this environment will be end-to-end tested, helping us catch our bugs earlier.

One step further up the CI/CD ladder would be to deploy every build, at least before integrating it with our main branch, and test it. Depending on the duration of your deployments, the complexity of your application’s environment and the cost of running your application for a few minutes, this might even be almost as easy to do as the former step is!

For our reference Spring Boot project, the constant verification consists of two parts. On every single commit, we run a suite of unit tests, static code analysis and a verification of the Terraform module. After merging to our integration branch (main), we deploy the project using the Terraform module, and run an end-to-end test against the deployed project. This assures us that both our core flows and our Terraform-based deployments are always verified. We have considered deploying every feature branch as well, but this proved troublesome due to some downstream third-party dependencies in this particular application.

If you make sure that failures of these pipelines are hard to ignore or miss, you should catch issues with your code integration early, preventing days of frantic coding to salvage a production release just before it happens.

Looking back (and ahead)

By taking these steps, you should already have a very solid base of a reliable deployment process and delivery pipeline. You now know that every release you deliver to production has been tested for its most important features. One of these features is that the deployment itself will succeed – something many of us know is not at all a given!

Depending on where you are along this road, what we’ve described here might sound trivial or daunting. However, many companies, large and small, have taken leaps by excelling at these steps.

We’re far from finished at this point! From here on, we have a lot more ground we can cover. We can make sure that every environment is deployed to automatically, either on approval by a person, or even fully automatically. We might want to perform an automatic rollback if an unexpected amount of errors is logged after the deployment. This also requires us to think about rolling back our performed data migrations – or possibly separating data migrations and application deployments entirely. When we discover more fragile or essential flows, we can expand our integration and unit test suites to ensure we catch these issues even earlier.

That said – while the steps may seem trivial to some, what you achieve with them should not be underestimated. By creating consistent artifacts with an easy deployment procedure, you minimise downtime or time spent deploying outside of working hours. By automatically verifying your application as a whole, you catch big issues before QA is forced to abort a release verification early.

The final destination, a fully automatic CI/CD flow, is not one all companies aim for. We’ve only passed part of the borders on our way to Rome, and the remaining borders are not all trivial. But at the point along the road we’ve reached now, our view is already a lot more beautiful, the weather a lot more appealing, than the place we started our journey from.

[post-meta]

Erik Steenman