Devops best practices: Immutability

Devops

Nov 13, 2016

Devops best practices: Immutability

FP Complete builds on cutting-edge open-source devops

technologies, providing devops solutions and consulting to a number

of companies in life sciences & health, financial services, and

secure Internet services. This exposure has given us a chance to

work with some of the best engineering practices in devops.

As we bring more companies forward into the world of devops, we

will continue to share lessons learned and best practices with the

IT community. Today’s best practice: immutability.

What is immutable?

As a software engineering concept, immutability means that once

you assign a value or configuration to some entity, you never

modify it in place -- if you want it to change, you make a new one

and (optionally) tear down the old one.

As we know from functional programming (FP) and our work with

Haskell, immutability boosts the reliability and predictability of

system behavior -- preventing bugs and downtime, and increasing the

speed and reproducibility of software development and deployment. A

variable, program, or server is immutable if we guarantee, once

it’s set up, that we won’t modify it in place. We can do this if

replacements and deployments are cheap.

Devops has made it so much cheaper to build and deploy new

software services and even whole servers and clusters, some

companies are now taking advantage of immutability -- leading to

much more reliable online services with less downtime and more

frequent, reliable, repeatable updates.

Old-fashioned servers are mutable

Mutability describes how most traditional software, and most

traditional server operations, are done. It means that once you

create something, to avoid the terrible cost of creating another

one, you just keep modifying the one you already put in place.

Software patches, configuration file changes, even changing the

value of a variable that’s already in use -- all these are examples

of mutability. It’s always been a bit risky, but can be economical

when (1) the cost of creating an entity is very high, and (2) the

cost of bugs and mistakes is very low.

Unfortunately, mutability is a key cause of bugs and mistakes.

Consider this with a person instead of a computer. If I break my

arm (an accidental mutation of my state), we could try to fix the

problem using an immutable method: make a new Aaron, identical to

the old one, but with a non-broken arm -- then tear down the old

Aaron who is no longer needed. Obviously we just don’t have the

technology to do this -- it’s beyond unaffordable -- so we are

forced to use mutability. We patch up my arm, and wait for it to

heal.

That’s way better than giving up, but now our managed service

(Aaron) is in an unprecedented, irreproducible state. For weeks my

arm is offline, in repair mode, while the rest of me runs. And for

the rest of my life I may have to keep track of the fact that this

arm is a little weaker, and there are some things I cannot do. My

boss now has to remember: Aaron has this special flag that says he

can’t lift some kinds of heavy boxes. What a pain. At least humans

are flexible, so my colleagues won't just break with an "Error 504"

when they try to shake my hand.

If only I could reconstruct the arm in its original state -- or

even a whole new Aaron -- life would be so much easier. We may

never have that for humans, certainly not in my lifetime. But

thanks to modern devops technologies, we do have that option for

servers. They don’t need to be modified in place, and they don’t

need to run in unprecedented, irreproducible configurations that

lead to many of today’s sysadmin emergencies, security breaches,

and downtime.

How do we make servers immutable?

Our FP Deploy approach to devops is based on heavy use of

containers (notably Docker), virtual machines, and where feasible,

virtual networking, cloud computing (AWS, Azure, etc.), and virtual

private clouds (VPCs). Every one of these technologies has

something in common: it allows us to abstract away the work of

creating a running online service. Configurations can be written

declaratively, put under source control (say, in a Git repository),

and run at any time (using, say, Kubernetes).

You want another server? Just run the deploy command again. You

want another whole cluster (a “device” made of multiple servers and

associated networking and data connections)? Just run that deploy

command again.

By slashing the cost of deployment, we make it possible to

create a whole new server painlessly, cheaply, and reproducibly.

Developers just delivered a bug fix? Don’t patch the application

server! Bring up a new instance based on the new software build.

Once you’re happy that it’s running properly, bring down the old,

less-good instance.

(In a future post we’ll talk about blue-green deployments and canary deployments -- cost-effective, easy techniques for making this transition cautiously and gracefully.)

An immutable server has a known, well-understood,

source-controlled, reproducible configuration. No footnotes. We can

be confident that the new production servers are the same as the

engineering test servers, because they were created by running

exactly the same deployment files -- not by a series of manual

admin tweaks that could be incorrect or have latent side

effects.

This also makes it easy to recover from disaster, or scale up

for increased load, by redeploying a new server from the same

deployment files.

We can afford to do this only because modern devops makes it so

cheap to create new servers. When doing it right is automated,

predictable, repeatable, and inexpensive, there’s no longer any

reason to do it wrong.

Can whole clusters, or distributed devices, be immutable?

It’s easy to assert that an application server can be made

immutable, because well-architected web app servers are fairly

self-contained and fairly stateless. But what if you are making

larger changes to a whole distributed device, consisting of many

servers and network connections? What if, for example, you have

made matching changes in both your front-end server and your

back-end server? Or in any group of services in a service-oriented

architecture (SOA)?

Then you step up to immutable devices, immutable clusters or

VPCs or distributed systems, using exactly the same methodology. At

FP Complete we routinely create whole 10+ virtual machine devices

on command, even just for testing purposes, because we’ve automated

it with FP Deploy. Again, why not do it right? Why not be confident

that the whole distributed system is in a known, reproducible

state? We can be much more assured that what worked in test is

going to work in production.

However, sometimes we don’t have that luxury, for example if we

are retrofitting modern devops onto parts of an older system that

has not been uniformly upgraded -- or in any case where the service

we are upgrading is far cheaper to redeploy than another, perhaps

more stateful, service in the same distributed system. That’s ok:

we can have immutable servers as parts of a mutable device. The

administrator of the distributed system now has to track which

services have been replaced with newer versions, but at least for

any given service the servers can be immutable.

What about the database server?

Once we have made it cheap to rebuild and replace servers at any

time, immutability can be a reality, and reliability and

reproducibility go way up. However, not all servers are so easily

replaced. In fact, servers providing key input and output channels

may be outside the scope of our control -- so all we can do is

treat them as external to the immutable device, and understand that

hooking up the inputs and outputs needs to be part of the

declarative script used to bring up any new version of our

device.

The strongest example of this may be an enterprise database

server. We typically have no intention of building a whole new

database as part of our application build-and-deploy process.

Databases are typically long-running and, fundamental to their

purpose, they are extremely stateful, extremely mutable. Cloud

services such as RDS make it easy to spin up new database

instances, but often the contents of an enterprise database are too

large or fast-changing to want to rehost -- or we just don’t have

permission to do so. Instead, we leave it in place and accept that

its contents are very mutable.

So even when we use immutability to make our application server

clusters easier to upgrade and less prone to errors, we need to

understand that they will almost always be connected up to other

servers that lack this golden property. Automated deployment with

modern devops, ideally with truly immutable servers, can ensure

that your system looks just like the system that worked in test --

eliminating a lot of surprise downtime and deployment failures. But

even at companies with modern devops, failures still happen -- and

when they do, it’s because the test system was not exposed to the

same kinds of inputs and outputs, and the same database state, as

the production system. In a future blog post, we’ll look at some

best practices for testing and quality control.

Have a look at your own software deployment practices. Are there

too many (more than zero) manual steps involved in bringing up a

new or updated server? Do you allow, or even require, sysadmins to

make changes to servers that are already up and running? Maybe it's

time to use automated, reproducible deployment, and make the move

to immutable servers.

For more assistance

In addition to our customizable FP Deploy devops solution, FP

Complete offers consulting services, from advice to hands-on

engineering, to help companies migrate to modern devops.