Automate it All: Taking TASS to the next level

by Ali Tayarani


Auto scaling groups are fantastic for handling instance scale, but they can be a bear to change.  The most common example we have is when we need to replace the AMI.  Even after the AMI is prepared, updates are a multi-step problem.

A little over a month ago, we introduced TASS to the world; however, even with it, updating auto scaling groups requires some intervention.

Updating an auto scaling group by hand

  1. Using TASS, update the launch configuration with the new AMI id

  2. Update the auto scaling group with the newly recreated launch configuration

  3. Scale up the cluster to test that the new configuration didn’t negatively affect the bootstrap process

  4. Repeat for each cluster needs the new id

The problems

From a high-level view, major problems are that we need to do a lot of testing to make sure everything works and that we need to trigger the processes that allow us to do the testing.

Both of these can be solved by changing our mindset.  Instead of thinking of them as Ops tasks, we started thinking about a continuous delivery pipeline.  Specifically, testing suites and ease of deployment.

Testing with ServerSpec

Since we already used Test Kitchen for Chef, the idea of testing our servers using ServerSpec was a natural transition.

Instead of making sure that our AMIs work properly by waiting for a new one to finish being built, then testing it on an auto scaling group, we added specs to the Packer run.  We split the specs out on a per-app basis, which we can treat as containers in future use if we decide to go down that path.

Where appropriate, we’ve added lines like this to the AMI build:

The specs themselves, call various methods in the spec helper.  An example spec helper method that checks the installation of an apt package would look like this:

 

Similarly, our userdata bootstrap scripts that are added into the auto scaling groups launch configuration needed some testing.  An example of what this looks like, if you expected to have a ‘/mnt’ directory created during the bootstrap process would be:

In previous versions of this vision, we’d get the bootstrap log output, which can be hard to parse visually.  Now, if a new instance fails any spec, we can get an email of the spec failure summary.

Simplifying deployment

While Packer makes deployment of the AMI fairly easy, we still needed to figure out how to make the deployment of the TASS configuration files easier.

We decided that the easiest approach here would be to write a build script to do the things we need during deployment, and use the infrastructure we have for deploying slugs via Slugforge.

During this build script, we first update any existing auto scaling groups with new launch configurations and create any new auto scaling groups as needed.  By updating all the auto scaling groups every time, we can ensure that none grow outdated; however, as a result of doing this the count of our launch configurations would skyrocket.

It is for that reason that after we create a new launch configuration and apply it to the auto scaling group, we delete the old one.  If we need to roll back, we can either redeploy an earlier slug or roll forward with a revert on a single launch configuration.

Finally, we scale each group up by one to give the bootstrap specs a chance to run, so that we can feel comfortable that the changes to the launch configuration have no adverse effect on their operational status.

Using these methods, we were able to transition from a multi-step manual process to a system that does the heavy lifting for us.  Our interaction with the system is to update the configuration files, and deploy them as we would any other application in our infrastructure.  With this, we have turned a project that would take hours into a task that can be accomplished inside of 30 minutes.

 

 

Think this is interesting? We're hiring! See our current openings here.