ECS ChatOps with CodePipeline and Slack

January 1, 2018

I’m currently working on migrating a Rails application to ECS at work. The current system uses a heavily customized Capistrano setup that’s showing its signs, especially when deploying to more than 10 instances at once.

While patiently waiting for EKS, I decided to use ECS over manage my own Kubernetes cluster on AWS using something like kops. I was initially planning on using Lambda to create the required task definitions and update ECS services, but native CodePipeline deploy support for ECS was announced right before I started planning the project, which greatly simplified the deploy step.

The current setup we have now is: a few Lambda functions to link CodePipeline and Slack together, two CodePipeline pipelines per service (one for production and one for staging), and the associated ECS resources.

First, a deploy is triggered by saying “deploy [environment] [service]” in the deploy channel. Slack sends an event to Lambda (via API Gateway), and Lambda starts an execution of the CodePipeline pipeline if it is not already in progress (because of the way CodePipeline API operations work, it’s hard to work with multiple concurrent runs). This Lambda function also records some basic state in DynamoDB – namely, the Slack channel, user, and timestamp. This information is used to determine what channel to send replies to, and what user to mention if something in the deploy process goes awry.

CodePipeline then starts CodeBuild, which is configured to create Docker image(s) and a simple JSON file that is used to tell CodePipeline’s ECS integration the image tags the task definition should be updated with.

When CodeBuild is finished, a “manual approve” action is used to request human approval before continuing with the deploy. In the example here, I have it turned on for staging environments, but it’s usually only used in production. In production, we normally have 3 stages in the release cycle – the first canary deployment, followed by 25%, then the remaining 75%.

[video width=“1096” height=“828” mp4="/wp-content/uploads/2017/12/approve.mp4" webm="/wp-content/uploads/2017/12/approve.webm" poster="/wp-content/uploads/2017/12/approve-cover.png" loop=“on”][/video]

The rest is relatively straightforward – just CodePipeline telling ECS to deploy images. If errors are detected along the way, a “rollback” command is used to manually roll back changes.

When the deploy is finished, a Lambda function is used to send a message to the deploy channel.