AWS Application Auto-scaling for ECS with Terraform

Update: Target tracking scaling is now available for ECS services.

I’ve been working on setting up autoscaling settings for ECS services recently, and here are a couple notes from managing auto-scaling for ECS services using Terraform.

Creating multiple scheduled actions at once


Terraform will perform the following actions:

  + aws_appautoscaling_scheduled_action.green_evening
      id:                                    
      arn:                                   
      name:                                  "ecs"
      resource_id:                           "service/default-production/green"
      scalable_dimension:                    "ecs:service:DesiredCount"
      scalable_target_action.#:              "1"
      scalable_target_action.0.max_capacity: "20"
      scalable_target_action.0.min_capacity: "2"
      schedule:                              "cron(0 15 * * ? *)"
      service_namespace:                     "ecs"

  + aws_appautoscaling_scheduled_action.wapi_green_morning
      id:                                    
      arn:                                   
      name:                                  "ecs"
      resource_id:                           "service/default-production/green"
      scalable_dimension:                    "ecs:service:DesiredCount"
      scalable_target_action.#:              "1"
      scalable_target_action.0.max_capacity: "20"
      scalable_target_action.0.min_capacity: "3"
      schedule:                              "cron(0 23 * * ? *)"
      service_namespace:                     "ecs"

This fails with:


* aws_appautoscaling_scheduled_action.green_evening: ConcurrentUpdateException: You already have a pending update to an Auto Scaling resource.

To fix, the scheduled actions need to be executed serially.


resource "aws_appautoscaling_scheduled_action" "green_morning" {
  name               = "ecs"
  service_namespace  = "${module.green-autoscaling.service_namespace}"
  resource_id        = "${module.green-autoscaling.resource_id}"
  scalable_dimension = "${module.green-autoscaling.scalable_dimension}"
  schedule           = "cron(0 23 * * ? *)"

  scalable_target_action {
    min_capacity = 3
    max_capacity = 20
  }
}

resource "aws_appautoscaling_scheduled_action" "green_evening" {
  name               = "ecs"
  service_namespace  = "${module.green-autoscaling.service_namespace}"
  resource_id        = "${module.green-autoscaling.resource_id}"
  scalable_dimension = "${module.green-autoscaling.scalable_dimension}"
  schedule           = "cron(0 15 * * ? *)"

  scalable_target_action {
    min_capacity = 2
    max_capacity = 20
  }

  # Application AutoScaling actions need to be executed serially
  depends_on = ["aws_appautoscaling_scheduled_action.green_morning"]
}

ECS ChatOps with CodePipeline and Slack

I’m currently working on migrating a Rails application to ECS at work. The current system uses a heavily customized Capistrano setup that’s showing its signs, especially when deploying to more than 10 instances at once.

While patiently waiting for EKS, I decided to use ECS over manage my own Kubernetes cluster on AWS using something like kops. I was initially planning on using Lambda to create the required task definitions and update ECS services, but native CodePipeline deploy support for ECS was announced right before I started planning the project, which greatly simplified the deploy step.

The current setup we have now is: a few Lambda functions to link CodePipeline and Slack together, two CodePipeline pipelines per service (one for production and one for staging), and the associated ECS resources.

First, a deploy is triggered by saying “deploy [environment] [service]” in the deploy channel. Slack sends an event to Lambda (via API Gateway), and Lambda starts an execution of the CodePipeline pipeline if it is not already in progress (because of the way CodePipeline API operations work, it’s hard to work with multiple concurrent runs). This Lambda function also records some basic state in DynamoDB — namely, the Slack channel, user, and timestamp. This information is used to determine what channel to send replies to, and what user to mention if something in the deploy process goes awry.

CodePipeline then starts CodeBuild, which is configured to create Docker image(s) and a simple JSON file that is used to tell CodePipeline’s ECS integration the image tags the task definition should be updated with.

When CodeBuild is finished, a “manual approve” action is used to request human approval before continuing with the deploy. In the example here, I have it turned on for staging environments, but it’s usually only used in production. In production, we normally have 3 stages in the release cycle — the first canary deployment, followed by 25%, then the remaining 75%.

The rest is relatively straightforward — just CodePipeline telling ECS to deploy images. If errors are detected along the way, a “rollback” command is used to manually roll back changes.

When the deploy is finished, a Lambda function is used to send a message to the deploy channel.

If you’re interested in making systems like these, we’re looking for infrastructure / DevOps engineers (on-site in Tokyo; Japanese required). Get in touch with the contact form if you’re interested. Follow this link for details.

2017: A year in review

A few years ago when I was doing client work, we would regularly host clients’ sites and apps for them. During this time, I was responsible for both development and keeping them up and running as much as possible. Most of the money being in new development, it was difficult to assign priority to improving the operations of existing applications. In this period, I wanted an “operations person” to teach me how to make new applications that would need minimal operations support from the beginning. Failing this, I decided to become “the operations person” myself.

Following that decision, I found myself working at BizReach on the infrastructure team for HRMOS, a Software-as-a-Service product focused on applicant tracking for medium to large enterprises, in the end of 2016. Following that job, I then went to a small startup, dely, as a Site Reliability Engineer for their flagship product Kurashiru, a recipe video app for iOS and Android.

This is the first full year I’ve been working full-time as a dedicated infrastructure / operations / SRE / DevOps engineer, and I feel like I’ve grown a lot. On the technical side, I was able to lead the migration of complex legacy monolith systems to scalable and resilient independent systems. On the not-so-technical side, I’ve experienced different types of company cultures, managerial styles, and I’ve gotten accustomed to working with teams of engineers — the experience I’ve had up until this year was mostly working in extremely small teams.

While I do have a passion for making, maintaining, and improving services, I am also very interested in company culture — what makes it and what breaks it — especially when it comes to remote work. I believe most technical engineering work can be done as efficiently (if not more) remote, but there are definite challenges that need to be addressed before I can start leading a change in any position I’m in.

Here’s to 2018! 🎉

Shipping Events from Fluentd to Elasticsearch

We use fluentd to process and route log events from our various applications. It’s simple, safe, and flexible. With at-least-once delivery by default, log events are buffered at every step before they’re sent off to the various storage backends. However, there are some caveats with using Elasticsearch as a backend.

Currently, our setup looks something like this:

The general flow of data is from the application, to the fluentd aggregators, then to the backends — mainly Elasticsearch and S3. If a log event warrants a notification, it’s published to a SNS topic, which in turn triggers a Lambda function that sends the notification to Slack.

The fluentd aggregators are placed by an auto-scaling group, but are not load balanced by a load balancer. Instead, a Lambda function connected to the auto-scaling group lifecycle notifications updates a DNS round-robin entry with the private IP addresses of the fluentd aggregator instances.

We use the fluent-plugin-elasticsearch plugin to output log events to Elasticsearch. However, because this plugin uses the bulk insert API and does not validate whether events have actually been successfully inserted in to the cluster, it is dangerous to rely on it exclusively (thus the S3 backup).

macOS Sierra

Here are the headline features of Sierra, and my thoughts about them.

Siri

Don’t use it on the phone, won’t use it on the Mac. It would be nice if I could use it by typing, though.

Universal Clipboard

This seems like a massive security risk. It works via iCloud, so if someone (like my daughter) is using my iPad, then it will sync the clipboard to that, too. How do I turn it off?

Auto Unlock

screen-shot-2016-09-21-at-21-35-28

I can’t use it, because “two-factor authentication” is apparently completely different from “two-step verification”, and they’re completely incompatible. I can switch to 2FA, but that means I have to logout and re-login to everything, right? Will that help my iMessages that are already not syncing between my Mac and my iPhone?

(Side rant: I can’t use any of the iMessage features because I’m completely on Telegram now, because of the aforementioned troubles with iMessages being unreliable. Sometimes they come to the Mac. Sometimes the iPhone. Sometimes both. Who knows! I need some kind of cloud forecast app to know. Or learn how to read the stars. Or something. Is this the kind of stuff they put in horoscopes?)

Apple Pay

Not available in Japan yet (but will be coming soon, to the “major” credit card brands: JCB and MasterCard. VISA isn’t “major”, I suppose).

Photos

For some reason, the photos themselves sync between machines, but the facial recognition metadata doesn’t? Progress is reducing the amount of things we want to babysit, not more. I don’t want to babysit my photos by correcting facial recognition guesses on both machines, over 5,000+ photos.

Messages

Hmm.

iCloud Drive

Is it better than Dropbox yet?

Optimized Storage

All of a sudden, the voice of one of my podcast panelists simply vanished from the mix. I quit and re-launched Logic, only to be told that the file in question was missing. Sure enough, a visit to Finder revealed that Sierra had “optimized” my storage and removed that file from my local drive.

macOS Sierra Review by Jason Snell at sixcolors.com

No thanks.

Picture in Picture

Okay, this might be useful sometimes.

Tabs

“Finally”.

Conclusion

2/10

Look, I don’t mean to degrade the excellent work of the team of engineers responsible for macOS at Apple. They’re doing a good job. The upgrade process to Sierra was smooth. There are just a lot of features in this release that don’t excite me. I wish they had fixed other things, like that silly facial recognition sync thing. Maybe they will in the future. I really hope so.

Also: Don’t say “then just use Linux”, because I already do. It’s not as elegant as macOS, but it’s reliable for the work I do (web programming) and when something does go wrong, I know how to fix it.

If you want to get in to an argument with me about how I’m wrong, please make an App.net account and contact me there, I’m @keita.

Convox

I stumbled upon Convox a couple weeks ago, and found it pretty interesting. It’s led by a few people formerly from Heroku, and it certainly feels like it. A simple command-line interface to manage your applications on AWS, with almost no AWS-specific configuration required.

An example of how simple it is to deploy a new application:

$ cd ~/my-new-application
$ convox apps create
$ convox apps info
Name       my-new-application
Status     creating
Release    (none)
Processes  (none)
Endpoints  
$ convox deploy
Deploying my-new-application
Creating tarball... OK
Uploading... 911 B / 911 B  100.00 % 0       
RUNNING: tar xz
...
... wait 5-10 minutes for the ELB to be registered ...
$ convox apps info
Name       my-new-application
Status     running
Release    RIIDWNBBXKL
Processes  web
Endpoints  my-new-application-web-L7URZLD-XXXXXXX.ap-northeast-1.elb.amazonaws.com:80 (web)

Now, you can access your application at that ELB specified in the “Endpoints” section.

I haven’t used Convox with more complex applications, but it definitely looks interesting. It uses a little too much infrastructure than I would like for personal, small projects (a dedicated ELB for the service manager, for example). However, when you’re managing multiple large deploys of complex applications, the time saved by Convox doing the infrastructure work for you seems like it would pay for itself.

The philosophy behind Convox:

The Convox team and the Rack project have a strong philosophy about how to manage cloud services. Some choices we frequently consider:

  • Open over Closed
  • Integration over Invention
  • Services over Software
  • Robots over Humans
  • Shared Expertise vs Bespoke
  • Porcelain over Plumbing

I want to focus on “Robots over Humans” here — one of AWS’s greatest strengths is that almost every single thing can be automated via an API. However, I feel like its greatest weakness is that the APIs are not very user-friendly — they’re very disjointed and not consistent between services. The AWS-provided GUI “service dashboard” is packed with features, and you can see some kind of similarity with UI elements, but it basically stops there. Look at the Route 53, ElastiCache, and EC2 dashboards — they’re completely different.

Convox, in my limited experience, abstracts all of this unfriendliness away and presents you with a simple command line interface to allow you to focus on your core competency — being an application developer.

I, personally, am an application / infrastructure developer (some may call me DevOps, but I’m not particularly attached to that title), and Convox excites me because it has the potential to throw half of the work necessary to get a secure, private application cluster running on AWS.

A month with Linux on the Desktop

It’s been a bit over a month since I installed Linux as my main desktop OS on a PC I built to replace OS X on a (cylinder) Mac Pro. I installed Ubuntu MATE 16.04.

Here are my general thoughts:

  • Linux has come a far way in 6 years (last time I used it full-time on the desktop).
  • There are Linux versions of popular software that is vital to my workflows — Firefox, Chrome, Dropbox, Slack, Sublime Text, etc.
    • When there isn’t a direct equivalent, there is usually a clone that gets the job done. Zeal, Meld, for example.
  • It still is definitely not for the casual user.
  • Btrfs.
  • Lacks the behavioral consistency of OS X.
  • Some keyboard shortcuts get some getting used to (but most of the time, they’re completely configurable).
  • Steam is available for Linux! (10 of the 11 titles in my library run on Linux. Does that say something about the games I play, or are Linux ports popular these days?)
  • If something is broken, it can be fixed*.

(*) maybe, probably. Sometimes. It depends.

Some thoughts specific to the development work I do:

  • Docker is as easy to use as it is on a Linux server. Because the kernel is exactly the same. 🙂
  • I can quickly reproduce server environments locally with minimal effort.
  • Configuration files are in the same place as any Ubuntu 16.04 server.

Some things really surprised me. For example, I plugged my iPhone in to the USB to charge it, and it automatically launched the photo importer and started the tethering connection. I did not expect that on a clean install.

It hasn’t been all peaches and roses, though — there are some specific complaints I have about the file browser (Caja, a Nautilus fork) and the MATE Terminal — so much so that I have replaced the MATE Terminal with GNOME 3’s terminal emulator. I haven’t gotten around to trying other file browser because most of the time I’m browsing files, I’m in the terminal.

Other nice-to-have things that don’t relate to the OS itself, but rather to building your own PC (I’m aware of Hackintosh-ing, but my issues were mainly with software, not hardware):

  • The particular case I’m using has space for 2 large (optical drive-sized) bays and 8 3.5 inch hard drive bays. That’s a lot of storage. It currently holds 2 SATA SSDs (and one M.2 SSD, but that doesn’t take up any room in the case).
  • Access to equipment that is much newer / faster than anything you can get via the Apple Store. (I’m planning on getting the Nvidia GTX 1080 at some point in the future, and I’m currently using the i7-6700K quad-core CPU at 4.0GHz now)

Conclusion: I’m enjoying it. I realize that I’m a special case, and I strongly discourage anyone from using Linux on the Desktop unless they really know what they’re doing. In my case, I regularly manage Linux servers professionally, so I know how to fix something when it’s gone wrong (most of the time). I still use a MacBook Pro with OS X installed on it when I’m on the go or need something specifically for Mac, but it usually stays asleep for most of the time.