Categories
Code Snippets English

A quick shortcut to open a Ruby gem in VS Code

While working on a Ruby project, I often find myself referring to the code of various libraries when it’s easier than looking up the documentation. For this, I used to use code (bundle show GEM_NAME), but recently I’ve been getting this warning:

[DEPRECATED] use `bundle info $GEM_NAME` instead of `bundle show $GEM_NAME`

Okay, that’s fine, but bundle info returns a bunch of stuff that would confuse VS Code:

> bundle info devise
  * devise (4.7.1)
	Summary: Flexible authentication solution for Rails with Warden
	Homepage: https://github.com/plataformatec/devise
	Path: /Users/keita/.asdf/installs/ruby/2.7.0/lib/ruby/gems/2.7.0/gems/devise-4.7.1

Luckily there’s bundle info $GEM_NAME --path. code (bundle info devise --path) is kind of long to type out every time, though, so I decided to make an alias.

I use the Fish shell, so the code here is written for that shell. Adapt it to your shell as required. You’ll also need the VS Code terminal integration installed for this to work.

function bundlecode
  if test -e ./Gemfile
    code (bundle info $argv[1] --path)
  else
    set_color -o red
    echo "Couldn't find `Gemfile`. Try again in a directory with a `Gemfile`."
    set_color normal
  end
end

Usage:

> bundlecode devise
# VS Code opens!
Categories
English

How I use Git

I’ve been using Git at work for around 10 years now. I started using Git with a GUI (Tower — back when I was eligible for the student discount!), but now I use the CLI for everything except complicated diffs and merges, where I use Kaleidoscope.

A question I get asked by my coworkers often is: “how in the world do you manage using Git without a GUI?”. This blog post is supposed to answer this question.

First, I use the Fish shell. It fits with the way I think. A lot of you probably use bash or zsh, that’s fine, there is lots of documentation on how to integrate Git with those shells. This is the relevant part of .config/fish/config.fish:

set __fish_git_prompt_show_informative_status 'yes'
set __fish_git_prompt_color_branch magenta
set __fish_git_prompt_color_cleanstate green
set __fish_git_prompt_color_stagedstate red
set __fish_git_prompt_color_invalidstate red
set __fish_git_prompt_color_untrackedfiles cyan
set __fish_git_prompt_color_dirtystate blue

function fish_prompt
  # ... 
  set_color normal
  printf ' %s' (prompt_pwd)
  printf '%s' (__fish_git_prompt)
  printf ' > '
end

I’ve omitted the irrelevant portions (status checking, # prompt when root, etc. If you want to see the full file, I’ve posted it as a gist.

On a clean working directory (that is, no changed files that haven’t been committed to the repository), this looks like this:

When updating some files, it will change to something like this:

This prompt doesn’t change in real time, so changes from other terminals won’t automatically change this prompt. I have a habit of tapping the “return” key to update the prompt.

To commit these changes:

The commands I use most often:

  • git add (if you only want to add a portion of a file, git add -p is your friend) / git commit
  • git push / git pull (git pull --rebase for feature branches being shared with other devs)
  • git diff @ — show all changes, staged or not, between the working directory and the latest commit of the branch you are on (@ is a alias for HEAD)
  • git diff --cached — show only changes that are being staged for the next commit
  • git status
  • git difftool / git mergetool (this will open Kaleidoscope)

This was obviously a very cursory, high-level look at how I use Git, but I hope it was useful. It’s been a long time since I’ve used a Git GUI full time, but whenever I do use one (for example, when helping a coworker), it feels clunky compared to using the CLI (that’s not saying I don’t have my complaints about the CLI — that’s another blog post 😇).

If you have any more questions, leave a comment or contact me on Twitter, and I’ll update this post with the answers.

Categories
English Tools Useful Utilities

“Logging in” to AWS ECS Fargate

I’m a big fan of AWS ECS Fargate. I’ve written in the past about managing ECS clusters, and with Fargate — all of that work disappears and is managed by AWS instead. I like to refer to this as quasi-serverless. Sorta-serverless? Almost-serverless? I’m open to better suggestions. 😂

There are a few limitations of running in Fargate, and this blog post will focus on working around one limitation: there’s easy way to get an interactive command line shell within a running Fargate container.

The way I’m going to establish an interactive session inside Fargate is similar to how CircleCI or Heroku does this: start a SSH server in the container. This requires two components: the SSH server itself, which will be running in Fargate, and a tool to automate launching the SSH server. Most of this blog post will be about the tool to automate launching the server, called ecs-fargate-login.

If you want to skip to the code, I’ve made it available on GitHub using the MIT license, so feel free to use it as you wish.

How it works

This is what ecs-fargate-login does for you, in order:

  1. Generate a temporary SSH key pair.
  2. Use the ECS API to start a one-time task, setting the public key as an environment variable.
    • When the SSH server boots, it reads this environment variable and adds it to the list of authorized keys.
  3. Poll the ECS API for the IP address of the running task. ecs-fargate-login supports both public and private IPs.
  4. Start the ssh command and connect to the server.

When the SSH session finishes, ecs-fargate-login will make sure the ECS task is stopping.

Use cases

Most of my clients use Rails, and Rails provides an interactive REPL (read-eval-print loop) within the Rails environment. This REPL is useful for running one-off commands like creating new users or fixing some data in the database, checking and/or clearing cache items, to mention a few common tasks. Rails developers are accustomed to using the REPL, so while not entirely necessary (in the past, I usually recommended fixing data using direct database access or with one-time scripts in the application repository), it is a nice-to-have feature.

In conclusion

I don’t use this tool daily, but probably a few times a week. A few clients of mine use it as well, and they’re generally happy with how it works. However, if you have any recommendations about how it could be improved, or how the way the tool itself is architected could be improved, I’m always open to discussion. This was my first serious attempt at writing Golang code, so there are probably quite a few beginner mistakes in the code, but it should work as expected.

Categories
AWS English

Hosting a Single Page Application with an API with CloudFront and S3

I’ve written about how to host a single page application (SPA) on AWS using CloudFront and S3 before, using the CloudFront “rewrite not found errors as a 200 response with index.html” trick.

Recently, working on a few serverless apps, I’ve realized that this trick, while quick, isn’t perfect. The specific case where it broke down was when the API is configured as a behavior on CloudFront (I usually scope the API to /api on the same domain as the frontend, so CORS and OPTIONS requests aren’t necessary). If the API returned a 404 Not Found response, CloudFront would rewrite it to 200 OK index.html, and the front-end application would get confused. Unfortunately, CloudFront doesn’t support customized error responses per behavior, so the only way to fix this was to use Lambda@Edge instead.

Here’s the code for the Lambda function:

'use strict'

const path = require('path')

exports.handler = (evt, context, cb) => {
  const { request } = evt.Records[0].cf

  const uriParts = request.uri.split("/")

  if (
    // Root resource with a file extension.
    (
      uriParts.length === 2 && path.extname(uriParts[1]) !== ""
    ) ||
    // Anything inside the "static" directory.
    uriParts[1] === "static"
  ) {
    // serve the original request to S3
  } else {
    // change the request to index.html
    request.uri = '/index.html'
  }

  cb(null, request)
}

This code assumes all requests to a root request with a file extension, or anything in the /static/ directory is a static file that should be served from S3. All other requests will be rewritten to index.html. These are the defaults for create-react-app, but you’ll probably need to change them to meet your requirements. (Remember, Lambda@Edge functions need to be created in us-east-1)

Attach this Lambda function to the CloudFront behavior responsible for serving from the S3 origin as origin-request, and you should be good to go. Don’t forget to remove the 404-to-200 rewrite.

Categories
AWS English WordPress

Serverless WordPress on AWS Lambda

Update 2020/07/29: AWS recently announced EFS support for Lambda, which makes running WordPress in Lambda easier, with fewer limitations. Here’s the new article about how to run WordPress in Lambda using EFS.

There are a few ways to run WordPress “serverless” on AWS. I’m going to talk about running WordPress on Lambda for this article. If you’re interested in how you can run WordPress serverless-ly on Fargate, I’m working on a post about that too.

Keep in mind that while it is possible to do this, it’s not for everyone. It’s probably not for me. Probably not for you. Use at your own risk!

Before we start, there is a core feature of Lambda that make running WordPress in Lambda quite troublesome: Read-only file system. WordPress expects a writable, persistent, local file system. We’ll be using the S3 Uploads plugin by Human Made to handle media uploads. However, core and plugin updates will not work. There’s no workaround for this, so to install / update files, we’ll need to make a new Lambda deployment.

So: let’s go! First, you’ll want to clone my boilerplate repository. I’ve prepared a WordPress installation and a simple glue script to actually boot WordPress.

$ git clone https://github.com/keichan34/wordpress-on-lambda

My plan of attack is: run WordPress in the Lambda function using a PHP custom runtime, make uploads work with S3 instead of the local filesystem, and wire up the database. In the repository above, I’ve configured static assets to be served from S3 as well.

Now, let’s prepare the database. Lambda has two networking modes: public and VPC mode. In public mode, the Lambda has default access to the public internet, but nothing else. In VPC mode, the Lambda is booted inside the VPC, and doesn’t have public internet access by default. Because WordPress requires public internet access we have to either run it in public mode, or run it in VPC mode and prepare a NAT gateway (about $30 to $50 a month, depending on the region). If Lambda runs in public mode, the database must also be publicly accessible — something that is frowned upon from a security standpoint. You should choose the option that fits your risk and price profile. In my case, I’m going with the NAT gateway route.

Now we’ve got the messy stuff out of the way, we’ll have to assemble the Lambda runtime. AWS has an article on their blog detailing how to make a PHP custom runtime, but Stackery provides a batteries-included PHP layer. It includes everything you need to make a PHP application that assumes it’s running in a traditional server environment run in AWS Lambda.

# Replace "km-wordpress-on-lambda-deployment-201906" with something that makes sense for you. It's globally unique, so copying and pasting this will result in an error.
# Make sure you're in the same region as your database!

$ DEPLOY_BUCKET="km-wordpress-on-lambda-deployment-201906"
$ aws s3 mb "s3://$DEPLOY_BUCKET"
$ cd <the directory you cloned the GitHub repository to>

Now, it’s time to install WordPress! We’ll add the WordPress files to the deployment package. As usual, copy wp-config-example.php to wp-config.php. Enter your database details. If you have a hostname that you’re going to use with CloudFront, enter it now. If not, you’ll have to wait until after the CloudFront distribution is created, then try again.

Now, let’s deploy. This will create a new CloudFront distribution and S3 bucket for public assets, so maybe it’s a good time to make a cup of coffee. If you haven’t installed the SAM CLI, do that before the next block.

$ sam package --template-file template.yaml --output-template-file serverless-output.yaml --s3-bucket "$DEPLOY_BUCKET"
$ sam deploy --template-file serverless-output.yaml --stack-name wordpress-on-lambda --capabilities CAPABILITY_IAM
$ aws s3 sync ./src/php s3://deploy-bucket-XXXXX --exclude "*.php" --exclude "*.ini"

I’ll be using the default CloudFront domain for this demo. If you’re going to be using your own domain, you need to modify the template.yaml file to add the an alias to the CloudFront distribution. Use the following command to show the CloudFront domain name.

$ aws cloudformation describe-stacks --stack-name wordpress-on-lambda | jq '.Stacks[0].Outputs'

OK! Now, you should be able to access the CloudFront URL, and you’ll get redirected to the friendly WordPress installer! If you’ve set up your wp-config.php correctly, the installation should go smoothly.

The site I set up for this post is available here: https://dskhgdbzphjkm.cloudfront.net/

Lessons Learned

This is for almost no-one. I think the only valid use case (in this current form) for running WordPress in AWS Lambda is a site that gets periodic, unpredictable spikes of intense traffic — a use case where Lambda’s scalability and price model pays off. This is also a use case where, presumably, the benefits of the scalability trumps the inconvenience of not being able to use the online updaters and installers (also, I’m assuming the database will be able to keep up with the load).

However, if updating and installing themes or plugins could be managed outside of the Lambda environment (say, with wp-cli), with deployments automated… Then, it may be a little more applicable to a larger audience.

If you’re looking for a cheap solution to host your personal blog (like me!), you might just want to bite the bullet and check out any of the hosted WordPress solutions out there.

If you liked this post, or you’d like to provide some input, please do so in the comments. My favorite AWS service is Lambda, and I like pushing it a bit, so look forward to similar posts in the future. If you find bugs in the boilerplate, or you can make improvements, please open an issue or PR!

Miscellaneous Tidbits

  • Aurora Serverless sounds like it would be the best match for this setup. It probably is. Just keep in mind that Aurora Serverless doesn’t support publicly accessible clusters. To use it, you’ll need to go the Lambda-in-VPC, NAT gateway route.
  • Regarding public / private access and NAT gateways, if you’re like me and believe in the future of IPv6 and think that you can just use an egress-only internet gateway – you’re wrong! Lambda doesn’t seem to support IPv6 at this time.
  • You can actually use a NAT instance if the NAT gateway is overkill. However, I would recommend using the NAT gateway if you can. It comes with automatic scalability and redundancy, so you don’t have to babysit your NAT instance. (If you need more than one NAT instance, use the gateway. Seriously.)
  • At time of writing, my patches to php-lambda-layer haven’t been merged yet, so you can use my patched version (the boilerplate repository has this applied already).
  • If you’re really going all-in, consider using an Application Load Balancer rather than API Gateway to save money. API Gateway has zero fixed costs, but there is a point where ALB will become cheaper than API Gateway.
  • Doing some crude calculations, you should be able to handle an average of a few hundred users per day under the perpetual free tier. Your highest bill may be data transfer to the user.
Categories
AWS English

Managing ECS clusters, 4 years in.

Throughout these past 4 years since AWS ECS became generally available, I’ve had the opportunity to manage 4 major ECS cluster deployments.

Across these deployments, I’ve built up knowledge and tools to help manage them, make them safer, more reliable, and cheaper to run. This article has a bunch of tips and tricks I’ve learned along the way.

Note that most of these tips are rendered useless if you use Fargate! I usually use Fargate these days, but there are still valid reasons for managing your own cluster.

Spot Instances

ECS clusters are great places to use spot instances, especially when managed by a Spot Fleet. As long as you handle the “spot instance is about to be terminated” event, and set the container instance to draining status, it works pretty well. When ECS is told to drain a container instance, it will stop the tasks cleanly on the instance and run them somewhere else. I’ve made the source code for this Lambda function available on GitHub.

Just make sure your app is able to stop itself and boot another instance in 2 minutes (the warning time you have before the spot instance is terminated). I’ve experienced overall savings of around 60% when using a cluster exclusively comprised of spot instances (EBS is not discounted).

Autoscaling Group Lifecycle Hooks

If you need to use on-demand instances for your ECS cluster, or you’re using a mixed spot/on-demand cluster, I recommend using an Autoscaling Group to manage your cluster instances.

To prevent the ASG from stopping instances with tasks currently running, you have to write your own integration. AWS provides some sample code, which I’ve modified and published on GitHub.

The basic gist of this integration is:

  1. When an instance is scheduled for termination, the Autoscaling Group sends a message to an SNS topic.
  2. Lambda is subscribed to this topic, and receives the message.
  3. Lambda tells the ECS API to drain the instance that is scheduled to be terminated.
  4. If the instance has zero running tasks, Lambda tells the Autoscaling Group to continue with termination. The Autoscaling Group terminates the instance at this point.
  5. If the instance has more than zero running tasks, Lambda waits for some time and sends the same message to the topic, returning to step (2).

By default, I set the timeout for this operation to 15 minutes. This value depends on the specific application. If your applications require more than 15 minutes to cleanly shut down and relocate to another container instance, you’ll have to set this value accordingly. (Also, you’ll have to change the default ECS StopTask SIGTERM timeout — look for the “ECS_CONTAINER_STOP_TIMEOUT” environment variable)

Cluster Instance Scaling

Cluster instance scale-out is pretty easy. Set some CloudWatch alarms on the ECS CPUReservation and MemoryReservation metrics, and you can scale out according to those. Scaling in is a little more tricky.

I originally used those same metrics to scale in. Now, I use a Lambda script that runs every 30 minutes, cleaning up unused resources until a certain threshold of available CPU and memory is reached. This technique further reduces service disruption. I’ll post this on GitHub sometime in the near future.

Application Deployment

I’ve gone through a few application deployment strategies.

  1. Hosted CI + Deploy Shell Script
    • Pros: simple.
    • Cons: you need somewhere to run it, easily becomes a mess. Shell scripts are a pain to debug and test.
  2. Hosted CI + Deploy Python Script (I might put this on GitHub sometime)
    • Pros: powerful, easier to test than using a bunch of shell scripts.
    • Cons: be careful about extending the script. It can quickly become spaghetti code.
  3. Jenkins
    • Pros: powerful.
    • Cons: Jenkins.
  4. CodeBuild + CodePipeline
    • Pros: simple; ECS deployment was recently added; can be managed with Terraform.
    • Cons: Subject to limitations of CodePipeline (pretty limited). In our use case, the sticking points are not being able to deploy an arbitrary Git branch (you have to deploy the branch specified in the CodePipeline definition).

Grab-bag

Other tips and tricks

  • Docker stdout logging is not cheap (also, performance is highly variable across log drivers — I recently had a major problem with the fluentd driver blocking all writes). If your application blocks on logging (looking at you, Ruby), performance will suffer.
  • Having a few large instances yields more performance than many small instances (with the added benefit of having the layer cache when performing deploys).
  • The default placing strategy should be: binpack on the resource that is most important to your application (CPU or memory), AZ-balanced
  • Applications that can’t be safely shut down in less than 1 minute do not work well with Spot instances. Use a placement constraint to make sure these tasks don’t get scheduled on a Spot instance (you’ll have to set the attribute yourself, probably using the EC2 user data)
  • Spot Fleet + ECS = ❤️
  • aws update-service help for service administration commands. I use --force-new-deployment and --desired-count quite often.
  • If you manage your own EC2 instances with Auto Scaling Groups: aws autoscaling terminate-instance-in-auto-scaling-group --instance-id "i-XXX" --no-should-decrement-desired-capacity will start a new EC2 instance and perform termination lifecycle hooks on it. This is what I use to switch out old EC2 instances with new launch configurations.
Categories
English

“Truth in bots”

The bots should announce, “I’m not a person, or if I am, I’m not allowed to act like one.”

Or, if there’s no room or time for that sentence, perhaps a simple bot at the top of the conversation. That way, we can save our human emotions for the humans who will appreciate them.

Truth in bots | Seth’s Blog

“If you can’t tell the difference, does it matter?”

Quacking like ducks, et cetera.

The point of the post is a bit different (it’s predicated on there being able to tell the difference — “… only a minute or two into the interaction that you realize you’re being fooled by an AI, not a caring human”), but what happens when you can’t tell the difference? Should AIs always announce themselves as AIs if they are indistinguishable from a human? Why?

Categories
AWS English

AWS Application Auto-scaling for ECS with Terraform

Update: Target tracking scaling is now available for ECS services.

I’ve been working on setting up autoscaling settings for ECS services recently, and here are a couple notes from managing auto-scaling for ECS services using Terraform.

Creating multiple scheduled actions at once


Terraform will perform the following actions:

  + aws_appautoscaling_scheduled_action.green_evening
      id:                                    
      arn:                                   
      name:                                  "ecs"
      resource_id:                           "service/default-production/green"
      scalable_dimension:                    "ecs:service:DesiredCount"
      scalable_target_action.#:              "1"
      scalable_target_action.0.max_capacity: "20"
      scalable_target_action.0.min_capacity: "2"
      schedule:                              "cron(0 15 * * ? *)"
      service_namespace:                     "ecs"

  + aws_appautoscaling_scheduled_action.wapi_green_morning
      id:                                    
      arn:                                   
      name:                                  "ecs"
      resource_id:                           "service/default-production/green"
      scalable_dimension:                    "ecs:service:DesiredCount"
      scalable_target_action.#:              "1"
      scalable_target_action.0.max_capacity: "20"
      scalable_target_action.0.min_capacity: "3"
      schedule:                              "cron(0 23 * * ? *)"
      service_namespace:                     "ecs"

This fails with:


* aws_appautoscaling_scheduled_action.green_evening: ConcurrentUpdateException: You already have a pending update to an Auto Scaling resource.

To fix, the scheduled actions need to be executed serially.


resource "aws_appautoscaling_scheduled_action" "green_morning" {
  name               = "ecs"
  service_namespace  = "${module.green-autoscaling.service_namespace}"
  resource_id        = "${module.green-autoscaling.resource_id}"
  scalable_dimension = "${module.green-autoscaling.scalable_dimension}"
  schedule           = "cron(0 23 * * ? *)"

  scalable_target_action {
    min_capacity = 3
    max_capacity = 20
  }
}

resource "aws_appautoscaling_scheduled_action" "green_evening" {
  name               = "ecs"
  service_namespace  = "${module.green-autoscaling.service_namespace}"
  resource_id        = "${module.green-autoscaling.resource_id}"
  scalable_dimension = "${module.green-autoscaling.scalable_dimension}"
  schedule           = "cron(0 15 * * ? *)"

  scalable_target_action {
    min_capacity = 2
    max_capacity = 20
  }

  # Application AutoScaling actions need to be executed serially
  depends_on = ["aws_appautoscaling_scheduled_action.green_morning"]
}
Categories
AWS English

ECS ChatOps with CodePipeline and Slack

I’m currently working on migrating a Rails application to ECS at work. The current system uses a heavily customized Capistrano setup that’s showing its signs, especially when deploying to more than 10 instances at once.

While patiently waiting for EKS, I decided to use ECS over manage my own Kubernetes cluster on AWS using something like kops. I was initially planning on using Lambda to create the required task definitions and update ECS services, but native CodePipeline deploy support for ECS was announced right before I started planning the project, which greatly simplified the deploy step.

The current setup we have now is: a few Lambda functions to link CodePipeline and Slack together, two CodePipeline pipelines per service (one for production and one for staging), and the associated ECS resources.

First, a deploy is triggered by saying “deploy [environment] [service]” in the deploy channel. Slack sends an event to Lambda (via API Gateway), and Lambda starts an execution of the CodePipeline pipeline if it is not already in progress (because of the way CodePipeline API operations work, it’s hard to work with multiple concurrent runs). This Lambda function also records some basic state in DynamoDB — namely, the Slack channel, user, and timestamp. This information is used to determine what channel to send replies to, and what user to mention if something in the deploy process goes awry.

CodePipeline then starts CodeBuild, which is configured to create Docker image(s) and a simple JSON file that is used to tell CodePipeline’s ECS integration the image tags the task definition should be updated with.

When CodeBuild is finished, a “manual approve” action is used to request human approval before continuing with the deploy. In the example here, I have it turned on for staging environments, but it’s usually only used in production. In production, we normally have 3 stages in the release cycle — the first canary deployment, followed by 25%, then the remaining 75%.

The rest is relatively straightforward — just CodePipeline telling ECS to deploy images. If errors are detected along the way, a “rollback” command is used to manually roll back changes.

When the deploy is finished, a Lambda function is used to send a message to the deploy channel.

Categories
English

2017: A year in review

A few years ago when I was doing client work, we would regularly host clients’ sites and apps for them. During this time, I was responsible for both development and keeping them up and running as much as possible. Most of the money being in new development, it was difficult to assign priority to improving the operations of existing applications. In this period, I wanted an “operations person” to teach me how to make new applications that would need minimal operations support from the beginning. Failing this, I decided to become “the operations person” myself.

Following that decision, I found myself working at BizReach on the infrastructure team for HRMOS, a Software-as-a-Service product focused on applicant tracking for medium to large enterprises, in the end of 2016. Following that job, I then went to a small startup, dely, as a Site Reliability Engineer for their flagship product Kurashiru, a recipe video app for iOS and Android.

This is the first full year I’ve been working full-time as a dedicated infrastructure / operations / SRE / DevOps engineer, and I feel like I’ve grown a lot. On the technical side, I was able to lead the migration of complex legacy monolith systems to scalable and resilient independent systems. On the not-so-technical side, I’ve experienced different types of company cultures, managerial styles, and I’ve gotten accustomed to working with teams of engineers — the experience I’ve had up until this year was mostly working in extremely small teams.

While I do have a passion for making, maintaining, and improving services, I am also very interested in company culture — what makes it and what breaks it — especially when it comes to remote work. I believe most technical engineering work can be done as efficiently (if not more) remote, but there are definite challenges that need to be addressed before I can start leading a change in any position I’m in.

Here’s to 2018! 🎉