Categories
AWS

Rails on AWS: Do you need nginx between Puma and ALB?

When I set up Rails on AWS, I usually use the following pattern:

(CloudFront) → ALB → Puma

I was wondering: Is it always necessary to put nginx between the ALB and Puma server?

My theory behind not using nginx is that because it has its own queue (while the Classic Load Balancer had a very limited “surge queue”, the ALB does not have such a queue), it will help in getting responses back to the user (trading for increased latency) while hindering metrics used for autoscaling and choosing what backend to route the request to (such as Rejected Connection Count).

I couldn’t find any in-depth articles about this, so I decided to prove my theory (in)correct by myself.

In this test, the application servers will be running using ECS on Fargate (platform version 1.4.0). It’s a very simple “hello world” app, but I’ll give it a bit of room to breathe with each instance having 1 vCPU and 2GB of RAM. I’ll be using Gatling on a single c5n.large instance (“up to 25 gigabits” should be enough for this test).

In this test, I wanted to try out a few configurations that mimic characteristics of applications I’ve worked on: short and long requests, usually IO-bound. A short request is defined as just rendering a simple HTML template. A long request is 300ms. The requests are ramped from 1 request/sec to 1000 requests/sec over 5 minutes.

Response Time Percentiles over Time (OK responses), simple render — 4 instances, 20 threads each, connected directly to the ALB.
Response Time Percentiles over Time (OK responses), simple render — 4 instances, 20 threads each, using Nginx.

As you can see, for the simple render scenario, Nginx and Puma were mostly the same. As load approached 1000 requests/sec, latency started to get worse, but all requests were completed with an OK status.

The 300ms scenario was a little more grim.

Number of responses per second (green OK, red error), 300ms response — 4 instances, 20 threads each, connected directly to the ALB.
Number of responses per second (green OK, red error), 300ms response — 4 instances, 20 threads each, using Nginx.

My theory that Puma will fail fast and give error status to the ALB when reaching capacity was right. The theoretical maximum throughput is 4 instances * 20 threads * (1000ms in 1 second / 300ms) = 266 requests/sec. Puma handles about 200 requests/sec before returning errors; Nginx starts returning error status at around 275 requests/sec, but at that point requests are already queueing and the response time is spiking.

Remember, these results are for this specific use case, and results for a test specific to your use case probably will be different, so it’s always important to do load testing tailored to your environment, especially for performance critical areas.

Categories
English Tools Useful Utilities

“Logging in” to AWS ECS Fargate

I’m a big fan of AWS ECS Fargate. I’ve written in the past about managing ECS clusters, and with Fargate — all of that work disappears and is managed by AWS instead. I like to refer to this as quasi-serverless. Sorta-serverless? Almost-serverless? I’m open to better suggestions. 😂

There are a few limitations of running in Fargate, and this blog post will focus on working around one limitation: there’s easy way to get an interactive command line shell within a running Fargate container.

The way I’m going to establish an interactive session inside Fargate is similar to how CircleCI or Heroku does this: start a SSH server in the container. This requires two components: the SSH server itself, which will be running in Fargate, and a tool to automate launching the SSH server. Most of this blog post will be about the tool to automate launching the server, called ecs-fargate-login.

If you want to skip to the code, I’ve made it available on GitHub using the MIT license, so feel free to use it as you wish.

How it works

This is what ecs-fargate-login does for you, in order:

  1. Generate a temporary SSH key pair.
  2. Use the ECS API to start a one-time task, setting the public key as an environment variable.
    • When the SSH server boots, it reads this environment variable and adds it to the list of authorized keys.
  3. Poll the ECS API for the IP address of the running task. ecs-fargate-login supports both public and private IPs.
  4. Start the ssh command and connect to the server.

When the SSH session finishes, ecs-fargate-login will make sure the ECS task is stopping.

Use cases

Most of my clients use Rails, and Rails provides an interactive REPL (read-eval-print loop) within the Rails environment. This REPL is useful for running one-off commands like creating new users or fixing some data in the database, checking and/or clearing cache items, to mention a few common tasks. Rails developers are accustomed to using the REPL, so while not entirely necessary (in the past, I usually recommended fixing data using direct database access or with one-time scripts in the application repository), it is a nice-to-have feature.

In conclusion

I don’t use this tool daily, but probably a few times a week. A few clients of mine use it as well, and they’re generally happy with how it works. However, if you have any recommendations about how it could be improved, or how the way the tool itself is architected could be improved, I’m always open to discussion. This was my first serious attempt at writing Golang code, so there are probably quite a few beginner mistakes in the code, but it should work as expected.

Categories
English Optimization & Speed

Web App Development and Caching

Any web developer who works with external services or databases (that’s probably almost every web developer) has probably run into performance problems. The problem is that running code by itself is pretty fast. Databases and external services / APIs are very slow. Waiting on an external API to load is basically the computer equivalent of waiting for a brontosaurus to walk a kilometer.

As web developers, we have a very powerful tool called caching. It’s being used in your computer reading this sentence every microsecond, with various levels of caching happening between the CPU, memory, and hard drive (or SSD). The act of caching is saving the result of a slow operation in a easily accessible space. In this case, we will be talking about caching database results and API results.

There are only two hard things in computer science: cache invalidation and naming things.

-- Phil Karlton (Adapted)

Cache invalidation is a hard problem. Let me illustrate:

  1. Server A fetches Record A and associated records from the database, and caches it.
  2. Server B updates Record A.
  3. Server A continues serving its cached copy from step 1 (until it expires).

There are a few ways to solve this problem.

You can manually invalidate caches:

  1. Server A fetches Record A and associated records from the database, and caches it.
  2. Server B updates Record A, notifying all servers to remove cached copies of Record A.
  3. Server A continues serving its cached copy from step 1 (until it expires).

This is tenable with one or two cache servers, but clearly not scalable — you’ll need to send cache purge requests to all your cache servers.

Then there’s my favorite — key-based cache invalidation:

  1. Server A fetches Record A, and looks up the cache key “Record A [timestamp when Record A was updated]”. It doesn’t exist, so it fetches associated records and stores everything in the cache.
  2. Server B updates Record A.
  3. Server A fetches Record A, and looks up the cache key “Record A [timestamp when Record A was updated]”. It exists, so it serves the cached copy.

This method has some drawbacks – it still requires one query to the canonical data store, and you need to remember to update the updated_at attribute of your record when any associated records change. If you’re using Rails, this is trivial:

class MyRecord < ActiveRecord::Base
  has_many :associated_records
end

class AssociatedRecord < ActiveRecord::Base
  belongs_to :my_record, touch: true
end

Another drawback is that your cache is going to be full of old keys, when a record is updated. Luckily, there are caches that already deal with this! LRU, or Least Recently Used, is a cache eviction policy that removes the least recently used records first, making room for new records. Redis can be used as a LRU cache and Memcached is sort of LRU. The Rails Memory Cache Store also uses a LRU algorithm.

“Caching sounds great! How do I use it?”

Caching is not something that you should “tack on” to an app. There are awesome tools, such as Varnish, that are based around this concept, but it is not ideal. The ideal web application will be designed from the ground up with caching in mind — even in the development environment. If you’re writing tests, make sure your test environment is connected to a cache, then test cache invalidation and lookup. Ideally, you should use the cache you use in production in both development and test environments.

Categories
English Uncategorized

My Experiences with Rubinius

Rubinius is an implementation of the Ruby language spec. I’ve been using it recently for a project, and I’ve been liking it so far. Here’s a few thoughts I’ve been having while using it.

Philosophy

The Core

Rubinius, in its core, is written in C++ and uses LLVM (Low Level Virtual Machine). Without getting too technical, it translates the Ruby code that you write into efficient machine code, then executes the machine code directly on the CPU. This architecture is very similar to Google’s V8 (and one of the reasons that Google Chrome is a fast browser).

The Concept

Now for the concept of “Ruby”. “Ruby” is a programming language specification, not a program or compiler. The standard reference implementation is called MRI (Matz’s Ruby Interpreter, not magnetic resonance imaging). MRI is used in many production environments, and especially with the latest 2.0.0 release, introduces many performance improvements, and is very stable (all of our new Rails apps are on 2.0.0).

MRI is written mostly in C.

Rubinius’ tagline “Rubinius: Use Ruby™” summarizes the intent of Rubinius. Use Ruby! Because the Rubinius core is as fast as it is, and Ruby code is basically machine code (from the computer’s standpoint), standard libraries – the basic functions of the language – can be written in Ruby. Use Ruby!!

Speed

In development, Rubinius seems to be a little slower than MRI – especially in the first load. Rails is a big library.

In production, especially with a threaded app server (Rubinius has no “Global Interpreter Lock” and supports real threads), however, Rubinius is extremely fast. I’ve been using Puma.

You do need to write thread-safe code, but the payoffs are enormous.

Conclusion

This little project I’m working on will probably not see a “real production” environment anytime soon, so I really wanted to try out some alternative Rubies (there are a few). All in all, my experience with Rubinius has been very good. Can’t wait for the production release of Rubinius!