Categories
English GIS

Creating Vector Tiles from Scratch using TypeScript

I currently work at a Japanese mapping startup called Geolonia. We specialize in customizing and displaying geospatial data, and a big part of this is making vector tiles. There are a lot of ways to do this, the most popular being tippecanoe from Mapbox, ST_AsMVT in PostGIS, OpenMapTiles, and tilemaker. We normally use tippecanoe to convert GeoJSON data to vector tiles and tilemaker to generate OpenStreetMap-based tiles, but there isn’t really an easy way to generate arbitrary vector tiles in plain old TypeScript / JavaScript.

Recently, I’ve been working on a proxy for converting data directly to vector tiles on the fly, and that is exactly what I need to do — generate arbitrary tiles without relying on outside processes like tippecanoe or PostGIS (this specific proxy will be running in AWS Lambda).

The Mapbox Vector Tile (MVT) spec is the de-facto standard for delivering tiled vector data, and that’s what we’ll be using. MVT tiles are Protocol Buffers-encoded files that include the drawing instructions for map renderers. The MVT spec handily includes the protobuf definition file, so we can use this to compile to JavaScript and generate the TypeScript definition files.

For the following, I’m assuming you have a git repository with the basic JavaScript and TypeScript essentials already set up.

Working with the MVT format in TypeScript

First, we create a submodule to bring in the MVT protobuf definition.

git submodule add https://github.com/mapbox/vector-tile-spec.git

This should create a new directory vector-tile-spec, with the contents of the MVT vector tile specification repository inside. Now, we can set up the protobuf-to-JS compiler.

My project uses ES modules, so these instructions will generate ES modules, but if you need something else, check out the protobufjs documentation.

In package.json, create a new script called build:protobuf:

"scripts": {
  ...
  "build:protobuf": "pbjs -t static-module -w es6 -o src/libs/protobuf.js vector-tile-spec/2.1/vector_tile.proto && pbts -o src/libs/protobuf.d.ts src/libs/protobuf.js"
}

Install protobufjs:

npm install --save protobufjs

Now, we’re ready to generate the code.

npm run build:protobuf

This command will generate two files: src/libs/protobuf.d.ts and src/libs/protobuf.js. Now, we’re ready to generate tiles.

The Mapbox Vector Tile format

Mapbox provides a very good guide on how data is laid out in the MVT container in their documentation. I highly recommend reading through it, but here are a few takeaways especially relevant to us:

  • Feature property values and keys are deduplicated and referred to by index.
  • Coordinates are encoded using “zigzag” encoding to save space while supporting negative coordinates. Starting from zero, 0, -1, 1, -2, 2, -3, 3…
  • Drawing points, lines, and polygons differ from GeoJSON in a few major ways:
    • Instead of absolute coordinates, drawing is done using a cursor and relative coordinates to the last operation.
    • Instead of declaring whether a geometry is a point, line, or polygon, MVT uses commands to imperatively draw geometry.
    • Coordinates are local to the tile, where (0,0) refers to the top-left corner of the tile.

Now that we’re ready to jump in to actually building up these tiles, let’s try some code.

import { vector_tile } from "../libs/protobuf";

// zigzag encoding
const zz = (value: number) => (value << 1) ^ (value >> 31);

const tile = vector_tile.Tile.create({
  layers: [
    {
      version: 2, // This is a constant
      name: "test", // The name of the layer
      extent: 256, // The extent of the coordinate system local to this tile. 256 means that this tile has 256×256=65536 pixels, from (0,0) to (256,256).
      features: [
        {
          // id must be unique within the layer.
          id: 1,
          type: vector_tile.Tile.GeomType.POLYGON,
          geometry: [
            // We start at (0,0)
            ((1 & 0x7) | (1 << 3)), // MoveTo (1) for 1 coordinate.
              zz(5), zz(5), // Move to (5,5)
            ((2 & 0x7) | (3 << 3)), // LineTo (2) for 3 coordinates.
              zz(1),  zz(0), // Draw a line from (5,5) to (6,5)
              zz(0),  zz(1), // Draw a line from (6,5) to (6,6)
              zz(-1), zz(0), // Draw a line from (6,6) to (5,6)
            15, // Close Path (implicitly closes the line from (5,6) to (5,5))
          ],
          tags: [
            0, // test-property-key-1
            0, // value of key 1

            1, // test-property-key-2
            1, // value of key 2 and 3

            2, // test-property-key-3
            1, // value of key 2 and 3
          ],
        }
      ],
      keys: [
        "test-property-key-1",
        "test-property-key-2",
        "test-property-key-3"
      ],
      values: [
        {stringValue: "value of key 1"},
        {stringValue: "value of key 2 and 3"},
      ],
    }
  ]
});

I tried to annotate the code as much as possible. Can you figure out what shape this draws?

Now, to convert this to binary:

const buffer: Buffer = vector_tile.Tile.encode(tile).finish();

And that’s it!

Conclusion

I was initially pretty scared about creating vector tiles from scratch — I’ve found them pretty hard to work with in the past, leaning on tools like vt2geojson to first convert them to GeoJSON. I was pleasantly surprised to find out that it wasn’t as hard as I thought it was going to be. I still got stuck on a few parts — it took me a few trial and error runs to figure out that my absolute-to-relative math was off — but once everything was working, it was definitely worth it.

Let me know what you think.

Categories
English GIS

Serving real-time, tiled, point data directly from DynamoDB

Recently, I’ve been interested in how to serve and work with geographic data with the least amount of “work” possible. Everything here can be done pretty easily with PostGIS and a large enough server, but I’m always up for the challenge of doing things in a slightly different way.

I’m working on an app that will store lots of points around the world — points that people have submitted. Users will open this app, and be presented with a map with points that have been submitted in their vicinity. Once they zoom out, I want to show clusters of points to the users, with a number showing how many points are represented in that cluster.

There are two major ways to do this:

  • Get the bounding box (the coordinates of what the user’s map is currently displaying), send it to the server, and the server will return a GeoJSON object containing all the points in that bounding box.
  • Split the data in to predefined tiles and the client will request the tiles that are required for the current map.

There are pros and cons for both of these methods:

  • GeoJSON would contain the raw data for all points within the bounding box. When the user is zoomed in, this is not a big problem, but when they’re zoomed out, the client will have to do a lot of work calculating clusters. Also, as the user pans around, new queries would have to be made to piece together the point data for areas they don’t have the data for. Not only can this data get pretty big, the user experience is subpar — data is normally loaded after the map pan finishes.
  • Tiles at the base zoom level (where all data is shown, without any clustering) can be generated on-the-fly easily, but tiles at lower zoom levels need to use all of the information in the tile bounds to generate clusters.

This basically boils down to two solutions: let the client process the data, or let the server process the data. Serving the whole GeoJSON document may work up to a point (I usually use something like 1-2MB as a general limit), datasets larger than that need to be tiled. I use Mapbox’s tippecanoe for this, but tippecanoe takes a GeoJSON file and outputs a static mbtiles file with precomputed tiles. Great for static data, but updating this means regenerating all the tiles in the set.

This is where the title of this post comes in. I’ve been thinking about how to serve multiple large tilesets directly from DynamoDB, and I think I have a pretty good idea of what might work. If there’s something like this that already exists, or you find something wrong with this implementation, please let me know!

Here is the basic data structure:

PKSK
Raw DataTileset#{tileset ID}#Data#{first 5 characters of geohash}{full geohash}#{item ID}
Pregenerated TileTileset#{tileset ID}#Tile#{z}/{x}/{y}{generated timestamp}

This is how it’s supposed to work: raw point data is stored conforming to the “Raw Data” row. Updates to raw data are processed by DynamoDB streams, and a message to update the tiles that point exists in is enqueued.

Because this potentially can get very busy (imagine a case where hundreds or thousands of points in the same tile are updated within seconds of each other), the queue would have a delay associated with it . The processor that generates tiles based on messages in the queue would take a look at the tile it’s about to generate, and compare the generated timestamp with the timestamp of the item update. If the tile timestamp is newer then the item, that item can be safely disregarded, because another process has already picked it up.

I initially thought about using tile numbers to index the raw data as well, but decided to use geohash instead. The reasoning behind this is because geohash is a Z-order grid, we can use this property to optimize DynamoDB queries spanning multiple cells. Another problem with using tile numbers is the ambiguity of points that lie on tile boundaries. In contrast, the precision of a geohash can be arbitrarily increased without affecting queries.

Is there a way to encode tile numbers in to a Z-order curve? Maybe interpolating the X and Y values bitwise? How can this account for the zoom parameter? Am I overthinking this? Can I get away with using a sufficiently high zoom parameter (like z=22 or 23)? (How about a crazier idea: can the geohash algorithm be massaged in to corresponding directly to tiles? Is it worth it?)

Anyways, this is sort of something that I’ve been thinking about for a while, and I want to get started on a proof-of-concept implementation in the near future. Let me know on Twitter if you have any input / comments, or even better, if you want to collaborate with me on making something like this.

Categories
English

For those times you don’t want to eval…

I wanted to make some advanced logic available, easily configurable via a database, in a couple apps that I’ve been working on recently.

Honestly, I could have just stored the code in the database and eval’d it — but no, I don’t want to take the risk of arbitrarily executing code. I could have done some gymnastics like running it in a network-less container with defined inputs and outputs. I could have made the configuration more capable. What I decided to do in the end, though, was to write an extremely compact domain-specific language (DSL).

To write this DSL, I chose a Lisp-style syntax due to its dead-simple parsing. The basic idea is to parse the string for tokens, generate an abstract syntax tree (AST), then just recurse through the AST and run whatever code is required.

In one example, I wanted to have an extendible SQL WHERE clause, with a very limited set of operators — AND, OR, LIKE, =.

(and (like (attr "person.name") (str "%Keita%")) (= (attr "person.city) (str "Tokyo")))

This example will generate the SQL WHERE clause:

((person.name LIKE '%Keita%') AND (person.city = 'Tokyo'))

Here’s pseudocode for how I write the parser / interpreter for this:

COMMANDS = {
  "and": (left, right) => { return f"(({left}) AND ({right}))" }
  "or":  (left, right) => { return f"(({left}) OR ({right}))" }
  "like":(left, right) => { return f"(({left}) LIKE ({right}))" }
  "=": (left, right) => { return f"(({left}) = ({right}))" }
  "str": (str) => { return escape_sql(str) }
  "attr": (str) => { return escape_sql_for_attr_name(str) }
}

def parse_ast(code):
  # parse the "code" string into nested arrays:
  # "(1 (2 3))" becomes ["1", ["2", "3"]]
  ...

def execute_node(ast):
  cmd = ast[0]
  argv = ast[1:]
  resolved_argv = [ execute_node(x) for x in argv ]
  return COMMANDS[cmd](*resolved_argv)

def execute(code):
  ast = parse_ast(code)
  execute_node(ast)

As you can see, this is a very simple example that takes the DSL and transforms it in to a SQL string. If you wanted to do parameterized queries, you might return a tuple with the string as the first element and a map of parameters for the second, for example.

The ability to map the language so closely to the AST, and being able to evaluate the AST just by recursion, makes this implementation simple and easy to write, easy to extend, and easy to embed in existing applications. While I probably won’t be switching to writing Common Lisp full time (for practical reasons), I definitely do get the appeal of the language itself.

This tool isn’t something I use all the time. It’s probably something that should be used very sparingly, and in specific circumstances. That said, it’s a good tool in my toolbox for those times for when I want to have on-the-fly customizable logic without the security concerns of using eval, or the complexity of creating a sandboxed environment for potentially unsafe code.

Last note: while this solution may be more secure than eval, it is definitely not 100% secure. In the simple example above, we do escape strings so SQL injection shouldn’t be a problem, but it doesn’t check if the column defined by the attr function is valid, or if the user is allowed to query information based on that column or not (although something like that would be possible). I would not use something like this to process completely untrusted input.

Categories
AWS English

My Brief Thoughts on the AWS Kinesis Outage

There have been multiple analyses about the recent (2020/11/25) outage of AWS Kinesis and its cascading failure mode, taking a chunk of AWS services with it — including seemingly unrelated Cognito — due to dependencies hidden to the user. If you haven’t read the official postmortem statement by AWS yet, go read it now.

There are an infinite amount of arguments that can made about cascading failure; I’m not here to talk about that today. I’m here to talk about a time a few years ago I was evaluating a few systems to do event logging. Naturally, Kinesis was in consideration, and our team interviewed an AWS Solution Architect about potential design patterns we could implement, what problems they would solve, what hiccups we may encounter on the way, et cetera.

At the time, I didn’t think much of it, but in hindsight it should have been a red flag.

ME: “So, what happens when Kinesis goes down? What kind of recovery processes do you think we need to have in place?”

SA: “Don’t worry about that, Kinesis doesn’t go down.”

The reason I didn’t think much of it at that time was that our workload would have been relatively trivial for Kinesis, and I mentally translated the reply to “don’t worry about Kinesis going down for this particular use case”.

We decided to not go with Kinesis for other reasons (I believe we went with Fluentd).

Maybe my mental translation was correct? Maybe this SA had this image of Kinesis as a system that was impervious to fault? Maybe it was representative of larger problem inside AWS of people who overestimated the reliability of Kinesis? I don’t know. This is just a single data point — it’s not even that strong. A vague memory from “a few years ago”? I’d immediately dismiss it if I heard it.

The point of this post is not to disparage this SA, nor to disparage the Kinesis system as a whole (it is extremely reliable!), but to serve as a reminder (mostly to myself) that one should be immediately suspicious when someone says “never” or “always”.

Categories
English

Venturing in to the realm of Hackintosh-ing

Like you, I’ve been finding myself working from home more often than not. These days, I probably go to an office once a month. I have a 16 inch MacBook Pro, but using it in clamshell mode, all the time, connected to a 4K monitor was… not ideal. It would often thermally throttle way down (often, it would be really sluggish — I wondered, how fast is this running? 800MHz. 90C. Fans at 100%.)

The 16″ MacBook Pro was a great machine for when I’d go to an office 3-5 days a week, but it just doesn’t make sense when it’s essentially used as a desktop.

This is where I thought, why don’t I just get a desktop Mac, then? That gives me a few choices: iMac, iMac Pro, Mac Pro, Mac Mini.

iMac (Pro): Pretty good performance, but a little expensive for my target price. Also, I had just gotten a new 4K monitor, and I didn’t want to get rid of it so soon.

Mac Pro: Way too expensive.

Mac Mini: Probably pretty similar in performance and thermal characteristics to my MacBook Pro.

I didn’t want to spend too much on it — especially with the Apple Silicon Macs coming out, meaning Intel support is on its way out, limiting the longevity of whatever I’d be buying. I’ll probably get a 2nd or 3rd generation Apple Silicon Mac, so what I wanted is something high performance that makes sense to bridge the gap of a couple years.

This led me down the rabbit hole of building a Hackintosh. My target price was “less than $2,000” (the price also is due to tax reasons — I thought it would be more fun to build a PC than it would be to calculate depreciation over 4 years). r/hackintosh on Reddit was a huge help — it was an invaluable resource when picking parts. I would look through everyone’s success stories and pick similar (or identical) parts. Here’s my entry (from when it had Catalina — I have since upgraded it to Big Sur).

Using some parts from my previous PC, the new parts came out to about $1,500. Not bad for a 10-core i9-10900k with 32GB RAM and a 1TB SSD.

I’ve never done water cooling before, and it sounded pretty cool, so I got an all-in-one unit that was really easy to install. Building a custom water cooling loop sounds fun, but really time and tool intensive — not something I have a lot of here in Tokyo. Maybe next time.


All in all, doing this Hackintosh was a fun little project. There are a lot of excellent resources that hold your hand through the whole process. If you’re interested in trying it out for yourself, I recommend reading the OpenCore Install Guide and looking through the aforementioned Reddit community.

When I started out to write this blog post, I installed Catalina 10.15.7 with OpenCore 0.6.2. Since then, I’ve upgraded to Big Sur 11.0.1 with OpenCore 0.6.3. I’ve only done one major upgrade, but it was relatively smooth.

One word of warning, though: Hackintosh-ing is definitely not for the faint of heart, or someone who is not prepared to spend hours debugging some issue that involves reading a bunch of white text on a black background. I would not recommend this to anyone who doesn’t like fiddling with computers. I had a bunch of small, strange issues (Ethernet instability, hardware video decoding, to name a couple) that required multiple reboots and trial-and-error with the config.plist configuration file.

Thanks for reading! If you have any questions, don’t hesitate to leave a comment or send me a Tweet.

Categories
AWS English WordPress

WordPress on AWS Lambda (EFS Edition)

I previously wrote a post about running WordPress on AWS Lambda, but it was before EFS support was announced (EFS is a managed network file system AWS provides). Being able to use EFS completely changes the way WordPress works in Lambda (for the better!), so I felt it warranted a new blog post.

In addition, this time I’m using Terraform instead of SAM. This matches the existing infrastructure-as-code setup I use when I deploy infrastructure for clients. Here’s the Terraform module (source code).

Summary

It works. It’s OK. Check it out, it’s running here. It’s not the best, but it isn’t bad, either. The biggest performance bottleneck is the EFS filesystem, and there’s no getting around that. PHP is serving static assets bundled with WordPress as well, which adds to some latency (in this configuration, CloudFront is caching most of these files, however). Tuning opcache to cache files in memory longer helped a lot.

Because EFS is synchronized across all the instances of Lambda, online updates, installs, and uploads work as expected.

What You’ll Need

In this setup, Lambda is only used for running PHP — installing the initial WordPress files is done on an EC2 instance that has the EFS volume mounted. This is a list of what you’ll need.

  1. An AWS account.
  2. A VPC with Internet access through a NAT gateway or instance (comparison). This is important because EFS connectivity requires Lambda to be set up in a VPC, but it won’t have Internet access by default.
  3. Terraform (the module uses v0.12 syntax, so you’ll need to use v0.12.)
  4. A MySQL database (I’m using MySQL on RDS using the smallest instance available)
  5. An EC2 instance to perform the initial setup and install of WordPress.

For a list of the resources that Terraform will provision, take a look at the Resources page here.

Steps

These steps assume you’re running this Terraform module standalone — if you want to run it in the context of an existing Terraform setup, prepare to adjust accordingly.

If you’re following this step-by-step, be sure to choose the us-west-2 region. Lambda Layer that I’m using for this is only published in the us-west-2 region. I’m working on getting the layer published in other regions, but in the meantime, use my fork of the php-lambda-layer to create your own in the region of your choosing.

1. Start the EC2 instance.

(If it isn’t already running)

I’m using a t3a.nano instance. Install the amazon-efs-utils package to get ready for mounting the EFS volume.

Also, while you’re in the console, note down the ID of a Security Group that allows access to RDS and the IDs of the private subnets to launch Lambda in.

2. Get Terraform up and running.
$ git clone https://github.com/KotobaMedia/terraform-aws-wordpress-on-lambda-efs
$ cd ./terraform-aws-wordpress-on-lambda-efs

Create a file called local.auto.tfvars, and put the following contents in to it:

# An array of the Security Group IDs you listed in step 1.
security_group_ids = ["sg-XXX"]

# An array of the Subnet IDs you listed in step 1.
subnet_ids = ["subnet-XXX", "subnet-XXX", "subnet-XXX"]

If you want to use a custom domain name (instead of the default randomly-generated CloudFront domain name), set the acm_certificate_arn and domain_name variables as well.

Now, you’re ready to create the resources.

$ terraform apply

If you’re asked for your AWS credentials, Ctrl-C and try setting the authentication information via environment variables. I manage a lot of AWS accounts, so I use the AWS_PROFILE environment variable.

Terraform will ask you if you want to go ahead with the apply or not — look over the changes (the initial apply should not have any modifications or deletions), then respond yes.

When the apply has finished, you should see some outputs. If you don’t (or you already closed the window), you can always run terraform output. Keep this window open, you’ll need it in the next step.

3. Mount EFS on the EC2 instance.

First, we need to give the EC2 instance access to the EFS filesystem. Terraform created a security group for us (it’s in the efs_security_group_id output), so attach that to your EC2 instance.

Log in to your EC2 server, then mount the EFS filesystem (replace fs-XXXXX with the value of the efs_file_system_id output):

$ sudo -s
# mkdir /mnt/efs
# mount -t efs fs-XXXXX:/ /mnt/efs

If you’re having trouble mounting the filesystem, double check the security groups and take a look at the User Guide.

4. Install WordPress.

Now that the filesystem is mounted, we can finally proceed to install WordPress. Terraform automatically created a directory in the EFS filesystem (/mnt/efs/roots/wp-lambda-$RANDOM_STRING), so cd to there first. Download the latest version of WordPress, then extract the files there.

Now, you can go ahead with the famous five-minute install like you would with any other WordPress site! If you didn’t set a custom domain name, your site should be accessible at the domain name outputted at cloudfront_distribution_domain_name. If you did set a custom domain, then set a CNAME or alias to the CloudFront distribution domain name, then you should be able to access the site there.

Where to go from here

Here are some ideas for performance improvements that I haven’t tried, but should have some potential.

  • Upload files to S3 instead of WordPress. I use this plugin by Human Made: humanmade/S3-Uploads.
  • Experiment with adjusting the opcache settings in src/php.ini.
  • Use a lightweight nginx server to serve static assets from EFS to CloudFront.
  • Experiment with setting Cache-Control headers in handler.php for static files.

Limitations

There are a couple hard limits imposed by AWS due to the technical limitations of the infrastructure.

Here are some other limitations that you’ll have to keep in mind.

  • No FTP / SSH access — you’ll need to manage an EC2 instance if you need command line or direct file access.
  • All the considerations of accessing a connection-oriented database from Lambda. You can try using Aurora Serverless if you run in to connection problems. RDS Proxy may also be able to provide you with a solution.

Thanks!

Thanks for reading! If you have any questions or comments, please don’t hesitate to leave a comment or send me a tweet.

Categories
English

Habits I’ll be keeping after COVID-19

During the COVID-19 pandemic, schools and daycares have been closed, so my family decided to use this as an opportunity to make some habits to make sure we can get through this period with minimal interruptions to life and work. Here are some habits that have worked so well for us that we’re planning on keeping them, even after the kids go back to school / daycare.

Keeping a schedule

We use a schedule to make sure the time we eat, sleep, and do activities are at regular times every day. At first, I thought this would be a good tool to let the kids know when we were working and when we could play together, but it’s proven to be a useful tool to both adults and children. I’ve written a blog post in Japanese about this as well.

Cleaning the house every day

This is related to the schedule, but we clean up the house every day at 5 PM, start the Roomba, take a bath, then get ready for dinner. Before, we would clean up whenever we felt like it, and as you can imagine, the room got pretty messy. We’d probably vacuum once a week or so. Now, the house stays clean and it’s much less stressful.

Taking the trash out

No, this isn’t a euphemism. Before, we would let the trash (especially recyclables like glass bottles, cans, etc) stock up before bringing it to the trash room, but now that we’re a little more flexible on time, we can take it down to the trash room immediately.

Categories
Code Snippets English

A quick shortcut to open a Ruby gem in VS Code

While working on a Ruby project, I often find myself referring to the code of various libraries when it’s easier than looking up the documentation. For this, I used to use code (bundle show GEM_NAME), but recently I’ve been getting this warning:

[DEPRECATED] use `bundle info $GEM_NAME` instead of `bundle show $GEM_NAME`

Okay, that’s fine, but bundle info returns a bunch of stuff that would confuse VS Code:

> bundle info devise
  * devise (4.7.1)
	Summary: Flexible authentication solution for Rails with Warden
	Homepage: https://github.com/plataformatec/devise
	Path: /Users/keita/.asdf/installs/ruby/2.7.0/lib/ruby/gems/2.7.0/gems/devise-4.7.1

Luckily there’s bundle info $GEM_NAME --path. code (bundle info devise --path) is kind of long to type out every time, though, so I decided to make an alias.

I use the Fish shell, so the code here is written for that shell. Adapt it to your shell as required. You’ll also need the VS Code terminal integration installed for this to work.

function bundlecode
  if test -e ./Gemfile
    code (bundle info $argv[1] --path)
  else
    set_color -o red
    echo "Couldn't find `Gemfile`. Try again in a directory with a `Gemfile`."
    set_color normal
  end
end

Usage:

> bundlecode devise
# VS Code opens!
Categories
English

How I use Git

I’ve been using Git at work for around 10 years now. I started using Git with a GUI (Tower — back when I was eligible for the student discount!), but now I use the CLI for everything except complicated diffs and merges, where I use Kaleidoscope.

A question I get asked by my coworkers often is: “how in the world do you manage using Git without a GUI?”. This blog post is supposed to answer this question.

First, I use the Fish shell. It fits with the way I think. A lot of you probably use bash or zsh, that’s fine, there is lots of documentation on how to integrate Git with those shells. This is the relevant part of .config/fish/config.fish:

set __fish_git_prompt_show_informative_status 'yes'
set __fish_git_prompt_color_branch magenta
set __fish_git_prompt_color_cleanstate green
set __fish_git_prompt_color_stagedstate red
set __fish_git_prompt_color_invalidstate red
set __fish_git_prompt_color_untrackedfiles cyan
set __fish_git_prompt_color_dirtystate blue

function fish_prompt
  # ... 
  set_color normal
  printf ' %s' (prompt_pwd)
  printf '%s' (__fish_git_prompt)
  printf ' > '
end

I’ve omitted the irrelevant portions (status checking, # prompt when root, etc. If you want to see the full file, I’ve posted it as a gist.

On a clean working directory (that is, no changed files that haven’t been committed to the repository), this looks like this:

When updating some files, it will change to something like this:

This prompt doesn’t change in real time, so changes from other terminals won’t automatically change this prompt. I have a habit of tapping the “return” key to update the prompt.

To commit these changes:

The commands I use most often:

  • git add (if you only want to add a portion of a file, git add -p is your friend) / git commit
  • git push / git pull (git pull --rebase for feature branches being shared with other devs)
  • git diff @ — show all changes, staged or not, between the working directory and the latest commit of the branch you are on (@ is a alias for HEAD)
  • git diff --cached — show only changes that are being staged for the next commit
  • git status
  • git difftool / git mergetool (this will open Kaleidoscope)

This was obviously a very cursory, high-level look at how I use Git, but I hope it was useful. It’s been a long time since I’ve used a Git GUI full time, but whenever I do use one (for example, when helping a coworker), it feels clunky compared to using the CLI (that’s not saying I don’t have my complaints about the CLI — that’s another blog post 😇).

If you have any more questions, leave a comment or contact me on Twitter, and I’ll update this post with the answers.

Categories
English Tools Useful Utilities

“Logging in” to AWS ECS Fargate

I’m a big fan of AWS ECS Fargate. I’ve written in the past about managing ECS clusters, and with Fargate — all of that work disappears and is managed by AWS instead. I like to refer to this as quasi-serverless. Sorta-serverless? Almost-serverless? I’m open to better suggestions. 😂

There are a few limitations of running in Fargate, and this blog post will focus on working around one limitation: there’s easy way to get an interactive command line shell within a running Fargate container.

The way I’m going to establish an interactive session inside Fargate is similar to how CircleCI or Heroku does this: start a SSH server in the container. This requires two components: the SSH server itself, which will be running in Fargate, and a tool to automate launching the SSH server. Most of this blog post will be about the tool to automate launching the server, called ecs-fargate-login.

If you want to skip to the code, I’ve made it available on GitHub using the MIT license, so feel free to use it as you wish.

How it works

This is what ecs-fargate-login does for you, in order:

  1. Generate a temporary SSH key pair.
  2. Use the ECS API to start a one-time task, setting the public key as an environment variable.
    • When the SSH server boots, it reads this environment variable and adds it to the list of authorized keys.
  3. Poll the ECS API for the IP address of the running task. ecs-fargate-login supports both public and private IPs.
  4. Start the ssh command and connect to the server.

When the SSH session finishes, ecs-fargate-login will make sure the ECS task is stopping.

Use cases

Most of my clients use Rails, and Rails provides an interactive REPL (read-eval-print loop) within the Rails environment. This REPL is useful for running one-off commands like creating new users or fixing some data in the database, checking and/or clearing cache items, to mention a few common tasks. Rails developers are accustomed to using the REPL, so while not entirely necessary (in the past, I usually recommended fixing data using direct database access or with one-time scripts in the application repository), it is a nice-to-have feature.

In conclusion

I don’t use this tool daily, but probably a few times a week. A few clients of mine use it as well, and they’re generally happy with how it works. However, if you have any recommendations about how it could be improved, or how the way the tool itself is architected could be improved, I’m always open to discussion. This was my first serious attempt at writing Golang code, so there are probably quite a few beginner mistakes in the code, but it should work as expected.