Shipping Events from Fluentd to Elasticsearch

We use fluentd to process and route log events from our various applications. It’s simple, safe, and flexible. With at-least-once delivery by default, log events are buffered at every step before they’re sent off to the various storage backends. However, there are some caveats with using Elasticsearch as a backend.

Currently, our setup looks something like this:

The general flow of data is from the application, to the fluentd aggregators, then to the backends — mainly Elasticsearch and S3. If a log event warrants a notification, it’s published to a SNS topic, which in turn triggers a Lambda function that sends the notification to Slack.

The fluentd aggregators are placed by an auto-scaling group, but are not load balanced by a load balancer. Instead, a Lambda function connected to the auto-scaling group lifecycle notifications updates a DNS round-robin entry with the private IP addresses of the fluentd aggregator instances.

We use the fluent-plugin-elasticsearch plugin to output log events to Elasticsearch. However, because this plugin uses the bulk insert API and does not validate whether events have actually been successfully inserted in to the cluster, it is dangerous to rely on it exclusively (thus the S3 backup).

IAM Policy for KMS-Encrypted Remote Terraform State in S3

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::<bucket name>/*",
                "arn:aws:s3:::<bucket name>"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "kms:Encrypt",
                "kms:Decrypt",
                "kms:GenerateDataKey"
            ],
            "Resource": [
                "<arn of KMS key>"
            ]
        }
    ]
}

Don’t forget to update the KMS Key Policy, too. I spent a bit of time trying to figure out why it wasn’t working, until CloudTrail helpfully told me that the kms:GenerateDataKey permission was also required. Turn it on today, even if you don’t need the auditing. It’s an excellent permissions debugging tool.

Authenticating Linux logins via LDAP (Samba / Active Directory)

I’ve been working on infrastructure of a fleet of a few dozen Amazon EC2 instances for the past week, and with a rapidly-growing team, we decided it was appropriate to make a central authentication / authorization service.

So, that meant setting up some sort of LDAP server.

I was a bit intimidated at first (the most I’ve done is seen people manage and complain about Active Directory), but I finally got it set up. Here are the components:

  • AWS Directory Service (Simple Directory) is used as the directory server.
  • A t2.large Windows Server instance used to administer directory (usually stopped).
  • A bunch of VPC settings to make the directory service the default DNS resolver of the VPC.
  • An Ansible play I made to:
    • Join the instance to the directory.
    • Configure sshd to pull public keys from the directory.
    • Add an access filter to allow access for users who are members of the appropriate groups.
    • Update sudoers to allow sudo for users who are members of the appropriate groups.

The first three weren’t too hard — Amazon has pretty good documentation and tutorials that cover this pretty well. I recommend reading them in this order:

  1. Tutorial: Create a Simple AD Directory.
  2. Create a DHCP Options Set.
  3. Joining a Windows Instance to an AWS Directory Service Domain — read the limitations and prerequisites (you’ll need a special EC2 IAM Role) — then skip to “Joining a Domain Using the Amazon EC2 Launch Wizard”.
  4. Delegating Directory Join Privileges — this is important for security.
  5. Manually Add a Linux Instance.

On step 5, the realm join command will prompt for a password. I spent a few days trying to figure out what the best way to automate this was — I tried creating a Kerberos keytab and use that for authentication, but I wasn’t getting consistent results (for some reason that is probably clear to someone who knows a lot about Kerberos, the realm join would work but after a realm leave, Kerberos would complain that the join account didn’t exist anymore — even though I couldn’t find any differences from the AD admin tools). I eventually decided to encrypt the directory join account password in an Ansible vault and use the Ansible expect module to automate the password entry.

To do

I’m currently using the Active Directory “Users & Groups” administration tool to administer users, but this involves booting a Windows instance every time a change to the directory is made — ideally, I want a simple web-based tool to add/remove/change users, their SSH public keys, and groups. There are a few web-based tools out there already, but the ones I’ve come across are either too complicated or don’t manage SSH keys as well.

Upgrading PostgreSQL on Ubuntu

I recently started using Ubuntu Linux on my main development machine. That means that my PostgreSQL database is running under Ubuntu, as well. I’ve written guides to upgrading PostgreSQL using Homebrew in the past, but the upgrade process under Ubuntu was much smoother.

These steps are assuming that you use Ubuntu 16.04 LTS, and PostgreSQL 9.6 is already installed via apt.

  1. Stop the postgresql service.
    $ sudo service postgresql stop
    
  2. Move the newly-created PostgreSQL 9.6 cluster elsewhere.
    $ sudo pg_renamecluster 9.6 main main_pristine
    
  3. Upgrade the 9.5 cluster.
    $ sudo pg_upgradecluster 9.5 main
    
  4. Start the postgresql service.
    $ sudo service postgresql start
    

Now, when running pg_lsclusters, you should see something like the following:

9.5 main          5434 online postgres /var/lib/postgresql/9.5/main          /var/log/postgresql/postgresql-9.5-main.log
9.6 main          5432 online postgres /var/lib/postgresql/9.6/main          /var/log/postgresql/postgresql-9.6-main.log
9.6 main_pristine 5433 online postgres /var/lib/postgresql/9.6/main_pristine /var/log/postgresql/postgresql-9.6-main_pristine.log

Verify everything is working as expected, then feel free to remove the 9.5/main and 9.6/main_pristine clusters (pg_dropcluster).

These cluster commands may be available in other distros, but I haven’t been able to check them. YMMV. Good luck!

macOS Sierra

Here are the headline features of Sierra, and my thoughts about them.

Siri

Don’t use it on the phone, won’t use it on the Mac. It would be nice if I could use it by typing, though.

Universal Clipboard

This seems like a massive security risk. It works via iCloud, so if someone (like my daughter) is using my iPad, then it will sync the clipboard to that, too. How do I turn it off?

Auto Unlock

screen-shot-2016-09-21-at-21-35-28

I can’t use it, because “two-factor authentication” is apparently completely different from “two-step verification”, and they’re completely incompatible. I can switch to 2FA, but that means I have to logout and re-login to everything, right? Will that help my iMessages that are already not syncing between my Mac and my iPhone?

(Side rant: I can’t use any of the iMessage features because I’m completely on Telegram now, because of the aforementioned troubles with iMessages being unreliable. Sometimes they come to the Mac. Sometimes the iPhone. Sometimes both. Who knows! I need some kind of cloud forecast app to know. Or learn how to read the stars. Or something. Is this the kind of stuff they put in horoscopes?)

Apple Pay

Not available in Japan yet (but will be coming soon, to the “major” credit card brands: JCB and MasterCard. VISA isn’t “major”, I suppose).

Photos

For some reason, the photos themselves sync between machines, but the facial recognition metadata doesn’t? Progress is reducing the amount of things we want to babysit, not more. I don’t want to babysit my photos by correcting facial recognition guesses on both machines, over 5,000+ photos.

Messages

Hmm.

iCloud Drive

Is it better than Dropbox yet?

Optimized Storage

All of a sudden, the voice of one of my podcast panelists simply vanished from the mix. I quit and re-launched Logic, only to be told that the file in question was missing. Sure enough, a visit to Finder revealed that Sierra had “optimized” my storage and removed that file from my local drive.

macOS Sierra Review by Jason Snell at sixcolors.com

No thanks.

Picture in Picture

Okay, this might be useful sometimes.

Tabs

“Finally”.

Conclusion

2/10

Look, I don’t mean to degrade the excellent work of the team of engineers responsible for macOS at Apple. They’re doing a good job. The upgrade process to Sierra was smooth. There are just a lot of features in this release that don’t excite me. I wish they had fixed other things, like that silly facial recognition sync thing. Maybe they will in the future. I really hope so.

Also: Don’t say “then just use Linux”, because I already do. It’s not as elegant as macOS, but it’s reliable for the work I do (web programming) and when something does go wrong, I know how to fix it.

If you want to get in to an argument with me about how I’m wrong, please make an App.net account and contact me there, I’m @keita.

Convox

I stumbled upon Convox a couple weeks ago, and found it pretty interesting. It’s led by a few people formerly from Heroku, and it certainly feels like it. A simple command-line interface to manage your applications on AWS, with almost no AWS-specific configuration required.

An example of how simple it is to deploy a new application:

$ cd ~/my-new-application
$ convox apps create
$ convox apps info
Name       my-new-application
Status     creating
Release    (none)
Processes  (none)
Endpoints  
$ convox deploy
Deploying my-new-application
Creating tarball... OK
Uploading... 911 B / 911 B  100.00 % 0       
RUNNING: tar xz
...
... wait 5-10 minutes for the ELB to be registered ...
$ convox apps info
Name       my-new-application
Status     running
Release    RIIDWNBBXKL
Processes  web
Endpoints  my-new-application-web-L7URZLD-XXXXXXX.ap-northeast-1.elb.amazonaws.com:80 (web)

Now, you can access your application at that ELB specified in the “Endpoints” section.

I haven’t used Convox with more complex applications, but it definitely looks interesting. It uses a little too much infrastructure than I would like for personal, small projects (a dedicated ELB for the service manager, for example). However, when you’re managing multiple large deploys of complex applications, the time saved by Convox doing the infrastructure work for you seems like it would pay for itself.

The philosophy behind Convox:

The Convox team and the Rack project have a strong philosophy about how to manage cloud services. Some choices we frequently consider:

  • Open over Closed
  • Integration over Invention
  • Services over Software
  • Robots over Humans
  • Shared Expertise vs Bespoke
  • Porcelain over Plumbing

I want to focus on “Robots over Humans” here — one of AWS’s greatest strengths is that almost every single thing can be automated via an API. However, I feel like its greatest weakness is that the APIs are not very user-friendly — they’re very disjointed and not consistent between services. The AWS-provided GUI “service dashboard” is packed with features, and you can see some kind of similarity with UI elements, but it basically stops there. Look at the Route 53, ElastiCache, and EC2 dashboards — they’re completely different.

Convox, in my limited experience, abstracts all of this unfriendliness away and presents you with a simple command line interface to allow you to focus on your core competency — being an application developer.

I, personally, am an application / infrastructure developer (some may call me DevOps, but I’m not particularly attached to that title), and Convox excites me because it has the potential to throw half of the work necessary to get a secure, private application cluster running on AWS.