Choria Playbooks DSL

I previously wrote about Choria Playbooks – a reminder they are playbooks written in YAML format and can orchestrate many different kinds of tasks, data, inputs and discovery systems – not exclusively ones from MCollective. It integrates with tools like terraform, consul, etcd, Slack, Graphite, Webhooks, Shell scripts, Puppet PQL and of course MCollective.

I mentioned in that blog post that I did not think a YAML based playbook is the way to go.

I am very pleased to announce that with the release of Choria 0.6.0 playbooks can now be written with the Puppet DSL. I am so pleased with this that effectively immediately the YAML DSL is deprecated and set for a rather short life time.

A basic example can be seen here, it will:

  • Reuse a company specific playbook and notify Slack of the action about to be taken
  • Discover nodes using PQL in a specified cluster and verify they are using a compatible Puppet Agent
  • Obtain a lock in Consul ensuring only 1 member in the team perform critical tasks related to the life cycle of the Puppet Agent at a time
  • Disable Puppet on the discovered nodes
  • Wait for up to 200 seconds for the nodes to become idle
  • Release the lock
  • Notify Slack that the task completed
# Disables Puppet and Wait for all in-progress catalog compiles to end
plan acme::disable_puppet_and_wait (
  Enum[alpha, bravo] $cluster
) {
  choria::run_playbook(acme::slack_notify, message => "Disabling Puppet in cluster ${cluster}")
  $puppet_agents = choria::discover("mcollective",
    discovery_method => "choria",
    agents => ["puppet"],
    facts => ["cluster=${cluster}"],
    uses => { puppet => ">= 1.13.1" }
  $ds = {
    "type" => "consul",
    "timeout" => 120,
    "ttl" => 60
  choria::lock("locks/puppet.critical", $ds) || {
      "action" => "puppet.disable",
      "nodes" => $puppet_agents,
      "fail_ok" => true,
      "silent" => true,
      "properties" => {"message" => "restarting puppet server"}
      "action"    => "puppet.status",
      "nodes"     => $puppet_agents,
      "assert"    => "idling=true",
      "tries"     => 10,
      "silent"    => true,
      "try_sleep" => 20,
    message => sprintf("Puppet disabled on %d nodes in cluster %s", $puppet_agents.count, $cluster)

As you can see we can re-use playbooks and build up a nice cache of utilities that the entire team can use, the support for locks and data sharing ensures safe and coordinated use of this style of system.

You can get this today if you use Puppet 5.4.0 and Choria 0.6.0. Refer to the Playbook Docs for more details, especially the Tips and Patterns section.

Why Puppet based DSL?

The Plan DSL as you’ll see in the Background and History part later in this post is something I have wanted a long time. I think the current generation Puppet DSL is fantastic and really suited to this problem. Of course having this in the Plan DSL I can now also create Ruby versions of this and I might well do that.

The Plan DSL though have many advantages:

  • Many of us already know the DSL
  • There are vast amounts of documentation and examples of Puppet code, you can get trained to use it.
  • The other tools in the Puppet stable support plans – you can use puppet strings to document your Playbooks
  • The community around the Puppet DSL is very strong, I imagine soon rspec-puppet might support testing Plans and so by extension Playbooks. This appears to be already possible but not quite as easy as it could be.
  • We have a capable and widely used way of sharing these between us in the Puppet Forge

I could not compete with this in any language I might want to support.

Future of Choria Playbooks

As I mentioned the YAML playbooks are not long for this world. I think they were an awesome experiment and I learned a ton from them, but these Plan based Playbooks are such a massive step forward that I just can’t see the YAML ones serving any purpose what so ever.

This release supports both YAML and Plan based Playbooks, the next release will ditch the YAML ones.

At that time a LOT of code will be removed from the repositories and I will be able to very significantly simplify the supporting code. My goal is to make it possible to add new task types, data sources, discovery sources etc really easily, perhaps even via Puppet modules so the eco system around these will grow.

I will be doing a bunch of work on the Choria Plugins (agent, server, puppet etc) and these might start shipping small Playbooks that you can use in your own Playbooks. The one that started this blog post would be a great candidate to supply as part of the Choria suite and I’d like to do that for this and many other plugins.

Background and History

For many years I have wanted Puppet to move in a direction that might one day support scripts – perhaps even become a good candidate for shell scripts, not at the expense of the CM DSL but as a way to reward people for knowing the Puppet Language. I wanted this for many reasons but a major one was because I wanted to use it as a DSL to write orchestration scripts for MCollective.

I did some proof of concepts of this late in 2012, you can see the fruits of this POC here, it allowed one to orchestrate MCollective tasks using Puppet DSL and a Ruby DSL. This was interesting but the DSL as it was then was no good for this.

I also made a pure YAML Puppet DSL that deeply incorporated Hiera and remained compatible with the Puppet DSL. This too was interesting and in hindsight given the popularity of YAML I think I should have given this a lot more attention than I did.

Neither of these really worked for what I needed. Around the time Henrik Lindberg started talking about massive changes to the Puppet DSL and I think our first ever conversation covered this very topic – this must have been back in 2012 as well.

More recently I worked on YAML based playbooks for Choria, a sample can be seen in the old Choria docs, this is about the closest I got to something workable, we have users in the wild using it and having success with these. As a exploration they were super handy and taught me loads.

Fast forward to Puppet Conf 2017 and Puppet Inc announced something called Puppet Plans, these are basically script like, uncompiled (kind of), top-down executed and aimed at use within your CLI much like you would a script. This was fantastic news, unfortunately the reality ended up with these locked up inside their new SSH based orchestrator called Bolt. Due to some very unfortunate technical direction and decision making Plans are entirely unusable by Puppet users without Bolt. Bolt vendors it’s own Puppet and Facter and so it’s unaware of the AIO Puppet.

Ideally I would want to use Plans as maintained by Puppet Inc for my Playbooks but the current status of things are that the team just is not interested in moving in that direction. Thus in the latest version of Choria I have implemented my own runner, result types, error types and everything needed to write Choria Playbooks using the Puppet DSL.


I am really pleased with how these playbooks turned out and am excited for what I can provide to the community in the future. There are no doubt some rough edges today in the implementation and documentation, your continued feedback and engagement in the Choria community around these would ensure that in time we will have THE Playbook system in the Puppet Eco system.

Prometheus experiments with docker-compose

As 2018 rolls along the time has come to rebuild parts of my homelab again. This time I’m looking at my monitoring and metrics setup, which is based on sensu and graphite, and planning some experiments and evaluations using Prometheus. In this post I’ll show how I’m setting up my tests and provide the Prometheus experiments with docker-compose source code in case it makes your own experiments a little easier to run.

My starting requirements were fairly standard. I want to use containers where possible. I want to test lots of different backends and I want to be able to pick and choose which combinations of technologies I run for any particular tests. As an example I have a few little applications that make use of redis and some that use memcached, but I don’t want to be committed to running all of the backing services for each smaller experiment. In terms of technology I settled on docker-compose to help keep the container sprawl in check while also enabling me to specify all the relationships. While looking into compose I found Understanding multiple Compose files and my basic structure began to emerge.

Starting with prometheus and grafana themselves I created the prometheus-server directory and added a basic prometheus config file to configure the service. I then added configuration for each of the things it was to collect from; prometheus and grafana in this case. Once these were in place I added the prometheus and grafana docker-compose.yaml file and created the stack.

docker-compose -f prometheus-server/docker-compose.yaml up -d

docker-compose -f prometheus-server/docker-compose.yaml ps

> docker-compose -f prometheus-server/docker-compose.yaml ps
        Name                   Command       State   Ports
prometheusserver_grafana_1     /       Up>3000/tcp
prometheusserver_prometheus_1  /bin/prom ... Up>9090/tcp

After manually configuring the prometheus data source in Grafana, all of which is covered in the README you have a working prometheus scraping itself and grafana and a grafana that allows you to experiment with presenting the data.

While this is a good first step I need visibility into more than the monitoring system itself, so it’s time to add another service. Keeping our goal of being modular in mind I decided to break everything out into separate directories and isolate the configuration. Adding a new service is as simple as adding a redis-server directory and writing a docker-compose file to run redis and the prometheus exporter we use to get metrics from it. This part is simple as most of the work is done for us. We use third party docker containers and everything is up and running. But how do we add the redis exporter to the prometheus targets? That’s where docker-composes merging behaviour shines.

In our base docker-compose.yaml file we define the prometheus service and the volumes assigned to it:

    image: prom/prometheus:v2.1.0
      - 9090:9090
      - public
      - prometheus_data:/prometheus
      - ${PWD}/prometheus-server/config/prometheus.yml:/etc/prometheus/prometheus.yml
      - ${PWD}/prometheus-server/config/targets/prometheus.json:/etc/prometheus/targets/prometheus.json
      - ${PWD}/prometheus-server/config/targets/grafana.json:/etc/prometheus/targets/grafana.json
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'

You can see we’re mounting individual target files in to prometheus for it to probe. Now in our docker-compose-prometheus/redis-server/docker-compose.yaml file we’ll reference back to the existing prometheus service and add to the volumes array.

      - ${PWD}/redis-server/redis.json:/etc/prometheus/targets/redis.json

Rather than overriding the array this incomplete service configuration adds another element to it. Allowing us to build up our config over multiple docker-compose files. In order for this to work we have to run the compose commands with each config specified every time. Resulting in the slightly hideous -

  -f prometheus-server/docker-compose.yaml 
  -f redis-server/docker-compose.yaml 
  up -d

Once you’re running a stack with 3 or 4 components you’ll probably reach for aliases and add a base docker-compose replacement

alias dc='docker-compose -f prometheus-server/docker-compose.yaml -f redis-server/docker-compose.yaml'

and then call that with actual commands like dc up -d and dc logs. Adding your own application to the testing stack is as easy as adding a backing resource. Create a directory and the two config files and everything should be hooked in.

It’s early in the process and I’m sure to find issues with this naive approach but it’s enabled me to create arbitrarily complicated prometheus test environments and start evaluating its ecosystem of plugins and exporters. I’ll add more to it and refine where possible, the manual steps should hopefully be reduced by Grafana 5 for example, but hopefully it’ll remain a viable way for myself and others to run quick, adhoc tests.

Replicating NATS Streams between clusters

I’ve mentioned NATS before – the fast and light weight message broker from – but I haven’t yet covered the sister product NATS Streaming before so first some intro.

NATS Streaming is in the same space as Kafka, it’s a stream processing system and like NATS it’s super light weight delivered as a single binary and you do not need anything like Zookeeper. It uses normal NATS for communication and ontop of that builds streaming semantics. Like NATS – and because it uses NATS – it is not well suited to running over long cluster links so you end up with LAN local clusters only.

This presents a challenge since very often you wish to move data out of your LAN. I wrote a Replicator tool for NATS Streaming which I’ll introduce here.


First I guess it’s worth covering what Streaming is, I should preface also that I am quite new in using Stream Processing tools so I am not about to give you some kind of official answer but just what it means to me.

In a traditional queue like ActiveMQ or RabbitMQ, which I covered in my Common Messaging Patterns posts, you do have message storage, persistence etc but those who consume a specific queue are effectively a single group of consumers and messages either go to all or load shared all at the same pace. You can’t really go back and forth over the message store independently as a client. A message gets ack’d once and once it’s been ack’d it’s done being processed.

In a Stream your clients each have their own view over the Stream, they all have their unique progress and point in the Stream they are consuming and they can move backward and forward – and indeed join a cluster of readers if they so wish and then have load balancing with the other group members. A single message can be ack’d many times but once ack’d a specific consumer will not get it again.

This is to me the main difference between a Stream processing system and just a middleware. It’s a huge deal. Without it you will find it hard to build very different business tools centred around the same stream of data since in effect every message can be processed and ack’d many many times vs just once.

Additionally Streams tend to have well defined ordering behaviours and message delivery guarantees and they support clustering etc. much like normal middleware has. There’s a lot of similarity between streams and middleware so it’s a bit hard sometimes to see why you won’t just use your existing queueing infrastructure.

Replicating a NATS Stream

I am busy building a system that will move Choria registration data from regional data centres to a global store. The new Go based Choria daemon has a concept of a Protocol Adapter which can receive messages on the traditional NATS side of Choria and transform them into Stream messages and publish them.

This gets me my data from the high frequency, high concurrency updates from the Choria daemons into a Stream – but the Stream is local to the DC. Indeed in the DC I do want to process these messages to build a metadata store there but I also want to processes these messages for replication upward to my central location(s).

Hence the importance of the properties of Streams that I highlighted above – multiple consumers with multiple views of the Stream.

There are basically 2 options available:

  1. Pick a message from a topic, replicate it, pick the next one, one after the other in a single worker
  2. Have a pool of workers form a queue group and let them share the replication load

At the basic level the first option will retain ordering of the messages – order in the source queue will be the order in the target queue. NATS Streaming will try to redeliver a message that timed out delivery and it won’t move on till that message is handled, thus ordering is safe.

The 2nd option since you have multiple workers you have no way to retain ordering of the messages since workers will go at different rates and retries can happen in any order – it will be much faster though.

I can envision a 3rd option where I have multiple workers replicating data into a temporary store where on the other side I inject them into the queue in order but this seems super prone to failure, so I only support these 2 methods for now.

Limiting the rate of replication

There is one last concern in this scenario, I might have 10s of data centres all with 10s of thousands of nodes. At the DC level I can handle the rate of messages but at the central location where I might have 10s of DCs x 10s of thousands of machines if I had to replicate ALL the data at near real time speed I would overwhelm the central repository pretty quickly.

Now in the case of machine metadata you probably want the first piece of metadata immediately but from then on it’ll be a lot of duplicated data with only small deltas over time. You could be clever and only publish deltas but you have the problem then that should a delta publish go missing you end up with a inconsistent state – this is something that will happen in distributed systems.

So instead I let the replicator inspect your JSON, if your JSON has something like fqdn in it, it can look at that and track it and only publish data for any single matching sender every 1 hour – or whatever you configure.

This has the effect that this kind of highly duplicated data is handled continuously in the edge but that it only gets a snapshot replication upwards once a hour for any given node. This solves the problem neatly for me without there being any risks to deltas being lost, it’s also a lot simpler to implement.

Choria Stream Replicator

So finally I present the Choria Stream Replicator. It does all that was described above with a YAML configuration file, something like this:

debug: false                     # default
verbose: false                   # default
logfile: "/path/to/logfile"      # STDOUT default
state_dir: "/path/to/statedir"   # optional
        topic: acme.cmdb
        source_url: nats://source1:4222,nats://source2:4222
        source_cluster_id: dc1
        target_url: nats://target1:4222,nats://target2:4222
        target_cluster_id: dc2
        workers: 10              # optional
        queued: true             # optional
        queue_group: cmdb        # optional
        inspect: host            # optional
        age: 1h                  # optional
        monitor: 10000           # optional
        name: cmdb_replicator    # optional

Please review the README document for full configuration details.

I’ve been running this in a test DC with 1k nodes for a week or so and I am really happy with the results, but be aware this is new software so due care should be given. It’s available as RPMs, has a Puppet module, and I’ll upload some binaries on the next release.

A short 2017 review

It’s time for a little 2017 navel gazing. Prepare for a little self-congratulation and a touch of gushing. You’ve been warned. In general my 2017 was a decent one in terms of tech. I was fortunate to be presented a number of opportunities to get involved in projects and chat to people that I’m immensely thankful for and I’m going to mention some of them here to remind myself how lucky you can be.

Let’s start with conferences, I was fortunate enough to attend a handful of them in 2017. Scale Summit was, as always, a great place to chat about our industry. In addition to the usual band of rascals I met Sarah Wells in person for the first time and was blown away by the breadth and depth of her knowledge. She gave a number of excellent talks over 2017 and they’re well worth watching. The inaugural Jeffcon filled in for a lack of Serverless London (fingers crossed for 2018) and was inspiring throughout, from the astounding keynote by Simon Wardley keynote all the way to the after conference chats.

I attended two DevopsDays, London, more about which later, and Stockholm. It was the first in Sweden and the organisers did the community proud. In a moment of annual leave burning I also attended Google Cloud and AWS Summits at the Excel centre. It’s nice to see tech events so close to where I’m from. I finished the year off with the GDS tech away day, DockerCon Europe and Velocity EU.

DevopsDays holds a special place in my heart as the conference and community that introduced me to so many of my peers that I heartily respect. The biggest, lasting contribution, of Patricks for me is building those bridges. When the last “definition of Devops” post is made I’ll still cherish the people I met from that group of very talented folk. That’s one of the reasons I was happy to be involved in the organisation of my second London DevOps. You’d be amazed at the time, energy and passion the organisers, speakers and audience invest in to a DevopsDays event. But it really does show on the day(s).

I was also honoured to be included in the Velocity Europe Program Committee. Velocity has always been one of the important events of industry and to go from budgeting most of a year in advance to attend to being asked to help select from the submitted papers, and even more than that, be a session chair, was something I’m immensely proud of and thankful to James Turnbull for even thinking of me. The speakers, some of who were old hands at large events and some giving their first conference talk (in their second language no less!), were a pleasure to work with and made a nerve wracking day so much better than I could have hoped. It was also a stark reminder of how much I hate speaking in front of a room full of people.

Moving away from gushing over conferences, I published a book. It was a small experiment and it’s been very educational. It’s sold a few copies, made enough to pay for the domain for a few years and led to some interesting conversations with readers. I also wrote a few Alexa skills. While they’re not the more complicated or interesting bits of code from last year they have a bit of a special significance to me. I’m from a very non-technical background so it’s nice for my family to actually see, or in this case hear, something I’ve built.

Other things that helped keep me sane were tech reviewing a couple of books, hopefully soon to be published, and reviewing talk submissions. Some for conferences I was heavily involved in and some for events I wasn’t able to attend. It’s a significant investment of time but nearly every one of them taught me something. Even about technology I consider myself competent in.

I still maintain a small quarterly Pragmatic Investment Plan (PiP), which I started a few years ago, and while it’s more motion than progress these days it does keep me honest and ensure I do at least a little bit of non-work technology each month. Apart from Q1 2017 I surprisingly managed to read a tech book each month, post a handful of articles on my blog, and attend a few user groups here and there. I’ve kept the basics of the PiP for 2018 and I’m hoping it keeps me moving.

My general reading for the year was the worst it’s been for five years. I managed to read, from start to finish, 51 books. Totalling under 15,000 pages. I did have quite a few false starts and unfinished books at the end which didn’t help.

Oddly, my most popular blog post of the year was Non-intuitive downtime and possibly not lost sales. It was mentioned in a lot of weekly newsletters and resulted in quite a bit of traffic. SRE weekly also included it, which was a lovely change of pace from my employer being mentioned in the “Outages” section.

All in all 2017 was a good year for me personally and contained at least one career highlight. In closing I’d like to thank you for reading UnixDaemon, especially if you made it this far down, and let’s hope we both have an awesome 2018.

Terraform testing thoughts

As your terraform code grows in both size and complexity you should invest in tests and other ways to ensure everything is doing exactly what you intended. Although there are existing ways to exercise parts of your code I think Terraform is currently missing an important part of testing functionality, and I hope by the end of this post you’ll agree.

I want puppet catalog compile testing in terraform

Our current terraform testing process looks a lot like this:

  • precommit hooks to ensure the code is formatted and valid before it’s checked in
  • run terraform plan and apply to ensure the code actually works
  • execute a sparse collection of AWSSpec / InSpec tests against the created resources
  • Visually check the AWS Console to ensure everything “looks correct”

We ensure the code is all syntactically validate (and pretty) before it’s checked in. We then run a plan, which often finds issues with module paths, names and such, and then the slow, all encompassing, and cost increasing apply happens. And then you spot an unexpanded variable. Or that something didn’t get included correctly with a count.

I think there is a missed opportunity to add a separate phase, between plan and apply above, to expose the compiled plan in a easy to integrate format such as JSON or YAML. This would allow existing testing tools, and things like custom rspec matchers and cucumber test cases, to verify your code before progressing to the often slow, and cash consuming, apply phase. There are a number of things you could usefully test in a serialised plan output. Are your “fake if” counts doing what you expect? Are those nested data structures translating to all the tags you expect? How about the stringified splats and local composite variables? And what are the actual values hidden behind those computed properties? All of this would be visible at this stage. Having these tests would allow you to catch a lot of more subtle logic issues before you invoke the big hammer of actually creating resources.

I’m far from the first person to request this and upstream have been fair and considerate but it’s not something that’s on the short term road map. Work arounds do exist but they all have expensive limitations. The current plan file is in a binary format that isn’t guaranteed to be backwards compatible to external clients. Writing a plan output parser is possible but “a tool like this is very likely to be broken by future Terraform releases, since we don’t consider the human-oriented plan output to be a compatibility constraint” and hooking the plan generation code, an approach taken by palantir/tfjson will be a constant re-investment as terraforms core rapidly changes.

Adding a way to publish the plan in an easy to process way would allow many other testing tools and approaches to bloom and I hope I’ve managed to convince you that it’d be a great addition to terraform.

Show server side response timings in chrome developer tools

While trying to add additional performance annotations to one of my side projects I recently stumbled over the exceptionally promising Server-Timing HTTP header and specification. It’s a simple way to add semi-structured values describing aspects of the response generation and how long they each took. These can then be processed and displayed in your normal web development tools.

In this post I’ll show a simplified example, using Flask, to add timings to a single page response and display them using Google Chrome developer tools. The sample python flask application below returns a web page consisting of a single string and some fake information detailing all the actions assembling the page could have required.

# cat

from flask import Flask, make_response
app = Flask(__name__)

def hello():

    # Collect all the timings you want to expose
    # each string is how long it took in microseconds
    # and the human readable name to display
    sub_requests = [
        'redis=0.1; "Redis"',
        'mysql=2.1; "MySQL"',
        'elasticsearch=1.2; "ElasticSearch"'

    # Convert timings to a single string
    timings = ', '.join(sub_requests)

    resp.headers.set('Server-Timing', timings)

    return resp

Once you’ve started the application, with flask run, you can request this page via curl to confirm the header and values are present.

    $ curl -sI | grep Timing
    Server-Timing: redis=0.1; "Redis", mysql=2.1; "MySQL", elasticsearch=1.2; "ElasticSearch"

Now we’ve added the header, and some sample data, to our tiny Flask application let’s view it in Chrome devtools. Open the developer tools with Ctrl-Shift-I and then click on the network tab. If you hover the mouse pointer over the coloured section in “Waterfall” you should see an overlay like this:

Chrome devtools response performance graph

The values provided by our header are at the bottom under “Server Timing”.

Support for displaying the values provided with this header isn’t yet wide spread. The example, and screenshot, presented here are from Chrome 62.0.3202.75 (Official Build) (64-bit) and may require changes as the spec progresses from its current draft status. The full potential of the Server-Timing header won’t be obvious for a while but even with only a few supporting tools it’s still a great way to add some extra visibility to your projects.

Containers will not fix your broken culture

From DevOps, Docker, and Empathy by Jérôme Petazzoni (title shamelessly stolen from a talk by Bridget Kromhout):

The DevOps movement is more about a culture shift than embracing a new set of tools. One of the tenets of DevOps is to get people to talk together.

Implementing containers won’t give us DevOps.

You can’t buy DevOps by the pound, and it doesn’t come in a box, or even in intermodal containers.

It’s not just about merging “Dev” and “Ops,” but also getting these two to sit at the same table and talk to each other.

Docker doesn’t enforce these things (I pity the fool who preaches or believes it) but it gives us a table to sit at, and a common language to facilitate the conversation. It’s a tool, just a tool indeed, but it helps people share context and thus understanding.

Sorting terraform variable blocks

When developing terraform code, it is easy to end up with a bunch of variable definitions that are listed in no particular order.

Here's a bit of python code that will sort terraform variable definitions. Use it as a filter from inside vim, or as a standalone tool if you have all your variable definitions in one file.


tf_sort < >

Here's the code:

#!/usr/bin/env python
# sort terraform variables

import sys
import re

# this regex matches terraform variable definitions
# we capture the variable name so we can sort on it
pattern = r'(variable ")([^"]+)(" {[^{]+})'

def process(content):
    # sort the content (a list of tuples) on the second item of the tuple
    # (which is the variable name)
    matches = sorted(re.findall(pattern, content), key=lambda x: x[1])

    # iterate over the sorted list and output them
    for match in matches:
        print ''.join(map(str, match))

        # don't print the newline on the last item
        if match != matches[-1]:

# check if we're reading from stdin
if not sys.stdin.isatty():
    stdin =
    if stdin:

# process any filenames on the command line
for filename in sys.argv[1:]:
    with open(filename) as f:

Use your GitHub SSH key with AWS EC2 (via Terraform)

Like most people I have too many credentials in my life. Passwords, passphrases and key files seem to grow in number almost without bound. So, in an act of laziness, I decided to try and remove one of them. In this case it’s my AWS EC2 SSH key and instead reuse my GitHub public key when setting up my base AWS infrastructure.

Once you start using EC2 on Amazon Web Services you’ll need to create, or supply an existing, SSH key pair to allow you to log in to the Linux hosts. While this is an easy enough process to click through I decided to automate the whole thing and use an existing key, one of those I use for GitHub. One of its lesser known features is that GitHub exposes a users SSH public keys. This is available from everywhere, without authenticating against anything and so seemed like a prime candidate for reuse.

The terraform code to do this was a lot quicker to write than the README. As this is for my own use I could use a newer version of 0.10.* and harness the locals functionality to keep the actual resources simpler to read by hiding all the variable composing in a single place. You can find the results of this, the terraform-aws-github-ssh-keys module on GitHub, and see an example of its usage here:

module "github-ssh-keys" {
  source = "deanwilson/github-ssh-keys/aws"

  # fetch the ssh key from this user name
  github_user = "deanwilson"

  # create the key with a specific name in AWS
  aws_key_pair_name = "deanwilson-from-github"

I currently use this for my own test AWS accounts. The common baseline setup of these doesn’t get run that often in comparison to the services running in the environment so I’m only tied to GitHub being available occasionally. Once the key’s created it has a long life span and has no external network dependencies.

After the module was (quite fittingly) available on GitHub I decided to go a step further and publish it to the Terraform Module Registry. I’ve never used it before so after a brief read about the module format requirements, which all seem quite sensible, I decided to blunder my way through and see how easy it was. The Answer? Very.

Screen shot of my module on the Terraform registry

The process was pleasantly straight forward. You sign in using your GitHub account, select your Terraform modules from a drop down and then you’re live. You can see how github-ssh-keys looks as an example. Adding a module was quick, easy to follow, and well worth finishing off your modules with.