The 4PM stand-up

I’m not a morning person. I never have been and I doubt it’ll suddenly become one of my defining characteristics. In light of this I’ve always had a dislike of the daily stand-up happening first thing in the morning, instead over the years I’ve become to much prefer having it at about 4PM.

A late afternoon stand-up isn’t a common thing. Some people absolutely hate the idea and with no scientific studies to back me up I’m essentially just stating an opinion but I do have a few reasons.

People are sometimes late in the mornings. Having the stand-up at the very start of the day means that anyone having issues getting to work, dropping the kids off at school or dealing with tube delays for example, will probably miss it. When things are slightly off having the added pressure of your team all standing there with trello up as you stumble in, soaked and stressed, doesn’t exactly give the best of openings for the day.

My second main point is the lack of situational awareness. A lot of my day will change based on the rest of the departments. To understand what impact that will have it’s helpful to have some time for other people to start disseminating information. Did we have a small on-call issue last night? Is anyone off sick on my team or the ones they deal with? Is there a security alert for Nginx?

By having my stand-ups at a much later point, such as 4PM, all the urgent issues have normally been raised at the start of the day, and hopefully dealt with. People know about unusual circumstances and who’s not in. At the stand-up itself people are less aspirational and actually get to talk about what they’ve done, not what they intended to, and I’ve still got some time to try and get anything blocked sorted before the next day. Later sessions can also work better if you’re dealing with Americans. It’s not too early to have to deal with them, and the time zones sync up more. But then you can rightly say, “They are having their stand-ups first thing in the morning!” and you’d be right, but they have bear claws available (thanks ckolos!), so it all balances out in the end.

There are downsides to a later time. Team wide issues might lay hidden for longer in the morning. People might leave early to pick the kids up and some people will find the later slot more disruptive to their afternoon flow. It’s not going to be for everyone but if a morning slot isn’t working for the team then maybe it’s time to shake things up a little and try a later time. Maybe 4PM.

Mass Provisioning Choria Servers

The Choria Server is the agent component of the Choria Orchestrator system, it runs on every node and maintains a connection to the middleware.

Traditionally we’ve configured it using Puppet along with its mcollective compatibility layer. We intend to keep this model for the foreseeable future. Choria Server though has many more uses – it’s embeddable so can be used in IoT, tools like our go-backplane, side cars in kubernetes in more. In these and other cases the Puppet model do not work:

  • You do not have CM at all
  • You do not own the machines where Choria runs on, you provide a orchestration service to other teams
  • You are embedding the Choria Server in your own code, perhaps in a IoT device where Puppet does not make sense
  • Your scale makes using Puppet not an option
  • You wish to have very dynamic decision making about node placement
  • You wish to integrate Choria into your own Certificate Authority system

In all these cases there are real complex problems to solve in configuring Choria Server. We’ve built a system that can help solve this problem, it’s called the Choria Server Provisioner and this post introduce it.

Server side design

The Provisioner is inspired by old school bootstrap PXE networks – unconfigured nodes would join a VLAN where they will do network boot and get their configuration, once configured they reboot into the right VLAN where they will be production servers.

As in that model Choria has a mode called Provisioning Mode where it will use compiled in defaults for it’s configuration.

% choria buildinfo
Choria build settings:
 
Build Data:
     Version: 0.5.1
     Git SHA: 7d0a215
  Build Date: 2018-08-10 10:26:42 +0000
     License: Apache-2.0
  Go Version: go1.10.2
 
Network Broker Settings:
       Maximum Network Clients: 50000
  Embedded NATS Server Version: 1.2.0
 
Server Settings:
            Provisioning Brokers: prov.example.net:4222
            Provisioning Default: true
      Default Provisioning Agent: true
                Provisioning TLS: false
  Provisioning Registration Data: /etc/choria/metadata.json
              Provisioning Facts: /etc/choria/metadata.json
              Provisioning Token: set
 
Agent Providers:
        Golang MCollective Agent Compatibility version 0.0.0
 
Security Defaults:
            TLS: true
  x509 Security: true

Here under Server Settings you can see the compiled in defaults. When this server starts up without a configuration that specifically prevent provisioning mode it will connect to prov.example.net:4222 without TLS, in that mode it will only connect to the provisioning sub collective and it will publish periodically it’s /etc/choria/metadata.json to the topic choria.provisioning_data.

It will have an agent choria_provision running that expose actions to request a CSR, configure it, restart it and more.

It will then wait until some process starts interacting with it and eventually give it a configuration file and ask it to restart. Once restarted it will join it’s real home and continue there as a normal server. This is where the Choria Server Provisioner come in.

Choria Server Provisioner

As you saw above the Choria Server will connect to a specific broker and sit in a provisioning sub collective waiting to be managed. We wrote a generic high performance manager that lets you plug your logic into it and it will configure your nodes. In our tests with a very fast helper script this process is capable of provisioning many thousands of machines a minute – many more than any cloud will allow you to boot.

The basic flow that the provisioner has is this:

On startup it will:

  • start to listen for events on choria.provisioning_data
  • do a discover on the provisioning sub collective and keep doing it on regular intervals

Any node identified using any of these 2 methods are added to the work queue where one of the configured number of workers will start provisioning them, this per worker flow is:

  1. Fetch the inventory using rpcutil#inventory
  2. Request a CSR if the PKI feature is enabled using choria_provision#gencsr
  3. Call the helper with the inventory and CSR, expecting to be configured
    1. If the helper sets defer to true the node provisioning is ended and next cycle will handle it
  4. Configure the node using choria_provision#configure
  5. Restart the node using choria_provision#restart

You can see here this is a generic flow and all the magic is left up to a helper, so lets look at the helper in detail.

The helper is simply a script or program written in any configuration language that receives node specific JSON on STDIN and returns JSON on its STDOUT.

The input JSON looks something like this:

{
  "identity": "node1.example.net",
  "csr": {
    "csr": "-----BEGIN CERTIFICATE REQUEST-----....-----END CERTIFICATE REQUEST-----",
    "ssldir": "/path/to/ssldir"
  },
  "inventory": "{"agents":["choria_provision","choria_util","discovery","rpcutil"],"facts":{},"classes":[],"version":"0.5.1","data_plugins":[],"main_collective":"provisioning","collectives":["provisioning"]}"
}

In this example the PKI feature is enabled and the CSR seen here was created by the node in question – and it kept its private key secure there never transferring it anywhere. The inventory is what you would get if you ran mco rpc inventory -I node1.example.net, here the main thing you’d look at is the facts which would be all the metadata found in /etc/choria/metadata.json.

The helper then is any program that outputs JSON resembling this:

{
  "defer": false,
  "msg": "Reason why the provisioning is being defered",
  "certificate": "-----BEGIN CERTIFICATE-----......-----END CERTIFICATE-----",
  "ca": "-----BEGIN CERTIFICATE-----......-----END CERTIFICATE-----",
  "configuration": {
    "plugin.choria.server.provision": "false",
    "identity": "node1.example.net"
  }
}

Here’s a bit of code showing CFSSL integration and country specific configuration:

request = JSON.parse(STDIN.read)
request["inventory"] = JSON.parse(request["inventory"])
 
reply = {
  "defer" => false,
  "msg" => "",
  "certificate" => "",
  "ca" => "",
  "configuration" => {}
}
 
identity = request["identity"]
 
if request["csr"] && request["csr"]["csr"]
  ssldir = request["csr"]["ssldir"]
 
  # save the CSR
  File.open("%s.csr" % identity, "w") do |f|
    f.puts request["csr"]["csr"]
  end
 
  # sign the CSR using CFSSL
  signed = %x[cfssl sign -ca ca.pem -ca-key ca-key.pem -loglevel 5 #{identity}.csr 2>&1]
  signed = JSON.parse(signed)
  abort("No signed certificate received from cfssl") unless signed["cert"]
 
  # Store the CA and the signed cert in the reply
  reply["ca"] = File.read("ca.pem")
  reply["certificate"] = signed["cert"]
 
  # Create security configuration customised to the SSL directory the server chose
  reply["configuration"].merge!(
    "plugin.security.provider" => "file",
    "plugin.security.file.certificate" => File.join(ssldir, "certificate.pem"),
    "plugin.security.file.key" => File.join(ssldir, "private.pem"),
    "plugin.security.file.ca" => File.join(ssldir, "ca.pem"),
    "plugin.security.file.cache" => File.join(ssldir, "cache")
  )
end

With that out of the way lets create the rest of our configuration, we’re going to look at per country specific brokers here:

case request["inventory"]["facts"]["country"]
when "mt"
  broker = "choria1.mt.example.net:4223"
when "us"
  broker = "choria1.us.example.net:4223"
else
  broker = "choria1.global.example.net:4223"
end
 
reply["configuration"].merge!(
  "identity" => identity,
  "plugin.choria.middleware_hosts" => broker,
  "classesfile" => "/opt/puppetlabs/puppet/cache/state/classes.txt",
  "collectives" => "mcollective",
  "loglevel" => "warn",
  "plugin.yaml" => "/tmp/mcollective/generated-facts.yaml",
  "plugin.choria.server.provision" => "false",
)
 
puts reply.to_json

The configuration is simply Choria configuration as key value pairs – all strings. With the provisioning mode on by default you must disable it specifically so be sure to set plugin.choria.server.provision=false.

You can see you can potentially integrate into any CA you wish and employ any logic or data source for making the configuration. In my case I use the CFSSL API and I integrate with our asset databases to ensure a node goes with the rest of it’s POD – we have multiple networks per DC and this helps our orchestrators perform better.

The provisioner will expose it’s statistics using Prometheus format and it embeds our Choria Backplane so you can perform actions like Circuit Breaking etc fleet wide.

This dashboard is available in the GitHub repository.

Demonstration

I made a video explainer that goes in more detail and show the system in action:

Conclusion

This is a quick introduction to the process, there’s a lot more to know – you can write your own custom provisioner and even your own custom agent and more – the provisioning-agent GitHub repository has all the details. The provisioner detailed above is released as RPMs on our package cloud repo.

It’s a bit early days for this tool – personally I will soon roll it out to 10s of data centres where it will manages 100s of thousands of nodes, expect a few more hardening changes to be made. In the future we’ll also support Choria version upgrades as part of this cycle.

Find all variables used in a terraform module

Want to make sure all the variables declared in a terraform module are actually used in the code?

This code lists all variables used in each of the sub-directories containing terraform code.

It started off as a one-liner but, as usual, the code to make it look pretty is bigger than the main functional code!

#!/usr/bin/env bash

set -euo pipefail

default_ul_char=-

main() {
  process
}

print_underlined () {
  local text="$1" ; shift
  local ul_char
  if [[ -n ${1:-} ]] ; then
    ul_char="$1" ; shift
  else
    ul_char=$default_ul_char
  fi
  printf '%sn%sn' "$text" "${text//?/$ul_char}"
}

process() {
  # loop over all directories
  while read -r dir ; do
    pushd "$dir" >/dev/null
    echo
    print_underlined "$dir" 
    # get a unique list of variables used in all .tf files in this directory
    sort -u < <(
      perl -ne 'print "$1n" while /var.([w-]+)/g' ./*.tf
    )
    popd > /dev/null
  done < <(
    # get a unique list of directories containing terraform files
    # starting in the present working directory
    sort -u < <(
      find . -name '*.tf' -exec dirname {} ;
    )
  )
}

main "$@"

Some talk submission thoughts

The summer conference submission season is slowly subsiding and after reading through a combined total of a few thousand submissions I’ve got some hastily compiled thoughts. But before we get started, a disclaimer: I don’t publicly present. My views on this are from the perspective of a submission reviewer and audience member. And remember, we want to say yes. We have slots to fill and there’s nothing more satisfying than giving a new speaker a chance and seeing the feedback consist of nothing but 10’s. Hopefully some of these points will help me say yes to you in the future.

Firstly I’ll address one of the most common issues, even when it’s not a completely fair one - people submitting on the subject their employer focuses on. As an organiser you want your speakers to have solid and wide experience on their chosen topic and it’s often easier to find those depths in people that live and breath a certain thing day in and day out. However it’s easy to submit what appears to be a vendor sales pitch. In non-anonymised submissions there will always be a moment of “This company has product / service in that area. Is this a sales pitch?” Audiences have paid for their tickets and being trapped for a 45 minute white paper spiel is a sure way to fill the twitter stream with complaints.

To balance that I’m much more careful when dealing with those kind of submissions. Despite my defensive watchfulness there are things you can do to make it easier to say yes. If it’s unrelated to your paid work but in the same industry, say so. You should also state how the talk relates to your product. Is it a feature overview for enterprise customers or all generic theory anyone can use? Be explicit about how much of the talk is product specific, “20 minutes on the principles, 10 on the open source offerings and 10 on the enterprise product additions” might not be exactly what I want to see but it’s better than my assumption. I should also note no matter how much it hurts your chances you should be honest. Event organisers chat. A lot of the Velocity program chairs know each other outside of work, there’s a lot of cross over between DevOpsDays events and London isn’t that big. If you’re given the benefit of the doubt and you were less than honest then good luck in the future. As an aside this also applies to sponsors. We know who’s a joy to deal with and who’s going to keep us dangling for 8 months.

Onto my next bugbear. Submissions that include things like “8 solutions to solving pipeline problems.” If you have a number in your submission topic or introduction and don’t tell me what they are in the body of the submission I’ll assume you don’t know either. Context and content are immensely important in submissions and it’s so hard to highly rate a talk with no actual explanation of what it’s covering. If you say “The six deadly secrets of the monkey king” then you better list those six secrets with a little context on each or expect to be dropped a point or three. The organisers probably won’t be in the session and without enough context to know what the audience will be seeing, neither will you.

My third personal catch is introducing a new tool at a big event. Unless you’re someone like Hashicorp or AWS then you need to be realistic about your impact. I will google technologies and programs I don’t recognise in submissions and if the entire result set is your GitHub page and some google group issues then it’s probably not ready for one of the bigger events. Instead start at a user group or two. A couple of blog posts and then maybe something bigger at a site like dzone or the new stack. Build some buzz and presence so I can tell that people are adopting it and finding merit. There’s often an inadvertent benefit to this, a lot of user groups record and upload their sessions and this is great benefit for after the anonymised stage of the reviews. Being able to see someone present and know that they can manage an audience and don’t look like they are about to break into tears for 25 minutes is reassuring and a great presentation style can help boost your submission.

Other than my personal idiosyncrasies there are a few things you should always consider. What’s the audience getting out of this? Why are you the person to give it to them? What’s the actionable outcome from this session? You don’t have to be a senior Google employee but you do need to have an angle on the material. This is especially true on subjects like career paths or health issues where it’s easy to confuse personal anecdotes with data. Does your employer have evangelists or advocates that spend a large amount of their time presenting or reviewing submissions? If so reach out and ask them for a read through. It’s in their interests to not see 10 submissions from the same company that all get rejected for not having enough information to be progressed. I wouldn’t normally single someone out but if, as an example, you’re working for Microsoft and submitting to a conference, especially a DevOpsDays, and you’ve not asked Bridget Kromhout to review your submission then you’re missing a massive opportunity. She’s seen everything get submitted at least once and can nearly always find something constructive to improve. There’s probably a similar person at many large tech companies and getting their opinion will almost always help the process.

In general it’s a pleasure to read so many thoughtful submissions but with just a little bit more effort in the right places it becomes a lot easier to get the reviewers to say yes. And then comes the really difficult part for us.

pre-commit hooks and terraform- a safety net for your repositories

I’m the only infrastructure person on a number of my projects and it’s sometimes difficult to find someone to review pull requests. So, in self-defence, I’ve adopted git precommit hooks as a way to ensure I don’t make certain tedious mistakes before burning through peoples time and goodwill. In this post we’ll look at how pre-commit and terraform can be combined.

pre-commit is “A framework for managing and maintaining multi-language pre-commit hooks” that has a comprehensive selection of community written extensions. The extension at the core of this post will be pre-commit-terraform, which provides all the basic functionality you’ll need.

Before we start you’ll need to install pre-commit itself. You can do this via your package manager of choice. I like to run all my python code inside a virtualenv to help keep the versions isolated.

$ pip install pre-commit --upgrade
Successfully installed pre-commit-1.10.4

To keep the examples realistic I’m going to add the precommit hook to my Terraform SNS topic module. Mostly because I need it on a new project; and I want to resolve the issue raised against it.

# repo cloning preamble
git clone git@github.com:deanwilson/tf_sns_email.git
cd tf_sns_email/
git co -b add_precommi

With all the preamble done we’ll start with the simplest thing possible and build from there. First we add the basic .pre-commit-config.yaml file to the root of our repository and enable the terraform fmt hook. This hook will ensure all our terraform code matches what would be produced by running terraform fmt over your codebase.

cat <<EOF > .pre-commit-config.yaml
- repo: git://github.com/antonbabenko/pre-commit-terraform
  rev: v1.7.3
  hooks:
    - id: terraform_fmt
EOF

We then install the pre-commit within this repo so it can start to provide our safety net.

$ pre-commit install
pre-commit installed at /tmp/tf_sns_email/.git/hooks/pre-commit

Let the pain commence! We can now run pre-commit over the repository and see what’s wrong.

$ pre-commit run --all-files
[INFO] Initializing environment for git://github.com/antonbabenko/pre-commit-terraform.
Terraform fmt............................................................Failed
hookid: terraform_fmt

Files were modified by this hook. Additional output:

main.tf
outputs.tf
variables.tf

So, what’s wrong? Only everything. A quick git diff shows that it’s not actually terrible. My indentation doesn’t match that expected by terraform fmt so we accept the changes and commit them in. It’s also worth adding .pre-commit-config.yaml in too to ensure anyone else working on this branch gets the same precommit checks. Once the config file is commited you should never again be able to commit incorrectly formatted code as the precommit will prevent it from getting that far.

A second run of the hook and we’re back in a good state.

$ pre-commit run --all-files
Terraform fmt..............Passed

The first base is covered, so let’s get a little more daring and ensure our terraform is valid as well as nicely formatted. This functionality is only a single line of code away as the pre-commit extension does all of the work for us:

cat <<EOF >> .pre-commit-config.yaml
    - id: terraform_validate_with_variables
EOF

This line of config enables another of the hooks. This one ensures all terraform files are valid and that all variables are set. If you have more of a module than a project and are not supplying all the possible variables you can change terraform_validate_with_variables to terraform_validate_no_variables and it will be much more lenient.

New config in place we rerun the hooks and prepare to be disappointed.

> pre-commit run --all-files
Terraform fmt..................................Passed
Terraform validate with variables..............Failed
hookid: terraform_validate_with_variables


Error: 2 error(s) occurred:

* provider.template: no suitable version installed
  version requirements: "(any version)"
  versions installed: none
* provider.aws: no suitable version installed
  version requirements: "(any version)"
  versions installed: none

And that shows how long it’s been since I’ve used this module; it predates the provider extraction work. Fixing these issues requires adding the providers, a new variable (aws_region) to allow specification of the AWS region, and adding some defaults. Once we fix those issues the precommit hook will fail due to the providers being absent, but that’s an easy one to resolve.

...
* provider.template: no suitable version installed
  version requirements: "1.0.0"
  versions installed: none
...

> terraform init

Initializing provider plugins...
- Checking for available provider plugins on https://releases.hashicorp.com...
- Downloading plugin for provider "template" (1.0.0)...
- Downloading plugin for provider "aws" (1.30.0)...

One more precommit run and we’re in a solid starting state.

Terraform fmt.............................Passed
Terraform validate without variables......Passed

With all the basics covered we can go a little further and mixin the magic of terraform-docs too. By adding another line to the pre-commit config -

cat <<EOF >> .pre-commit-config.yaml
    - id: terraform_docs
EOF

And adding a placeholder anywhere in the README.md -

+### Module inputs and outputs
+
+<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
+<!-- END OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
+
+

terraform-docs will be invoked and add generated documentation to the README for all of the variables and outputs. If they ever change you’ll need to review and commit the differences but the hooks will stop you from ever going out of sync. Now we have this happening automatically I can remove the manually added, and error prone, documentation for variables and outputs. And be shamed into adding some useful descriptions.

pre-commit hooks will never replace a competent pull request reviewer but they help ensure basics mistakes are never made and allow your peers to focus on the important parts of the code, like structure and intent, rather than formatting and documentation consistencies. All of the code changes made in this post can be seen in the Add precommit pull request

Managing AWS Default VPC Security Groups with Terraform

When it comes to Amazon Web Services support Terraform has coverage that’s second to none. It includes most of Amazons current services, rapidly adds newly released ones, and even helps granularise existing resources by adding terraform specific extensions for things like individual rules with aws_security_group_rule. This awesome coverage makes it even more jarring when you encounter one of the rare edge cases, such as VPC default security groups.

It’s worth taking a step back and thinking about how Terraform normally works. When you write code to manage resources terraform expects to fully own the created resources life cycle. It will create it, ensure that changes made are correctly reflected (and remove those made manually), and when resources code is removed from the .tf files it will destroy it. While this is fine for 99% of the supported Amazon resources the VPC default security group is a little different.

Each Amazon Virtual Private Cloud (VPC) created will have a default security group provided. This is created by Amazon itself and is often undeletable. Rather than leaving it unmanaged, which happens all too often, we can instead add it to terraforms control with the special aws_default_security_group resource. When used this resource works a little differently than most others. Terraform doesn’t attempt to create the group, instead it’s adopted under its management umbrella. This allows you to control what rules are placed in this default group and stops the security group already exists errors that will happen if you try to manage it as a normal group.

The terraform code to add the default VPC security group looks surprisingly normal:

resource "aws_vpc" "myvpc" {
  cidr_block = "10.2.0.0/16"
}

resource "aws_default_security_group" "default" {
  vpc_id = "${aws_vpc.myvpc.id}"

  # ... snip ...
  # security group rules can go here
}

One nice little tweak I’ve found useful is to customise the default security group to only allow inbound access on port 22 from my current (very static) IP address.

# use the swiss army knife http data source to get your IP
data "http" "my_local_ip" {
    url = "https://ipv4.icanhazip.com"
}

resource "aws_security_group_rule" "ssh_from_me" {
  type            = "ingress"
  from_port       = 22
  to_port         = 22
  protocol        = "tcp"
  cidr_blocks     = ["${chomp(data.http.my_local_ip.body)}/32"]

  security_group_id = "${aws_default_security_group.default.id}"
}

Automatic Terraform documentation with terraform-docs

Terraform code reuse leads to modules. Modules lead to variables and outputs. Variables and outputs lead to massive amount of boilerplate documentation. terraform-docs lets you shortcut some of these steps and jump straight to consistent, easy to use, automatically generated documentation instead.

Terraform-docs, a self-contained binary implemented in Go, and released by Segment, provides an efficient way to add documentation to your terraform code without requiring large changes to your workflow or massive amounts of additional boilerplate. In its simplest invocation it reads the descriptions provided in your variables and outputs and displays them on the command line:

    /**
    *
    * A sample terraform file with a variable and output
    *
    */

variable "greeting" {
  type        = "string"
  description = "The string used as a greeting"
  default     = "hello"
}

output "introduction" {
  description = "The full, polite, introduction"
  value       = "${var.greeting} from terraform"
}

Running terraform-docs against this code produces:

A sample terraform file with a variable and output

  var.greeting (hello)
  The string used as a greeting

  output.introduction
  The full, polite, introduction

This basic usage makes it simpler to use existing code by presenting the official interface without over-burdening you with implementation details. Once you’ve added descriptions to your variables and outputs, something you should really already be doing, you can start to expose the documentation in other ways. By adding the markdown option -

terraform-docs markdown .

you can generate the docs in a GitHub friendly way that provides an easy, web based, introduction to what your code accepts and returns. We used this quite heavily in the GOV.UK AWS repo and it’s been invaluable. The ability to browse an overview of the terraform code makes it simpler to determine if a specific module does what you actually need without requiring you to read all of the implementation.

A collection of terraform variables and their defaults

When we first adopted terraform-docs we hit issues with the code being updated without the documentation changing to match it. We soon settled on using git precommit hooks, such as this terraform-docs githook script by Laura Martin or the heavy handed GOV.UK update-docs script. Once we had these in place the little discrepancies stopped slipping through and the reference documentation became a lot more trusted.

As an aside if you plan on using terraform-docs as part of your automated continuous integration pipeline you’ll probably want to create a terraform-docs package. I personally use FPM Cookery for this and it’d been an easy win so far.

I’ve become a big fan of terraform-docs and it’s great to see such self-contained tools making such a positive impact on the terraform ecosystem. If you’re writing tf code for consumption by more than just yourself (and even then) it’s well worth a second look.

Automatic datasource configuration with Grafana 5

When I first started my Prometheus experiments with docker-compose one of the most awkward parts of the process, especially to document, were the manual steps required to click around the Grafana dashboard in order to add the Prometheus datasource. Thanks to the wonderful people behind Grafana there has been a push in the newest major version, 5 at time of writing, to make Grafana easier to automate. And it really does pay off.

Instead of forcing you to load the UI and play clicky clicky games with vague instructions to go here, and then the tab on the left, no, the other left, down a bit… you can now configure the data source with a YAML file that’s loaded on startup.

# from datasource.yaml
apiVersion: 1

datasources:
- name: Prometheus
  type: prometheus
  access: proxy
  isDefault: true
  url: http://prometheus:9090
  # don't set this to true in production
  editable: true

Because I’m using this code base in a tinkering lab I set editable to true. This allows me to make adhoc changes. In production you’d want to make this false so people can’t accidentally break your backing store.

It only takes a little code to link everything together, add the config file and expose it to the container. You can see all the changes required in the Upgrade grafana and configure datasource via a YAML file pull request. Getting the exact YAML syntax, and confusing myself over access proxy vs direct was the hardest part. It’s only a single step along the way to a more automation friendly Grafana but it is an important one and a positive example that they are heading in the right direction.

Aqua Security microscanner – a first look

I’m a big fan of baking testing into build and delivery pipelines so when a new tool pops up in that space I like to take a look at what features it brings to the table and how much effort it’s going to take to roll out. The Aqua Security microscanner, from a company you’ve probably seen at least one excellent tech talk from in the last year, is a quite a new release that surfaces vulnerable operating systems packages in your container builds.

To experiment with microscanner I’m going to add it to my simple Gemstash Dockerfile.

FROM ubuntu:16.04
MAINTAINER dean.wilson@gmail.com

RUN apt-get update && 
    apt-get -y upgrade && 
    apt-get install -y 
      build-essential 
      ruby 
      ruby-dev 
      libsqlite3-dev 
      curl 
    && gem install --no-ri --no-rdoc gemstash

EXPOSE 9292

HEALTHCHECK --interval=15s --timeout=3s 
  CMD curl -f http://localhost:9292/ || exit 1

CMD ["gemstash", "start", "--no-daemonize"]

This is a conceptually simple Dockerfile. We update the Ubuntu package list, upgrade packages where needed, add dependencies required to build our rubygems and then install gemstash. From this very boilerplate base we only need to make a few changes for microscanner to run.

> git diff Dockerfile
diff --git a/gemstash/Dockerfile b/gemstash/Dockerfile
index 741838f..bab819a 100644
--- a/gemstash/Dockerfile
+++ b/gemstash/Dockerfile
@@ -2,7 +2,6 @@ FROM ubuntu:16.04
 MAINTAINER dean.wilson@gmail.com

 RUN apt-get update && 
-    apt-get -y upgrade && 
     apt-get install -y 
       build-essential 
       ruby 
@@ -11,6 +10,14 @@ RUN apt-get update && 
       curl 
     && gem install --no-ri --no-rdoc gemstash

+ARG token
+RUN apt-get update && apt-get -y install ca-certificates wget && 
+    wget -O /microscanner https://get.aquasec.com/microscanner && 
+    chmod +x /microscanner && 
+    /microscanner ${token} && 
+    rm -rf /microscanner
+

Firstly we remove the package upgrade step, as we want to ensure vulnerabilities are present in our container. We then use the newer ARG directive to tell Docker we will be passing a value named token in at build time. Lastly we attempt to add microscanner and its dependencies, in a single image layer. As we’re using the wget and ca- certificates packages it does have a small impact on container size but microscanner itself is added, used and removed without a trace.

You’ll notice we specify a token when running the scanner. This grants access to the Aqua scanning servers, and is rate limited. How do you get a token? You request it by calling out to the Aqua Security container with your email address:

docker run --rm -it aquasec/microscanner --register foo@mailinator.com
# ... snip ...
Aqua Security MicroScanner, version 2.6.4
Community Edition

Accept and proceed? Y/N:
y
Please check your email for the token.

Once you have the token (mine came through in seconds) you can build the container:

docker build --build-arg=token=A1A1Aaa1AaAaAAA1 --no-cache .

For this experiment I’ve taken the big hammer of --no-cache to ensure all the packages are tested on each build. This will have a build time performance aspect and should be considered along with the other best practices. If your container has vulnerable package versions you’ll get a massive dump of JSON in your build output. Individual packages will show their vulnerabilities:

{
      "resource": {
        "format": "deb",
        "name": "systemd",
        "version": "229-4ubuntu21.1",
        "arch": "amd64",
        "cpe": "pkg:/ubuntu:16.04:systemd:229-4ubuntu21.1",
        "name_hash": "2245b39c177e93fc015ba051be4e8574"
      },
      "scanned": true,
      "vulnerabilities": [
        {
          "name": "CVE-2018-6954",
          "description": "systemd-tmpfiles in systemd through 237 mishandles symlinks present in non-terminal path components, which allows local users to obtain ownership of arbitrary files via vectors involving creation of a directory and a file under that directory, and later replacing that directory with a symlink. This occurs even if the fs.protected_symlinks sysctl is turned on.",
          "nvd_score": 7.2,
          "nvd_score_version": "CVSS v2",
          "nvd_vectors": "AV:L/AC:L/Au:N/C:C/I:C/A:C",
          "nvd_severity": "high",
          "nvd_url": "https://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2018-6954",
          "vendor_score": 5,
          "vendor_score_version": "Aqua",
          "vendor_severity": "medium",
          "vendor_url": "https://people.canonical.com/~ubuntu-security/cve/2018/CVE-2018-6954.html",
          "publish_date": "2018-02-13",
          "modification_date": "2018-03-16",
          "fix_version": "any in ubuntu 17.04",
          "solution": "Upgrade operating system to ubuntu version 17.04 (includes fixed versions of systemd)"
        }
}

You’ll also see some summary information, total number of issues, run time and container operating system values for example.

  "vulnerability_summary": {
    "total": 147,
    "medium": 77,
    "low": 70,
    "negligible": 6,
    "score_average": 4.047619,
    "max_score": 5,
    "max_fixable_score": 5,
    "max_fixable_severity": "medium"
  },

If any of the vulnerabilities are considering to be high in severity then the build should fail, preventing you from going live with known issues.

It’s very early days for microscanner and there’s a certain amount of inflexibility that will shake out over use, such as being able to fail builds on medium or even low severity issues, or only show packages with vulnerabilities, but it’s a very easy way to add this kind of safety net to your containers and worth keeping an eye on.