Managing multiple puppet modules with modulesync

With the exception of children, puppies and medical compliance frameworks managing one of something is normally much easier than managing a lot of them. If you have a lot of puppet modules, and you’ll eventually always have a lot of puppet modules, you’ll get bitten by this and find yourself spending as much time managing supporting functionality as the puppet code itself.

Luckily you’re not the first person to have a horde of puppet modules that share a lot of common scaffolding. The fine people at Vox Pupuli had the same issue and maintain an excellent tool, modulesync that solves this very problem. With modulesync and a little YAML you’ll soon have a consistent, easy to iterate, on set of modules.

To get started with module sync you need three things, well four if you count the puppet module horde you want to manage.

I’ve been using modulesync for some of my projects for a while but we recently adopted it for the GDS Operations Puppet Modules so there’s now a full, but nascent, example we can look at. You can find all the modulesync code in our public repo.

First we set up the basic module sync config in modulesync.yml -

---
git_base: 'git@github.com:'
namespace: gds-operations
branch: modulesync
...
# vim: syntax=yaml

This YAML mostly controls how we interact with our upstream. git_base is the base of the URL to run git operations against. In our case we explicitly specify GitHub (which is also the default) but this is easy to change if you use bitbucket, gitlab or a local server. We treat namespace as the GitHub organisation modules are under. As we never push directly to master we specify a branch our changes should be pushed to for later processing as a pull request.

The second config file, managed_modules.yml, contains a list of all the modules we want to manage:

---
- puppet-aptly
- puppet-auditd
- puppet-goenv

By default modulesync will perform any operations against every module in this file. It’s possible to filter this down to specific modules but there’s only really value in doing that as a simple test. After all keeping the modules in sync is pretty core to the tools purpose.

The last thing to configure is a little more abstract. Any files you want to manage across the modules should be placed in the moduleroot directory and given a .erb extension. At the moment we’re treating all the files in this directory as basic, static, files modulesync does expand them provides a @configs hash, which contains any values you specify in the base config_defaults.yml file. These values can also be overridden with more specific values stored along side the module itself in the remote repository.

Once you’ve created the config files and added at least a basic file to moduleroot, a LICENSE file is often a safe place to start, you can run modulesync to see what will be changed. In this case I’m going to be working with the gds-operations/puppet_modulesync_config repo.

bundle install

# run the module sync against a single module and show potential changes
bundle exec msync update -f puppet-rbenv --noop

This command will filter the managed modules (using the -f flag to select them) clone the remote git repo(s), placing them under modules/, change the branch to either master or the one specified in modulesync.yml and then present a diff of changes from the expanded templates contained in moduleroot against the cloned remote repo. None of the changes are actually made thanks to the --noop flag. If you’re happy with the diff you can add a commit message (with -m message), remove --noop and then run the command again to push the amended branch.

bundle exec msync update -m "Add LICENSE file" -f puppet-rbenv

Once the branch is pushed you can review and create a pull request as usual.

Screen shot of GitHub pull request from modulesync change

We’re at a very early stage of adoption so there is a large swathe of functionality we’re not using so I’ve not mentioned. If you’re actually using the moduleroot templates as actual templates you can have a local override, in each remote module/github repo, that can localise the configuration and be correctly merged with the main configuration. This allows you to push settings out to where they’re needed while still keeping most modules baselined. You can also customise the syncing workflow to specify bumping the minor version, updating the CHANGELOG and a number of other helpful shortcuts provided by modulesync.

Once you get above half-a-dozen modules it’s a good time to take a step back and think about how you’re going to manage dependencies, versions, spec_helpers and such in an ongoing, iterative way and modulesync presents one very helpful possible solution.

Job applications and GitHub profile oddities

I sift through a surprising amount, to me at least, of curricula vitae / resumes each month and one pattern I’ve started to notice is the ‘fork only’ GitHub profile.

There’s been a lot written over the last few years about using your GitHub profile as an integral part of your job application. Some in favour, some very much not. While each side has valid points when recruiting I like to have all the information I can to hand, so if you include a link to your profile I will probably have a rummage around. When it comes to what I’m looking for there are a lot of different things to consider. Which languages do you use? Is the usage idiomatic? Do you have docs or tests? How do you respond to people in issues and pull requests? Which projects do you have an interest in? Have you solved any of the same problems we have?

Recently however I’ve started seeing a small but growing percentage of people that have an essentially fork only profile. Often of the bigger, trendier projects, Docker, Kubernetes, Terraform for example, and there will be no contributed code. In the most blatant case there were a few amended CONTRIBUTORS files with the applicants name and email but no actual changes to the code base.

Although you shouldn’t place undue weight on an applicants GitHub profile in most cases, and in the Government we deliberately don’t consider it in any phase past the initial CV screen, it can be quite illuminating. In the past it provided an insight towards peoples attitude, aptitudes and areas of interest and now as a warning sign that someone may be more of a system gamer than a system administrator.

AWS Trivia – Broken user data and instance tag timing

Have you ever noticed in the AWS console, when new instances are created, the “Tags” tab doesn’t have any content for the first few seconds? A second or two before values are added may not seem like much but it can lead to elusive provisioning issues, especially if you’re autoscaling and have easily blamed network dependencies in your user data scripts.

A lot of people use Tag values in their user data scripts to help ‘inflate’ AMIs and defer some configuration, such as which config management classes to apply, to run time when the instance is started, rather than embedding them at build time when the AMI itself is created. In a stupendous amount of cases everything will work exactly as you expect. Instances will start, tags will be applied and user data will determine how to configure the instance based on their values. However, very rarely, the user data script will begin before the tags are applied to the instance.

If your script requires these tag values then you need to consider this rare but occasional issue and decide how to handle it. You can ignore it, as it’s very rare. If you’re using tags to assign config management roles or similar provide sensible defaults such as applying the base class. It’s possible to ensure that instances that don’t detect their tags fail their health checks and are marked as defective and terminated before they come into service. You can also stack the odds a little more in your favour by having tags reading happen a little later in your user data, run that apt-get update or AWS agent installing curl before fetching the tags for instance to give the tags more time to be applied.

Tagging is often a simple after thought but in the cloud you need a very firm understanding of which things are atomic units and which are separate services and can fail independently. Although tags may seem like a direct property of the instance they are actually handled (I think) by a completely different service, which can always fail. Understanding this split also explains why you can’t read tags and their values from the local metadata service. Which as an aside can, even more rarely, be unavailable. That was a fun afternoon.

I’ll leave you with a closing comment from the days when you could only have 10 tags. Tag values can be complex strings, for example, JSON objects. Possibly even compressed and base64 encoded JSON objects. Just putting that out there.

Over engineering a badly thought out terraform data provider

All the well managed AWS accounts I have access to include some form of security group control over which IP addresses can connect to them. I have a home broadband connection that provides a dynamic IP address. These two things do not play well together.

Every now and again my commands will annoyingly fail with ‘access denied’. I’ll run a curl icanhazip.org, raise a new PR against the isolated bootstrap project that controls my access, get it reviewed and after running terraform, restore my access. This process has to be improvable right? I know, more code will fix it!

As an experiment in writing a custom data provider for Terraform, the real reason I did any of this, I decided to try and remove the IP address from the code base completely and instead make it a run time determined value. The, never to be merged, Icanhazip data source pull request that implements this is still available and shows how to add a simple data source to terraform. Becoming a little more familiar with the code base, and how to test it properly, thanks to Richard Clamp of the Terraform GitLab provider for lots of pointers on testing, were worth the time invested even with the rejected PR.

Was this data provider a good idea? No, not really. The HTTP data source solution proposed by Martin Atkins is a much better approach and requires no changes to terraform itself. The code is easy to follow:

# main.tf

# use the swiss army knife http data source to get your IP
data "http" "my_local_ip" {
    url = "https://ipv4.icanhazip.com"
}

# write it to a local file to prove everything's fine
resource "local_file" "my_ip" {
    content  = "${chomp(data.http.my_local_ip.body)}"
    filename = "/tmp/my_ip"
}

and it does exactly what my pile of Golang does -

$ terraform apply
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

$ cat /tmp/my_ip
312.533.143.224

The more time that passes since this little experiment, the more I think the whole idea was a terrible one. My use case, bootstrapping AWS access with security groups, is at best a very niche one. It assumes your bootstrap tool isn’t restricted and only works if everyone executes terraform from the same location. Was it a complete waste of time? Not really. I learned a lot about how data sources work and how I’d implement a sensible one in the future. I also know the Terraform PR reviewers are quick, courteous and good at spotting well meaning mistakes, which as a user of the tool itself is quite reassuring.

AWS security audits with Scout2

Inspired by a link in the always excellent Last Week in AWS I decided to investigate Scout2, a “Security auditing tool for AWS environments”. Scout2 is a command line program, written in Python, that runs against your AWS account, queries your configuration data and presents common issues and misconfigurations via a set of local HTML files.

The dashboard itself is simple, but effective, and displays a nice overview of all the checks Scout2 ran.

Screen shot of the Scout2 dashboard

Installing the program and generating a report against your own infrastructure is remarkably easy and has no external requirements. In my experiments I decided to run it locally under a virtualenv against AWS using an existing profile.

cd /tmp

virtualenv scout

cd scout/

source  bin/activate

pip install awsscout2

# set up your access here

Scout2 --profile <your profile name> --regions eu-west-1

In the above example I use a named profile from ~/.aws/credentials rather than specifying the values in environment variables. As an aside: I have two profiles defined for each of my AWS accounts, one with permissions to use all the list, read and describe functions but nothing that allows changes (which I used for this experiment), and another with more admin powers. If you’re running Scout2 in AWS you can use an IAM profile with the default Scout2 IAM policy.

Once you’ve run the tool there’s a pleasant little trick where the report is opened in your local web browser, unless you’re running under something like Jenkins, in which case you should specify --no-browser. Behind the dashboard there are per service pages with the configs that require attention, here’s a peek of the IAM services in my experimentation VPC.

Scout2 IAM service dashboard

Although I’ve not tried to extend Scout2 yet the default reports highlighted a couple of configuration details that I’ll have to think about, which shows that it provides some immediate value. It’s been quite an easy tool to set up and run and I highly recommend taking it for a spin.

A Terraform equivalent to CloudFormations AWS::NoValue ?

Sometimes, when using an infrastructure as code tool like Terraform or CloudFormation, you only want to include a property on a resource under certain conditions, while always including the resource itself. In AWS CloudFormation there are a few CloudFormation Conditional Patterns that let you do this, but and this is the central point of this post, what’s the Terraform equivalent of using AWS::NoValue to remove a property?

Here’s an example of doing this in CloudFormation. If InProd is false the Iops property is completely removed from the resource. Not set to undef, no NULLs, simply not included at all.

    "MySQL" : {
      "Type" : "AWS::RDS::DBInstance",
      "DeletionPolicy" : "Snapshot",
      "Properties" : {
        ... snip ...
        "Iops" : {
          "Fn::If" : [ "InProd",
            "1000",
            { "Ref" : "AWS::NoValue" }
          ]
        }
        ... snip ...
      }
    }

While Terraform allows you to use the, um, ‘inventive’, count meta-parameter to control if an entire resource is present or not -

resource "aws_security_group_rule" "example" {
    count = "${var.create_rule}"
    ... snip ...
}

It doesn’t seem to have anything more fine grained.

One example of when I’d want to use this is writing an RDS module. I want nearly all the resource properties to be present every time I use the module, but not all of them. I’d only want replicate_source_db or snapshot_identifier to be present when a certain variable was passed in. Here’s a horrific example of what I mean

resource "aws_db_instance" "default" {
    ... snip ...
    # these properties are always present
    storage_type         = "gp2"
    parameter_group_name = "default.mysql5.6"
    # and then the optional one
    replicate_source_db = "${var.replication_primary | absent_if_null}"
    ... snip ...
}

But with a nice syntax rather than that horrible made up one above. Does anyone know how to do this? Do I need to write either two nearly identical modules, with one param different or slightly better, have two database resources, one with the extra parameter present and use a count to choose one of those? Help me Obi-internet! Is these a better way to do this?

Refreshing a keyboard and mouse – 2017

After having some work done at home I recently found myself in need of both a new keyboard and mouse on very short notice. Also wallpaper paste and electronics, not good friends. I’m very set in my ways when it comes to peripherals and over the years I’ve grown very fond of a Das Keyboard and, as a left handed mouse user, Microsoft IntelliMouse Optical combination.

The keyboard should’ve been an easy replacement, unfortunately Das take a few weeks to be delivered, and these days are inching closer and closer to the 200 GBP price point. The cheap plastic, dead flesh feeling, standby with was starting to annoy me so I went for a browse through Amazon Prime and its next day delivery section and settled on a Cooler Master MasterKeys. You can see the two keyboards together here:

Photo of a Das Keyboard and a Cooler Master Masterkeys

The Cooler Master has a number of fancy features that I’ll probably never investigate but it does have nice Cherry Brown switches. They are comfortable to type on and make about as much noise as my old Das, which I think has Cherry Blue switches. I did start to investigate other options in a little more in depth before I placed the order but when keyboard reviews talk about on board CPU specs I started to zone out a little. It’s also half the price of the Das.

I’ve been using it for a week or so and currently have no complaints. Other than one evening coding with the keyboard back light on full, which was bright enough to work by, and should make on call a little more pleasant for everyone else in the house I’m using it as a solid, dumb keyboard.

Selecting a new mouse was more of an issue. In a nearly unforgivable move Microsoft stopped selling the IntelliMouse Optical quite a few years ago. I’ve always considered it to be the pinnacle of mouse technology (although I also consider all UIs after Windows 2000 to be superfluous so I’m not to be trusted) and so I spent a chunk of time trying to hunt one down. The second hand market has stupidly high markups and the idea of using a second hand mouse was a little unsettling so I had to find an alternative. That could be used comfortably in the left hand.

The first attempt was a logitec M220, which I bought on the recommendation of a left handed friend. Who apparently has tiny, tiny hands. And bad taste in mice. I like a sharp click and the accompanying noise when I click, the M220 key presses are very soft and squidgy with no real click sensation. I found myself second guessing if the click had taken. It was also way too small for me to use comfortably. It felt like I was dragging most of my hand over the desk when I was using it. I very nearly surrendered and bought a Razor Death Adder, the mouse I used to play games with quite a lot a few years ago but the left handed model seems to have a lot less features than the right handed one so I hesitated and asked a few groups of techies for recommendations. A couple of people, who were kind enough to measure their hands for me, suggested a Roccat Kova, which should be fine for either hand and has very good, community supplied drives and config software for Linux.

I’ve put all three mice in one photo here. If you can’t see the Logitech one it’s because Ghost Rider is holding it.

Photo of a Intellimouse and a Roccat Kova

The Roccat is a little smaller, has quite a few more buttons and has been very comfortable to use for the few weeks I’ve had it. I’ve tried to avoid getting too tweaky with it but I’ve remapped a few of the extra buttons to run certain commands and it’s been very solid, on or off a mouse mat. Some left handed mice are very uncomfortable for right handed users but I’ve had no complaints about the Roccat yet. I don’t know if it’ll last as long as the Intellimouse, which has seen nearly a decade of daily use, but it wasn’t too expensive, feels comfortable in use and means I can buy another one for the office.

I know this post might seem like a lot of words over something very trivial but if you’re going to use a few tools for 6-12 hours a day it can pay dividends to find decent ones, even if they cost a little more than the default plastic ones you get free with every PC.

I did consider ordering the newest model Das Keyboard for use in the office but then I noticed the ‘The Cloud Connected Keyboard’ tag line and removed it from my basket. I don’t even use a wireless keyboard, a cloud connected one… Really?

Testing multiple Puppet versions with TravicCI

When it comes to running automated tests of my public Puppet code TravisCI has long been my favourite solution. It’s essentially a zero infrastructure, second pair of eyes, on all my changes. It also doesn’t have any of my local environment oddities and so provides a more realistic view of how my changes will impact users. I’ve had two Puppet testing scenarios pop up recently that were actually the same technical issue once you start exploring them, running tests against the Puppet version I use and support, and others I’m not so worried about.

This use case came up as I have code written for Puppet 3 that I need to start migrating to Puppet 4 (and probably to Puppet 5 soon) and on the other hand I have code on Puppet 4 that I’d like to continue supporting on Puppet 3 until it becomes too much of burden. While I can do the testing locally with overrides, rvm and gemfiles, I wanted the same behaviour on TravisCI.

It’s very easy to get started with TravisCI. Once you’ve signed up (probably with github auth) it only requires two quick steps to get going. The first step is to enable your repo on the TravisCI site.

Enable repo UI

You should then add a .travis.yml file to the repo itself. This contains the what and how of building and testing your code. You can see a very minimal example, that just runs rake spec with a specific ruby version, below:

---
language: ruby
rvm:
  - 2.1.0
script: "bundle exec rake spec"

This provides our basic safety net, but now we want to allow multiple versions of puppet to be specified for testing. First we’ll modify our Gemfile to install a specific version of the puppet gem if an environment variable is passed in via the TravisCI build config. If this is missing we’ll just install the newest and run our tests using that. The lines that implement this, the last five in our sample file, are the important ones to note.

To support testing under multiple versions of Puppet we’ll modify our Gemfile to install a specific version of the puppet gem if an environment variable is passed in, otherwise we’ll just install the newest and run our tests using that. The code that implements this, last five lines in our sample, are the important ones to note.

#!ruby
source 'https://rubygems.org'

group :development, :test do
  gem 'json'
  gem 'puppetlabs_spec_helper', '~> 1.1.1'
  gem 'rake', '~> 11.2.0'
  gem 'rspec', '~> 3.5.0'
  gem 'rubocop', '~> 0.47.1', require: false
end

if puppetversion = ENV['PUPPET_GEM_VERSION']
  gem 'puppet', puppetversion, :require => false
else
  gem 'puppet', :require => false
end

Now we’ve added this capability to the Gemfile we’ll modify our .travis.yml file to take advantage of it. Add an env array, with a version from each of the two major versions we want to test under, with the same variable name as we use in our Gemfile.

---
language: ruby
rvm:
  - 2.1.0
bundler_args: --without development
script: "bundle exec rake spec SPEC_OPTS='--format documentation'"
env:
  - PUPPET_GEM_VERSION="~> 3.8.0"
  - PUPPET_GEM_VERSION="~> 4.10.0"
notifications:
  email: dean.wilson@gmail.com

Now our .travis.yml is getting a little mode complicated you might want to lint it to confirm it’s valid. You can use the online TravisCI linter or install the TravisCI YAML gem and work offline. The example file above will trigger two separate builds when TravisCI receives the trigger from our change. If you want to explicitly test under two versions of Puppet, and fail the tests if anything breaks under either version, you are done. Congratulations!

If however you’d like to test against an older, best effort but unsupported version or start testing a newer version that you’re willing to accept failures from, assuming the main other version still passes, while you migrate you’ll need to add another config option to your .travis.yml file - matrix.

matrix:
  allow_failures:
    - env: PUPPET_GEM_VERSION="~> 3.8.0"

In this case (in combination with the config file above) failures under Puppet 4 fail the build, but we allow, and essentially ignore, failures against Puppet 3 as we no longer explicitly support it. If we were planning a move to Puppet 5 we’d add its version here and even on builds that failed we’d start to collect information on what needs to be investigated and fixed while still ensuring our code passes tests under Puppet 4.

I’d also recommend adding an explicit fast_finish to your matrix config if you allow_failures. This allows TravisCI to signal when required tests are finished, even if the results of those allowed to fail are not yet known, as you don’t need them to know if a run has been successful or not.

matrix:
  fast_finish: true
  allow_failures:
    - env: PUPPET_GEM_VERSION="~> 3.8.0"

Here’s an example of a build with Allowed Failures in the UI:

Build with allowed failures

Little ruby libraries – Testing with Timecop

When it comes to little known rubygems that help with my testing I’m a massive fan of the relatively unknown Timecop. It’s a well written, highly focused, gem that lets you control and manipulate the date and time returned by a number of ruby methods. In specs where testing requires certainty of ‘now’ it’s become my favoured first stop.

The puppet deprecate function is a good example of when I’ve needed this functionality. The spec scenarios should exercise a resource with the time set to before and after the deprecation time in separate tests. The two obvious options are to hard code the dates, which won’t work here as we’re black box testing the function or mocking the calls, something Timecop excels at and saves you writing yourself.

require 'timecop'

# explicitly set the date.
Timecop.freeze(Time.local('2015-01-24'))

...
  # success: we've explicitly set the date above to be before 2015-01-25
  # so this resource hasn't been deprecated
  should run.with_params('2015-01-25', 'Remove Foo at the end of the contract.')
...
  # failure: we're using a date older than that set in the freeze above
  # so we now deprecate the resource
  should run.with_params('2015-01-20', 'Trigger expiry')
...

# reset the time to the real now
Timecop.return

This allows us to pick an absolute point in time and use literal strings in our tests that relate to the point we’ve picked. No more intermediate variables with manually manipulated date objects to ensure we’re 7 days in the future or 30 days in the past. Removing this boilerplate code itself was a win for me. If you need to ensure all your specs run with the same time set you can call the freeze and return in the before and after methods.

before do
  # all tests will have this as their time
  Timecop.freeze(Time.local(1990))
end

after do
  # return to normal time after the tests have run
  Timecop.return
end

I’ve shown the basic, and for me most commonly used functionality above, but there are a few helper methods that elevate Timecop from “I could quickly write that myself” to “this deserves a place in my gemfile. The ability to freeze time in the future with a simple Timecop.freeze(Date.today + 7) is handy, the auto-return time to normal block syntax is pure user experience refinement but the Timecop.scale function, that lets you define how much time passes for every real second, isn’t something you need every day, but when you do you’ll be very glad you don’t have to write it yourself.

Announcing multi_epp – Puppet function

As part of refreshing my old puppet modules I’ve started to convert some of my Puppet templates from the older ERB format to the newer, and hopefully safer, Embedded Puppet (EPP).

While it’s been a simple conversion in most cases, I did quickly find myself lacking the ability to select a template based on a hierarchy of facts, which I’ve previously used multitemplate to address. So I wrote a Puppet 4 version of multitemplate that wraps the native EPP function, adds matching lookup logic and then imaginatively called it multi_epp. You can see an example of it in use here:

class ssh::config {

  file { '/etc/ssh/sshd_config':
    ensure  => present,
    mode    => '0600',
    # note the array of files.
    content => multi_epp( [
                            "ssh/${::fqdn}.epp",
                            "ssh/${::domain}.epp",
                            'ssh/default_sshdconfig.epp',
                          ], {
                                'port'          => 22222,
                                'ListenAddress' => '0.0.0.0',
                          }),
  }

}

This was the first function I’ve written using the new, Puppet 4 function API and in general it feels like an improvement to the previous API. The dispatch blocks and related functions encourage you to keep the individual sections of code quite small and isolated but will require some diligence to ensure you don’t duplicate a lot of nearly similar code between signatures. I also couldn’t quite do what I wanted (a repeating set of params followed by one optional) in the API but I’ve worked around that by requiring all the files to check be given as an array; which works but is a little icky. I’ve not gone full “all the shiny” yet and included things like function return values and types but I can see myself converting some of my other functions over to gain the benefit of easier parameter checking and basic types.

So what’s next on the path to EPP? For me it’ll be to get my no ERB template puppet-lint check running cleanly over a few local modules and to double check I don’t slip back in to old habits.