Managing multiple puppet modules with modulesync

With the exception of children, puppies and medical compliance frameworks managing one of something is normally much easier than managing a lot of them. If you have a lot of puppet modules, and you’ll eventually always have a lot of puppet modules, you’ll get bitten by this and find yourself spending as much time managing supporting functionality as the puppet code itself.

Luckily you’re not the first person to have a horde of puppet modules that share a lot of common scaffolding. The fine people at Vox Pupuli had the same issue and maintain an excellent tool, modulesync that solves this very problem. With modulesync and a little YAML you’ll soon have a consistent, easy to iterate, on set of modules.

To get started with module sync you need three things, well four if you count the puppet module horde you want to manage.

I’ve been using modulesync for some of my projects for a while but we recently adopted it for the GDS Operations Puppet Modules so there’s now a full, but nascent, example we can look at. You can find all the modulesync code in our public repo.

First we set up the basic module sync config in modulesync.yml -

---
git_base: 'git@github.com:'
namespace: gds-operations
branch: modulesync
...
# vim: syntax=yaml

This YAML mostly controls how we interact with our upstream. git_base is the base of the URL to run git operations against. In our case we explicitly specify GitHub (which is also the default) but this is easy to change if you use bitbucket, gitlab or a local server. We treat namespace as the GitHub organisation modules are under. As we never push directly to master we specify a branch our changes should be pushed to for later processing as a pull request.

The second config file, managed_modules.yml, contains a list of all the modules we want to manage:

---
- puppet-aptly
- puppet-auditd
- puppet-goenv

By default modulesync will perform any operations against every module in this file. It’s possible to filter this down to specific modules but there’s only really value in doing that as a simple test. After all keeping the modules in sync is pretty core to the tools purpose.

The last thing to configure is a little more abstract. Any files you want to manage across the modules should be placed in the moduleroot directory and given a .erb extension. At the moment we’re treating all the files in this directory as basic, static, files modulesync does expand them provides a @configs hash, which contains any values you specify in the base config_defaults.yml file. These values can also be overridden with more specific values stored along side the module itself in the remote repository.

Once you’ve created the config files and added at least a basic file to moduleroot, a LICENSE file is often a safe place to start, you can run modulesync to see what will be changed. In this case I’m going to be working with the gds-operations/puppet_modulesync_config repo.

bundle install

# run the module sync against a single module and show potential changes
bundle exec msync update -f puppet-rbenv --noop

This command will filter the managed modules (using the -f flag to select them) clone the remote git repo(s), placing them under modules/, change the branch to either master or the one specified in modulesync.yml and then present a diff of changes from the expanded templates contained in moduleroot against the cloned remote repo. None of the changes are actually made thanks to the --noop flag. If you’re happy with the diff you can add a commit message (with -m message), remove --noop and then run the command again to push the amended branch.

bundle exec msync update -m "Add LICENSE file" -f puppet-rbenv

Once the branch is pushed you can review and create a pull request as usual.

Screen shot of GitHub pull request from modulesync change

We’re at a very early stage of adoption so there is a large swathe of functionality we’re not using so I’ve not mentioned. If you’re actually using the moduleroot templates as actual templates you can have a local override, in each remote module/github repo, that can localise the configuration and be correctly merged with the main configuration. This allows you to push settings out to where they’re needed while still keeping most modules baselined. You can also customise the syncing workflow to specify bumping the minor version, updating the CHANGELOG and a number of other helpful shortcuts provided by modulesync.

Once you get above half-a-dozen modules it’s a good time to take a step back and think about how you’re going to manage dependencies, versions, spec_helpers and such in an ongoing, iterative way and modulesync presents one very helpful possible solution.

The Choria Emulator

In my previous posts I discussed what goes into load testing a Choria network, what connections are made, subscriptions are made etc.

From this it’s obvious the things we should be able to emulate are:

  • Connections to NATS
  • Subscriptions – which implies number of agents and sub collectives
  • Message payload sizes

To make it realistically affordable to emulate many more machines that I have I made an emulator that can start numbers of Choria daemons on a single node.

I’ve been slowly rewriting MCollective daemon side in Go which means I already had all the networking and connectors available there, so a daemon was written:

usage: choria-emulator --instances=INSTANCES [<flags>]
 
Emulator for Choria Networks
 
Flags:
      --help                 Show context-sensitive help (also try --help-long and --help-man).
      --version              Show application version.
      --name=""              Instance name prefix
  -i, --instances=INSTANCES  Number of instances to start
  -a, --agents=1             Number of emulated agents to start
      --collectives=1        Number of emulated subcollectives to create
  -c, --config=CONFIG        Choria configuration file
      --tls                  Enable TLS on the NATS connections
      --verify               Enable TLS certificate verifications on the NATS connections
      --server=SERVER ...    NATS Server pool, specify multiple times (eg one:4222)
  -p, --http-port=8080       Port to listen for /debug/vars

You can see here it takes a number of instances, agents and collectives. The instances will all respond with ${name}-${instance} on any mco ping or RPC commands. It can be discovered using the normal mc discovery – though only supports agent and identity filters.

Every instance will be a Choria daemon with the exact same network connection and NATS subscriptions as real ones. Thus 50 000 emulated Choria will put the exact same load of work on your NATS brokers as would normal ones, performance wise even with high concurrency the emulator performs quite well – it’s many orders of magnitude faster than the ruby Choria client anyway so it’s real enough.

The agents they start are all copies of this one:

emulated0
=========
 
Choria Agent emulated by choria-emulator
 
      Author: R.I.Pienaar <rip@devco.net>
     Version: 0.0.1
     License: Apache-2.0
     Timeout: 120
   Home Page: http://choria.io
 
   Requires MCollective 2.9.0 or newer
 
ACTIONS:
========
   generate
 
   generate action:
   ----------------
       Generates random data of a given size
 
       INPUT:
           size:
              Description: Amount of text to generate
                   Prompt: Size
                     Type: integer
                 Optional: true
            Default Value: 20
 
 
       OUTPUT:
           message:
              Description: Generated Message
               Display As: Message

You can this has a basic data generator action – you give it a desired size and it makes you a message that size. It will run as many of these as you wish all called like emulated0 etc.

It has an mcollective agent that go with it, the idea is you create a pool of machines all with your normal mcollective on it and this agent. Using that agent then you build up a different new mcollective network comprising the emulators, federation and NATS.

Here’s some example of commands – you’ll see these later again when we talk about scenarios:

We download the dependencies onto all our nodes:

$ mco playbook run setup-prereqs.yaml --emulator_url=https://example.net/rip/choria-emulator-0.0.1 --gnatsd_url=https://example.net/rip/gnatsd --choria_url=https://example.net/rip/choria

We start NATS on our first node:

$ mco playbook run start-nats.yaml --monitor 8300 --port 4300 -I test1.example.net

We start the emulator with 1500 instances per node all pointing to our above NATS:

$ mco playbook run start-emulator.yaml --agents 10 --collectives 10 --instances 750 --monitor 8080 --servers 192.168.1.1:4300

You’ll then setup a client config for the built network and can interact with it using normal mco stuff and the test suite I’ll show later. Simularly there are playbooks to stop all the various parts etc. The playbooks just interact with the mcollective agent so you could use mco rpc directly too.

I found I can easily run 700 to 1000 instances on basic VMs – needs like 1.5GB RAM – so it’s fairly light. Using 400 nodes I managed to build a 300 000 node Choria network and could easily interact with it etc.

Finally I made a ec2 environment where you can stand up a Puppet Master, Choria, the emulator and everything you need and do load tests on your own dime. I was able to do many runs with 50 000 emulated nodes on EC2 and the whole lot cost me less than $20.

The code for this emulator is very much a work in progress as is the Go code for the Choria protocol and networking but the emulator is here if you want to take a peek.

Job applications and GitHub profile oddities

I sift through a surprising amount, to me at least, of curricula vitae / resumes each month and one pattern I’ve started to notice is the ‘fork only’ GitHub profile.

There’s been a lot written over the last few years about using your GitHub profile as an integral part of your job application. Some in favour, some very much not. While each side has valid points when recruiting I like to have all the information I can to hand, so if you include a link to your profile I will probably have a rummage around. When it comes to what I’m looking for there are a lot of different things to consider. Which languages do you use? Is the usage idiomatic? Do you have docs or tests? How do you respond to people in issues and pull requests? Which projects do you have an interest in? Have you solved any of the same problems we have?

Recently however I’ve started seeing a small but growing percentage of people that have an essentially fork only profile. Often of the bigger, trendier projects, Docker, Kubernetes, Terraform for example, and there will be no contributed code. In the most blatant case there were a few amended CONTRIBUTORS files with the applicants name and email but no actual changes to the code base.

Although you shouldn’t place undue weight on an applicants GitHub profile in most cases, and in the Government we deliberately don’t consider it in any phase past the initial CV screen, it can be quite illuminating. In the past it provided an insight towards peoples attitude, aptitudes and areas of interest and now as a warning sign that someone may be more of a system gamer than a system administrator.

What to consider when speccing a Choria network

In my previous post I talked about the need to load test Choria given that I now aim for much larger workloads. This post goes into a few of the things you need to consider when sizing the optimal network size.

Given that we now have the flexibility to build 50 000 node networks quite easily with Choria the question is should we, and if yes then what is the right size. As we can now federate multiple Collectives together into one where each member Collective is a standalone network we have the opportunity to optimise for the operability of the network rather than be forced to just build it as big as we can.

What do I mean when I say the operability of the network? Quite a lot of things:

  • What is your target response time on a unbatched mco rpc rpcutil ping command
  • What is your target discovery time? You should use a discovery data source but broadcast is useful, so how long do you want?
  • If you are using a discovery source, how long do you want to wait for publishes to happen?
  • How many agents will you run? Each agent makes multiple subscriptions on the middleware and consume resources there
  • How many sub collectives do you want? Each sub collective multiply the amount of subscriptions
  • How many federated networks will you run?
  • When you restart the entire NATS, how long do you want to wait for the whole network to reconnect?
  • How many NATS do you need? 1 can run 50 000 nodes, but you might want a cluster for HA. Clustering introduces overhead in the middleware
  • If you are federating a global distributed network, what impact does the latency cross the federation have and what is acceptable

So you can see that to a large extend the answer here is related to your needs and not only to the needs of benchmarking Choria. I am working on a set of tools to allow anyone to run tests locally or on a EC2 network. The main work hose is a Choria emulator that runs a 1 000 or more Choria instances on a single node so you can use a 50 node EC2 network to simulate a 50 000 node one.

Middleware Scaling Concerns


Generally for middleware brokers there are a few things that impact their scalability:

  • Number of TCP Connections – generally a thread/process is made for each
  • TLS or Plain text – huge overhead in TLS typically and it can put a lot of strain on single systems
  • Number of message targets – queues, topics, etc. Different types of target have different overheads. Often a thread/process for each.
  • Number of subscribers to each target
  • Cluster overhead
  • Persistence overheads like storage and ACKs etc

You can see it’s quite a large number of variables that goes into this, anywhere that requires a thread or process to manage 1 of it means you should get worried or at least be in a position to measure it.

NATS uses 1 go routine for each connection and no additional ones per subscription etc, its quite light weight but there are no hard and fast rules. Best to observe how it grows by needs, something I’ll include in my test suite.

How Choria uses NATS


It helps then to understand how Choria will use NATS and what connections and targets it makes.

A single Choria node will:

  • Maintain a single TCP+TLS connection to NATS
  • Subscribe to 1 queue unique to the node for every Subcollective it belongs to
  • For every agent – puppet, package, service, etc – subscribe to a broadcast topic for that agent. Once in every Subcollective. Choria comes default with 7 agents.

So if you have a node with 10 agents in 5 Subcollectives:

  • 50 broadcast subjects for agents
  • 5 queue subjects
  • 1 TCP+TLS connection

So 100 nodes will have 5 500 subscriptions, 550 NATS subjects and 100 TCP+TLS connections.

Ruby based Federation brokers will maintain 1 subscription to a queue subject on the Federation and same on the Collective. The upcoming Go based Federation Brokers will maintain 10 (configurable) connections to NATS on each side, each with these subscriptions.

Conclusion


This will give us a good input into designing a suite of tools to measure various things during the run time of a big test, check back later for details about such a tool.

Load testing Choria

Overview


Many of you probably know I am working on a project called Choria that modernize MCollective which will eventually supersede MCollective (more on this later).

Given that Choria is heading down a path of being a rewrite in Go I am also taking the opportunity to look into much larger scale problems to meet some client needs.

In this and the following posts I’ll write about work I am doing to load test and validate Choria to 100s of thousands of nodes and what tooling I created to do that.

Middleware


Choria builds around the NATS middleware which is a Go based middleware server that forgoes a lot of the persistence and other expensive features – instead it focusses on being a fire and forget middleware network. It has an additional project should you need those features so you can mix and match quite easily.

Turns out that’s exactly what typical MCollective needs as it never really used the persistence features and those just made the associated middleware quite heavy.

To give you an idea, in the old days the community would suggest every ~ 1000 nodes managed by MCollective required a single ActiveMQ instance. Want 5 500 MCollective nodes? That’ll be 6 machines – physical recommended – and 24 to 30 GB RAM in a cluster just to run the middleware. We’ve had reports of much larger RabbitMQ networks on 4 or 5 servers – 50 000 managed nodes or more, but those would be big machines and they had quite a lot of performance issues.

There was a time where 5 500 nodes was A LOT but now it’s becoming a bit every day, so I need to focus upward.

With NATS+Choria I am happily running 5 500 nodes on a single 2 CPU VM with 4GB RAM. In fact on a slightly bigger VM I am happily running 50 000 nodes on a single VM and NATS uses around 1GB to 1.5GB of RAM at peak.

Doing 100s of RPC requests in a row against 50 000 nodes the response time is pretty solid around 16 seconds for a RPC call to every node, it’s stable, never drops a message and the performance stays level in the absence of Java GC issues. This is fast but also quite slow – the Ruby client manages about 300 replies every 0.10 seconds due to the amount of protocol decoding etc that is needed.

This brings with it a whole new level of problem. Just how far can we take the client code and how do you determine when it’s too big and how do I know the client, broker and federation I am working on significantly improve things.

I’ve also significantly reworked the network protocol to support Federation but the shipped code optimize for code and config simplicity over lets say support for 20 000 Federation Collectives. When we are talking about truly gigantic Choria networks I need to be able to test scenarios involving 10s of thousands of Federated Network all with 10s of thousands of nodes in them. So I need tooling that lets me do this.

Getting to running 50 000 nodes


Not everyone just happen to have a 50 000 node network lying about they can play with so I had to improvise a bit.

As part of the rewrite I am doing I am building a Go framework with the Choria protocol, config parsing and network handling all built in Go. Unlike the Ruby code I can instantiate multiple of these in memory and run them in Go routines.

This means I could write a emulator that can start a number of faked Choria daemons all in one process. They each have their own middleware connection, run a varying amount of agents with a varying amount of sub collectives and generally behave like a normal MCollective machine. On my MacBook I can run 1 500 Choria instances quite easily.

So with fewer than 60 machines I can emulate 50 000 MCollective nodes on a 3 node NATS cluster and have plenty of spare capacity. This is well within budget to run on AWS and not uncommon these days to have that many dev machines around.

In the following posts I’ll cover bits about the emulator, what I look for when determining optimal network sizes and how to use the emulator to test and validate performance of different network topologies.

Monitoring SSL Certificate Expiry in GCP and Kubernetes

SSL cert monitoring diagram

Problem

At my current job, we use Google Cloud Platform. Each team has a set of GCP Projects; each project can have multiple clusters. The majority of services that our teams write expose some kind of HTTP API or web interface - so what does this mean? All HTTP endpoints we expose are encrypted with SSL[1], so we have a lot of SSL certificates in a lot of different places.

Each of our GCP projects is built using our CI/CD tooling. All GCP resources and all of our Kubernetes application manifests are defined in git. We have a standard set of stacks that we deploy to each cluster using our templating. One of the stacks is Prometheus, Influxdb, and Grafana. In this article, I’ll explain how we leverage (part of) this stack to automatically monitor SSL certificates in use by our load balancers across all of our GCP projects.

Certificate Renewal

To enable teams to expose services with minimal effort, we rely on deploying a Kubernetes LetsEncrypt controller to each of our clusters. The LetsEncrypt controller automatically provisions certificates for Kubernetes resources that require them, as indicated by annotations on the resources, e.g:

apiVersion: v1
kind: Service
metadata:
  name: app0
  labels:
    app: app0
  annotations:
    acme/certificate: app0.prod.gcp0.example.com
    acme/secretName: app0-certificate
spec:
  type: ClusterIP
  ports:
    - port: 3000
      targetPort: 3000
  selector:
    app: app0

This certificate can now be consumed by an NGiNX ingress controller, like so:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: app0
  annotations:
    kubernetes.io/ingress.class: "nginx"
spec:
  tls:
    - secretName: app0-certificate
      hosts:
        - app0.prod.gcp0.example.com

  rules:
    - host: app0.prod.gcp0.example.com
      http:
        paths:
          - path: /
            backend:
              serviceName: app0
              servicePort: 3000

Switching the ingress.class annotation to have the value of gce will mean Google Compute Engine will handle this configuration. A copy of the secret (the SSL certificate) will be made in GCP as a Compute SSL Certificate resource, which the GCP load balancer can then use to serve HTTPS.

Of course, this isn’t the only method for deploying SSL certificates for services in GCP and/or Kubernetes. In our case, we also have many legacy certificates that are manually renewed by humans, stored encrypted in our repositories, and deployed as secrets to Kubernetes or SSL Certificate resources to Google Compute Engine.

The GCE ingress controller makes a copy of the secret as a Compute SSL Certificate. This means that certificates used in the default Kubernetes load balancers are stored in two separate locations: the Kubernetes cluster, as a secret, and in GCE, as a Certificate resource.

Regardless of how the certificates end up in either GCE or Kubernetes, we can monitor them with Prometheus.

Whether manually renewed or managed by LetsEncrypt, our certificates end up in up-to two places:

  • The Kubernetes Secret store
  • As a GCP compute SSL Certificate

Note that the NGiNX ingress controller works by mounting the Kubernetes Secret into the controller as a file.

The following commands will show certificates for each respective location:

  • Kubernetes Secrets (kubectl get secret)
  • GCP compute ssl-certificates (gcloud compute ssl-certificates)

Exposing Certificate Expiry

In order to ensure that our certificates are being renewed properly, we want to check the certificates that are being served up by the load balancers. To check the certificates we need to do the following:

  1. Fetch a list of FQDNs to check from the appropriate API (GCP or GKE/Kubernetes)
  2. Connect to each FQDN and retrieve the certificate
  3. Check the Valid To field for the certificate to ensure it isn’t in the past

To do the first two parts of this process we’ll use a couple of programs that I’ve written that scrape the GCP and K8S APIs and expose the expiry times for every certificate in each:

Kubernetes manifest for prometheus-gke-letsencrypt-certs:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: prometheus-gke-letsencrypt-certs
  namespace: system-monitoring
  labels:
    k8s-app: prometheus-gke-letsencrypt-certs
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: prometheus-gke-letsencrypt-certs
  template:
    metadata:
      labels:
        k8s-app: prometheus-gke-letsencrypt-certs
      annotations:
        prometheus_io_port: '9292'
        prometheus_io_scrape_metricz: 'true'
    spec:
      containers:
      - name: prometheus-gke-letsencrypt-certs
        image: roobert/prometheus-gke-letsencrypt-certs:v0.0.4
        ports:
          - containerPort: 9292

Kubernetes manifest for prometheus-gcp-ssl-certs:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: prometheus-gcp-ssl-certs
  namespace: system-monitoring
  labels:
    k8s-app: prometheus-gcp-ssl-certs
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: prometheus-gcp-ssl-certs
  template:
    metadata:
      labels:
        k8s-app: prometheus-gcp-ssl-certs
      annotations:
        prometheus_io_port: '9292'
        prometheus_io_scrape_metricz: 'true'
    spec:
      containers:
      - name: prometheus-gcp-ssl-certs
        image: roobert/prometheus-gcp-ssl-certs:v0.0.4
        ports:
          - containerPort: 9292

These exporters each connect to a different API and then expose a list of CNs with their Valid To value in seconds. Using these values we can calculate how long left until the certificate expires (time() - $valid_to).

Once these exporters have been deployed, and if, like ours, Prometheus has been configured to look for the prometheus_io_* annotations, then Prometheus should start scraping these exporters and the metrics should be visible in the Prometheus UI. Search for gke_letsencrypt_cert_expiration or gcp_ssl_cert_expiration, here’s one example:

Prometheus Query - SSL

Visibility

Now that certificate metrics are being updated, the first useful thing we can do is make them visible.

Each of our projects has a Grafana instance automatically deployed to it and preloaded with some useful dashboards, one of which queries Prometheus for data about the SSL certs. When a certificate has less than seven days until it runs out, it turns orange; when it’s expired it will turn red.

Grafana SSL cert expiry dashboard

The JSON for the above dashboard can be found in this gist: gist:roobert/e114b4420f2be3988d61876f47cc35ae

Alerting

Next, let’s setup some Alert Manager alerts so we can surface issues rather than having to check for them ourselves:

ALERT GKELetsEncryptCertExpiry
  IF gke_letsencrypt_cert_expiry - time() < 86400 AND gke_letsencrypt_cert_expiry - time() > 0
  LABELS {
    severity="warning"
  }
  ANNOTATIONS {
    SUMMARY = ": SSL cert expiry",
    DESCRIPTION = ": GKE LetsEncrypt cert expires in less than 1 day"
  }

ALERT GKELetsEncryptCertExpired
  IF gke_letsencrypt_cert_expiry - time() =< 0
  LABELS {
    severity="critical"
  }
  ANNOTATIONS {
    SUMMARY = ": SSL cert expired",
    DESCRIPTION = ": GKE LetsEncrypt cert has expired"
  }

ALERT GCPSSLCertExpiry
  IF gcp_ssl_cert_expiry - time() < 86400 AND gcp_ssl_cert_expiry - time() > 0
  LABELS {
    severity="warning"
  }
  ANNOTATIONS {
    SUMMARY = ": SSL cert expiry",
    DESCRIPTION = ": GCP SSL cert expires in less than 1 day"
  }

ALERT GCPSSLCertExpired
  IF gcp_ssl_cert_expiry - time() =< 0
  LABELS {
    severity="critical"
  }
  ANNOTATIONS {
    SUMMARY = ": SSL cert expired",
    DESCRIPTION = ": GCP SSL cert has expired"
  }

Caution: Due to the nature of LetsEncrypt certificate renewals only happening on the last day that they are valid, the window of opportunity for receiving an alert is extremely slim.

Conclusion

In this article, I’ve outlined our basic SSL monitoring strategy and included the code for two Prometheus exporters which can expose the metrics necessary to configure your own graphs and alerts. I hope this has been helpful.




[1] Technically TLS but commonly referred to as SSL

Decomissioning my Drupal blog

If you are looking at this blog post right now... my live Drupal site has finally been decommissioned.. or not .. well these pages are served statically but the content is still generated by an ancient aging Drupal 6, which is hiding somewhere in a container that I only start when I need it.

Given my current low blog volume .. and the lack of time to actually migrate all the content to something like Jekyll or Webby I took the middle road and pulled the internet facing Drupal offline. My main concern was that I want to keep a number of articles that people frequently point to in the exact same location as before. So that was my main requirement, but with no more public facing drupal I have no more worrying about the fact that it really needed updating, no more worrying about potential issues on wednesday evenings etc

My first couple of experiments were with wget / curl but I bumped into. Sending a Drupal site into retirement which pointed me to httrack which was a new tool for me ..

As documented there
httrack http://krisbuytaert.be/blog -O . -N "%h%p/%n/index%[page].%t" -WqQ%v --robots=0
creates a usuable tree but root page ends up in blog/blog which is not really handy.
So the quick hack for that is to go into the blog/blog subdir and regexp the hell out of all those files generated there direction one level below :)
for file in `ls`; do cat $file | sed -e "s/..///blog//g" > ../$file ; done

httrack however has one annoying default in which it puts metatdata in the footer of a page it mirrors, where it comes from and when it was generated thats very usefull for some use cases, but not for mine as it means that every time I regenerate the site it actually generates slightly different content rather than identical pages. Luckily I found the -%F "" param to keep that footerstring empty

And that is what you are looking at right now ...

There are still a bunch of articles I have in draft .. so maybe now that I don't have to worry about the Drupal part of things I might blog more frequent again, or not..

Decomissioning my Drupal blog

If you are looking at this blog post right now... my live Drupal site has finally been decommissioned.. or not .. well these pages are served statically but the content is still generated by an ancient aging Drupal 6, which is hiding somewhere in a container that I only start when I need it.

Given my current low blog volume .. and the lack of time to actually migrate all the content to something like Jekyll or Webby I took the middle road and pulled the internet facing Drupal offline. My main concern was that I want to keep a number of articles that people frequently point to in the exact same location as before. So that was my main requirement, but with no more public facing drupal I have no more worrying about the fact that it really needed updating, no more worrying about potential issues on wednesday evenings etc

My first couple of experiments were with wget / curl but I bumped into. Sending a Drupal site into retirement which pointed me to httrack which was a new tool for me ..

As documented there
httrack http://127.0.0.1:8080/blog -O . -N "%h%p/%n/index%[page].%t" -WqQ%v --robots=0
creates a usuable tree but root page ends up in blog/blog which is not really handy.
So the quick hack for that is to go into the blog/blog subdir and regexp the hell out of all those files generated there direction one level below :)
for file in `ls`; do cat $file | sed -e "s/..///blog//g" > ../$file ; done

httrack however has one annoying default in which it puts metatdata in the footer of a page it mirrors, where it comes from and when it was generated thats very usefull for some use cases, but not for mine as it means that every time I regenerate the site it actually generates slightly different content rather than identical pages. Luckily I found the -%F "" param to keep that footerstring empty

And that is what you are looking at right now ...

There are still a bunch of articles I have in draft .. so maybe now that I don't have to worry about the Drupal part of things I might blog more frequent again, or not..

Removing an orphaned resource from terraform state

If you manually delete a resource that is being managed by terraform, it it not removed from the state file and becomes "orphaned".

You many see errors like this when running terraform:

1 error(s) occurred:
* aws_iam_role.s3_readonly (destroy): 1 error(s) occurred:
* aws_iam_role.s3_readonly (deposed #0): 1 error(s) occurred:
* aws_iam_role.s3_readonly (deposed #0): Error listing Profiles for IAM Role (s3_readonly) when trying to delete: NoSuchEntity: The role with name s3_readonly cannot be found.

This prevents terraform from running, even if you don't care about the missing resource such as when you're trying to delete everything, ie. running terraform destroy.

Fortunately, terraform has a command for exactly this situation, to remove a resource from the state file: terraform state rm <name of resource>

In the example above, the command would be terraform state rm aws_iam_role.s3_readonly

Kubernetes Manifest Templating with ERB and Hiera

Problem

At my current job each team has a dev(n)-stage(n)-production(n) type deployment workflow. Application deployments are kept in git repositories and deployed by our continuous delivery tooling.

It is unusual for there to be major differences between applications deployed to each of these different contexts. Usually it is just a matter of tuning resource limits or when testing, deploying a different version of the deployment.

The project matrix looks like this:

project matrix



GCP projects must have globally unique names, so ours are prefixed with “bw-

The directory structure is composed of Names, Deployments, and Components:

  • Name is the GCP Project name
  • A Deployment is a logical collection of software
  • A Component is a logical collection of Kubernetes manifests

For example, a monitoring deployment composed of influxdb, grafana, and prometheus might look like:

monitoring/prometheus/<manifests>
monitoring/influxdb/<manifests>
monitoring/grafana/<manifests>

The monitoring stack can be deployed to each context by simply copying the monitoring deployment to the relevant location in our directory tree:

bw-dev-teamA0/monitoring/
bw-stage-teamA0/monitoring/
bw-prod-teamA0/monitoring/
bw-dev-teamB0/monitoring/
bw-stage-teamB0/monitoring/
bw-prod-teamB0/monitoring/

In order to apply resource limits for the stage and prod environments where teamB processes more events than teamA:

bw-dev-teamA0/monitoring/prometheus/    #
bw-dev-teamA0/monitoring/influxdb/      # unchanged
bw-dev-teamA0/monitoring/grafana/       #

bw-stage-teamA0/monitoring/prometheus/  # cpu: 1, mem: 256Mi
bw-stage-teamA0/monitoring/influxdb/    # cpu: 1, mem: 256Mi
bw-stage-teamA0/monitoring/grafana/     # cpu: 1, mem: 256Mi
bw-prod-teamA0/monitoring/prometheus/   # cpu: 1, mem: 256Mi
bw-prod-teamA0/monitoring/influxdb/     # cpu: 1, mem: 256Mi
bw-prod-teamA0/monitoring/grafana/      # cpu: 1, mem: 256Mi

bw-dev-teamB0/monitoring/prometheus/    #
bw-dev-teamB0/monitoring/influxdb/      # unchanged
bw-dev-teamB0/monitoring/grafana/       #

bw-stage-teamB0/monitoring/prometheus/  # cpu: 1, mem: 256Mi
bw-stage-teamB0/monitoring/influxdb/    # cpu: 1, mem: 256Mi
bw-stage-teamB0/monitoring/grafana/     # cpu: 1, mem: 256Mi

bw-prod-teamB0/monitoring/prometheus/   # cpu: 2, mem: 512Mi
bw-prod-teamB0/monitoring/influxdb/     # cpu: 2, mem: 512Mi
bw-prod-teamB0/monitoring/grafana/      # cpu: 2, mem: 512Mi

To also test a newer version of influxdb in teamA’s dev environment:

bw-dev-teamA0/monitoring/prometheus/    #
bw-dev-teamA0/monitoring/influxdb/      # version: 1.4
bw-dev-teamA0/monitoring/grafana/       #

bw-stage-teamA0/monitoring/prometheus/  # cpu: 1, mem: 256Mi
bw-stage-teamA0/monitoring/influxdb/    # cpu: 1, mem: 256Mi
bw-stage-teamA0/monitoring/grafana/     # cpu: 1, mem: 256Mi
bw-prod-teamA0/monitoring/prometheus/   # cpu: 1, mem: 256Mi
bw-prod-teamA0/monitoring/influxdb/     # cpu: 1, mem: 256Mi
bw-prod-teamA0/monitoring/grafana/      # cpu: 1, mem: 256Mi

bw-dev-teamB0/monitoring/prometheus/    #
bw-dev-teamB0/monitoring/influxdb/      # unchanged
bw-dev-teamB0/monitoring/grafana/       #

bw-stage-teamB0/monitoring/prometheus/  # cpu: 1, mem: 256Mi
bw-stage-teamB0/monitoring/influxdb/    # cpu: 1, mem: 256Mi
bw-stage-teamB0/monitoring/grafana/     # cpu: 1, mem: 256Mi

bw-prod-teamB0/monitoring/prometheus/   # cpu: 2, mem: 512Mi
bw-prod-teamB0/monitoring/influxdb/     # cpu: 2, mem: 512Mi
bw-prod-teamB0/monitoring/grafana/      # cpu: 2, mem: 512Mi

The point of this example is to show how quickly maintenance can become a problem when dealing with many deployments across multiple teams/environments.

For instance, this example shows that five unique sets of manifests would need to be maintained for this single deployment.

Solution

Requirements

  • Deploy different versions of a deployment to different contexts (versioning)
  • Tune deployments using logic and variables based on deployment context (templating)

Versioning

Let’s say we want to have the following:

bw-dev-teamA0/monitoring/prometheus/    #
bw-dev-teamA0/monitoring/influxdb/      # version: 1.4
bw-dev-teamA0/monitoring/grafana/       #

bw-stage-teamA0/monitoring/prometheus/  #
bw-stage-teamA0/monitoring/influxdb/    # version: 1.3
bw-stage-teamA0/monitoring/grafana/     #

bw-prod-teamA0/monitoring/prometheus/   #
bw-prod-teamA0/monitoring/influxdb/     # version: 1.3
bw-prod-teamA0/monitoring/grafana/      #

bw-dev-teamB0/monitoring/prometheus/    #
bw-dev-teamB0/monitoring/influxdb/      # version: 1.3
bw-dev-teamB0/monitoring/grafana/       #

bw-stage-teamB0/monitoring/prometheus/  #
bw-stage-teamB0/monitoring/influxdb/    # version: 1.3
bw-stage-teamB0/monitoring/grafana/     #

bw-prod-teamB0/monitoring/prometheus/   #
bw-prod-teamB0/monitoring/influxdb/     # version: 1.3
bw-prod-teamB0/monitoring/grafana/      #

This can be achieved by creating directories for each version of the deployment:

/manifests/monitoring/0.1.0/           # contains influxdb version 1.3
/manifests/monitoring/0.2.0/           # contains influxdb version 1.4
/manifests/monitoring/latest -> 0.2.0  # symlink to latest version (used by dev environments)

And then by quite simply symlinking the deployment to the version to deploy:

bw-dev-teamA0/monitoring/   -> /manifests/monitoring/latest  # deployment version 0.2.0
bw-stage-teamA0/monitoring/ -> /manifests/monitoring/0.1.0
bw-prod-teamA0/monitoring/  -> /manifests/monitoring/0.1.0

bw-dev-teamB0/monitoring/   -> /manifests/monitoring/0.1.0
bw-stage-teamB0/monitoring/ -> /manifests/monitoring/0.1.0
bw-prod-teamB0/monitoring/  -> /manifests/monitoring/0.1.0

Although this solves the versioning problem, this doesn’t help with customizing the deployments, which is where templating comes in.

ERB and Hiera

erb-hiera

Understanding ERB and Hiera is beyond the scope of this article but this diagram should give some clue as to how they work.

Templating

erb-hiera is a generic templating tool, here’s an example of what a config to deploy various versions of a deployment to different contexts looks like:

- scope:
    environment: dev
    project: bw-dev-teamA0
  dir:
    input: /manifests/monitoring/latest/manifest
    output: /output/bw-dev-teamA0/cluster0/monitoring/

- scope:
    environment: stage
    project: bw-stage-teamA0
  dir:
    input: /manifests/monitoring/0.1.0/manifest
    output: /output/bw-stage-teamA0/cluster0/monitoring/

- scope:
    environment: prod
    project: bw-prod-teamA0
  dir:
    input: /manifests/monitoring/0.1.0/manifest
    output: /output/bw-prod-teamA0/cluster0/monitoring/

- scope:
    environment: dev
    project: bw-dev-teamB0
  dir:
    input: /manifests/monitoring/0.1.0/manifest
    output: /output/bw-dev-teamB0/cluster0/monitoring/

- scope:
    environment: stage
    project: bw-stage-teamB0
  dir:
    input: /manifests/monitoring/0.1.0/manifest
    output: /output/bw-stage-teamB0/cluster0/monitoring/

- scope:
    environment: prod
    project: bw-prod-teamB0
  dir:
    input: /manifests/monitoring/0.1.0/manifest
    output: /output/bw-prod-teamB0/cluster0/monitoring/

Note that instead of having a complex and difficult to manage directory structure of symlinks the input directory is defined in each block - in this example the input directories are versioned, as discussed in the Versioning section

Example hiera config:

:backends:
  - yaml
:yaml:
  :datadir: "hiera"
:hierarchy:
  - "project/%{project}/deployment/%{deployment}"
  - "deployment/%{deployment}/environment/%{environment}"
  - "common"

Now it is possible to configure some default resource limits for each environment. It is assumed stage and prod require roughly the same amount of resources by default:

deployment/monitoring/environment/stage.yaml:

limits::cpu: 1
limits::mem: 256Mi

deployment/monitoring/environment/prod.yaml:

limits::cpu: 1
limits::mem: 256Mi

Then override team B’s production environment to increase the resource limits, since it needs more resources than the other environments: project/%{project}/deployment/monitoring.yaml:

limits::cpu: 2
limits::mem: 512Mi

One more change is required in order for this configuration to work. It is necessary to wrap the limits config in a condition so that no limits are applied to the dev environment:

<%- if hiera("environment") =~ /stage|production/ -%>
apiVersion: v1
kind: LimitRange
metadata:
  name: limits
spec:
  limits:
  - default:
      cpu: <%= hiera("limits::cpu") %>
      memory: <%= hiera("limits::mem") %>
...
<% else %>
# no limits set for this environment
<% end %>

The result is that with a simple erb-hiera config, hiera config, hiera lookup tree, and versioned manifests, the desired configuration is reached. There is less code duplication, and more flexibility in manifest creation.

Why Not Helm?

Helm can be used in various different ways, it can do as much or as little as required. It can act in a similar way to erb-hiera by being used simply to generate manifests from templates, or act as a fully fledged release manager where it deploys a pod into a kubernetes cluster which can track release state for the deployed helm charts.

So why erb-hiera? Because it is simple, and our teams are used to the combination of ERB templating language and Hiera due to their familiarity with Puppet. We can use the same tool across multiple code bases which manage our infrastructure and applications.

If you like Hiera but prefer Go templates, perhaps developing a Hiera plugin for Helm would be a good option?

erb-hiera can be used to manage all Kubernetes manifests but it is also entirely possible to use helm in parallel. At the moment we have a combination of native kubernetes manifests, helm charts, and template generated documents from erb-hiera.

Conclusion

erb-hiera is a simple tool which does just one thing: document generation from templates. This article has shown one possible use case where using a templating tool can be combined with versioning to provide powerful and flexible Kubernetes manifest management.

References