Category Archives: devops

A Puppet 4 Hiera Based Classifier

When I first wrote Hiera I included a simple little hack called hiera_include() that would do a Array lookup and include everything it found. I only included it even because include at the time did not take Array arguments. In time this has become quite widely used and many people do their node classification using just this and the built in hierarchical nature of Hiera.

I’ve always wanted to do better though, like maybe write an actual ENC that uses Hiera data keys on the provided certname? Seemed like the only real win would be to be able to set the node environment from Hiera, I guess this might be valuable enough on it’s own.

Anyway, I think the ENC interface is really pretty bad and should be replaced by something better. So I’ve had the idea of a Hiera based classifier in my mind for years.

Some time ago Ben Ford made a interesting little hack project that used a set of rules to classify nodes and this stuck to my mind as being quite a interesting approach. I guess it’s a bit like the new PE node classifier.

Anyway, so I took this as a starting point and started working on a Hiera based classifier for Puppet 4 – and by that I mean the very very latest Puppet 4, it uses a bunch of the things I blogged about recently and the end result is that the module is almost entirely built using the native Puppet 4 DSL.

Simple list-of-classes based Classification


So first lets take a look at how this replaces/improves on the old hiera_include().

Not really much to be done I am afraid, it’s an array with some entries in it. It now uses the Knockout Prefix features of Puppet Lookup that I blogged about before to allow you to exclude classes from nodes:

So we want to include the sysadmins and sensu classes on all nodes, stick this in your common tier:

# common.yaml
classifier::extra_classes:
 - sysadmins
 - sensu

Then you have some nodes that need some more classes:

# clients/acme.yaml
classifier::extra_classes:
 - acme_sysadmins

At this point it’s basically same old same old, but lets see if we had some node that needed Nagios and not Sensu:

# nodes/example.net.yaml
classifier::extra_classes:
 - --sensu
 - nagios

Here we use the knockout prefix of to remove the sensu class and add the nagios one instead. That’s already a big win from old hiera_include() but to be fair this is just as a result of the new Lookup features.

It really gets interesting later when you throw in some rules.

Rule Based Classification


The classifier is built around a set of Classifications and these are made up of one or many rules per Classification which if they match on a host means a classification applies to the node. And the classifications can include classes and create data.

Here’s a sample rule where I want to do some extra special handling of RedHat like machines. But I want to handle VMs different from Physical machines.

# common.yaml
classifier::rules:
  RedHat VMs:
    match: all
    rules:
      - fact: "%{facts.os.family}"
        operator: ==
        value: RedHat
      - fact: "%{facts.is_virtual}"
        operator: ==
        value: "true"
    data:
      redhat_vm: true
    classes:
      - centos::vm
 
  RedHat:
    rules:
      - fact: "%{facts.os.family}"
        operator: ==
        value: RedHat
    data:
      redhat_os: true
    classes:
      - centos::common

This shows 2 Classifications one called “RedHat VMs” and one just “RedHat”, you can see the VMs one contains 2 rules and it sets match: all so they both have to match.

End result here is that all RedHat machines get centos::common and RedHat VMs also get centos::vm. Additionally 2 pieces of data will be created, a bit redundant in this example but you get the idea.

Using the Classifier


So using the classifier in the basic sense is just like hiera_include():

node default {
  include classifier
}

This will process all the rules and include the resulting classes. It will also expose a bunch of information via this class, the most interesting is $classifier::data which is a Hash of all the data that the rules emit. But you can also access the the included classes via $classifier::classes and even the whole post processed classification structure in $classifier::classification. Some others are mentioned in the README.

You can do very impressive Hiera based overrides, here’s an example of adjusting a rule for a particular node:

# clients/acme.yaml
classifier::rules:
  RedHat VMs:
    classes:
      - some::other
    data:
      extra_data: true

This has the result that for this particular client additional data will be produced and additional classes will be included – but only on their RedHat VMs. You can even use the knockout feature here to really adjust the data and classes.

The classes get included automatically for you and if you set classifier::debug you’ll get a bunch of insight into how classification happens.

Hiera Inception


So at this point things are pretty neat, but I wanted to both see how the new Data Provider API look and also see if I can expose my classifier back to Hiera.

Imagine I am making all these classifications but with what I shown above it’s quite limited because it’s just creating data for the $classifier::data hash. What you really want is to create Hiera data and be able to influence Automatic Parameter Lookup.

So a rule like:

# clients/acme.yaml
classifier::rules:
  RedHat:
    data:
      centos::common::selinux: permissive

Here I am taking the earlier RedHat rule and setting centos::common::selinux: permissive, now you want this to be Data that will be used by the Automatic Parameter Lookup system to set the selinux parameter of the centos::common class.

You can configure your Environment with this hiera.yaml

# environments/production/hiera.yaml
---
version: 4
datadir: "hieradata"
hierarchy:
  - name: "%{trusted.certname}"
    backend: "yaml"
 
  - name: "classification data"
    backend: "classifier"
 
  # ... and the rest

Here I allow node specific YAML files to override the classifier and then have a new Data Provider called classifier that expose the classification back to Hiera. Doing it this way is super important, the priority the classifier have on a site is not a single one size fits all choice, doing it this way means the site admins can decide where in their life classification site so it best fits their workflows.

So this is where the inception reference comes in, you extract data from Hiera, process it using the Puppet DSL and expose it back to Hiera. At first thought this is a bit insane but it works and it’s really nice. Basically this lets you completely redesign hiera from something that is Hierarchical in nature and turn it into a rule based system – or a hybrid.

And you can even test it from the CLI:

% puppet lookup --compile --explain centos::common::selinux
Merge strategy first
  Data Binding "hiera"
    No such key: "centos::common::selinux"
  Data Provider "Hiera Data Provider, version 4"
    ConfigurationPath "environments/production/hiera.yaml"
    Merge strategy first
      Data Provider "%{trusted.certname}"
        Path "environments/production/hieradata/dev2.devco.net.yaml"
          Original path: "%{trusted.certname}"
          No such key: "centos::common::selinux"
      Data Provider "classification data"
        Found key: "centos::common::selinux" value: "permissive"
      Merged result: "permissive"
  Merged result: "permissive"

I hope to expose here which rule provided this data like the other lookup explanations do.

Clearly this feature is a bit crazy, so consider this a exploration of what’s possible rather than a strong endorsement of this kind of thing :)

Implementation


Implementing this has been pretty interesting, I got to use a lot of the new Puppet 4 features. Like I mentioned all the data processing, iteration and deriving of classes and data is done using the native Puppet DSL, take a look at the functions directory for example.

It also makes use of the new Type system and Type Aliases all over the place to create a strong schema for the incoming data that gets validated at all levels of the process. See the types directory.

The new Modules in Data is used to set lookup strategies so that there are no manual calling of lookup(), see the module data.

Writing a Data Provider ie. a Hiera Backend for the new lookup system is pretty nice, I think the APIs around there is still maturing so definitely bleeding edge stuff. You can see the bindings and data provider in the lib directory.

As such this module only really has a hope of working on Puppet 4.4.0 at least, and I expect to use new features as they come along.

Conclusion


There’s a bunch more going on, check the module README. It’s been quite interesting to be able to really completely rethink how Hiera data is created and what a modern take on classification can achieve.

With this approach if you’re really not too keen on the hierarchy you can totally just use this as a rules based Hiera instead, that’s pretty interesting! I wonder what other strategies for creating data could be prototyped like this?

I realise this is very similar to the PE node classifier but with some additional benefits in being exposed to Hiera via the Data Provider, being something you can commit to git and being adjustable and overridable using the new Hiera features I think it will appeal to a different kind of user. But yeah, it’s quite similar. Credit to Ben Ford for his original Ruby based implementation of this idea which I took and iterated on. Regardless the ‘like a iTunes smart list’ node classifier isn’t exactly a new idea and have been discussed for literally years :)

You can get the module on the forge as ripienaar/classifier and I’d greatly welcome feedback and ideas.

Puppet 4 Type Aliases

Back when I first took a look at Puppet 4 features I explored the new Data Types and said:

Additionally I cannot see myself using a Struct like above in the argument list – to which Henrik says they are looking to add a typedef thing to the language so you can give complex Struc’s a more convenient name and use that. This will help that a lot.

And since Puppet 4.4.0 this has now become a reality. So a quick post to look at that.

The Problem


I’ve been writing a Hiera based node classifier both to scratch and itch and to have something fairly complex to explore the new features in Puppet 4.

The classifier takes a set of classification rules and produce classifications – classes to include and parameters – from there. Here’s a sample classification:

classifier::rules:
  RedHat VMs:
    match: all
    rules:
      - fact: "%{facts.os.family}"
        operator: ==
        value: RedHat
      - fact: "%{facts.is_virtual}"
        operator: ==
        value: "true"
    data:
      redhat_vm: true
      centos::vm::someprop: someval
    classes:
      - centos::vm

This is a classification rule that has 2 rules to match against machines running RedHat like operating systems and that are virtual. In that case if both these are true it will:

  • Include the class centos::vm
  • Create some data redhat_vm => true and centos::vm::someprop => someval

You can have an arbitrary amount of classifications made up of a arbitrary amount of rules. This data lives in hiera so you can have all sorts of merging, overriding and knock out fun with it.

The amazing thing is since Puppet 4.4.0 there is now no Ruby code involved in doing what I said above, all the parsing, looping, evaluating or rules and building of data structures is all done using functions written in the pure Puppet DSL.

There’s some Ruby there in the form of a custom backend for the new lookup based hiera system – but this is experimental, optional and a bit crazy.

Anyway, so here’s the problem, before Puppet 4.4.0 my main class had this in:

class classifier (
  Hash[String,
    Struct[{
      match    => Enum["all", "any"],
      rules    => Array[
        Struct[{
          fact     => String,
          operator => Enum["==", "=~", ">", " =>", "<", "<="],
          value    => Data,
          invert   => Optional[Boolean]
        }]
      ],
      data     => Optional[Hash[Pattern[/A[a-z0-9_][a-zA-Z0-9_]*Z/], Data]],
      classes  => Array[Pattern[/A([a-z][a-z0-9_]*)?(::[a-z][a-z0-9_]*)*Z/]]
    }]
  ] $rules = {}
) {
....
}

This describes the full valid rule as a Puppet Type. It’s pretty horrible. Worse I have a number of functions and classes all that receives the full classification or parts of it and I’d have to duplicate all this all over.

The Solution


So as of yesterday I can now make this a lot better:

class classifier (
  Classifier::Classifications  $rules = {},
) {
....
}

to do this I made a few files in the module:

# classifier/types/matches.pp
type Classifier::Matches = Enum["all", "any"]
# classifier/types/classname.pp
type Classifier::Classname = Pattern[/A([a-z][a-z0-9_]*)?(::[a-z][a-z0-9_]*)*Z/]

and a few more, eventually ending up in:

# classifier/types/classification.pp
type Classifier::Classification = Struct[{
  match    => Classifier::Matches,
  rules    => Array[Classifier::Rule],
  data     => Classifier::Data,
  classes  => Array[Classifier::Classname]
}]

Which you can see solves the problem quite nicely. Now in classes and functions where I need lets say just a Rule all I do is use Classifier::Rule instead of all the crazy.

This makes the native Puppet Data Types perfectly usable for me, well worth adopting these.

The Puppet 4 Lookup Function

Puppet 4 has a new lookup subsystem exposed to the user in a few places:

  • The lookup() function
  • Automatic parameter lookups
  • Configuring the automatic parameter lookups via Data in Modules

I’ve not been able to figure out everything the docs have been trying to say about this function but it turns out they were copied from the deep_merge gem and it actually has better examples in some cases. So I thought a post exploring it and it’s various forms is in order

It’s pivotal to the use of data in Puppet so while you probably don’t need to fully grasp all of it’s intricacies as in this post a passing knowledge is valuable as is knowing how to find good help for it. I do think there’s some opportunity for improving the UX of this function though.

As usual the challenge when faced with all these options isn’t in how to use them all but in which options to use when that won’t result in a giant unmaintainable mess down the line. I think this function is definitely on the wrong side of the line in this regard. It’s massive and unwieldy in that it is exposing internals of Puppet in a 1:1 manner to the user.

So I would not recommend writing code that calls this function directly unless in extraordinary circumstances. With the Data in Modules and Automatic Parameter Lookup features you can achieve this, see the last section of the post for that.

First though you need to know the behaviours and terminology of the lookup() function in order to get to a point where you can use the other methods, so lets dive in.

Lookup Patterns

Basic usage


The function comes in a few forms past the most obvious lookup(“thing”):

lookup("some::thing", String, "first", "default value")

Here we’re looking up the key some::thing and it has to be a String from the data store. It will do a first style lookup which is your basic traditional Hiera first-match-wins and there’s a default. Apparently there is no simple case lookup(“some::thing”, “default”) which seems like it would be the most common use. You can come kind of close though with (more on this below):

lookup({"name" => "some::thing", "default_value" => "default"})

Anyway, you’re not really going to be using the lookup function directly much so this is probably fine

The thing to note here are the lookup strategies, there are a few and you will always have to know them:

first First match found is returned, just like in traditional hiera() default behaviour
unique This is an array merge like old hiera_array().
hash This is hiera_hash() without deep merging enabled.
deep This is hiera_hash() with deep merging enabled. You would not guess this from the description in the docs.

So this is your basic replacement for the old hiera(), hiera_hash() and hiera_array() and as you can see from the last 2 the merge strategy isn’t set globally like in old Hiera, this is a big improvement.

I will not go into a full exploration of what Tiers mean, the old Hiera docs are pretty good for that. Effectively a merge strategy describe what Hiera does when it finds interesting data in many different levels of data or in different data sources.

Complex Strategies for Setting Defaults


From here it gets a bit crazy, but there are some really great things you can do with some of these so lets look at them.

First I’ll look at the task of setting defaults. Hiera had quite basic features in this space which was enough to get going but lookup has some nice additions.

First the above lookup can also be written like this:

lookup({"name" => "some::thing", "value_type" => String, "default_value" => "default", "merge" => "first"})
lookup({"name" => "some::thing", "default_value" => "default"}) # though accepts any data type

So this is quite nice because now you can decide the order of arguments and which to include.

There’s a more powerful way to set defaults though:

function some_module::params() {
  {
    "some_module::thing" => "default",
    "some_module::other_thing" => false
  }
}
 
lookup({"name" => "some_module::thing", "default_values_hash" => some_module::params()})

Which at first does not seem a huge improvement, but if you’re thinking about strategies to replace something like params.pp you could come up with some interesting patterns using this method. For example you can have a module function like here and an environment one (it supports environment level native functions) and combine them like environment_params() + some_module::params() to come up with layered sets of defaults, in effect this would be a micro hiera on it’s own programmed in pure Puppet DSL.

And finally you can use a lambda to set the default:

lookup("some::thing") |$key| { "Could not find a value for key '${key}', please configure it in your hiera data" }

Here we return a custom string instead that tells the user what is going on rather than blow up badly and we can of course include any helpful information like fact values and such to help them find the right place in your possibly complex data store.

Sticking to the Lambda I saw Henrik mention this on IRC yesterday:

$result = with(lookup("some::thing")) |$value| { if $value =~ Array { $value } else { [$value] } }

This does a lookup and ensures that the result is always an array, like the Ruby code Array(thing). These 2 Lambda approaches can’t really be done without calling lookup() specifically, so probably a bit niche.

I won’t go into all the details just now about Data in Modules and Merge Strategies but to see how these things tie together you should know you can set these option hashes via your data layer, see the linked to blog post for some details about this. The last section of this post shows a end to end working setup with Data in Modules and Merge Strategies in data.

Merge Strategies


The merge strategies in Hiera is where things really gets interesting and this function has even more than before. Some that I honestly can’t imagine any use for but I tend to lean on the less is more side of things wrt Puppet code.

We’ve seen the basic merge strategies above:

lookup("some::thing", String, "first", "default value")
lookup({"name" => "some::thing", "value_type" => String, "default_value" => "default", "merge" => "first"})

Here the strategy is first. But when the strategy is deep this can also be a hash with more merging options.

The most interesting for me is the knockout_prefix one. A common question when using Hiera for node classification is how to exclude a class from a certain node. This was kind of doable at least in Puppet 4 by using Arrays like:

include(hiera_array("classes", []) - hiera_array("exclude_classes", []))

Which will lookup classes and exclude_classes and subtract them from each other. This is a hack, lets look at a better option:

Given data like this:

# common.yaml
classification:
  classes:
    - sensu
    - sysadmin
# node1.example.net.yaml
classification:
  classes:
    - --sensu
    - nagios
    - webserver

What we’re trying to say is that the node1.example.net is not monitored by Sensu but by Nagios instead, the following lookup achieves this and includes the resulting classes:

$classification = lookup({"name" => "classifiation", 
        "merge" => {
          "strategy" => "deep", 
          "knockout_prefix" => "--",
          "sort_merge_arrays" => true
        }
})
 
$classification["classes"].include

Additionally I sorted the merged arrays. The tells it to remove data that matches the prefix. You can remove just some array member like here or entire keys from a resulting hash.

There’s another option where if some array member was a hash and you wanted to merge these hashes in the result sets you can set merge_hash_arrays. At that point you should probably rather rethink your data though tbh.

And the last one which I cannot figure out any use for and was quite baffled at is about turning Strings into Arrays. Henrik says they did not add this one for a reason other than it’s available on the deep_merge gem.

Lets change the data for our node to look like this:

# node1.example.net.yaml
classification:
  classes:
    - --sensu,nagios
    - webserver

While leaving the common data as is. If you set “unpack_arrays” => “,” in the merge options it will take every string found, split it by “,” which would turn this into a array of [“–sensu”, “nagios”] and then merge it up and then perform any knockouts so you get the same outcome ie. [“nagios”, “sysadmin”, “webserver”].

You should probably rethink your data instead if you find this useful :) That said though this –sensu,nagios does look like a search and replace, so perhaps in the context of a classifier utility it’s not all bad.

CLI tool


Like in the old hiera there’s a CLI tool for this function, unlike the old hiera one it does not suck.

To recreate the above lookup on the cli you’d do (though only once PUP-6050 is fixed):

% puppet lookup --hiera_config hiera.yaml --merge deep --knock-out-prefix "--" --unpack-arrays "," --sort-merge-arrays classification
---
classes:
- sysadmin
- nagios
- webserver

This is fine, but it’s a lot nicer than that. If you add the option –explain you get this:

Merge strategy deep
  Options: {
    "sort_merge_arrays" => true,
    "merge_hash_arrays" => false,
    "knockout_prefix" => "--",
    "unpack_arrays" => ","
  }
  Data Binding "hiera"
    Found key: "classification" value: {
      "classes" => [
        "sysadmin",
        "nagios",
        "webserver"
      ]
    }
  Data Provider "EnvironmentDataProvider"
    No such key: "classification"
  Merged result: {
    "classes" => [
      "sysadmin",
      "nagios",
      "webserver"
    ]
  }

A bit lacking in the case of old school hiera data since old Hiera does not emit the right kinds of detail for it to show where it gets your data from. It’s handy though since you can see the merge options hash and what data providers are queries. See below for the full potential.

Bringing it all together

When I started this fairly epic post I said I do not recommend people use lookup() directly, so lets take a look at pulling this all together.

I’ll make a simple classifier class like above in a module. Note the classes variable would above be done with the huge lookup() but not here. We do not want to use the lookup() function instead use Automatic Parameter Lookup:

class classifier($classes) {
  $classes.include
}

I’ll set it up for data in modules and add to it the lookup options:

# production/modules/classifier/data/common.yaml
lookup_options:
  classifier::classes:
    merge:
      strategy: deep
      knockout_prefix: "--"
      unpack_arrays: ","
      sort_merge_arrays: true

Note this is basically a lookup() call but attached to a specific key – classifier::classes. This way as we add more classification data we can have different strategies and such, doing it here means it works across all types of Hiera data old and new.

Now the data, I am using the environment data provider here – so no classic hiera at all:

First we configure our production environment to have it’s own instance of Hiera and it’s own hiera.yaml – take note, this is huge. Per environment hiera and hierarchies now works!

# production/environment.conf
environment_data_provider = hiera
# production/hiera.yaml
---
version: 4
datadir: "hieradata"
hierarchy:
  - name: "%{trusted.certname}"
    backend: "yaml"
  - name: "common"
    backend: "yaml"

Here’s our production environment data:

# production/hieradata/common.yaml
classifier::classes:
  - sensu
  - sysadmins
# production/hieradata/dev1.devco.net.yaml
classifier::classes:
  - nagios
  - --sensu
  - webserver

At this point it all works a charm, our node knocks out Sensu and brings in Nagios. This is a major wishlist item that old hiera_include() did not have!

Note this is just Array data that’s being knocked out and not Hash data here, while the deep strategy is supposed to work with Hashes only, so I am a bit surprised it works but I’ll take it as it makes this classifier better.

% puppet lookup --environmentpath environments classifier::classes
---
- sysadmins
- nagios
- webserver

And if we added –explain you can finally get the massive benefit of finally learning how Hiera finds your data:

% puppet lookup --environmentpath environments --explain classifier::classes
Merge strategy deep
  Options: {
    "knockout_prefix" => "--",
    "sort_merge_arrays" => true,
    "unpack_arrays" => ","
  }
  Data Binding "hiera"
    No such key: "classifier::classes"
  Data Provider "Hiera Data Provider, version 4"
    ConfigurationPath "/home/rip/temp/lookup/environments/production/hiera.yaml"
    Merge strategy deep
      Options: {
        "knockout_prefix" => "--",
        "sort_merge_arrays" => true,
        "unpack_arrays" => ","
      }
      Data Provider "%{trusted.certname}"
        Path "/home/rip/temp/lookup/environments/production/hieradata/dev1.devco.net.yaml"
          Original path: "%{trusted.certname}"
          Found key: "classifier::classes" value: [
            "nagios",
            "--sensu",
            "webserver"
          ]
      Data Provider "common"
        Path "/home/rip/temp/lookup/environments/production/hieradata/common.yaml"
          Original path: "common"
          Found key: "classifier::classes" value: [
            "sensu",
            "sysadmins"
          ]
      Merged result: [
        "sysadmins",
        "nagios",
        "webserver"
      ]
  Module "classifier" using Data Provider "Hiera Data Provider, version 4"
    ConfigurationPath "/home/rip/temp/lookup/environments/production/modules/classifier/hiera.yaml"
    Merge strategy deep
      Options: {
        "knockout_prefix" => "--",
        "sort_merge_arrays" => true,
        "unpack_arrays" => ","
      }
      Data Provider "%{trusted.certname}"
        Path "/home/rip/temp/lookup/environments/production/modules/classifier/data/dev1.devco.net.yaml"
          Original path: "%{trusted.certname}"
          Path not found
      Data Provider "common"
        Path "/home/rip/temp/lookup/environments/production/modules/classifier/data/common.yaml"
          Original path: "common"
          No such key: "classifier::classes"
  Merged result: [
    "sysadmins",
    "nagios",
    "webserver"
  ]

Every data file and every config file is shown and the full merge logic in all it’s glory is included. Huge win over previous hiera.

The result is a bit dense but if you follow along you can see it all works quite nicely and it’s super helpful for debugging cases where hiera just don’t work.

It’s a bit awkward – here I am doing it on the node the data is for, but for other nodes you would need their facts. As I understand it, it basically compiles the catalog and profiles the lookups during that process, so it needs facts as usual.

Conclusion

So that’s a rather epic exploration of the lookup() function which eventually ended us up with – do not use the lookup() function :)

You can see how this is a big step forward and in the end by using environment and module data – and no site data – I am not using old Hiera at all anymore as far as I know. This is purely the new lookup subsystem and it’s really powerful.

  • Environments and Modules can have data and independent hierarchies
  • The lookup subsystem is fully exposed in lookup() but the bulk of the features are accessible via lookup_options and so the Automatic Parameter Lookups
  • It has a really good CLI command which once a few bugs are sorted can bring amazing visibility to where your data comes from and what data is assigned to a node. Even without those bugs fixed though if you use lookup_options as in the last option it’s totally usable today

params.pp in Puppet 4

I do not like the params.pp pattern. Puppet 4 has brought native Data in Modules that’s pretty awesome and to a large extend it removes the traditional need for params.pp.

Thing is, we kind of do still need some parts of params.pp. To understand this we have to consider what the areas of concern params.pp has in Puppet world:

  • Holds data, often in large if or case statements that ultimately resemble hiera data
  • Derives new data using logic based on situation specific data like facts
  • Validates data is valid. This was kind of all over the place and not in params.pp since it’s not parameterised generally. But it’s closely related.

Points 1 and 3 are roughly sorted out by Puppet 4 types and data in modules, but what about the 2nd point and to some extend more complex data validation that falls outside of the type system?

Before I start looking at how to derive data though I’ll take a look at the new function API in Puppet 4.

Native Functions


Puppet has always allowed us to write functions but they needed to be in Ruby and nothing else. This isn’t really great. The message is kind of:

Puppet has a DSL for managing systems, we think it’s awesome and can do everything you need. But in order to use it you have to learn 2 programming languages with different models.

And I always felt the same about the general suggestion to write ENCs etc, luckily not something we hear much these days.

And they had a few major issues:

  • They do not work right in environments, just like custom providers and types do not. This is a showstopper bug as environments have become indispensable in modern Puppet use.
  • They are not namespaced. This is a showstopper for putting them on the forge.

The Puppet 4 functions API fix this, you can write functions in the native DSL and they work fine in environments. The Puppet 4 DSL with it’s loops and blocks and so forth have matured enough that it can do a lot of the things I’d need to do for deriving data from other data.

They live in your module in the functions directory, they’re namespaced and environment safe:

function mymod::myfunc(Fixnum $input) {
  $input * 2
}

And you’d use this like any other function: $x = mymod::myfunc(10). Simple stuff, whatever is the last value is returned like in Ruby.

Derived Data in Puppet 4


So we’re finally where I can show my preferred method for deriving data in Puppet 4, and that’s to use a native function.

As an example we’ll stick with Apache but this time a wrapper for the main class. From the previous blog post you’ll remember (or if not please read that post) that we wrapped the puppetlabs-apache module to create our own vhost define. Here I’ll show a wrapper for the main apache class.

class site::apache(
  Boolean $passenger = false,
  Hash $apache = {}
  Hash $module_options = {}
) {
  if $passenger {
    $_passenger_defaults = {
      "passenger_max_pool_size" => site::apache::passenger_pool_size(),
      # ...
    }
 
    class{"apache::mod::passenger":
      * => $_passenger_defaults + site::fetch($module_options, "passenger", {})
    }
  }
 
  $defaults = {
    "default_vhost" => false,
    # ....
  }
 
  class{"apache":
    * => $defaults + $apache
  }
}

Here I have a wrapper that does the basic Apache configuration with some overridable defaults via $apache and I have a way to configure Passenger again with overridable defaults via $module_options[“passenger”].

The Passenger part uses 2 functions: site::apache::passenger_poolsize and site::fetch. These are name spaced to the site module and are functions that you can see below:

First the site:apache::passenger_poolsize that follows typical community guidelines for the pool size based on core count, it’s also aware if the machine is virtual or physical. This is a good example of derived data that would be impossible to do using just Hiera – and so simply does not have a place there.

function site::apache::passenger_poolsize {
  if $is_virtual {
    $multiplier = 1.5
  } else {
    $multiplier = 2
  }
 
  floor($facts["processors"]["count"] * $multiplier)
}

And this is site::fetch that’s like Ruby’s Hash#fetch. stdlib will soon have dig() that does something similar.

function site::fetch(
  Hash $data,
  String $key,
  $default
) {
  if $data[$key] {
    $data[$key]
  } else {
    $default
  }
}

Why functions and not inlining the logic?

This seems like a bit more work than just sticking the site::apache::passenger_poolsize logic into the class that’s calling it so why bother? The first is obviously that it’s reusable so if you have anywhere else you might need this logic you could use it. But the second is about isolation.

I am not a big fan of writing Puppet rspec tests since I tend to shy away from Puppet logic in modules. But if I have to put logic in modules I’d like to isolate the logic so I can easily test it in isolation. I have no idea if rspec-puppet supports these functions yet, but if it did having this logic in as small a package as possible for testing is absolutely the right thing to do.

Further today the function is quite limited, but I can see I might want to expand it later to consider total memory as well as core count. When that day comes, I only have to edit this function and nothing else. The potential fallout from logic errors and so forth is neatly contained and importantly I can be fairly sure that this function is used for 1 thing only and changing it’s internals is something I can safely do – the things calling it really should not care for it’s internals.

Early on here I touched on complex validation of data as a possibly thing these functions could solve. The example here does not really do this, but imagine that for my site I never want to set the passenger_poolsize above some threshold that might relate to the memory on the machine. Given that this poolsize is user overridable I’d write a function like site::apache::validate_poolsize that takes care of this and fails when needed.

These validations could become very complex and situation specific (ie. based on facts) so this is more than we can expect from a Type system. Writing validations as native functions is easy and fits in neatly with the DSL.

Conclusion

These functions are great, to me they are everything defined types should have been and more. I think they move Puppet as a whole a huge leap forward in that you can achieve more complex things using just the Puppet DSL and they combine very nicely with the recent epp native Puppet based templates.

They fix massive show stopper bugs of environment compatibility and makes sharing modules like this on a forge a lot safer.

Using them in this manner here Puppet 4 can close the loop on all the functionality that params.pp had:

  • Pure data that is hierarchical in nature can live in modules.
  • Input validation can be done using the data type system.
  • Derived data can be done in isolation and in a reusable manner using native functions

When combined in this manner params.pp can be removed completely without any loss of functionality. Every one of these above points improve significantly on the old pattern.

I could not find docs for the new functions on the Puppet Labs site, hopefully we’ll see some soon.

I have a short wishlist for these functions:

  • I want to be able to specify their return type from functions, I think this is critical.
  • I want a return() function like in other languages. I know you can generally do without but sometimes that can lead to some pretty awkward code.
  • More docs

Bonus: The end of defined types?

These functions can create resources just like any other manifest can. This is a big difference from old Ruby functions who had to do all kinds of nasty things, possibly via create_resources. But since they can create resources they might be a viable replacement for defined types.

There are a few issues with this idea: The immediate missing part is that you cannot export a function. Additionally as they are outside of the resource system you couldn’t do overrides and do any relations on them. You can’t say install a package before a vhost made by a function.

The first I don’t really personally care for since I do not and will never use exported resources. The 2nd is perhaps a more important issue, from a ordering perspective the MOAR ordering in Puppet 4 helps but for doing notifies and such it might not be that hot.

It’s a interesting thought experiment though, I think with a bit of work defined types can be deprecated, people want to think of defined types as functions but they aren’t and this is a hurdle in learning Puppet for newcomers, with some work I think functions can eventually replace defined types. That’s a good goal to work toward.

Managing AWS CloudFront Security Group with AWS Lambda

One of our security groups on Amazon Web Services (AWS) allows access to an Elastic Load Balancer (ELB) from one of our Amazon CloudFront distributions. Traffic from CloudFront can originate from a number of a different source IP addresess that Amazon publishes. However, there is no pre-built security group to allow inbound traffic from CloudFront.

I constructed an AWS Lambda function to periodically update our security group so that we can ensure all CloudFront IP addresses are permitted to access our ELB.

AWS Lambda

AWS Lambda allows you to execute functions in a few different languages (Python, Java, and Node.js) in response to events. One of these events can be the triggering of a regular schedule. In this case, I created a scheduled event with an Amazon CloudWatch rule to execute a lambda function on an hourly basis.

CloudWatch Schedule to Lambda Function

The Idea

The core of my code involves calls to authorize_ingress and revoke_ingress using the boto3 library for AWS. AWS Lambda makes the boto3 library available for Python functions.


print("the following new ip addresses will be added:")
print(authorize_dict['ipranges'])
print("the following new ip addresses will be removed:")
print(revoke_dict['ipranges'])
security_group.authorize_ingress(ippermissions=[authorize_dict])
security_group.revoke_ingress(ippermissions=[revoke_dict])

Amazon publishes the IP address ranges of its various services online.


response = urllib2.urlopen('https://ip-ranges.amazonaws.com/ip-ranges.json')
jsondata = json.loads(response.read())
newipranges = [ x['ipprefix'] for x in jsondata['prefixes'] if x['service'] == 'cloudfront' ]
print(newip_ranges)

I can easily compare the allowed ingress address ranges in an existing security group with those retrieved from the published ranges. The authorized_ingress and revoke_ingress functions then allow me to make modifications to the security group to keep it up-to-date, and permit traffic from CloudFront to access my ELB.


for ip in new_ip_ranges:
    if ip not in current_ip_ranges:
        authorize_dict['ipranges'].append({u'cidrip': ip})
for ip in current_ip_ranges:
    if ip not in new_ip_ranges:
        revoke_dict['ipranges'].append({u'cidrip': ip})

The AWS Lambda Function

The full lambda function is written as a standard lambda_handler for AWS. In this case, the event and context are ignored, and the code is just executed on a regular schedule.

Lambda Function

Notice that the existing security group is directly referenced as sg-3xxexx5x.


from __future__ import print_function
import json, urllib2, boto3
def lambda_handler(event, context):
    response = urllib2.urlopen('https://ip-ranges.amazonaws.com/ip-ranges.json')
    json_data = json.loads(response.read())
    new_ip_ranges = [ x['ip_prefix'] for x in json_data['prefixes'] if x['service'] == 'cloudfront' ]
    print(new_ip_ranges)
    ec2 = boto3.resource('ec2')
    security_group = ec2.securitygroup('sg-3xxexx5x')
    current_ip_ranges = [ x['cidrip'] for x in security_group.ip_permissions[0]['ipranges'] ]
    print(current_ip_ranges)
    params_dict = {
        u'prefixlistids': [],
        u'fromport': 0,
        u'ipranges': [],
        u'toport': 65535,
        u'ipprotocol': 'tcp',
        u'useridgrouppairs': []
    }
    authorize_dict = params_dict.copy()
    for ip in new_ip_ranges:
        if ip not in current_ip_ranges:
            authorize_dict['ipranges'].append({u'cidrip': ip})
    revoke_dict = params_dict.copy()
    for ip in current_ip_ranges:
        if ip not in new_ip_ranges:
            revoke_dict['ipranges'].append({u'cidrip': ip})
    print("the following new ip addresses will be added:")
    print(authorize_dict['ipranges'])
    print("the following new ip addresses will be removed:")
    print(revoke_dict['ipranges'])
    security_group.authorize_ingress(ippermissions=[authorize_dict])
    security_group.revoke_ingress(ippermissions=[revoke_dict])
    return {'authorized': authorize_dict, 'revoked': revoke_dict}

The Security Policy

The above lamdba function presumes permissions to be able to edit the referenced security group. These permissions can be configured with an AWS Identity and Access Management (IAM) policy, applied to the role which the lamdba function executes as.

Lambda function role

Notice that the security group resource, sg-3xxexx5x, is specifically scoped to the us-west-2 AWS region.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeNetworkAcls"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:AuthorizeSecurityGroupIngress",
                "ec2:RevokeSecurityGroupIngress"
            ],
            "Resource": "arn:aws:ec2:us-west-2:*:security-group/sg-3xxexx5x"
        }
    ]
}

Making It All Work

In order to get everything hooked up correctly, an appropriate security group needs to exist. The identifier for the group needs to be referenced in both the Lambda script, and the policy used by the role that the lambda script executes as. The IAM policy uses the Amazon Resource Name (ARN) instead of the security group identifier. The AWS Lambda function presumes that Amazon will publish changes to the CloudFront IP address range in a timely manner, and that running the function once per hour will be sufficient to grant ingress permissions on the security group. If the CloudFront ranges change frequently, or traffic is particularly crucial, the frequency of the lambda function run should be increased.

The post Managing AWS CloudFront Security Group with AWS Lambda appeared first on Atomic Spin.

The Resource Wrapper Pattern in Puppet 4

One tends to need to wrap resources quite often in Puppet and prior to Puppet 4 this was extremely annoying and resulted in a high maintenance burden, but in Puppet 4 this has significantly improved so I thought I’ll write a quick post about that.

Why wrap resources?


The example I’ll show here is going to wrap the apache::vhost resource from the PuppetLabs Apache module. This resource has 139 possible attributes, it’s a beast.

While it has some helpers to create doc roots and logdirs I want to add some higher level support:

  • Copy standardly named SSL certs out – you should use something like hiera-eyaml for keys of course
  • Set sane defaults for a few of the properties like ServerAdmin but keep them overridable by callers
  • Create some directories and default locations for docroots and logroots
  • Perhaps down the line create standard monitoring or backup policies

In Puppet 3 your only option to retain full features was to do something like:

def my::vhost (
  $copy_ssl_source = undef,
  $docroot,
  $manage_docroot  = true,
  $virtual_docroot = false,
  # all the rest of the 139 properties
) {
  if $copy_ssl_source {
    file{"/etc/httpd/ssl/${name}.crt":
       source => "${copy_ssl_source}/${name}.crt"
    }
 
    # and the same for the chain, key etc
  }
 
  apache::vhost{$name:
     docroot => $docroot,
     manage_docroot => $manage_docroot,
     virtual_docroot => $virtual_docroot,
     # and reproduce the rest of the 139 proeprties again
  }
}

As you can see this is a nightmare, it might be workable on a small module but the amount of work required to manage it on this large module is insane. You’ll forever have to play catch up with upstream and hope they never get the same property as one you’ve added in your wrapper.

In this module you’ll have to list 139 properties twice, once in the parameters and once when creating the apache::vhost resource.

I never found this useable at all and it was a big reason why I never used the forge much as you end up needing this pattern a lot.

Puppet 4

Since Puppet 4 there are now a few new features that make this completely comfortable and easy and allow your wrapper to focus on the features they add and nothing more.

The outcome of using the wrapper will be this:

my::vhost{"example.net":
  copy_ssl_source => "puppet:///modules/${module_name}/ssl/example_net",
  vhost => {
    "access_log_file" => "secure_access.log",
    "serveralias" => ["www.example.net", "other.com"],
    # and any other of the 139 properties
  }
}

So basically my decorated properties are the top level and there’s a hash that directly matches the wrapped resource.

The wrapper will look like this:

define my::vhost (
  Optional[String] $copy_ssl_source = undef,
  Hash             $vhost           = {}
) {
  if $copy_ssl_source {
    ["crt", "key", "chain"].each |$item| {
      file{"/etc/httpd/ssl/${name}.${item}":
         source => "${copy_ssl_source}.${item}",
         # .....
      }
    }
 
    $ssl_options = {
      "access_log_file" => "ssl_access.log"
    }
  } else {
    $ssl_options = {}
  }
 
  $defaults = {
    "docroot" => "/srv/www/${name}_docroot",
    "serveradmin" => "webmaster@example.net",
    "logroot" => "/var/log/httpd/${name}",
    "logroot_ensure" => "directory"
    "access_log_file" => "access.log",
    "error_log_file" => "error.log"
  }
 
  apache::vhost{$name:
    * => $defaults + $ssl_options + $vhost
  }
}

Here you can see the new wrapper, there are a few things to note here:

  • It’s pretty much focussed on just what the wrapper is supposed to achieve with almost no details of the wrapped resources. Primarily this relates to copying out SSL certs but it also sets some sane defaults like ServerAdmin as per my site policies.
  • Any of the defaults the wrapper sets can be overridden from the caller, here we set a custom access_log_file and serveralias in the caller, they would override the ones from the defaults
  • Hashes are immutable but here is an example of setting a set of custom options depending on other internal state. The $ssl_options variable will override the $defaults while still remaining overridable by the site user
  • There is almost no ongoing maintenance required just because the Apache module gets an update – unless it changes behaviour on one of the properties we’re defaulting. Everything else is not a concern of the wrapper.
  • I do plain hash merges here to create the parameters but you could use something like the deep_merge() function to really handle complex data, but for me less is more.

Conclusion


I find this really nice, I know there’s been some community interest in basically adding inheritance to defined types but honestly I do not see the use. This has a number of advantages over inheritance – for example I’d have no chance ever of shadowing a inherited property for example which would be quite a surprise.

The handling of defaults and merging in other set of defaults as here with the $ssl_options is super handy and without adding a lot of extra stuff would become quite awkward with an inheritance scheme. Especially given the immutable nature of Puppet variables.

Puppet 4 data lookup strategies

I recently wrote about the new Data in Modules support in Puppet 4, there’s another new feature that goes hand in hand with this to finally rid us of functions like hiera_hash() and such.

Up to now we’ve had to do something ugly like this to handle merged class parameters:

class users($local = hiera_hash("users::local", {}) {
 ...
}

This is functional but quite ugly and ties your module to having hiera. While these days it’s a reasonably safe assumption but with the ability to specify different environment data sources this will not always be the case. For example there’s a new kid on the block called Jerakia that lives in this world so having Hiera specific calls in modules is going to be a limiting strategy.

A much safer abstraction is to be able to rely on the automatic parameter lookup feature – but it had no way to know about the fact that this item should be a hash merge and so the functions were used as above.

Worse things like merge strategies were set globally, a module could not say a certain key should be deep merged and others just shallow merged etc, and if a module required a specific way it had no control over this.

A solution for this problem landed in recent Puppet 4 via a special merged hash called lookup_options. This is documented quite lightly in the official docs so I thought I’ll put up a example here.

lookup() function


To understand how this work you first have to understand the lookup() function, it’s documented here. But this is basically the replacement for the various hiera() functions and have a matching puppet lookup CLI tool.

If you wanted to do a hiera_hash() lookup that is doing the old deeper hash merge you’d do something like:

$local = lookup("users::local", Hash, {"strategy" => "deep", "merge_hash_arrays" => true})

This would merge just this key rather than say setting the merge strategy to deeper globally in hiera and it’s something the module author can control. The Hash above describes the data type the result should match and support all the various complex composite type definitions so you can really in detail describe the desired result data – almost like a Schema.

lookup_options hiera key


We saw above how to instruct the lookup() function to do a hiera_hash() but wouldn’t it be great if we could somehow tell Puppet that a specific key should always be merged in this way? That way a simple lookup(“users::local”) would do the merge and crucially so would the automatic parameter lookups – even across backends and data providers.

We just want:

class users(Hash $local = {}) {
 ...
}

For this to make sense the users module must be able to indicate this in the data layer. And since we now have data in modules there’s a obvious place to put this.

If you set up the users module here to use the hiera data service for data in modules as per my previous blog post you can now specify the merge strategy in your data:

# users/data/common.yaml
lookup_options:
  users::local:
    strategy: deep
    merge_hash_arrays: true

Note how this match exactly the following lookup():

$local = lookup("users::local", Hash, {"strategy" => "deep", "merge_hash_arrays" => true})

The data type validation is done on the class parameters where it will also validate specifically specified data and the strategies for processing the data is in the module data level.

The way this works is that puppet will do a lookup_options lookup from the data source that is merged together – so you could set this at site level as well – but there is a check to ensure a module can only set keys for itself so it can not change behaviours of other modules.

This is a huge win and really shows what can be done with the Data in Modules features and something that’s been impossible before. This really brings the automatic parameter lookup feature a huge way forward and combines for me to be one of the most compelling features of Puppet 4.

I am not sure who proposed this behaviour, the history is a bit muddled but if someone can tweet me links to mailing list threads or something I’ll link them here for those who want to discover the background and reasoning that went into it.

Wishlist

The lookup function and the options are a great move forward however I find the UX of the various lookup options and merge strategies etc quite bad. It’s really hard for me to go from reading the documentation to knowing what a certain option will do with my data – in fact I still have no idea what some of these do the only way to discover it seems to be just spending time playing with it which I haven’t had, it would be great for new users to get some more clarity there.

Some doc updates that provide a translation from old Hiera terms to new strategies would be great and maybe some examples of what these actually do.

Native Puppet 4 Data in Modules

Back in August 2012 I requested an enhancement to the general data landscape of Puppet and a natural progression on the design of Hiera to enable it to be used in modules that are shared outside of your own environments. I called this Data in Modules. There was lots of community interest in this but not much movement, eventually I made a working POC that I released in December 2013.

The basic idea around the feature is that we want to be able to use Hiera to model internal data found in modules as well as site specific data and that these 2 sets of data coexist and compliment each other. Full details of this can be found in my post titled Better Puppet Modules Using Hiera Data and some more background can be found in The problem with params.pp. These posts are a bit old now and some things have moved on but they’re good background reading.

It’s taken a while but as part of the Puppet 4 rework effort the data ingesting mechanisms have also been rewritten in finally in Puppet 4.3.0 native data in modules have arrived. The original Jira for this is 4474. It’s really pretty close to what I had in mind in my proposals and my POC and I am really happy with this. Along the way a new function called lookup() have been introduced to replace the old collection of hiera(), hiera_array() and hiera_hash().

The official docs for this feature can be found at the Puppet Labs Docs site. Here I’ll more or less just take my previous NTP example and show how you could use the new Data in Modules to simplify it as per the above mentioned posts.

This is the very basic Puppet class we’ll be working with here:

class ntp (
  String $config,
  String $keys_file
) {
 ...
}

In the past these variables would have needed to interact with the params.pp file like $config = $ntp::params::config, but now it’s just a simple class. At this point it’ll not yet use any data in the module, to do that you have to activate it in the metadata.json:

# ntp/metadata.json
{
  ...
  "data_provider": "hiera"
}

At this point Puppet knows you want to use the hiera data in the module. But key to the feature and really the whole reason it exists is because a module needs to be able to specify it’s own hierarchy. Imagine you want to set $keys_file here, you’ll have to be sure the hierarchy in question includes the OS Family and you must have control over that data. In the past with the hierarchy being controlled completely by the site hiera.yaml this was not possible at all and the outcome was that if you wanted to share a module outside of your environment you have to go the params.pp route as that was the only portable solution.

So now your modules can have their own hiera.yaml. It’s slightly different from the past but should be familiar to past hiera users, it goes in your module so this would be ntp/hiera.yaml:

---
version: 4
datadir: data
hierarchy:
  - name: "OS family"
    backend: yaml
    path: "os/%{facts.os.family}
 
  - name: "common"
    backend: yaml

This is the new format for the hiera configuration, it’s more flexible and a future version of hiera will have some changing semantics that’s quite nice over the original design I came up with so you have to use that new format here.

Here you can see the module has it’s own OS Family tier as well as a common tier. Lets see the ntp/data/common.yaml:

---
ntp::config: "/etc/ntp.conf"
ntp::keys_file: "/etc/ntp.keys"

These are sane defaults to use for any non specifically supported operating systems.

Below are examples for AIX and Debian:

# data/os/AIX.yaml
---
ntp::config: "/etc/ntpd.conf"
# data/os/Debian.yaml
---
ntp::keys_file: "/etc/ntp/keys"

At this point the need for params.pp is gone – at least in this simplistic example – and this data along with the environment specific or site specific data cohabit really nicely. If you specified any of these data items in your site Hiera data your site data will override the module. The advantages of this might not be immediately obvious. I have a very long list of advantages over params.pp in my Better Puppet Modules Using Hiera Data post, be sure to read that for background.

There’s an alternative approach where you write a Puppet function that returns a hash of data and the data system will fetch the keys from there. This is really powerful and might end up being a interesting solution to something along the lines of a module specific custom hiera backend – but a lighter weight version of that. I might write that up later, this post is already a bit long.

The remaining problem is to do with data that needs to be merged as traditionally Hiera and Puppet has no idea you want this to happen when you do a basic lookup – hence these annoying hiera_hash() functions etc – , there’s a solution for this and I’ll post a blog post about that next week once the next Puppet 4 release is out and a bug I found that makes it unusable is fixed in that version.

This feature is a great addition to Puppet and I am really glad to finally see this land. My hacky modules in data code was used quite extensively with 72 000 downloads from the forge but I was never really happy with it and was desperate to see this land natively. This is a big step forward and I hope it sees wide adoption in the community.

Iterating in Puppet

Iteration in Puppet has been a long standing pain point, Puppet 4 address this by adding blocks, loops etc. Here I capture the various approaches to working with some complex data in Puppet before and after Puppet 4

To demonstrate this I’ll take some data from a previous blog post and see how to deal with it, here’s the data that will be in $domains in the examples blow:

{
    "x.net": {
      "nexthop": "70.x.x.x",
      "spamdestination": "rip@devco.net",
      "spamthreshold": 1500,
      "enable_antispam": 1
    },
    "x.co.uk": {
      "nexthop": "70.x.x.x",
      "spamdestination": "rip@devco.net",
      "spamthreshold": 1500,
      "enable_antispam": 1
    },
}

First we’re going to need some defined type that can create an individual domain, we’ll call that mail::domain but I won’t show the code here, as that’s not really important.

Puppet 3 + stdlib

The first approach I’ll show your basic Puppet 3 approach. The basic idea here is to get a list of domains and use the array iteration Puppet has always had on name.

The trick here is to get the domain names using the keys() function and then pass all the data into every instance of the define – the instance fetch it’s data from the data passed into the define.

$domain_names = keys($domains)
 
mail::domains{$domain_names:
  domains => $domains
}
 
define mail::domains($domains) {
  $domain = $domains[$name]
 
  mail::domain{$name:
    nexthop => $domain["nexthop"]
    .
    .
  }
}

Puppet 3 + create_resources

A hacky riff on eval() was added to Puppet during 3 to make it a bit easier to deal with data from Hiera or similar, it takes some data in a standard format and create instances of a defined type:

create_resources("mail::domain", $domains, {"spamthreshold" => 1500, "enable_antispam" => 1})

This replaces all the code above plus adds some default handling in the case that the data is not uniform. Some people love it, some hate it, I think it’s a bit too magical so prefer to avoid it.

Puppet 4 – each loop

This is the approach you’d probably want to use in Puppet 4 it uses a simple each loop over the data:

$domains.each |$name, $domain| { 
  mail::domain{$name:
    nexthop => $domain["nexthop"]
    .
    .
  }
}

It’s quite readable and obvious what’s happening here, it’s more typing than the create_resources example but I think this is the preferred way due to clarity etc

Below this we get into the more academic solutions to the problem, mainly showing off some Puppet 4 features.

Puppet 4 – wildcard shortcut

If listing every key is tedious like above and if you know your hashes map 1:1 to the defined type parameters you can short circuit things a bit, this is quite close to the create_resources convenience:

each($domains) |$name, $domain| { 
  mail::domain{$name:
    * => $domain
  }
}

The splat operator takes all the data in the hash and maps it right onto properties of the define type, quite handy

Puppet 4 – wildcard and defaults

Your data might not all be complete so you’d want to get some defaults merged in, this is something create resources also supports so this is how you’d do it without create_resources:

$defaults = {
  "spamthreshold" => 1500,
  "enable_antispam" => 1
}
 
$domains.each |$name, $domain| { 
  mail::domain{$name:
    * => $defaults + $domain  # + now merge hashes 
  }
}

Puppet 4 – wildcard and resource defaults

An alternative to the above that’s a bit more verbose but might be more readable can be seen below:

$defaults = {
  "spamthreshold" => 1500,
  "enable_antispam" => 1
}
 
$domains.each |$name, $domain| { 
  mail::domain{
    default:
      * => $defaults;
 
    $name:
      * => $domain
  }
}

Conclusion

That’s about it, there are many more iteration tricks in Puppet 4 but this shows you how to achieve what you did with create_resources in the past and a couple of possible approaches to solving that problem.

Not sure which I’d recommend, but I suspect the choice comes down to personal style and situation.

Translating Webhooks with AWS API Gateway and Lambda

Webhooks are great, so many services now support them but I found actually doing anything with them a pain as there are no standards for what goes in them and any 3rd party service you wish to integrate with has to support the particular hooks you are producing.

For instance I want to use SignalFX for my metrics and events but they have very few integrations. A translator could take an incoming hook and turn it into a SignalFX event and pass it onward.

For a long time I’ve wanted to build a translator but never got around to doing it because I did not feel like self hosting it and write a whole bunch of supporting infrastructure. With the release of AWS API Gateway this has become quite easy and really convenient as there are no infrastructure or instances to manage.

I’ll show a bit of a walk through on how I built a translator that sends events to Signal FX. Note I do not do any kind of queueing or retrying on the gateway at present so it’s lossy and best efforts.

AWS Lambda runs stateless functions on demand. At launch it only supported ingesting their own Events but the recently launched API Gateway lets you front it using a REST API of your own design and this made it a lot easier.

For the rest of this post I assume you’re over the basic hurdles of signing up for AWS and are already familiar with the basics, so some stuff will be skipped but it’s not really that complex to get going.

The Code


To get going you need some JS code to handle the translation, here’s a naive method to convert a GitHub push notification into a SignalFX event:

This will be the meat of the of the processing and it includes a bit of code to create a request using the https module which includes the SignalFX authentication header.

Note this creates dimensions to the event that is being sent, I guess you can think of them like some kind of key=val tags for the event. In the Signal FX UI I can select events like this:

And any other added dimension can be used too, events shows up as little diamonds on graphs, so if I am graphing a service using these dimensions I can pick out events that relate to the branches and repositories that influence the data.

This is called as below:

There’s some stuff not shown here for brevity, it’s all in GitHub.

Setting up the Lambda functions

We have to create a Lambda function, for now I’ll use the console but you can use terraform for this it helps quite a lot.

As this repo is made up of a few files your only option is to zip it up. You’ll have to clone it and make your own config.js based on the sample prior to creating the zip file.

Once you have it just create a Lambda function which I’ll call gitHubToSFX and choose your zip file as source. While setting it up you have to supply a handler. This is how Lambda finds your function to call.

In my case I specify index.handleGitHubPushNotifications – uses the handleGitHubPushNotifications function found in index.js.

It ends up looking like this:

Once created you can test it right there if you have a sample GitHub commit message.

The REST End Point

Now we need to create somewhere for GitHub to send the POST request to. Gateway works with resources and methods. A resource is something like /github-hook and a method is POST.

I’ve created the resource and method, and told it to call the Lambda function here:

You have to deploy your API – just hit the big Deploy API button and follow the steps, you can create stages like development, staging, production and deploy API’s through such a life cycle. I just went straight to prod.

Once deployed it gives you a URL like https://12344xnb.execute-api.eu-west-1.amazonaws.com/prod and your GitHub hook would be configured to hit https://12344xnb.execute-api.eu-west-1.amazonaws.com/prod/github-hook .

Conclusion


That’s about it, once you’ve configured GitHub you’ll start seeing events flow through.

Both Lambda and API Gateway can write logs to Cloud Watch and from the JS side you can see do something like console.log(“hello”) and this will show up in the Cloud Watch logs to help with debugging.

I hope to start gathering a lot of translations like these and am still learning Node, so not really sure yet how to make packages or classes but so far this seems really easy to use.

Cost wise it’s really cheap. You’d pay $3.50 per million API calls received on the Gateway and $0.09/GB for the transfer costs, but given the nature of these events this will be negligible. Lambda is free for the first 1 million requests and you’ll pay some tiny amount for the time used. They are both eligible for the free tier too in case you’re new to AWS.

The costs and lack of infrastructure makes this a very attractive option for building this kind of service.