Category Archives: System Administration

Command-line cookbook dependency solving with knife exec

Note: This article was originally published in 2011. In response to demand, I've updated it for 2014! Enjoy! SNS

Imagine you have a fairly complicated infrastructure with a large number of nodes and roles. Suppose you have a requirement to take one of the nodes and rebuild it in an entirely new network, perhaps even for a completely different organization. This should be easy, right? We have our infrastructure in the form of code. However, our current infrastructure has hundreds of uploaded cookbooks - how do we know the minimum ones to download and move over? We need to find out from a node exactly what cookbooks are needed for that node to be built.

The obvious place to start is with the node itself:

$ knife node show controller
Node Name:   controller
Environment: _default
FQDN:        controller
IP:          182.13.194.41
Run List:    role[base], recipe[apt::cacher], role[pxe_server]
Roles:       pxe_server, base
Recipes      apt::cacher, pxe_dust::server, dhcp, dhcp::config
Platform:    ubuntu 10.04

OK, this tells us we need the apt, pxe_dust and dhcp cookbooks. But what about them - do they have any dependencies? How could we find out? Well, dependencies are specified in two places - in the cookbook metadata, and in the individual recipes. Here's a primitive way to illustrate this:

bash-3.2$ for c in apt pxe_dust dhcp
> do
> grep -iER 'include_recipe|^depends' $c/* | cut -d '"' -f 2 | sort | uniq
> done
apt::cacher-client
apache2
pxe_dust::server
tftp
tftp::server
utils

As I said - primitive. However the problem doesn't end here. In order to be sure, we now need to repeat this for each dependency, recursively. And of course it would be nice to present them more attractively. Thinking about it, it would be rather useful to know what cookbook versions are in use too. This is definitely not a job for a shell one liner - is there a better way?

As it happens, there is. Think about it - the Chef server already needs to solve these dependencies to know what cookbooks to push to API clients. Can we access this logic? Of course we can - clients carry out all their interactions with the Chef server via the API. This means we can let the server solve the dependencies and query it via the API ourselves.

Chef provides two powerful ways to access the API without having to write a RESTful client. The first, Shef, is an interactive REPL based on IRB, which when launched gives access to the Chef server. This isn't trivial to use. The second, much simpler way is the knife exec subcommand. This allows you to write Ruby scripts or simple one-liners that are executed in the context of a fully configured Chef API Client using the knife configuration file.

Now, since I wrote this article, back in summer 2011, the API has changed, which means that my original method no longer works. Additionally, we are now served by at least two local dependency solvers, in the form of Berkshelf (whose dependency solver, 'solve' is now available as an individual Gem), and Librarian-chef. In this updated version, I'll show how to use the new Chef server API to perform the same function. Berkshelf and Librarian solve a slightly different problem, in that in this instance we're trying to solve dependencies for a node, so for the purposes of this article I'll consider them out of scope.

For historical purposes, here's the original solution:

knife exec -E '(api.get "nodes/controller/cookbooks").each { |cb| pp cb[0] => cb[1].version }'

The /nodes/NODE_NAME/cookbooks endpoint returns the cookbook attributes, definitions, libraries and recipes that are required for this node. The response is a hash of cookbook name and Chef::CookbookVersion object. We simply iterate over each one, and pretty print the cookbook name and the version.

Let's give it a try:

$ knife exec -E '(api.get "nodes/controller/cookbooks").each { |cb| pp cb[0] => cb[1].version }'
{"apt"=>"1.1.1"}
{"tftp"=>"0.1.0"}
{"apache2"=>"0.99.3"}
{"dhcp"=>"0.1.0"}
{"utils"=>"0.9.5"}
{"pxe_dust"=>"1.1.0"}

The current way to solve dependencies using the Chef server API resides under the environments end point. This makes sense, if you think of environments as a way to define and constrain version numbers for a given set of nodes. This means that constructing the API call, and handling the results is slightly more than can easily be comprehended in a one-liner, which gives us the opportunity to demonstrate the use of knife exec with a script on the filesystem.

First let's create the script:

USAGE = "knife exec script.rb NODE_NAME"

def usage_and_exit
  STDERR.puts USAGE
  exit 1
end

node_name = ARGV[2]

usage_and_exit unless node_name

node = api.get("nodes/#{node_name}")
run_list_expansion = node.expand!("server")

cookbook_solution = api.post("environments/#{node.chef_environment}/cookbook_versions",
                            :run_list => run_list_expansion.recipes)

cookbook_solution.each do |name, cb|
  puts name + " => " + cb.version
end

exit

The way knife exec scripts work is to pass the arguments following knife to Ruby as the ARGV special variable, which is an array of each space-separated argument. This allows us to produce a slightly more general solution, to which we can pass the name of the node for which we want to solve. The usage handling is obvious - we print the usage to stderr if the command is called without a node name. The meat of the script is the API call. First we get the node object (from ARGV[2], i.e. the node we passed to the script) from the Chef server. Next we expand the run list - this means check for and expand any run lists in roles. Finally we call the API to provide us with cookbook versions for the specified node in the environment in which the node currently resides, passing in the recipes from the expanded run list. Finally we iterate over the cookbooks we get back, and print the name and version. Note that this script could easily be modified to solve for a different environment, which would be handy if we wanted to confirm what versions we'd get were we to move the node to a different environment. Let's give in a whirl:

$ knife exec src/knife-cookbook-solve/solve.rb asl-dev-1
chef_handler => 1.1.4
minitest-handler => 0.1.3
base => 0.0.2
hosts => 0.0.1
yum => 2.3.0
tmux => 1.1.1
ssh => 0.0.6
fail2ban => 1.2.2
users => 2.0.6
security => 0.1.0
sudo => 2.0.4
atalanta-users => 0.0.2
community_users => 1.5.1
sudoersd => 0.0.2
build-essential => 1.4.2

To conclude as did the original article....Nifty! :)

Masterzen’s Blog 2011-11-02 20:00:56

After the first part in this series of article on Puppet extensions points, I’m proud to deliver a new episode focusing on Types and Providers.

Note that there’s a really good chapter on the same topic in James Turnbull and Jeff McCune Pro Puppet (which I highly recommend if you’re a serious puppeteer). Also note that you can attend Puppetlabs Developper Training, which covers this topic.

Of Types and Providers

One of the great force of Puppet is how various heterogenous aspects of a given POSIX system (or not, like the Network Device system I contributed) are abstracted into simple elements: types.

Types are the foundation bricks of Puppet, you use them everyday to model how your systems are formed. Among the core types, you’ll find user, group, file, …

In Puppet, manifests define resources which are instances of their type. There can be only one resource of a given name (what we call the namevar, name or title) for a given catalog (which usually maps to a given host).

A type models what facets of a physical entity (like a host user) are managed by Puppet. These model facets are called “properties” in Puppet lingo.

Essentially a type is a name, some properties to be managed and some parameters. Paramaters are values that will help or direct Puppet to manage the resource (for instance the managehome parameter of the user type is not part of a given user on the host, but explains to Puppet that this user’s home directory is to be managed).

Let’s follow the life of a resource during a puppet run.

  1. During compilation, the puppet parser will instantiate Puppet::Parser::Resource instances which are Puppet::Resource objects. Those contains the various properties and parameters values defined in the manifest.

  2. Those resources are then inserted into the catalog (an instance of Puppet::Resource::Catalog)

  3. The catalog is then sent to the agent (usually in json format)

  4. The agent converts the catalog individual resources into RAL resources by virtue of Puppet::Resource#to_ral. We’re now dealing with instances of the real puppet type class. RAL means Resource Abstraction Layer.

    1. The agent then applies the catalog. This process creates the relationships graph so that we can manage resources in an order obeying require/before metaparameters. During catalog application, every RAL resource is evaluated. This process tells a given type to do what is necessary so that every managed property of the real underlying resource match what was specified in the manifest. The software system that does this is the provider.

So to summarize, a type defines to Puppet what properties it can manage and an accompanying provider is the process to manage them. Those two elements forms the Puppet RAL.

There can be more than one provider per type, depending on the host or platform. For instance every users have a login name on all kind of systems, but the way to create a new user can be completely different on Windows or Unix. In this case we can have a provider for Windows, one for OSX, one for Linux… Puppet knows how to select the best provider based on the facts (the same way you can confine facts to some operating systems, you can confine providers to some operating systems).

Looking Types into the eyes

I’ve written a combination of types/providers for this article. It allows to manage DNS zones and DNS Resource Records for DNS hosting providers (like AWS Route 53 or Zerigo). To simplify development I based the system on Fog DNS providers (you need to have the Fog gem installed to use those types on the agent). The full code of this system is available in my puppet-dns github repository.

This work defines two new Puppet types:

  • dnszone: manage a given DNS zone (ie a domain)
  • dnsrr: manage an individual DNS RR (like an A, AAAA, … record). It takes a name, a value and a type.

Here is how to use it in a manifest:





Let’s focus on the dnszone type, which is the simpler one of this module:





Note, that the dnszone type assumes there is a /etc/puppet/fog.yaml file that contains Fog DNS options and credentials as a hash encoded in yaml. Refer to the aforementioned github repository for more information and use case.

Exactly like parser functions, types are defined in ruby, and Puppet can autoload them. Thus types should obey to the Puppet type ruby namespace. That’s the reason we have to put types in puppet/type/. Once again this is ruby metaprogramming (in its all glory), to create a specific internal DSL that helps describe types to Puppet with simple directives (the alternative would have been to define a datastructure which would have been much less practical).

Let’s dive into the dnszone type.

  • Line 1, we’re calling the Puppet::Type#newtype method passing, first the type name as a ruby symbol (which should be unique among types), second a block (from line 1 to the end). The newtype method is imported in Puppet::Type but is in fact defined in Puppet::Metatype::ManagerNewtype job is to create a new singleton class whose parent is Puppet::Type (or a descendant if needed). Then the given block will be evaluated in class context (this means that the block is executed with self being the just created class). This singleton class is called Puppet::TypeDnszone in our case (but you see the pattern).

  • Line 2: we’re assigning a string to the Puppet::Type class variable @doc. This will be used to to extract type documentation.

  • Line 4: This straight word ensurable, is a class method in Puppet::Type. So when our type block is evaluated, this method will be called. This methods installs a new special property Ensure. This is a shortcut to automatically manage creation/deletion/existence of the managed resource. This automatically adds support for ensure => (present|absent) to your type. The provider still has to manage ensurability, though.

  • Line 6: Here we’re calling Puppet::Type#newparam. This tells our type that we’re going to have a parameter called “name”. Every resource in Puppet must have a unique key, this key is usually called the name or the title. We’re giving a block to this newparam method. The job of newparam is to create a new class descending of Puppet::Parameter, and to evaluate the given block in the context of this class (which means in this block self is a singleton class of Puppet::Parameter). Puppet::Parameter defines a bunch of utility class methods (that becomes apparent directives of our parameter DSL), among those we can find isnamevar which we’ve used for the name parameter. This tells Puppet type system that the name parameter is what will be the holder of the unique key of this type. The desc method allows to give some documentation about the parameter.

  • Line 12: we’re defining now the email parameter. And we’re using the newvalues class method of Puppet::Parameter. This method defines what possible values can be set to this parameter. We’re passing a regex that allows any string containing an ‘@’, which is certainly the worst regex to validate an e-mail address :) Puppet will raise an error if we don’t give a valid value to this parameter.

  • Line 17: and again a new parameter. This parameter is used to control Fog behavior (ie give to it your credential and fog provider used). Here we’re using defaultto, which means if we don’t pass a value then the defaultto value will be used.

  • Line 22: there is a possibility for a given resource to auto-require another resource. The same way a file resource can automatically add ‘require’ to its path ancestor. In our case, we’re autorequiring the yaml_fog_file, so that if it is managed by puppet, it will be evaluated before our dnszone resource (otherwise our fog provider might not have its credentials available).

Let’s now see another type which uses some other type DSL directives:





We’ll pass over the bits we already covered with the first type, and concentrate on new things:

  • Line 12: our dnszone type contained only parameters. Now it’s the first time we define a property. A property is exactly like a parameter but is fully managed by Puppet (see the chapter below). A property is an instance of a Puppet::Property class, which itself inherits from Puppet::Parameter, which means all the methods we’ve covered in our first example for parameters are available for properties. This type property is interesting because it defines discrete values. If you try to set something outside of this list of possible values, Puppet will raise an error. Values can be either ruby symbols or strings.

  • Line 17: a new property is defined here. With the isrequired method we tell Puppet that is is indeed necessary to have a value. And the validate methods will store the given validate block so that when Puppet will set the desired value to this property it will execute it. In our case we’ll report an error if the given value is empty.

  • Line 24: here we defined a global validation system. This will be called when all properties will have been assigned a value. This block executes in the instance context of the type, which means that we can access all instance variables and methods of Puppet::Type (in particualy the [] method that allows to access parameters/properties values). This allows to perform validation across the boundaries of a given parameter/property.

  • Line 25: finally, we declare a new parameter that references a dnszone. Note that we use a dynamic defaultto (with a block), so that we can look up the given resource name and derive our zone from the FQDN. This raises an important feature of the type system: the order of the declarations of the various blocks is important. Puppet will always respect the declaration order of the various properties when evaluating their values. That means a given property can access a value of another properties defined earlier.

I left managing RR TTL as an exercise to the astute reader :) Also note we didn’t cover all the directives the type DSL offers us. Notably, we didn’t see value munging (which allows to transform a string representation coming from the manifest to an internal (to the type) format). For instance that can be used to transform string IP address to the ruby IPAddr type for later use. I highly recommend you to browse the default types in the Puppet source distribution and check the various directives used there. You can also read Puppet::Parameter, Puppet::Property and Puppet::Type source code to see the ones we didn’t cover.

Life and death of Properties

So, we saw that a Puppet::Parameter is just a holder for the value coming from the manifest. A Puppet::Property is a parameter that along with the desired value (the one coming from the manifest) contains the current value (the one coming from the managed resource on the host). The first one is called the “should”, and the later one is called the “value”. Those innocently are methods of the Puppet::Property object and returns respectively those values. A property implements the following aspects:

  • it can retrieve a value from the managed resource. This is the operation of asking the real host resource to fetch its value. This is usually performed by delegation to the provider.

  • it can report its should which is the value given in the manifest

  • it can be insync?. This returns true if the retrieved value is equal to the “should” value.

  • and finally it might sync. Which means to the necessary so that “insync?” becomes true. If there is a provider for the given type, this one will be called to take care of the change.

When Puppet manages a resource, it does it with the help of a Puppet::Transaction. The given transaction orders the various properties that are not insync? to sync. Of course this is a bit more complex than that, because this is done while respecting resource ordering (the one given by the require/before metaparameter), but also propagating change events (so that service can be restarted and so on), and allowing resources to spawn child resources, etc… It’s perfectly possible to write a type without a provider, as long as all properties used implement their respective retrieve and sync methods. Some of the core types are doing this.

Providers

We’ve seen that properties usually delegate to the providers for managing the underlying real resource. In our example, we’ll have two providers, one for each defined type. There are two types of providers:

  • prefetch/flush
  • per properties

The per properties providers needs to implement a getter and a setter for every property of the accompanying type. When the transaction manipulates a given property its provider getter is called, and later on the setter will be called if the property is not insync?. It is the responsibility of those setters to flush those values to the physical managed resource. For some providers it is highly impractical or inefficient to flush on every property value change. To solve this issue, a given provider can be a prefetch/flush one. A prefetch/flush provider implements only two methods:

  • prefetch, which given a list of resources will in one call return a set of provider instances filled with the value fetched from the real resource.
  • flush will be called after all values will have been set, and that they can be persisted to the real resource.

The two providers I’ve written for this article are prefetch/flush ones, because it was impractical to call Fog for every property.

Anatomy of the dnszone provider

We’ll focus only on this provider, and I’ll leave as an exercise to the reader the analysis of the second one. Providers, being also ruby extensions, must live in the correct path respecting their ruby namespaces. For our dnszone fog provider, it should be in the puppet/provider/dnszone/fog.rb file. Unlike what I did for the types, I’ll split the provider code in parts so that I can explain them with the context. You can still browse the whole code.





This is how we tell Puppet that we have a new provider for a given type. If we decipher this, we’re fetching the dnszone type (which returns the singleton class of our dnszone type), and call the class method “provide”, passing it a name, some options and a big block. In our case, the provider is called “fog”, and our parent should be Puppet::Provider::Fog (which defines common methods for both of our fog providers, and is also a descendant of Puppet::Provider). Like for types, we have a desc class method in Puppet::Provider to store some documentation strings. We also have the confine method. This method will help Puppet choose the correct provider for a given type, ie its suitability. The confining system is managed by Puppet::Provider::Confiner. You can use:

  • a fact or puppet settings value, as in: confine :operatingsystem => :windows
  • a file existence: confine :exists => "/etc/passwd"
  • a Puppet “feature”, like we did for testing the fog library presence
  • an arbitrary boolean expression confine :true => 2 == 2

A provider can also be the default for a given fact value. This allows to make sure the correct provider is used for a given type, for instance the apt provider on debian/ubuntu platforms.

And to finish, a provider might need to call executables on the platform (and in fact most of them do). The Puppet::Provider class defines a shortcut to declare and use those executables easily:





Let’s continue our exploration of our dnszone provider





mk_resource_methods is an handy system that creates a bunch of setters/getters for every parameter/properties for us. Those fills values in the @property_hash hash.





The prefetch methods calls fog to fetch all the DNS zones, and then we match those with the ones managed by Puppet (from the resources hash).

For each match we instantiate a provider filled with the values coming from the underlying physical resource (in our case fog). For those that don’t match, we create a provider whose only existing properties is that ensure is absent.





Flush does the reverse of prefetch. Its role is to make sure the real underlying resource conforms to what Puppet wants it to be.

There are 3 possibilities:

  • the desired state is absent. We thus tell fog to destroy the given zone.
  • the desired state is present, but during prefetch we didn’t find the zone, we’re going to tell fog to create it.
  • the desired state is present, and we could find it during prefetch, in which case we’re just refreshing the fog zone.

To my knowledge this is used only for ralsh (puppet resource). The problem is that our provider can’t know how to access fog until it has a dnszone (which creates a chicken and egg problem :)

And finally we need to manage the Ensure property which requires our provider to implement: create, destroy and exists?.

In a prefetch/flush provider there’s no need to do more than controlling the ensure value.

Things to note:

  • a provider instance can access its resource with the resource accessor
  • a provider can access the current catalog through its resource.catalog accessor. This allows as I did in the dnsrr/fog.rb provider to retrieve a given resource (in this case the dnszone a given dnsrr depends to find how to access a given zone through fog).

Conclusion

We just surfaced the provider/type system (if you read everything you might disagree, though).

For instance we didn’t review the parsed file provider which is a beast in itself (the Pro Puppet book has a section about it if you want to learn how it works, the Puppet core host type is also a parsed file provider if you need a reference).

Anyway make sure to read the Puppet core code if you want to know more :) feel free to ask questions about Puppet on the puppet-dev mailing list or on the #puppet-dev irc channel on freenode, where you’ll find me under the masterzen nick.

And finally expect a little bit of time before the next episode, which will certainly cover the Indirector and how to add new terminus (but I first need to find an example, so suggestions are welcome).

Puppet Extension Points – part 2

After the first part in this series of article on Puppet extensions points, I’m proud to deliver a new episode focusing on Types and Providers.
Note that there’s a really good chapter on the same topic in James Turnbull and Jeff McCune Pro Puppet (which I highly recommend if you’re a serious puppeteer). Also note that you can attend Puppetlabs Developper Training, which covers this topic.

Of Types and Providers

One of the great force of Puppet is how various heterogenous aspects of a given POSIX system (or not, like the Network Device system I contributed) are abstracted into simple elements: types.

Types are the foundation bricks of Puppet, you use them everyday to model how your systems are formed. Among the core types, you’ll find user, group, file, …

In Puppet, manifests define resources which are instances of their type. There can be only one resource of a given name (what we call the namevar, name or title) for a given catalog (which usually maps to a given host).

A type models what facets of a physical entity (like a host user) are managed by Puppet. These model facets are called “properties” in Puppet lingo.

Essentially a type is a name, some properties to be managed and some parameters. Paramaters are values that will help or direct Puppet to manage the resource (for instance the managehome parameter of the user type is not part of a given user on the host, but explains to Puppet that this user’s home directory is to be managed).

Let’s follow the life of a resource during a puppet run.
  1. During compilation, the puppet parser will instantiate Puppet::Parser::Resource instances which are Puppet::Resource objects. Those contains the various properties and parameters values defined in the manifest.
  2. Those resources are then inserted into the catalog (an instance of Puppet::Resource::Catalog)
  3. The catalog is then sent to the agent (usually in json format)
  4. The agent converts the catalog individual resources into RAL resources by virtue of Puppet::Resource#to_ral. We’re now dealing with instances of the real puppet type class. RAL means Resource Abstraction Layer.
  5. The agent then applies the catalog. This process creates the relationships graph so that we can manage resources in an order obeying require/before metaparameters. During catalog application, every RAL resource is evaluated. This process tells a given type to do what is necessary so that every managed property of the real underlying resource match what was specified in the manifest. The software system that does this is the provider.
So to summarize, a type defines to Puppet what properties it can manage and an accompanying provider is the process to manage them. Those two elements forms the Puppet RAL.
There can be more than one provider per type, depending on the host or platform. For instance every users have a login name on all kind of systems, but the way to create a new user can be completely different on Windows or Unix. In this case we can have a provider for Windows, one for OSX, one for Linux… Puppet knows how to select the best provider based on the facts (the same way you can confine facts to some operating systems, you can confine providers to some operating systems).

Looking Types into the eyes

I’ve written a combination of types/providers for this article. It allows to manage DNS zones and DNS Resource Records for DNS hosting providers (like AWS Route 53 or Zerigo). To simplify development I based the system on Fog DNS providers (you need to have the Fog gem installed to use those types on the agent). The full code of this system is available in my puppet-dns github repository.

This work defines two new Puppet types:

  • dnszone: manage a given DNS zone (ie a domain)
  • dnsrr: manage an individual DNS RR (like an A, AAAA, … record). It takes a name, a value and a type.

Here is how to use it in a manifest:
Let’s focus on the dnszone type, which is the simpler one of this module:

Note, that the dnszone type assumes there is a /etc/puppet/fog.yaml file that contains Fog DNS options and credentials as a hash encoded in yaml. Refer to the aforementioned github repository for more information and use case.

Exactly like parser functions, types are defined in ruby, and Puppet can autoload them. Thus types should obey to the Puppet type ruby namespace. That’s the reason we have to put types in puppet/type/. Once again this is ruby metaprogramming (in its all glory), to create a specific internal DSL that helps describe types to Puppet with simple directives (the alternative would have been to define a datastructure which would have been much less practical).

Let’s dive into the dnszone type.

  • Line 1, we’re calling the Puppet::Type#newtype method passing, first the type name as a ruby symbol (which should be unique among types), second a block (from line 1 to the end). The newtype method is imported in Puppet::Type but is in fact defined in Puppet::Metatype::ManagerNewtype job is to create a new singleton class whose parent is Puppet::Type (or a descendant if needed). Then the given block will be evaluated in class context (this means that the block is executed with self being the just created class). This singleton class is called Puppet::TypeDnszone in our case (but you see the pattern).
  • Line 2: we’re assigning a string to the Puppet::Type class variable @doc. This will be used to to extract type documentation.
  • Line 4: This straight word ensurable, is a class method in Puppet::Type. So when our type block is evaluated, this method will be called. This methods installs a new special property Ensure. This is a shortcut to automatically manage creation/deletion/existence of the managed resource. This automatically adds support for ensure => (present|absent) to your type. The provider still has to manage ensurability, though.
  • Line 6: Here we’re calling Puppet::Type#newparam. This tells our type that we’re going to have a parameter called “name”. Every resource in Puppet must have a unique key, this key is usually called the name or the title. We’re giving a block to this newparam method. The job of newparam is to create a new class descending of Puppet::Parameter, and to evaluate the given block in the context of this class (which means in this block self is a singleton class of Puppet::Parameter). Puppet::Parameter defines a bunch of utility class methods (that becomes apparent directives of our parameter DSL), among those we can find isnamevar which we’ve used for the name parameter. This tells Puppet type system that the name parameter is what will be the holder of the unique key of this type. The desc method allows to give some documentation about the parameter.
  • Line 12: we’re defining now the email parameter. And we’re using the newvalues class method of Puppet::Parameter. This method defines what possible values can be set to this parameter. We’re passing a regex that allows any string containing an ‘@’, which is certainly the worst regex to validate an e-mail address :) Puppet will raise an error if we don’t give a valid value to this parameter.
  • Line 17: and again a new parameter. This parameter is used to control Fog behavior (ie give to it your credential and fog provider used). Here we’re using defaultto, which means if we don’t pass a value then the defaultto value will be used.
  • Line 22: there is a possibility for a given resource to auto-require another resource. The same way a file resource can automatically add ‘require’ to its path ancestor. In our case, we’re autorequiring the yaml_fog_file, so that if it is managed by puppet, it will be evaluated before our dnszone resource (otherwise our fog provider might not have its credentials available).

Let’s now see another type which uses some other type DSL directives:
We’ll pass over the bits we already covered with the first type, and concentrate on new things:

  • Line 12: our dnszone type contained only parameters. Now it’s the first time we define a property. A property is exactly like a parameter but is fully managed by Puppet (see the chapter below). A property is an instance of a Puppet::Property class, which itself inherits from Puppet::Parameter, which means all the methods we’ve covered in our first example for parameters are available for properties. This type property is interesting because it defines discrete values. If you try to set something outside of this list of possible values, Puppet will raise an error. Values can be either ruby symbols or strings.
  • Line 17: a new property is defined here. With the isrequired method we tell Puppet that is is indeed necessary to have a value. And the validate methods will store the given validate block so that when Puppet will set the desired value to this property it will execute it. In our case we’ll report an error if the given value is empty.
  • Line 24: here we defined a global validation system. This will be called when all properties will have been assigned a value. This block executes in the instance context of the type, which means that we can access all instance variables and methods of Puppet::Type (in particualy the [] method that allows to access parameters/properties values). This allows to perform validation across the boundaries of a given parameter/property.
  • Line 25: finally, we declare a new parameter that references a dnszone. Note that we use a dynamic defaultto (with a block), so that we can look up the given resource name and derive our zone from the FQDN. This raises an important feature of the type system: the order of the declarations of the various blocks is important. Puppet will always respect the declaration order of the various properties when evaluating their values. That means a given property can access a value of another properties defined earlier.

I left manging RR TTL as an exercise to the astute reader :) Also note we didn’t cover all the directives the type DSL offers us. Notably, we didn’t see value munging (which allows to transform a string representation coming from the manifest to an internal (to the type) format). For instance that can be used to transform string IP address to the ruby IPAddr type for later use. I highly recommend you to browse the default types in the Puppet source distribution and check the various directives used there. You can also read Puppet::Parameter, Puppet::Property and Puppet::Type source code to see the ones we didn’t cover.

Life and death of Properties

So, we saw that a Puppet::Parameter is just a holder for the value coming from the manifest. A Puppet::Property is a parameter that along with the desired value (the one coming from the manifest) contains the current value (the one coming from the managed resource on the host). The first one is called the “should”, and the later one is called the “value”. Those innocently are methods of the Puppet::Property object and returns respectively those values. A property implements the following aspects:

  • it can retrieve a value from the managed resource. This is the operation of asking the real host resource to fetch its value. This is usually performed by delegation to the provider.
  • it can report its should which is the value given in the manifest
  • it can be insync?. This returns true if the retrieved value is equal to the “should” value.
  • and finally it might sync. Which means to the necessary so that “insync?” becomes true. If there is a provider for the given type, this one will be called to take care of the change.

When Puppet manages a resource, it does it with the help of a Puppet::Transaction. The given transaction orders the various properties that are not insync? to sync. Of course this is a bit more complex than that, because this is done while respecting resource ordering (the one given by the require/before metaparameter), but also propagating change events (so that service can be restarted and so on), and allowing resources to spawn child resources, etc… It’s perfectly possible to write a type without a provider, as long as all properties used implement their respective retrieve and sync methods. Some of the core types are doing this.

Providers

We’ve seen that properties usually delegate to the providers for managing the underlying real resource. In our example, we’ll have two providers, one for each defined type. There are two types of providers:

  • prefetch/flush
  • per properties

The per properties providers needs to implement a getter and a setter for every property of the accompanying type. When the transaction manipulates a given property its provider getter is called, and later on the setter will be called if the property is not insync?. It is the responsibility of those setters to flush those values to the physical managed resource. For some providers it is highly impractical or inefficient to flush on every property value change. To solve this issue, a given provider can be a prefetch/flush one. A prefetch/flush provider implements only two methods:

  • prefetch, which given a list of resources will in one call return a set of provider instances filled with the value fetched from the real resource.
  • flush will be called after all values will have been set, and that they can be persisted to the real resource.

The two providers I’ve written for this article are prefetch/flush ones, because it was impractical to call Fog for every property.

Anatomy of the dnszone provider

We’ll focus only on this provider, and I’ll leave as an exercise to the reader the analysis of the second one. Providers, being also ruby extensions, must live in the correct path respecting their ruby namespaces. For our dnszone fog provider, it should be in the puppet/provider/dnszone/fog.rb file. Unlike what I did for the types, I’ll split the provider code in parts so that I can explain them with the context. You can still browse the whole code.
This is how we tell Puppet that we have a new provider for a given type. If we decipher this, we’re fetching the dnszone type (which returns the singleton class of our dnszone type), and call the class method “provide”, passing it a name, some options and a big block. In our case, the provider is called “fog”, and our parent should be Puppet::Provider::Fog (which defines common methods for both of our fog providers, and is also a descendant of Puppet::Provider). Like for types, we have a desc class method in Puppet::Provider to store some documentation strings. We also have the confine method. This method will help Puppet choose the correct provider for a given type, ie its suitability. The confining system is managed by Puppet::Provider::Confiner. You can use:

  • a fact or puppet settings value, as in: confine :o peratingsystem => :windows
  • a file existence: confine :exists => "/etc/passwd"
  • a Puppet “feature”, like we did for testing the fog library presence
  • an arbitrary boolean expression confine :true => 2 == 2

A provider can also be the default for a given fact value. This allows to make sure the correct provider is used for a given type, for instance the apt provider on debian/ubuntu platforms.

And to finish, a provider might need to call executables on the platform (and in fact most of them do). The Puppet::Provider class defines a shortcut to declare and use those executables easily:
Let’s continue our exploration of our dnszone provider mk_resource_methods is an handy system that creates a bunch of setters/getters for every parameter/properties for us. Those fills values in the @property_hash hash. The prefetch methods calls fog to fetch all the DNS zones, and then we match those with the ones managed by Puppet (from the resources hash).

For each match we instantiate a provider filled with the values coming from the underlying physical resource (in our case fog). For those that don’t match, we create a provider whose only existing properties is that ensure is absent. Flush does the reverse of prefetch. Its role is to make sure the real underlying resource conforms to what Puppet wants it to be.

There are 3 possibilities:

  • the desired state is absent. We thus tell fog to destroy the given zone.
  • the desired state is present, but during prefetch we didn’t find the zone, we’re going to tell fog to create it.
  • the desired state is present, and we could find it during prefetch, in which case we’re just refreshing the fog zone.

To my knowledge this is used only for ralsh (puppet resource). The problem is that our provider can’t know how to access fog until it has a dnszone (which creates a chicken and egg problem :)

And finally we need to manage the Ensure property which requires our provider to implement: create, destroy and exists?.

In a prefetch/flush provider there’s no need to do more than controlling the ensure value.

Things to note:

  • a provider instance can access its resource with the resource accessor
  • a provider can access the current catalog through its resource.catalog accessor. This allows as I did in the dnsrr/fog.rb provider to retrieve a given resource (in this case the dnszone a given dnsrr depends to find how to access a given zone through fog).

Conclusion

We just surfaced the provider/type system (if you read everything you might disagree, though).

For instance we didn’t review the parsed file provider which is a beast in itself (the Pro Puppet book has a section about it if you want to learn how it works, the Puppet core host type is also a parsed file provider if you need a reference).

Anyway make sure to read the Puppet core code if you want to know more :) feel free to ask questions about Puppet on the puppet-dev mailing list or on the #puppet-dev irc channel on freenode, where you’ll find me under the masterzen nick.

And finally expect a little bit of time before the next episode, which will certainly cover the Indirector and how to add new terminus (but I first need to find an example, so suggestions are welcome).

Puppet Extension Points – part 1

It’s been a long time since my last blog post, almost a year. Not that I stopped hacking on Puppet or other things (even though I’m not as productive as I had been in the past), it’s just that so many things happened last year (Memoir’44 release, architecture work at Days of Wonder) that I lost the motivation of maintaining this blog.
But that’s over, I plan to start a series of Puppet internals articles. The first one (yes this one) is devoted to Puppet Extension Points.
Since a long time, Puppet contains a system to dynamically load ruby fragments to provide new functionalities both for the client and the master. Among the available extension points you’ll find:
  • manifests functions
  • custom facts
  • types and providers
  • faces
Moreover, Puppet contains a synchronization mechanism that allows you to ship your extensions into your manifests modules and those will be replicated automatically to the clients. This system is called pluginsync.
This first article will first dive into the ruby meta-programming used to create (some of) the extension DSL (not to be confused with the Puppet DSL which is the language used in the manifests). We’ll talk a lot about DSL and ruby meta programming. If you want to know more on those two topics, I’ll urge you to read those books:

Anatomy of a simple extension

Let’s start with the simplest form of extension: Parser Functions.

Functions are extensions of the Puppet Parser, the entity that reads and analyzes the puppet DSL (ie the manifests). This language contains a structure which is called “function”. You already use them a lot, for instance “include” or “template” are functions.

When the parser analyzes a given manifest, it detects the use of functions, and later on during the compilation phase the function code is executed and the result may be injected back into the compilation.

Here is a simple function:
The given function uses the puppet functions DSL to load the extension code into Puppet core code. This function is simple and does what its basename shell equivalent does: stripping leading paths in a given filename. For this function to work you need to drop it in the lib/puppet/parser/functions directory of your module. Why is that? It’s because after all, extensions are written in ruby and integrate into the Puppet ruby namespace. Functions in puppet live in the Puppet::Parser::Functions class, which itself belongs to the Puppet scope. The Puppet::Parser::Functions class in Puppet core has the task of loading all functions defined in any puppet/parser/functions directories it will be able to find in the whole ruby load path. When Puppet uses a module, the modules’ lib directory is automatically added to the ruby load path. Later on, when parsing manifests and a function call is detected, the Puppet::Parser::Functions will try to load all the ruby files in all the puppet/parser/functions directory available in the ruby load path. This last task is done by the Puppet autoloader (available into Puppet::Util::Autoload). Let’s see how the above code is formed:

  • Line 1: this is ruby way to say that this file belongs to the puppet function namespace, so that Puppet::Parser::Functions will be able to load it. In real, we’re opening the ruby class Puppet::Parser::Functions, and all that will follow will apply to this specific puppet class.
  • Line 2: this is where ruby meta-programming is used. Translated to standard ruby, we’re just calling the “newfunction” method. Since we’re in the Puppet::Parser::Functions class, we in fact are just calling the class method Puppet::Parser::Functions#newfunction.
  • We pass to it 4 arguments:
    • the function name, encoded as a symbol. Functions name should be unique in a given environment
    • the function type: either your function is a rvalue (meaning a right-value, an entity that lies on the right side of an assignment operation, so in real English: a function that returns a value), or is not (in which case the function is just a side-effect function not returning any values).
    • a documentation string (here we used a ruby heredoc) which might be extracted later.
    • and finally we’re passing a ruby code block (from the do on line 5, to the inner end on line 10). This code block won’t be executed when puppet loads the functions.
  • Line 5 to 10. The body of the methods. When ruby loads the function file on behalf of Puppet, it will happily pass the code block to newfunction. This last one will store the code block for later use, and make it available in the Puppet scope class under the name function_basename (that’s one of the cool thing about ruby, you can arbitrarily create new methods on classes, objects or even instances).

So let’s see what happens when puppet parses and executes the following manifest:

The first thing that happens when compiling manifests is that the Puppet lexer triggers. It will read the manifest content and split it in tokens that the parser knows. So essentially the above content will be transformed in the following stream of tokens:
The parser, given this input, will reduce this to what we call an Abstract Syntax Tree. That’s a memory data structure (usually a tree) that represents the orders to be executed that was derived from the language grammar and the stream of tokens. In our case this will schematically be parsed as:

In turns, when puppet will compile the manifest (ie execute the above AST), this will be equivalent to this ruby operation:
Remember how Puppet::Parser::Functions#newfunction created the function_basename. At that time I didn’t really told you the exact truth. In fact newfunction creates a function in an environment specific object instance (so that functions can’t leak from one Puppet environment to another, which was one of the problem of 0.25.x). And any given Puppet scope which are instances of Puppet::Parser::Scope when constructed will mix in this environment object, and thus bring to life our shiny function as if it was defined in the scope ruby code itself.

Pluginsync

Let’s talk briefly about the way your modules extensions are propagated to the clients. So far we’ve seen that functions live in the master, but some other extensions types (like facts or types) essentially live in the client. Since it would be cumbersome for an admin to replicate all the given extensions to all the clients manually, Puppet offers pluginsync, a way to distribute this ruby code to the clients. It’s part of every puppet agent run, before asking for a catalog to the master. The interesting thing (and that happens in a lot of place into Puppet, which always amazes me), is that this pluginsync process is using Puppet itself to perform this synchronization. Puppet is good at synchronizing remotely and recursively a set of files living on the master. So pluginsync just create a small catalog containing a recursive File resource whose source is the plugins fileserver mount on the master, and the destination the current agent puppet lib directory (which is part of the ruby load path). Then this catalog is evaluated and the Puppet File resource mechanism does its magic and creates all the files locally, or synchronizes them if they differ. Finally, the agent loads all the ruby files it synchronized, registering the various extensions it contains, before asking for its host catalog.

Wants some facts?

The other extension point that you certainly already encountered is adding custom facts. A fact is simply a key, value tuple (both are strings). But we also usually call a fact the method that dynamically produces this tuple. Let’s see what it does internally. We’ll use the following example custom fact:

It’s no secret that Puppet uses Facter a lot. When a puppet agent wants a catalog, the first thing it does is asking Facter for a set of facts pertaining to the current machine. Then those facts are sent to the master when the agent asks for a catalog. The master injects those facts as variables in the root scope when compiling the manifests.

So, facts are executed in the agent. Those are pluginsync’ed as explained above, then loaded into the running process.

When that happens the add method of the Facter class is called. The block defined between line 2 and 6 is then executed in the Facter::Util::Resolution context. So the Facter::Util::Resolution#setcode method will be called and the block between line 3 and 5 will be stored for later use.

This Facter::Util::Resolution instance holding our fact code will be in turn stored in the facts collection under the name of the fact (see line 2).
Why is it done in this way? Because not all facts can run on every hosts. For instance our above facts does not work on Windows platform. So we should use facter way of confining our facts to architectures on which we know they’ll work.
Thus Facter defines a set of methods like “confine” that can be called during the call of Facter#add (just add those outside of the setcode block).  Those methods will modify how the facts collection will be executed later on. It wouldn’t have been possible to confine our facts if we stored the whole Facter#add block and called it directly at fact resolution, hence the use of this two-steps system.

Conclusion

And, that’s all folks for the moment. Next episode will explain types and providers inner workings. I also plan an episode about other Puppet internals, like the parser, catalog evaluation, and/or the indirector system.

Tell me (though comments here or through my twitter handle @_masterzen_) if you’re interested in this kind of Puppet stuff, or if there are any specific topics you’d like me to cover :)

Masterzen’s Blog 2011-10-29 10:17:23

It’s been a long time since my last blog post, almost a year. Not that I stopped hacking on Puppet or other things (even though I’m not as productive as I had been in the past), it’s just that so many things happened last year (Memoir’44 release, architecture work at Days of Wonder) that I lost the motivation of maintaining this blog.

But that’s over, I plan to start a series of Puppet internals articles. The first one (yes this one) is devoted to Puppet Extension Points.

Since a long time, Puppet contains a system to dynamically load ruby fragments to provide new functionalities both for the client and the master. Among the available extension points you’ll find:

  • manifests functions
  • custom facts
  • types and providers
  • faces

Moreover, Puppet contains a synchronization mechanism that allows you to ship your extensions into your manifests modules and those will be replicated automatically to the clients. This system is called pluginsync.

This first article will first dive into the ruby meta-programming used to create (some of) the extension DSL (not to be confused with the Puppet DSL which is the language used in the manifests). We’ll talk a lot about DSL and ruby meta programming. If you want to know more on those two topics, I’ll urge you to read those books:

Anatomy of a simple extension

Let’s start with the simplest form of extension: Parser Functions.

Functions are extensions of the Puppet Parser, the entity that reads and analyzes the puppet DSL (ie the manifests). This language contains a structure which is called “function”. You already use them a lot, for instance “include” or “template” are functions.

When the parser analyzes a given manifest, it detects the use of functions, and later on during the compilation phase the function code is executed and the result may be injected back into the compilation.

Here is a simple function:

The given function uses the puppet functions DSL to load the extension code into Puppet core code. This function is simple and does what its basename shell equivalent does: stripping leading paths in a given filename. For this function to work you need to drop it in the lib/puppet/parser/functions directory of your module. Why is that? It’s because after all, extensions are written in ruby and integrate into the Puppet ruby namespace. Functions in puppet live in the Puppet::Parser::Functions class, which itself belongs to the Puppet scope.

The Puppet::Parser::Functions class in Puppet core has the task of loading all functions defined in any puppet/parser/functions directories it will be able to find in the whole ruby load path. When Puppet uses a module, the modules’ lib directory is automatically added to the ruby load path. Later on, when parsing manifests and a function call is detected, the Puppet::Parser::Functions will try to load all the ruby files in all the puppet/parser/functions directory available in the ruby load path. This last task is done by the Puppet autoloader (available into Puppet::Util::Autoload). Let’s see how the above code is formed:

  • Line 1: this is ruby way to say that this file belongs to the puppet function namespace, so that Puppet::Parser::Functions will be able to load it. In real, we’re opening the ruby class Puppet::Parser::Functions, and all that will follow will apply to this specific puppet class.

  • Line 2: this is where ruby meta-programming is used. Translated to standard ruby, we’re just calling the “newfunction” method. Since we’re in the Puppet::Parser::Functions class, we in fact are just calling the class method Puppet::Parser::Functions#newfunction.

We pass to it 4 arguments:

  • the function name, encoded as a symbol. Functions name should be unique in a given environment
  • the function type: either your function is a rvalue (meaning a right-value, an entity that lies on the right side of an assignment operation, so in real English: a function that returns a value), or is not (in which case the function is just a side-effect function not returning any values).
  • a documentation string (here we used a ruby heredoc) which might be extracted later.
  • and finally we’re passing a ruby code block (from the do on line 5, to the inner end on line 10). This code block won’t be executed when puppet loads the functions.

  • Line 5 to 10. The body of the methods. When ruby loads the function file on behalf of Puppet, it will happily pass the code block to newfunction. This last one will store the code block for later use, and make it available in the Puppet scope class under the name function_basename (that’s one of the cool thing about ruby, you can arbitrarily create new methods on classes, objects or even instances).

So let’s see what happens when puppet parses and executes the following manifest:

The first thing that happens when compiling manifests is that the Puppet lexer triggers. It will read the manifest content and split it in tokens that the parser knows. So essentially the above content will be transformed in the following stream of tokens:

The parser, given this input, will reduce this to what we call an Abstract Syntax Tree. That’s a memory data structure (usually a tree) that represents the orders to be executed that was derived from the language grammar and the stream of tokens. In our case this will schematically be parsed as:

In turns, when puppet will compile the manifest (ie execute the above AST), this will be equivalent to this ruby operation:

Remember how Puppet::Parser::Functions#newfunction created the function_basename. At that time I didn’t really told you the exact truth. In fact newfunction creates a function in an environment specific object instance (so that functions can’t leak from one Puppet environment to another, which was one of the problem of 0.25.x). And any given Puppet scope which are instances of Puppet::Parser::Scope when constructed will mix in this environment object, and thus bring to life our shiny function as if it was defined in the scope ruby code itself.

Pluginsync

Let’s talk briefly about the way your modules extensions are propagated to the clients. So far we’ve seen that functions live in the master, but some other extensions types (like facts or types) essentially live in the client. Since it would be cumbersome for an admin to replicate all the given extensions to all the clients manually, Puppet offers pluginsync, a way to distribute this ruby code to the clients. It’s part of every puppet agent run, before asking for a catalog to the master. The interesting thing (and that happens in a lot of place into Puppet, which always amazes me), is that this pluginsync process is using Puppet itself to perform this synchronization. Puppet is good at synchronizing remotely and recursively a set of files living on the master. So pluginsync just create a small catalog containing a recursive File resource whose source is the plugins fileserver mount on the master, and the destination the current agent puppet lib directory (which is part of the ruby load path). Then this catalog is evaluated and the Puppet File resource mechanism does its magic and creates all the files locally, or synchronizes them if they differ. Finally, the agent loads all the ruby files it synchronized, registering the various extensions it contains, before asking for its host catalog.

Wants some facts?

The other extension point that you certainly already encountered is adding custom facts. A fact is simply a key, value tuple (both are strings). But we also usually call a fact the method that dynamically produces this tuple. Let’s see what it does internally. We’ll use the following example custom fact:





It’s no secret that Puppet uses Facter a lot. When a puppet agent wants a catalog, the first thing it does is asking Facter for a set of facts pertaining to the current machine. Then those facts are sent to the master when the agent asks for a catalog. The master injects those facts as variables in the root scope when compiling the manifests.

So, facts are executed in the agent. Those are pluginsync’ed as explained above, then loaded into the running process.

When that happens the add method of the Facter class is called. The block defined between line 2 and 6 is then executed in the Facter::Util::Resolution context. So the Facter::Util::Resolution#setcode method will be called and the block between line 3 and 5 will be stored for later use.

This Facter::Util::Resolution instance holding our fact code will be in turn stored in the facts collection under the name of the fact (see line 2).

Why is it done in this way? Because not all facts can run on every hosts. For instance our above facts does not work on Windows platform. So we should use facter way of confining our facts to architectures on which we know they’ll work. Thus Facter defines a set of methods like “confine” that can be called during the call of Facter#add (just add those outside of the setcode block).  Those methods will modify how the facts collection will be executed later on. It wouldn’t have been possible to confine our facts if we stored the whole Facter#add block and called it directly at fact resolution, hence the use of this two-steps system.

Conclusion

And, that’s all folks for the moment. Next episode will explain types and providers inner workings. I also plan an episode about other Puppet internals, like the parser, catalog evaluation, and/or the indirector system.

Tell me (though comments here or through my twitter handle @masterzen) if you’re interested in this kind of Puppet stuff, or if there are any specific topics you’d like me to cover :)

DevOps Gurgaon Meetup : Some High octane gyan

DevOps Gurgaon Meetup : Some High octane gyan With the clock striking 6 P.M. on the 12th of July,  few geeks from MakeMyTrip were all set to brainstorm with some other fellow geeks from the DevOps community, for the first … Continue reading

The post DevOps Gurgaon Meetup : Some High octane gyan appeared first on Piyush dot Me.

DevOps Gurgaon Meetup

DevOps Gurgaon Meetup Time  Tuesday, July 12 2011 · 6:00pm – 9:00pm Location : MakeMyTrip Office – Gurgaon 103 Udyog Vihar Phase I, Gurgaon – 122016, Haryana, India Gurgaon, India Come to DevOps Gurgaon Meetup to:- - hang around with … Continue reading

The post DevOps Gurgaon Meetup appeared first on Piyush dot Me.

Command-line cookbook dependency solving with knife exec

Imagine you have a fairly complicated infrastructre with a large number of nodes and roles. Suppose you have a requirement to take one of the nodes and rebuild it in an entirely new network, perhaps even for a completely different organization. This should be easy, right? We have our infrastructure in the form of code. However, our current infrastructure has hundreds of uploaded cookbooks - how do we know the minimum ones to download and move over? We need to find out from a node exactly what cookbooks are needed for that node to be built.

The obvious place to start is with the node itself:

$ knife node show controller
Node Name:   controller
Environment: _default
FQDN:        controller
IP:          182.13.194.41
Run List:    role[base], recipe[apt::cacher], role[pxe_server]
Roles:       pxe_server, base
Recipes      apt::cacher, pxe_dust::server, dhcp, dhcp::config
Platform:    ubuntu 10.04

OK, this tells us we need the apt, pxe_dust and dhcp cookbooks. But what about them - do they have any dependencies? How could we find out? Well, dependencies are specified in two places - in the cookbook metadata, and in the individual recipes. Here’s a primitive way to illustrate this:

bash-3.2$ for c in apt pxe_dust dhcp
> do
> grep -iER 'include_recipe|^depends' $c/* | cut -d '"' -f 2 | sort | uniq
> done
apt::cacher-client
apache2
pxe_dust::server
tftp
tftp::server
utils

As I said - primitive. However the problem doesn’t end here. In order to be sure, we now need to repeat this for each dependency, recursively. And of course it would be nice to present them more attractively. Thinking about it, it would be rather useful to know what cookbook versions are in use too. This is definitely not a job for a shell one liner - is there a better way?

As it happens, there is. Think about it - the Chef server already needs to solve these dependencies to know what cookbooks to push to API clients. Can we access this logic? Of course we can - clients carry out all their interactions with the Chef server via the API. This means we can let the server solve the dependencies and query it via the API ourselves.

Chef provides two powerful ways to access the API without having to write a RESTful client. The first, Shef, is an interactive REPL based on IRB, which when launched gives access to the Chef server. This isn’t trivial to use. The second, much simpler way is the knife exec subcommand. This allows you to write Ruby scripts or simple one-liners that are executed in the context of a fully configured Chef API Client using the knife configuration file.

knife exec -E '(api.get "nodes/controller/cookbooks").each { |cb| pp cb[0] => cb[1].version }'

The /nodes/NODE_NAME/cookbooks endpoint returns the cookbook attributes, definitions, libraries and recipes that are required for this node. The response is a hash of cookbook name and Chef::CookbookVersion object. We simply iterate over each one, and pretty print the cookbook name and the version.

Let’s give it a try:

$ knife exec -E '(api.get "nodes/controller/cookbooks").each { |cb| pp cb[0] => cb[1].version }'
{"apt"=>"1.1.1"}
{"tftp"=>"0.1.0"}
{"apache2"=>"0.99.3"}
{"dhcp"=>"0.1.0"}
{"utils"=>"0.9.5"}
{"pxe_dust"=>"1.1.0"}

Nifty! :)

Command-line cookbook dependency solving with knife exec

Imagine you have a fairly complicated infrastructre with a large number of nodes and roles. Suppose you have a requirement to take one of the nodes and rebuild it in an entirely new network, perhaps even for a completely different organization. This should be easy, right? We have our infrastructure in the form of code. However, our current infrastructure has hundreds of uploaded cookbooks - how do we know the minimum ones to download and move over? We need to find out from a node exactly what cookbooks are needed for that node to be built.

The obvious place to start is with the node itself:

$ knife node show controller
Node Name:   controller
Environment: _default
FQDN:        controller
IP:          182.13.194.41
Run List:    role[base], recipe[apt::cacher], role[pxe_server]
Roles:       pxe_server, base
Recipes      apt::cacher, pxe_dust::server, dhcp, dhcp::config
Platform:    ubuntu 10.04

OK, this tells us we need the apt, pxe_dust and dhcp cookbooks. But what about them - do they have any dependencies? How could we find out? Well, dependencies are specified in two places - in the cookbook metadata, and in the individual recipes. Here's a primitive way to illustrate this:

bash-3.2$ for c in apt pxe_dust dhcp
> do
> grep -iER 'include_recipe|^depends' $c/* | cut -d '"' -f 2 | sort | uniq
> done
apt::cacher-client
apache2
pxe_dust::server
tftp
tftp::server
utils

As I said - primitive. However the problem doesn't end here. In order to be sure, we now need to repeat this for each dependency, recursively. And of course it would be nice to present them more attractively. Thinking about it, it would be rather useful to know what cookbook versions are in use too. This is definitely not a job for a shell one liner - is there a better way?

As it happens, there is. Think about it - the Chef server already needs to solve these dependencies to know what cookbooks to push to API clients. Can we access this logic? Of course we can - clients carry out all their interactions with the Chef server via the API. This means we can let the server solve the dependencies and query it via the API ourselves.

Chef provides two powerful ways to access the API without having to write a RESTful client. The first, Shef, is an interactive REPL based on IRB, which when launched gives access to the Chef server. This isn't trivial to use. The second, much simpler way is the knife exec subcommand. This allows you to write Ruby scripts or simple one-liners that are executed in the context of a fully configured Chef API Client using the knife configuration file.

knife exec -E '(api.get "nodes/controller/cookbooks").each { |cb| pp cb[0] => cb[1].version }'

The /nodes/NODE_NAME/cookbooks endpoint returns the cookbook attributes, definitions, libraries and recipes that are required for this node. The response is a hash of cookbook name and Chef::CookbookVersion object. We simply iterate over each one, and pretty print the cookbook name and the version.

Let's give it a try:

$ knife exec -E '(api.get "nodes/controller/cookbooks").each { |cb| pp cb[0] => cb[1].version }'
{"apt"=>"1.1.1"}
{"tftp"=>"0.1.0"}
{"apache2"=>"0.99.3"}
{"dhcp"=>"0.1.0"}
{"utils"=>"0.9.5"}
{"pxe_dust"=>"1.1.0"}

Nifty! :)

Building a Devops team

This is a guest post by Brian Henerey, from Sony Computer Entertainment Europe.

Background

I’ve had 3 roles at Sony since joining in August 2008. Nearly a year ago I took over the management of the original engineering team I joined. This was a failing team by any definition, but I was excited about the opportunity to reshape it. I knew the remaining team was deeply unhappy and likely to quit at any moment, so I had a few immediate goals:

  • Hire!
  • Keep people from quitting.
  • Hire!

Side story: I stumbled on one important objective I didn’t list however. Keep customers happy. It doesn’t matter how awesome you think your team can be if no one wants to work with you based on past experiences. I didn’t appreciate how much a demotivated employee could jeopardise customer relationships by virtue of not caring. It has taken me months to restore trust with one customer. I’ve heard a story about a manager offering employees £500 to quit on a regular basis. I think that probably has some practical problems, but its a tempting idea to cull the unmotivated.

I come from a long background of small/medium size enterprises. It has been a challenge adapting to a large corporation, but I don’t think there’s much unique to Sony about the anti-Devops patterns I’ve encountered. I know several people in small companies who says they’ve been practicing Devops before there was such a word and I completely agree. The trouble of silos, bureaucracy, organizational boundaries, politics, etc, seem pretty common in larger businesses though. I can’t speak to how to create a Devops culture across a large organisation from the top down, but I’ve been working really hard to create one from the inside.

The beginning

A year ago I’d never heard of the term Devops. If you’re in the same boat, it is easy to find a great deal to read about what Devops is:

And what it is not:

However, I suspect some people will have trouble finding the read-worthy gems amongst all the chatter. Here’s a good place to get started: getting started with devops. The gigantic list of Devops related bookmarks compiled by Patrick Debois shows why you may not want to try and read everything: devops bookmarks

If you’re in the know already and Devops resonates with you, and you want to build a team around the concept, here’s how I went about it.

Networking

The terms Devops didn’t really take shape for me until I started to talk about it with others. Fortunately, London has a really active Devops community so I’ve had ample opportunity. The tireless Gareth Rushgrove organises many events, and The Guardian is a frequent host. I’ve been to sessions discussing Continuous Integration, Deployments, Google App Engine, Load Balancers, Chef, CloudFoundry, etc. I’ve found people to be incredibly open about technology, processes, culture, difficulties and successes they’ve had.

While Devops is of course about more than technology and tools, I personally have found Devops to be an excellent banner under which to have really interesting conversations. Having a forum which brings people from diverse backgrounds together has helped me shape my own internal understanding of what Devops should be about.

I felt a bit of an imposter going to the initial London Devops meetups because I was so keen on recruiting. However, the quality of the discussions has been so good I eagerly anticipate each upcoming meetup even though I’m no longer hiring. I’ve also discovered that half the attendees are also hiring. It’s a Devopsee’s market.

Result!: I met and subsequently hired Stephen Nelson-Smith from Atalanta-Systems. (He’s @Lordcope on twitter, and the author of agilesysadmin.net

Working definition of Devops

If you’re going to hire people with Devops in mind, its good to have a working definition. I like the pillars of Devops (CAMS) put forth by John Willis: what devops means to me

  • Culture
  • Automation
  • Measurement
  • Sharing

SMAC might have been a better acronym, but I’ll go with CAMS.

A Devops job spec

I don’t think Devops is a role, though I’ve seen jobs posting for such a thing. I only mentioned that I was looking for someone ‘Devops-savvy’, and later changed it to ‘Devops-minded’ or something similar. The job posting expired and I’d have to dig it out, but R.I.Pinearr described in on Twitter as the ‘perfect devops job posting’. I’m pretty keen on revising a job spec until the requirements are only things I actually require and can measure against. Saying that, how to write a job spec is way outside the scope of this post. To summarize, I was looking for:

  • problem solving skills
  • ‘can do’ attitude
  • good team fit (really hard to quantify)
  • a broad set of skills (LAMP, Java, C++, Ruby, Python, Oracle, Scaling/Capacity, High-Availability, etc, etc)

My team works on a ton of different technology stacks, and the landscape is constantly changing. Its a techie-dream job, but the interpersonal skills are the most important.

Recruiters

I strongly believe in giving recruiters a fair bit of my time. I’ve seen many people be rude to recruiters, ignore them, etc, and then wonder why they don’t get good candidates through. I’m quite keen on engaging the recruiters, explaining the role I’m trying to fill thoroughly, and having the occasional coffee or beer with them. Feedback is of course vital to candidates, and I try to give it honestly and quickly, letting the recruiter worry about sugar coating things.

CV selection

This is tough. I regularly get CV blindness where everyone starts to look the same. And generally ill-suited. I try to remember there are human beings on the other end and force myself to have concrete reasons why I’m rejecting someone. Talking to a recruiter about this helps me be concrete.

First interview - remote technical test

This is where things get interesting! I don’t know if this is unique to London, but I’ve had a LOT of candidates from other countries apply to join this team. If someone has a good CV and the recruiter vouches for their English language skills, I developed a great screening test which can be conducted remotely. This saves a trip to London + hotel, and I can end it promptly if things aren’t going well. Here’s how it works:

  • I email the candidate/recruiter a url to an ec2 instance that I spin up on the day about 20 minutes before the interview.
  • The instance is running a web browser which contains instructions for the test. These only state that the candidate will need a terminal such as Putty if they’re on Windows.
  • At the arranged time I phone the candidate. I explain that there will be two tests. The first is a sys admin task which will be time bound to 20 minutes. The second is a programming task which they can use the remainder of the time to complete. The call will end after 1 hour.
  • I explain the rules: They are to perform all of their work on the ec2 instance. They have a test account/password, and sudo root access. They can use any resources they want to solve the problems. Google, man pages, libraries are not only fair game, but fully expected.
  • I explain what I want from them: They need to talk to me, tell me what they are thinking, and walk me through the problem solving process. I’m far more interested in that dialogue than whether they solve either problem I give them.
  • I also add that we’re using Screen, and I can see everything they type.
  • I swap the index.html with the complete instructions in place, make note of the time, and let them begin.

The problems

1) Its really quite simple: install Wordpress and configure it to work properly. The catch is that we install mysql first, break it, and then watch as candidates wonder what the heck is going on. For an experienced sysadmin this is child’s play. I tended to interview people with stronger development background and less familiar installing applications. I could tell almost immediately how well someone knew there way around a Linux system. It was interesting to see what kinds of assumptions people made about the system itself (I never mentioned the OS that was running. Several just assumed Ubuntu.) Some people read instructions, some don’t. I give people the mysqladmin password, but some people search on how to reset a lost password because they didn’t read what I gave them. I had one guy spend 10 minutes trying to ssh to http://ec2……. I gave him a pass on nerves, but he continued to suck and I ended it soon there after. He blamed language barrier (Eastern European), and said if only I had been more clear to him. If I can’t communicate with him, I think that’s a pretty big problem and it doesn’t really matter who’s fault it is.

2) We provide sanitized Production Tomcat logs for a real application we support and ask the candidate to write a log parsing script in a language of their choice. We want the output of the script to show methods calls, call counts, frequencies, average and 90% latencies. Our preference is Ruby, but they can do it however they’d like. I had one candidate choose to implement this in Bash and was writing some serious regex-fu that I had no idea how it worked. He got stuck however, and I couldn’t help but ask as he claimed to be a Ruby developer why he didn’t do it in Ruby, which was my stated preference. He started over in Ruby and did okay. Depending how much time was spent on problem 1, this part of the interview is really boring for me. I stay on the phone in case they have questions, I ask them to explain their approach before they begin coding, but then I just start checking email/etc. After 60 minutes total is up, I explain to the candidate that they can continue working on the coding task as long as they need and to send me an email when they’ve finished. I get off the phone however, stating that we’ll give them feedback as soon as we’ve reviewed the code they submit and explain the next steps.

Results

I put several candidates through this process. In the beginning of creating this test, I’d have a couple members of my team on this call as well, but we found this too time consuming and a bit intimidating to certain candidates. Timeboxing problem 1 was a HUGE improvement, and once Stephen Nelson-Smith was on board I had someone better than me at evaluating the Ruby code. We all felt this test process was extremely revealing of candidates skillsets and I highly recommend it.

One of my favourite candidates conducted this interview on a laptop in the shared wifi area of a crowded and noisy London hostel. In the background were screaming people and overbearing Christmas music. He was able to tune out the distractions and nailed both problems with ease, and got major bonus points for doing so.

Round 2 - Face to face interview

Round 2 actually has a few parts:

  • Coffee/lunch/dinner informal chat up to 1 hour in length. I explain what I’m looking for; they can talk about themselves; we can find out if we have a good match.
  • Hypothetical whiteboard problem solving exercise: You receive a call saying customer goes to http://yoursite.com and gets a blank page. What do you do next? We can improvise a bit here on what the actual problem is, but we’re hoping to learn two things: How does this person approach problem solving? What level of architectural complexity have they been exposed to?
  • 2 hours of pair programming with a member of my team. This is usually a real bit of work that needs doing. It could be writing a chef cookbook, or a cucumber test, etc. We want to learn what its like to work closely with this person. My team pair programs often. Do we want to pair with this person day in / day out?

Round 3 - my boss + any member of my team who hasn’t met the candidate yet.

  • This is generally very open, though my boss has her own techniques for evaluating people.

Its very important to me that everyone on my team have a voice. I was quite keen on one candidate, but when one of my team member’s voiced vague concerns about the person’s team-fit, we all stopped and took it on board. We rejected the candidate in the end because once the first doubts were out in the open, other people’s concerns started to be raised as well. I recognised that I was a bit too keen to hire someone to fill a pressing need and am glad how things worked out..

A GREAT candidate/hire

One of my favourite hires not only does he know C, Java, and Linux, but wrote a sample Ruby application because he knew we were looking to hire Ruby skills within the team. His app worked out the shortest path between tube stations, though only in terms of number of stops, not time travelled. This initiative told me a lot about him, and its been 100% the same since he joined the team. Eager to learn and try new things. Any problem/task put in front of him is ‘easy’. My only trouble is he tends to consider problems solved when he’s worked out in his head how he will solve it. This is a bit of a joke really. I accused him the other day of declaring checkmate on a task because he was so confident it would be completed in his next 7 seven steps.

Beyond hiring

Now what? Well, hiring the right people is HUGE. We celebrated each hire, as opposed to the typical ‘leaving drinks’ when people move on. How I manage the team will be a future blog post (I hope), but I’ll add one quick comment. Hiring people according to the vision I had means that I am held accountable as well. Whenever I find myself explaining that the reason for a decision I’m making is ‘politics’, I know I have to change.

About the author

Image

Brian Henerey heads up Operations Engineering in the Online Technology Group at Sony Computer Entertainment Europe. His passions include Devops, Tool-chains, Web Operations, Continuous Delivery and Lean thinking. He’s currently building automated infrastructure pipelines with Ruby, Chef, and AWS, enabling self-service, just-in-time development and test environments for Sony’s Worldwide Studios.

Image Image