Hi! Welcome...

Syndication of blogs and tweets by users of the Freenode ##infra-talk IRC channel

27 February 2013 ~ Comments Off

Mount NFS 4 shares from OSX

This is one of those things that goes to show: it's easy if you know how.

I've got a zfs-based file server (currently using SmartOS) which uses NFSv4 shares. OSX can connect to NFS shares using "Connect To Server" from the finder" using a syntax like this:

nfs://nas.example.com/share_name

I've previously tried to use on my mbp but have never managed to get it to work in a stable fashion.

Then, this evening, I stumbled across the solution:

nfs://vers=4,nas.example.com/share_name

That's all there is to it – I now have stable NFSv4 connections from my Mac!

17 February 2013 ~ Comments Off

Last Check-in Time for Nodes

This one liner uses the knife exec sub-command to iterate over all the node objects on the Chef Server, and print out their ohai_time attribute in a human readable format.

1
knife exec -E ‘nodes.all {|n| puts "#{n.name} #{Time.at(n[:ohai_time])}"}’

Let’s break this up a little.

1
knife exec -E

The exec plugin for knife executes a script or the given string of Ruby code in the same context as chef-shell (or shef in Chef 10 and earlier) if you start it up in it’s “main” context. Since it is knife, it will also use your .chef/knife.rb settings, so it knows about your user, key and Chef Server.

1
nodes.all

The chef-shell main context has helper methods to access the corresponding endpoints in the Chef Server API. Clearly we’re working with “nodes” here, and the #all method returns all the node objects from the Chef Server. This differs from search in that there’s a commit delay between the time when data is saved to the server, and the data is indexed by Solr. This is usually a few seconds, but depending on various factors like the hardware you’re using, how many nodes are converging, etc, it can take longer.

Anyway, we can pass a block to nodes.all and do something with each node object. The example above is a oneliner, so let’s make it more readable.

1
2
3
nodes.all do |n|
  puts "#{n.name} #{Time.at(n[:ohai_time])}"
end

We’re simply going to use n as the iterator for each node object, and we’ll print a string about the node. The #{}’s in the string to print with puts is Ruby string interpolation. That is, everything inside the braces is a Ruby expression. First, the Chef::Node object has a method, #name, that returns the node’s name. This is usually the FQDN, but depending on your configuration (node_name in /etc/chef/client.rb or using the -N option for chef-client), it could be something else. Then, we’re going to use the node’s ohai_time attribute. Every time Chef runs and it gathers data about the node with Ohai, it generates the ohai_time attribute, which is the Unix epoch of the timestamp when Ohai ran. When Chef saves the node data at the end of the run, we know approximately the last time the node ran Chef. In this particular string, we’re converting the Unix epoch, like 1358962351.444405 to a human readable timestamp like 2013-01-23 10:32:31 -0700.

Of course, you can get similar data from the Chef Server by using knife status:

1
knife status

The ohai_time attribute will be displayed as a relative time, e.g., “585 hours ago.” It will include some more data about the nodes like IP’s. This uses Chef’s search feature, so you can also pass in a query:

1
knife status "role:webserver"

The knife exec example is simple, but you can get a lot more data about the nodes than what knife status reports.

In either case, ohai_time isn’t 100% accurate, since it is generated at the beginning of the run, and depending on what you’re doing with Chef on your systems, it can take a long time before the node data is saved. However, it’s close enough for many use cases.

If more detailed or completely accurate information about the Chef run is required for your purposes, you should use a report handler, which does have more data about the run available, including whether the run was successful or not.

16 February 2013 ~ Comments Off

FOSDEM 2013

Well, that's another FOSDEM over with. In general this year seemed the same as the last couple of years but slightly bigger than usual (although it seems that way every year). The (newish) K building was in full swing with dozens of project stalls and dev rooms. The usual suspects - virtualisation / cloud, configuration management and MySQL rooms had nearly as many people trying to get in to the rooms as they did sitting down.

I think some of the main dev rooms have reached the level of popularity that forces you to either arrive early, get a seat and not move for the rest of the day or accept a very high level of probability that you won't get to see the talks you want. I know a few of us had trouble cherry picking sessions across tracks - which obviously means we have excellent taste in topics. I wonder if having the same talks on both days would make it easier to move around as a visitor - you'd attempt to catch it the first time and if that fails, come back tomorrow. I realise however that this puts even more of a burden on speakers that graciously give their own time in both the preparation and performing of their talks. It does seem that scaling the rooms is the problem of the day once again.

I'd like to say a big thank you to all the organisers, speakers and other attendees for making it another enjoyable couple of days. See you next year.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

13 February 2013 ~ Comments Off

Love, MonitoringLove

Last year we were pretty negative about Monitoring, We shouted out that MonitoringSucked ... A year has passed and a lot has changed ... most importantly our new found love for monitoring, thanks to an inspirational Ignite talk by Ulf Mansson at devopsdays Rome.

Right after Fosdem about 20 people showed up at the #monitoringlove hacksessions hosted at the Inuits.eu offices to work on Open Source monitoring projects and exchange ideas. Some completely new people, some people with already a lot of experience.

Amongst the projects that were worked on was Maciej working on Packaging graphite for Debian, Ohter people were fixing bugs in Puppet , I spent some time with a vagrant box to deploy Sensu using Puppet. Last time I was playing with Sensu was on the flight back from PuppetCon , I gave up the fight with
RabbitMQ and SSL because I had no internet connection .. and now Ulf just pointed out that I could disable SSL at all, which resulted in having a POC up and running in no time.

Patrick was hacking on the Chef counterpart of the vagrant-puppet sensu setup a part of #monigusto. Ulf Mansson was getting dashing to display on a Raspberry Pi ... pretty cool stuff
And Jelle Smet was working on Pyseps a Python based Simple Event Processing Server framework that consume JSON docs from RabbitMQ and forwards them real time to other queues using MongoDB query syntax.

One of the more interesting discussion was around the topic of alerting and modeling business rules and input from a lot of different sources
in order to send the right alerts to the right people.

We explored different ideas like using BPM tools such as Activity or Rules engines like Ruby Rools. There exist some Saas providers that try to solve this need like PagerDuty and friends but obviously there is still a lot of work that needs to be done in order to create a viable alerting system based on different input sources.

The monitoring problem is not solved yet .. and it will stay around for a couple of years .. but with the advent of event such as Monitorama its clear
that an event like our #monitoring love hackessions is needed .. and is probably here to stay for a couple of years.

11 February 2013 ~ Comments Off

Puppet Camp – Ghent 2013

It's been a while since I've attended a Puppet Camp but considering the quality of the last one (organised by Patrick Debois) and the fact it was being held in the lovely city of Ghent again I thought it'd be a wise investment to scrape together the time off.

The quality of the talks seemed quite high and considering the number of newer users present the content level was well pitched. A couple of deeper talks for the more experienced members would have been nice but we mostly made our own in the open sessions. Facter, writing MCollective plugins, off-line and bulk catalogue compilation and the murky corners of our production puppets all came under discussion - in some cases quite fruitfully.

The wireless was a point of annoyance and amusement (depending on the person and the time of day). We had 20 users for an audience of ten times that - the attitudes covered the gamut from "I only need to check my mail once a day" to "I have my own tethering" and all the way to "This is my brute force script I run in a loop". You can tell when most of us lost our access based on the twitter hash tag.

I was a little surprised at the number of Puppet Camps there will be this year - 27 was the number mentioned. I think a lot of the more experienced members of the community value the camps and confs as a chance to catch up with each other and the PuppetLabs people and I'd hate to see us sticking to our own local camps and losing the cross pollination of ideas, plans and pains.

You can also view the Puppet Camp slides for a number of the sessions.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

10 February 2013 ~ Comments Off

Resilience and Reliability on AWS – book review

With a title like Resilience and Reliability on AWS I had quite high expectations for this slim book. Unfortunately, they were not met.

The first four chapters provide brief introductions to AWS and some of its more popular services. While these were fine I'd point people looking for this level of information at the Amazon Webservice Advent 2012 instead. Following this are a handful of more cookbook like chapters that each present a small amount of theory and advice about how to run a given applications on AWS - interspaced with multiple pages of python code. The chapters don't go in to enough details to bring much value to their subjects and the code detracts from the narrative without bringing much technical insight. I was particularly irked at the commented out sections - if you're going to publish a lot of code in a small book then at least be conscious that each line should bring something to the table.

It feels like this book should have been a series of blog posts rather than a printed book. Very disappointing and not recommended. Programming Amazon EC2 Programming Amazon EC2 by the same authors is much better.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

10 February 2013 ~ Comments Off

Install Chef 11 Server on CentOS 6

A few months ago, I posted briefly on how to install Chef 10 server on CentOS. This post revisits the process for Chef 11.

These steps were performed on a default CentOS 6.3 server install.

First, navigate to the Chef install page to get the package download URL. Use the form on the “Chef Server” tab to select the appropriate drop-down items for your system.

Install the package from the given URL.

1
rpm -Uvh https://opscode-omnitruck-release.s3.amazonaws.com/el/6/x86_64/chef-server-11.0.4-1.el6.x86_64.rpm

The package just puts the bits on disk (in /opt/chef-server). The next step is to configure the Chef Server and start it.

1
% chef-server-ctl reconfigure

This runs the embedded chef-solo with the included cookbooks, and sets up everything required - Erchef, RabbitMQ, PostgreSQL, etc.

Next, run the Opscode Pedant test suite. This will verify that everything is working.

1
% chef-server-ctl test

Copy the default admin user’s key and the validator key to your local workstation system that you have Chef client installed on, and create a new user for yourself with knife. You’ll need version 11.2.0. The key files on the Chef Server are readable only by root.

1
2
scp root@chef-server:/etc/chef-server/admin.pem .
scp root@chef-server:/etc/chef-server/chef-validator.pem .

Use knife configure -i to create an initial ~/.chef/knife.rb and new administrative API user for yourself. Use the FQDN of your newly installed Chef Server, with HTTPS. The validation key needs to be copied over from the Chef Server from /etc/chef-server/chef-validator.pem to ~/.chef to use it for automatically bootstrapping nodes with knife bootstrap.

1
% knife configure -i

The .chef/knife.rb file should look something like this:

1
2
3
4
5
6
7
8
log_level                :info
log_location             STDOUT
node_name                'jtimberman'
client_key               '/home/jtimberman/.chef/jtimberman.pem'
validation_client_name   'chef-validator'
validation_key           '/home/jtimberman/.chef/chef-validator.pem'
chef_server_url          'https://chef-server.example.com'
syntax_check_cache_path  '/home/jtimberman/.chef/syntax_check_cache'

Your Chef Server is now ready to use. Test connectivity as your user with knife:

1
2
3
4
5
6
% knife client list
chef-validator
chef-webui
% knife user list
admin
jtimberman

In previous versions of Open Source Chef Server, users were API clients. In Chef 11, users are separate entities on the Server.

The chef-server-ctl command is used on the Chef Server system for management. It has built-in help (-h) that will display the various sub-commands.

06 February 2013 ~ Comments Off

Chef and Net::SSH Dependency Broken

2nd UPDATE CHEF-3835 was opened by a member of the community; Chef versions 11.2.0 and 10.20.0 have been released by Opscode to resolve the issue.

UPDATE Opscode is working on getting a new release of the Chef gem with updated version constraints.

What Happened?

Earlier today (February 6, 2013), a new version of the various net-ssh RubyGems were published. This includes:

  • net-ssh 2.6.4
  • net-ssh-multi 1.1.1
  • net-ssh-gateway 1.1.1

Chef’s dependencies have a pessimistic version constraint (~>) on net-ssh 2.2.2.

What’s the Problem?

So what is the problem?

It appears to lie with net-ssh-gateway. The version of net-ssh-gateway went from 1.1.0 (released in April 2011), to 1.1.1. It depends on net-ssh. In net-ssh-gateway 1.1.0, the net-ssh version constraint was >= 1.99.1, which is fine with Chef’s constraint against ~> 2.2.2. However, in net-ssh-gateway 1.1.1, the net-ssh version constraint was changed to >= 2.6.4, which is obviously a conflict with Chef’s constraint.

What’s the Solution?

So, how can we fix it?

One solution is to use the Opscode Omnibus Package for Chef. This isn’t a solution for everyone, of course, but it does include and contain all the dependencies. This also doesn’t help if one wishes to install another gem that depends on Chef under the “Omnibus” Ruby environment along with Chef, because the conflict will be found. For example, to use the minitest-chef-handler gem for running minitest-chef tests.

vagrant@ubuntu-12-04:~$ /opt/chef/embedded/bin/gem install minitest-chef-handler ERROR: While executing gem … (Gem::DependencyError) Unable to resolve dependencies: net-ssh-gateway requires net-ssh (>= 2.6.4)

Another solution is to relax / modify the constraint in Chef. This may be okay, but as of right now we don’t know if this will affect anything in the way that Chef uses net-ssh. We have tickets related to net-ssh version constraints in Chef:

  • http://tickets.opscode.com/browse/CHEF-2977
  • http://tickets.opscode.com/browse/CHEF-3156

05 February 2013 ~ Comments Off

check_graphite

During my Puppetcamp Gent talk last week, I explained how to get alerts based on trends from graphite. A number of people asked ,e how to do that.

First lets quickly explain why you might want to do that .
Sometimes you don't care about the current value of a metric..as an example take a Queing system .. there is no problem if there are messages added to the queue, not even if there are a lot of messages on the queue, there might however be a problem if over a certain period the number of messages on a queue stays to high.

In this example I`m monitoring the queue length of a hornetq setup which is exposed by JMX.
On the server runnnig HornetQ I have an exported resource that tells the JMXTrans server to send the MessageCount to graphite
(you could also do this using collectd plugins)

  1. @@jmxtrans::graphite {"MessageCountMonitor-${::fqdn}":
  2. jmxhost => hiera('hornetqserver'),
  3. jmxport => "5446",
  4. objtype => 'org.hornetq:type=Queue,*',
  5. attributes => '"MessageCount","MessagesAdded","ConsrCount"',
  6. resultalias => "hornetq",
  7. typenames => "name",
  8. graphitehost => hiera('graphite'),
  9. graphiteport => "2003",
  10. }

This gives me a computable url on which I can get the graphite view

The next step then is to configure a nagios check that verifies this data. For that I need to use the check_graphite plugin from Datacratic ..

Which can work with an nrpe config like

  1. ### File managed with puppet ###
  2. ### Served by: '<%= scope.lookupvar('::servername') %>'
  3. ### Module: '<%= scope.to_hash['module_name'] %>'
  4. ### Template source: '<%= template_source %>'
  5.  
  6. command[check_hornetq]=/usr/lib64/nagios/plugins/check_graphite -u "http://<%= graphitehost%>/render?target=servers.<%= scope.lookupvar('::fqdn').gsub(/\./,'_')%>_5446.hornetq.docstore_private_trigger_notification.MessageCount&from=-30minutes&rawData=true" -w 2000 -c 20000

I define this check on the host where HornetQ is running as it then will map to that host on Icinga/Nagios rather than throw a host error on an unrelated host.

03 February 2013 ~ Comments Off

Managing Puppet Using MCollective

I recently gave a talk titled “Managing Puppet Using MCollective” at the Puppet Camp in Ghent.

The talk introduces a complete rewrite of the MCollective plugin used to manage Puppet. The plugin can be found on our Github repo as usual. Significantly this is one of a new breed of plugin that we ship as native OS packages and practice continuous delivery on.

The packages can be found on apt.puppetlabs.com and yum.puppetlabs.com and are simply called mcollective-puppet-agent and mcollective-puppet-client.

This set of plugins show case a bunch of recent MCollective features including:

  • Data Plugins
  • Aggregation Functions
  • Custom Validators
  • Configurable enabling and disabling of the Agent
  • Direct Addressing and pluggable discovery to significantly improve the efficiency of the runall method
  • Utility classes shared amongst different types of plugin
  • Extensive testing using rspec and our mcollective specific rspec plugins

It’s a bit of a beast coming at a couple thousand lines but this was mostly because we had to invent a rather sizeable wrapper for Puppet to expose a nice API around Puppet 2.7 and 3.x for things like running them and obtaining their status.

The slides from the talk can be seen below, hopefully a video will be up soon else I’ll turn it into a screencast.