Aggregating Nagios Checks With MCollective

A very typical scenario I come across on many sites is the requirement to monitor something like Puppet across 100s or 1000s of machines.

The typical approaches are to add perhaps a central check on your puppet master or to check using NRPE or NSCA on every node. For this example the option exist to easily check on the master and get one check but that isn’t always easily achievable.

Think for example about monitoring mail queues on all your machines to make sure things like root mail isn’t getting stuck. In those cases you are forced to do per node checks which inevitably result in huge notification storms in the event that your mail server was down and not receiving the mail from the many nodes.

MCollective has had a plugin that can run NRPE commands for a long time, I’ve now added a nagios plugin using this agent to combine results from many hosts.

Sticking with the Puppet example, here are my needs:

  • I want to know if anywhere some puppet machine isn’t successfully doing runs.
  • I want to be able to do puppetd –disable and not get alerts for those machines.
  • I do not want to change any configs when I am adding new machines, it should just work.
  • I want the ability to do monitoring on subsets of machines on different probes

This is a pretty painful set of requirements for nagios on its own to achieve. Easy with the help of MCollective.

Ultimately, I just want this:

OK: 42 WARNING: 0 CRITICAL: 0 UNKNOWN: 0

Meaning 42 machines – only ones currently enabled – are all running happily.

The NRPE Check

We put the NRPE logic on every node. A simple check command in /etc/nagios/nrpe.d/check_puppet_run.cfg:

command[check_puppet_run]=/usr/lib/nagios/plugins/check_file_age -f /var/lib/puppet/state/state.yaml -w 5400 -c 7200

In my case I just want to know there are successful runs happening, if I wanted to know the code is actually compiling correctly I’d monitor the local cache age and size.

Determining if Puppet is enabled or not

Currently this is a bit hacky, I’ve filed tickets with Puppet Labs to improve this. The way to determine if puppet is disabled is to check if the lock file exist and if its 0 bytes. If it’s not zero bytes it means a puppetd is currently doing a run – there will be a pid in it. Or the puppetd crashed and there’s a stale pid preventing other runs.

To automate this and integrate into MCollective I’ve made a fact puppet_enabled. We’ll use this in MCollective discovery to only monitor machines that are enabled. Get this onto all your nodes perhaps using Plugins in Modules.

The MCollective Agent

You want to deploy the MCollective NRPE Agent to all your nodes, once you’ve got it right you can test it easily using something like this:

% mc-nrpe -W puppet_enabled=1 check_puppet_run
 
 * [ ============================================================> ] 47 / 47
 
Finished processing 47 / 47 hosts in 395.51 ms
              OK: 47
         WARNING: 0
        CRITICAL: 0
         UNKNOWN: 0

Note we’re restricting the run to only enabled hosts.

Integrating into Nagios

The last step is to add this to nagios. I create SSL certs and a specific client configuration for Nagios and put these in it’s home directory.

The check-mc-nrpe plugin works best with Nagios 3 as it will return subsequent lines of output indicating which machines are in what state so you get the details hidden behind the aggregation in alerts. It also outputs performance data for total node, each status and also how long it took to do the check.

The nagios command would be something like this:

define command{
        command_name                    check_mc_nrpe
        command_line                    /usr/sbin/check-mc-nrpe  --config /var/log/nagios/.mcollective/client.cfg  -W $ARG1$ $ARG2$
}

And finally we need to make a service:

define service{
        host_name                       monitor1
        service_description             mc_puppet-run
        use                             generic-service
        check_command                   check_mc_nrpe!puppet_enabled=1!check_puppet_run
        notification_period             awakehours
        contact_groups                  sysadmin
}

Here are a few other command examples I use:

All machines with my Puppet class “pki”, check the age of certs:

check_command   check_mc_nrpe!pki!check_pki

All machines with my Puppet class “bacula::node”, make sure the FD is running:

check_command   check_mc_nrpe!bacula::node!check_fd

…and that they were backed up:

check_command   check_mc_nrpe!bacula::node!check_bacula_main

Using this I removed 100s of checks from my monitoring platform, saving on resources and making sure I can do my critical monitor tasks better.

Depending on the quality of your monitoring system you might even get a graph showing the details hidden behind the aggregation:

The above is a graph showing a series of servers where the backup ran later than usual, I had 2 alerts only, would have had more than 30 before aggregation.

Restrictions for Probes

The last remaining requirement I had was to be able to do checks on different probes and restrict them. My Collective is one big one spread all over the world which means sometimes things are a bit slow discovery wise.

So I have many nagios servers doing local checks. Using MCollective discovery I can now easily restrict checks, for example If I only wanted to check machines in the USA and I had a fact country I only have to change my command line in the service declaration:

check_command   check_mc_nrpe!puppet_enabled=1 country=us!check_puppet_run

This will then via MCollective discovery just monitor machines in the US.

What to monitor this way

As this style of monitoring is done using Discovery you would need to think carefully about what you monitor this way. It’s totally conceivable that if a node is under high CPU load that it wont respond to discovery commands in time, and so wont get monitored!

You would then for example not want to monitor things like load averages or really critical services this way, but we all have a lot of peripheral things like zombie process counts and a lot of other places where aggregation makes a lot of sense, in those cases by all means consider this approach.

Tutorial: Writing MCollective Agents

I’ve recorded a screencast that walks you through the process of developing a SimpleRPC Agent, give it a DDL and also a simple client to communicate with it.

The tutorial creates a small echo agent that takes input and return it unmodified. It validates that you are sending a string and has a sample of dealing with intermittent failure.

Once you’ve watched this, or even during, you can use the following links are reference material: Writing Agents, Data Definition Language and Writing Clients.

You can view it directly on blip.tv which will hopefully be better quality.

I used a few VIM Snippets during the demo to boilerplate the agent and DDL, you’ll find these in the tarball for the upcoming 0.4.7 release in the ext/vim directory, they are already on GitHub too.

Recent MCollective releases and roadmap.

I’ve had two successive Marionette Collective releases recently, I was hoping to have one big one but I was waiting for the Stomp maintainers to do a release and it was taking a while.

These two releases are both major feature releases covering major feature sets. See lower down for a breakdown of it all.

We’re nearing feature completeness for the SimpleRPC layer as I am adding a number of features of interest to Enterprise and Large users especially around security and web UIs.

Once we’re at the end of this cycle I’ll do a 1.0.0 release and then from there move onto the next major feature cycle. The next cycle will focus on queuing long running tasks, background scheduling, future scheduling of tasks and a lot of related work. I posted some detail about these plans to the list recently.

Over the new few days or weeks I’ll do a number of Screencasts exploring some of these new features in depth, for now the list of what’s new:

Security

Connectivity

We can use Ruby Gem Stomp 1.1.6 which brings a lot of enhancements:

  • Connection pools for failover between multiple ActiveMQs
  • Lots of tunables about the connection pools such as retry frequencies etc
  • SSL TLS between node and ActiveMQ

Writing Web and Dynamic UIs

  • A DDL that describes agents, inputs and outputs:
    • Creates auto generated documentation
    • Can be used to auto generate user interfaces
    • The client library will only make requests that validate against the DDL
    • In future input validations will move into the DDL and will be done automatically for you
  • Web UI’s can bypass or do their own discovery and use the DDL to auto generate user interfaces

Usability

  • Fire-and-Forget style requests, for when you just want something done but do not care about results, these requests are very quick as they do not do any discovery.
  • Agents can now be reloaded without restarting the daemon
  • A new mc-inventory tool that can be used to view facts, agents and classes for a node
  • Many UI enhancements to the CLI tools

MCollective pgrep

The unix pgrep utility is great, it lets you grep through your process list and find interesting things. I wanted to do something similar but for my entire server group so built something quick ontop of MCollective.

I am using the Ruby sys-proctable gem to do the hard work, it returns a massive amount of information about each process and have written a simple agent on top of this.

The agent supports grepping the process tree but also supports kill and pgre+kill though I have not yet implemented more than the basic grep on the command line. Frankly the grep+kill combination scares me and I might remove it. A simple grep slipup and you will kill all processes on all your machine :) Sometimes too much power is too much and should just be avoided.

At the moment mc-pgrep outputs a set format but I intend to make that configurable on the command line, here’s a sample:

% mc-pgrep -C /dev_server/ ruby
 
 * [ ============================================================> ] 4 / 4
 
dev1.my.com
       root   9833  ruby /usr/sbin/mcollectived --pid=/var/run/mcollectived.pid 
       root  21608  /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_pass
 
dev2.my.com
       root  14568  /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_pass
       root  31595  ruby /usr/sbin/mcollectived --pid=/var/run/mcollectived.pid 
 
dev3.my.com
       root   1620  /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_pass
       root  14093  ruby /usr/sbin/mcollectived --pid=/var/run/mcollectived.pid 
 
dev4.my.com
       root   3231  /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_pass
       root  20557  ruby /usr/sbin/mcollectived --pid=/var/run/mcollectived.pid 
 
   ---- process list stats ----
        Matched hosts: 4
    Matched processes: 8
        Resident Size: 37.264KB
         Virtual Size: 629.578MB

You can also limit it to only find zombies with the -z option.

This has been quite interesting for me, if I limit the pgrep to “.” (the pattern is regex) every machine will send back a Sys::ProcTable hash for all its processes. This is a 50 to 70 KByte payload per server. I’ve so far seen no problem getting his much traffic through ActiveMQ + MCollective and processing it all in a very short time:

% time mc-pgrep -F "country=/uk|us/" .
 
   ---- process list stats ----
        Matched hosts: 20
    Matched processes: 1958
        Resident Size: 1.777MB
         Virtual Size: 60.072GB
 
mc-pgrep -F "country=/uk|us/" .  0.19s user 0.06s system 7% cpu 3.420 total

That 3.4 seconds is with a 2 second discovery overhead client machine in Germany and the filter matching UK and US machines – all the way to the West Coast – my biggest delay here is network and not MC or ActiveMQ.

The code can be found at my GitHub account and still a bit of a work in progress, wiki pages will follow once I am happy with it.

And as an aside, I am slowly migrating at least my code to GitHub if not wiki and ticketing. So far my Plugins have moved, MC will move soon too.

Monitoring, another way

eth0Some friends told me for a while about collectd, why I should look at it, why munin is so painful and so on. If you’ve been reading my posts you know I have tweaked a little my $WORK munin install to make it faster and lighter. But I finally took time to explore collectd, and I regret to not have done this before. It has so many pros that I decided to implement it in parallel with munin (because I can’t afford being blind on metrics). But collectd comes without an UI : it “only” collectds data, but that’s not a problem. There are various web interfaces and after giving a look to a bunch of them I fell in love with Lindsay Holmwood’s Visage.

This piece of software is definitely cool : all graphs are rendered live in your browser in SVG. Yes ! Realtime graphs, no need for crappy flash s***, zoom. It is based on sinatra, haml and some JS libraries (I won’t talk about this, my JS foo is deeper than the Mariana Trench). But it lacked some features : it’s OK when you have a few hosts but when the hosts list starts being loooong then the interface needs some improvements. So I forked it on github and implemented (some parts of) what I needed. My github fork has host grouping & per host profiles. Check this out and enjoy Visage !

Now working on sets of graphs :)

PS : <3 Guigui2

Puppet Concat 20100507

I’ve had quite a lot of contributions to my Puppet Concat module and after some testing by various people I’m ready to do a new release.

Thanks to Paul Elliot, Chad Netzer and David Schmitt for patches and assistance.

For background of what this is about please see my earlier post: Building files from fragments with Puppet

You can download the release here. Please pay special attention to the upgrade instructions below.

Changes in this release

  • Several robustness improvements to the helper shell script.
  • Removed all hard coded paths in the helper script to improve portability.
  • We now use file{} to copy the combined file to its location. This means you can now change the ownership of a file by just changing the owner/group in concat{}.
  • You can specify ensure => “/some/other/file” in concat::fragment to include the contents of another file in the fragment. Even files not managed by puppet.
  • The code is now hosted on Github and we’ll accept patches there.

Upgrading

When upgrading to this version you need to take particular care. All the fragments are now owned by root, the shell script runs as root and we use file{} to copy the resulting file out.

This means you’ll see the diff of not just the fragments but also the final file when running puppetd –test but unfortunately it also means the first time you run puppet with the new code your Puppet will fire off all notifies that you have on your concat{} resources. You’ll also see a lot of changes to resources in the fragments directory on first run. This is normal and expected behavior.

So if say you’re using the concat to create my.cf and notify the service to restart automatically then simply upgrading this module will result in MySQL restarting. This is a one off notify that happens only the first time, from then on it will be as normal. So I’d suggest when upgrading to disable those notifies till this upgrade is running everywhere and then put it back.

Mcollective & Xen : naughty things

eth0I already blogged about my experiments with mcollective & xen but I had something a little bigger in my mind. A friend had sent me a video showing some vmware neat features (DRS mainly) with VMs migrating through hypervisors automatically.

So I wrote a “proof of concept” of what you can do with an awesome tool like mcollective. The setup of this funny game is the following :

  • 1 box used a iSCSI target that serves volumes to the world
  • 2 xen hypervisors (lenny packages) using open-iscsi iSCSI initiator to connect to the target. VMs are stored in LVM, nothing fancy

The 3 boxens are connected on a 100Mb network and the hypervisors have an additionnal gigabit network card with a crossover cable to link them (yes, this is a lab setup). You can find a live migration howto here.

For the mcollective part I used my Xen agent (slightly modified from the previous post to support migration), which is based on my xen gem. The client is the largest part of the work but it’s still less than 200 lines of code. It can (and will) be improved because all the config is hardcoded. It would also deserve a little DSL to be able to handle more “logic” than “if load is superior to foo” but as I said before, it’s a proof of concept.

Let’s see it in action :

hypervisor2:~# xm list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0   233     2     r-----    873.5
hypervisor3:~# xm list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0   232     2     r-----  78838.0
test1                                        6   256     1     -b----     18.4
test2                                        4   256     1     -b----     19.3
test3                                       20   256     1     r-----     11.9

test3 is a VM that is “artificially” loaded, as is the machine “hypervisor3″ (to trigger migration)

[mordor:~] ./mc-xen-balancer
[+] hypervisor2 : 0.0 load and 0 slice(s) running
[+] init/reset load counter for hypervisor2
[+] hypervisor2 has no slices consuming CPU time
[+] hypervisor3 : 1.11 load and 3 slice(s) running
[+] added test1 on hypervisor3 with 0 CPU time (registered 18.4 as a reference)
[+] added test2 on hypervisor3 with 0 CPU time (registered 19.4 as a reference)
[+] added test3 on hypervisor3 with 0 CPU time (registered 18.3 as a reference)
[+] sleeping for 30 seconds

[+] hypervisor2 : 0.0 load and 0 slice(s) running
[+] init/reset load counter for hypervisor2
[+] hypervisor2 has no slices consuming CPU time
[+] hypervisor3 : 1.33 load and 3 slice(s) running
[+] updated test1 on hypervisor3 with 0.0 CPU time eaten (registered 18.4 as a reference)
[+] updated test2 on hypervisor3 with 0.0 CPU time eaten (registered 19.4 as a reference)
[+] updated test3 on hypervisor3 with 1.5 CPU time eaten (registered 19.8 as a reference)
[+] sleeping for 30 seconds

[+] hypervisor2 : 0.16 load and 0 slice(s) running
[+] init/reset load counter for hypervisor2
[+] hypervisor2 has no slices consuming CPU time
[+] hypervisor3 : 1.33 load and 3 slice(s) running
[+] updated test1 on hypervisor3 with 0.0 CPU time eaten (registered 18.4 as a reference)
[+] updated test2 on hypervisor3 with 0.0 CPU time eaten (registered 19.4 as a reference)
[+] updated test3 on hypervisor3 with 1.7 CPU time eaten (registered 21.5 as a reference)
[+] hypervisor3 has 3 threshold overload
[+] Time to see if we can migrate a VM from hypervisor3
[+] VM key : hypervisor3-test3
[+] Time consumed in a run (interval is 30s) : 1.7
[+] hypervisor2 is a candidate for being a host (step 1 : max VMs)
[+] hypervisor2 is a candidate for being a host (step 2 : max load)
trying to migrate test3 from hypervisor3 to hypervisor2 (10.0.0.2)
Successfully migrated test3 !

Let’s see our hypervisors :

hypervisor2:~# xm list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0   233     2     r-----    878.9
test3                                       25   256     1     -b----      1.1
hypervisor3:~# xm list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0   232     2     r-----  79079.3
test1                                        6   256     1     -b----     18.4
test2                                        4   256     1     -b----     19.4

A little word about configuration options :

  • interval : the poll time in seconds.  this should not be too low, let the machine some time and avoid load peeks to distort the logic.
  • load_threshold : where you consider the machine load is too high and that it is time to move some stuff away (tampered with max_over, see below)
  • daemonize : not used yet
  • max_over : maximum time (in minutes) where load should be superior to the limit. When reached, it’s time, really. Don’t set it too low and at least 2*interval or sampling will not be efficient
  • debug : well….
  • max_vm_per_host : the maximum VMs a host can handle. If a host already hit this limit it will not be candidate for receiving a VM
  • max_load_candidate : same thing as above, but for the load
  • host_mapping : a simple CSV file to handle non-DNS destinations (typically my crossover cable address have no DNS entries)

What is left to do :

  • Add some barriers to avoid migration madness to let load go down after a migration or to avoid migrating a VM permanently
  • Add a DSL to insert some more logic
  • Write a real client, not a big fat loop

Enjoy the tool !

Files :

Authorization plugins for MCollective SimpleRPC

Till now The Marionette Collective has relied on your middleware to provide all authorization and authentication for requests. You’re able to restrict certain middleware users from certain agents, but nothing more fine grained.

In many cases you want to provide much finer grain control over who can do what, some cases could be:

  • A certain user can only request service restarts on machines with a fact customer=acme
  • A user can do any service restart but only on machines that has a certain configuration management class
  • You want to deny all users except root from being able to stop services, others can still restart and start them

This kind of thing is required for large infrastructures with lots of admins all working in their own group of machines but perhaps a central NOC need to be able to work on all the machines, you need fine grain control over who can do what and we did not have this will now. It would also be needed if you wanted to give clients control over their own servers but not others.

Version 0.4.5 will have support for this kind of scheme for SimpleRPC agents. We wont provide a authorization plugin out of the box with the core distribution but I’ve made one which will be available as a plugin.

So how would you write an auth plugin, first a typical agent would be:

module MCollective
    module Agent
         class Service<RPC::Agent
             authorized_by :action_policy
 
             # ....
         end
    end
end

The new authorized_by keyword tells MCollective to use the class MCollective::Util::ActionPolicy to do any authorization on this agent.

The ActionPolicy class can be pretty simple, if it raises any kind of exception the action will be denied.

module MCollective
    module Util
         class ActionPolicy
              def self.authorize(request)
                  unless request.caller == "uid=500"
                      raise("You are not allow access to #{request.agent}::#{request.action}")
                  end
              end
         end
    end
end

This simple check will deny all requests from anyone but Unix user id 500.

It’s pretty simple to come up with your own schemes, I wrote one that allows you to make policy files like the one below for the service agent:

policy default deny
allow   uid=500 *                    *                *
allow   uid=502 status               *                *
allow   uid=600 *                    customer=acme    acme::devserver

This will allow user 500 to do everything with the service agent. User 502 can get the status of any service on any node. User 600 will be able to do any actions on machines with the fact customer=acme that also has the configuration management class acme::devserver on them. Everything else will be denied.

You can do multiple facts and multiple classes in a simple space separated list. The entire plugin to implement such policy controls was only 120 – heavy commented – lines of code.

I think this is a elegant and easy to use layer that provides a lot of functionality. We might in future pass more information about the caller to the nodes. There’s some limitations, specifically about the source of the caller information being essentially user provided so you need to keep that mind.

As mentioned this will be in MCollective 0.4.5.

Meet the marionette

eth0Another cool project I keep an eye on for some weeks is “the marionette collective“, aka mcollective. This project is leaded & develloped by R.I. Pienaar, one of the most active people in the puppet world too.

Mcollective is an framework for distributed sysadmin. It relies on a messaging framework and has many features included : flexibility, speed, easy to understand.

Some time ago, I had wrote a tool called “whosyourdaddy” to help me (and my memory as big as a goldfish one) to find on which Xen dom0 a Xen domU was living. It worked fine, expect the fact that is was not dynamic : if a VM was migrated  from a dom0 to another, I had to update the CMDB. Not really reliable (if an update fails the CMDB is no more accurate) and I didn’t want to have to embed this constraint in the Xen logic. So I decided to try out to write my own mcollective agent and here it is ! It is built on top of a (very) small ruby module for xen and has it own client.

You can find on which dom0 a domU resides :

master1:~# ./mc-xen -a find --domu test
hypervisor2              : Absent
hypervisor1              : Absent
master1:~# ./mc-xen -a find --domu domu2
hypervisor2              : Present
hypervisor1              : Absent

Or list your domUs :

master1:~# ./mc-xen -a list
hypervisor2              
 domu2

hypervisor1              
 no domU running

Download the agent & the client

MCollective Release 0.4.4

I just released version 0.4.4 of The Marionette Collective. This release is primarily a bug fix release addressing issues with log files and general code cleanups.

The biggest change in this release is that controlling the daemon has become better, you can ask it to reload an agent or all agents and a few other bits. Read all about it on the wiki..

Please see the Release Notes, Changelog and Download List for full details.

For background information about the MCollective project please see the project website.