Category Archives: graphite

Jenkins, Puppet, Graphite, Logstash and YOU

This is a repost of an article I wrote for the Acquia Blog some time ago.

As mentioned before, devops can be summarized by talking about culture, automation, monitoring metrics and sharing. Although devops is not about tooling, there are a number of open source tools out there that will be able to help you achieve your goals. Some of those tools will also enable better communication between your development and operations teams.

When we talk about Continuous Integration and Continuous Deployment we need a number of tools to help us there. We need to be able to build reproducible artifacts which we can test. And we need a reproducible infrastructure which we can manage in a fast and sane way. To do that we need a Continuous Integration framework like Jenkins.

Formerly known as Hudson, Jenkins has been around for a while. The open source project was initially very popular in the Java community but has now gained popularity in different environments. Jenkins allows you to create reproducible Build and Test scenarios and perform reporting on those. It will provide you with a uniform and managed way to , Build, Test, Release and Trigger the deployment of new Artifacts, both traditional software and infrastructure as code-based projects. Jenkins has a vibrant community that builds new plugins for the tool in different kinds of languages. People use it to build their deployment pipelines, automatically check out new versions of the source code, syntax test it and style test it. If needed, users can compile the software, triggering unit tests, uploading a tested artifact into a repository so it is ready to be deployed on a new platform level.

Jenkins then can trigger an automated way to deploy the tested software on its new target platform. Whether that be development, testing, user acceptance or production is just a parameter. Deployment should not be something we try first in production, it should be done the same on all platforms. The deltas between these platforms should be managed using a configuration management tool such as Puppet, Chef or friends.

In a way this means that Infrastructure as code is a testing dependency, as you also want to be able to deploy a platform to exactly the same state as it was before you ran your tests, so that you can compare the test results of your test runs and make sure they are correct. This means you need to be able to control the starting point of your test and tools like Puppet and Chef can help you here. Which tool you use is the least important part of the discussion, as the important part is that you adopt one of the tools and start treating your infrastructure the same way as you treat your code base: as a tested, stable, reproducible piece of software that you can deploy over and over in a predictable fashion.

Configuration management tools such as Puppet, Chef, CFengine are just a part of the ecosystem and integration with Orchestration and monitoring tools is needed as you want feedback on how your platform is behaving after the changes have been introduced. Lots of people measure the impact of a new deploy, and then we obviously move to the M part of CAMS.

There, Graphite is one of the most popular tools to store metrics. Plenty of other tools in the same area tried to go where Graphite is going , but both on flexibility, scalability and ease of use, not many tools allow developers and operations people to build dashboards for any metric they can think of in a matter of seconds.

Just sending a keyword, a timestamp and a value to the Graphite platform provides you with a large choice of actions that can be done with that metric. You can graph it, transform it, or even set an alert on it. Graphite takes out the complexity of similar tools together with an easy to use API for developers so they can integrate their own self service metrics into dashboards to be used by everyone.

One last tool that deserves our attention is Logstash. Initially just a tool to aggregate, index and search the log files of our platform, it is sometimes a huge missed source of relevant information about how our applications behave.. Logstash and it's Kibana+ElasticSearch ecosystem are now quickly evolving into a real time analytics platform. Implementing the Collect, Ship+Transform, Store and Display pattern we see emerge a lot in the #monitoringlove community. Logstash now allows us to turn boring old logfiles that people only started searching upon failure into valuable information that is being used by product owners and business manager to learn from on the behavior of their users.

Together with the Graphite-based dashboards we mentioned above, these tools help people start sharing their information and communicate better. When thinking about these tools, think about what you are doing, what goals you are trying to reach and where you need to improve. Because after all, devops is not solving a technical problem, it's trying to solve a business problem and bringing better value to the end user at a more sustainable pace. And in that way the biggest tool we need to use is YOU, as the person who enables communication.

Sensu and Graphite, Part 2

In a previous post I described two methods for routing metrics generated by Sensu clients to Graphite:

  • use a pipe handler to send metrics via TCP to graphite
  • use Graphite's AMQP (rabbitmq) support.

Method #1 was simply described for completeness. It is not scalable and shouldn't be used except for very small workloads. Pipe handlers involve a fork() by sensu-server for every metric received.

At the time I recommended method #2 which was more efficient – Sensu would simply copy the metric from its own results queue to another queue that Graphite would be listening on, since both Sensu and Graphite can talk to RabbitMQ.

However, Graphite's AMQP support is fairly lacking, in my opinion. It does not seem to be getting much attention on the regular Graphite support forums and the code around AMQP has not changed much. The docs section describing its configuration remains an empty TODO.

The main reason I don't like the AMQP approach anymore is that it does not work well with Graphite clusters. I prefer to build a Graphite cluster where each node is identically configured. Each node would connect to a an AMQP queue, pop a metric off of the queue in a load-balanced fashion, then let carbon-relay's routing rules figure out where to send the metric. It does not work this way. Instead each graphite node would pull each metric posted to the queue, duplicating effort on each node in the cluster. This is wasteful and limits the capacity of the cluster needlessly.

Newer, better ways

My new preferred method for sending metrics to Graphite is to use TCP with a load-balancer in front of Graphite's carbon-relay instances in the case of a multi-node cluster.

This was not really possible when the initial blog post was written, but since that time Sensu has added support for extensions handlers in addition to the original pipe handlers. Extensions are Ruby code that is loaded and run inside the sensu-server process. They are much more efficient than fork()'ing to handle each event.

There are two extension handlers available for sending metrics to Graphite:

  • Sensu-server TCP handler: Ships with sensu-server. Very simple, takes the event['output'] string and sends it untouched over a TCP socket to a destination.
  • @grepory's WizardVan: More features, supports OpenTSDB and Graphite, buffering support, re-connect, backoff, etc.

Here is a quick example of configuring each of these extension handlers.

Sensu-server TCP Handler

Configuring the TCP handler that ships with Sensu is easy and is documented in the handlers section of the Sensu docs.

The TCP handler is very basic and will simply copy the output of the check directly over the socket. This works out fine for most Sensu metric checks since the defacto standard for most is to output graphite's line-oriented format.

Example tcp handler:

  "handlers": {
    "graphite_line_tcp": {
      "type": "tcp",
      "socket": {
        "host": "metrics.dom.tld",
        "port": 2003

Add the graphite_line_tcp handler to your metric checks:

  "checks": {
    "vmstat_metrics": {
      "type": "metric",
      "handlers": ["graphite_line_tcp"],
      "command": "/etc/sensu/plugins/vmstat-metrics.rb --scheme stats.:::name:::",
      "interval": 60,
      "subscribers": [ "webservers" ]

WizardVan (aka, sensu-metrics-relay) Extension

A more advanced TCP extension handler is available from @grepory and goes by the code-name WizardVan or sensu-metrics-relay (same thing, but I was confused for a moment).

WizardVan does not come shipped with Sensu but installation instructions are available on its Github page. In the future it may be easier to install by shipping as a rubygem.

WizardVan also takes advantage of another newer Sensu feature known as mutators which provide the ability for WizardVan to send metrics to either Graphite or OpenTSDB or both.

By default, WizardVan assumes that metrics are in Graphite format and so configuring it for use with Graphite is straight-forward:

Here is a general example for configuring WizardVan. See the docs for more options.

  "handlers": {
    "relay": {
        "graphite": {
            "host": "graphite.dom.tld",
            "port": 2003
        "opentsdb": {
            "host": "tsdb.dom.tld",
            "port": 4424

For further information on configuring WizardVan see the README.

NOTE: Unless you have a very high (hundreds/sec) rate of metrics you may need to lower WizardVan's MAX_QUEUE_SUZE to something less than 16KB (try 128). Hopefully soon this will be configurable instead of hardcoded.


During my Puppetcamp Gent talk last week, I explained how to get alerts based on trends from graphite. A number of people asked ,e how to do that.

First lets quickly explain why you might want to do that .
Sometimes you don't care about the current value of a an example take a Queing system .. there is no problem if there are messages added to the queue, not even if there are a lot of messages on the queue, there might however be a problem if over a certain period the number of messages on a queue stays to high.

In this example I`m monitoring the queue length of a hornetq setup which is exposed by JMX.
On the server runnnig HornetQ I have an exported resource that tells the JMXTrans server to send the MessageCount to graphite
(you could also do this using collectd plugins)

  1. @@jmxtrans::graphite {"MessageCountMonitor-${::fqdn}":
  2. jmxhost => hiera('hornetqserver'),
  3. jmxport => "5446",
  4. objtype => 'org.hornetq:type=Queue,*',
  5. attributes => '"MessageCount","MessagesAdded","ConsrCount"',
  6. resultalias => "hornetq",
  7. typenames => "name",
  8. graphitehost => hiera('graphite'),
  9. graphiteport => "2003",
  10. }

This gives me a computable url on which I can get the graphite view

The next step then is to configure a nagios check that verifies this data. For that I need to use the check_graphite plugin from Datacratic ..

Which can work with an nrpe config like

  1. ### File managed with puppet ###
  2. ### Served by: '<%= scope.lookupvar('::servername') %>'
  3. ### Module: '<%= scope.to_hash['module_name'] %>'
  4. ### Template source: '<%= template_source %>'
  6. command[check_hornetq]=/usr/lib64/nagios/plugins/check_graphite -u "http://<%= graphitehost%>/render?target=servers.<%= scope.lookupvar('::fqdn').gsub(/\./,'_')%>_5446.hornetq.docstore_private_trigger_notification.MessageCount&from=-30minutes&rawData=true" -w 2000 -c 20000

I define this check on the host where HornetQ is running as it then will map to that host on Icinga/Nagios rather than throw a host error on an unrelated host.

Our #monitoringsucks rpm is repository available

Not only our Rubygems Builds have changed, but also my internal #monitoringsucks repository.

You might have noticed a variety of vagrant- projects on my github acount,
Being the #monitoringsucks part of them. All of those Vagrant projects are basically my test setups to play with those new tools.

They contain a bunch of puppet modules that install and configure these tools. (Note that they mostly consist of
of git submodules to other puppet module repositories.

Given the fact that I also like to have my software cleanly installed from a package, that means that some of these tools had to be packaged, or I had to create a personal / internal repository which had packages from upstream that were hiding on the internet available.

I've forked of this repository off the internal Inuits epository so you all can also benefit from these efforts.
(You gotta love pulp :))

That means you can now install all of the above mentionned #monitoringsucks tool from our public repo on

  1. yumrepo { 'monitoringsucks':
  2. baseurl => '',
  3. descr => 'MonitoringSuck at Inuits',
  4. gpgcheck => '0',
  5. }

Patches to both the Vagrant projects and the puppet modules are welcome ...

FlossUK and Puppetcamp Edinburgh

I've just finished presenting my talk on how I currently work on Puppet modules at Puppetcamp here in Edinburgh where I've been for the week talking on both FlossUK 2012 and Puppetcamp.

Earlier this week I opened FlossUK 2012 with my talk on 7 tools for your devops stack

Sensu and Graphite

Updated December 7, 2013: I no longer recommend using the approach described in this post. Please read Sensu and Graphite, Part 2 instead.

Updated October 16, 2012: Removed “passive”:”true” from the graphite amqp handler definition in Sensu. This is too brittle. Sensu will fail to start unless graphite has started first and created the exchange on the RabbitMQ server. By matching the “durable”:”true” setting that graphite expects, then we can start either service in any order.

Updated October 16, 2012: Updated graphite amqp handler definition to use the new “mutator”‘s in Sensu 0.9.7. See the for details on backwards-incompatible changes as Sensu moves towards a 1.0.0 release.

It's been pretty exciting to see the number of folks getting involved with Sensu lately, as judging by the increased activity on the #sensu channel on Freenode. One of the most common questions is how to integrate Sensu and Graphite. In this article I'll cover two approaches for pushing metrics from Sensu to Graphite.

Remember: think of Sensu as the "monitoring router". While we are going to show how to push metrics to Graphite, it is just as easy to push metrics to any other system – Librato, Cube, OpenTSDB, etc. In fact, it would not be difficult at all to push metrics to multiple graphing backends in a fanout manner.

Install vmstat-metrics plugin

For this example, we're going to use the vmstat-metrics plugin which can be found in the sensu-community-plugins repository on github. The plugins in this repo are meant to be cherry-picked, so let's just grab the one we're interested in:

cd /etc/sensu/plugins/
sudo wget
sudo chmod +x vmstat-metrics.rb

Most, but not all, of the plugins in the sensu-community-plugins repository use helper classes from the sensu-plugin, so we'll need to install that gem on the nodes that will be running the vmstat-metrics plugin.

sudo gem install sensu-plugin --no-rdoc --no-ri

Open up vmstat-metrics.rb and we see that it inherits a lot of plumbing from the Sensu::Plugin::Metric::CLI::Graphite class. Keep this in mind when writing your own metrics-gathering plugins, it can save you some time.

Let's run the plugin manually so we can see the output.

./vmstat-metrics.rb   0   1328153991
stats.swap.out  0   1328153991 122160  1328153991
stats.memory.swap_used  8   1328153991   48556   1328153991
stats.memory.inactive   73704   1328153991
stats.cpu.waiting   1   1328153991
stats.cpu.idle  95  1328153991
stats.cpu.system    4   1328153991

We will want to customize the path before sending this data to Graphite, including adding a hostname to differentiate it from other hosts. This can be done with the –scheme switch:

./vmstats-metrics.rb --scheme stats.`hostname -s`    0   1328155423
stats.host01.swap.out   0   1328155423  122512  1328155423
stats.host01.memory.swap_used   8   1328155423    43856   1328155423
stats.host01.memory.inactive    75120   1328155423
stats.host01.cpu.waiting    1   1328155423
stats.host01.cpu.idle   95  1328155423
stats.host01.cpu.system 4   1328155423

If you're familiar with Graphite, this should look familiar. This data fits Graphite's " value timestamp" format and we can be feed it directly into Graphite via a couple different methods.

Create a check .json

Let's push out a check definition to our nodes running sensu-client and sensu-server nodes. Don't forget to install the plugin as well as the sensu-plugin gem.

File: /etc/sensu/conf.d/metrics_vmstat.json:

  "checks": {
    "vmstat_metrics": {
      "type": "metric",
      "handlers": ["graphite"], 
      "command": "/etc/sensu/plugins/vmstat-metrics.rb --scheme stats.:::name:::",
      "interval": 60,
          "subscribers": [ "webservers" ]

There are two new items in this check definition that we haven't covered yet:

  • The first is a new attribute: "type": "metric". This is critical. It tells the sensu-server to send the output of every invocation of this check to the specified handlers. Normally, only checks that return a non-zero exit status indicating a failed check will be passed onto handlers. For metrics, however, we always want to send the output to the handlers.

  • The second is the use of a "custom variable" in the command: :::name:::. Sensu will replace this at execution time with the name attribute from the client section of the Sensu config. There's a lot of other stuff we can do with custom variables which will be covered in future blogs.

Next, we need to create a handler to do something with this data. Since we're using Graphite in this example, we'll explore two methods for getting this data into graphite: direct TCP and AMQP.

Method 1 – Direct TCP (via netcat) handler

The first method will be to simply send this data to Graphite via TCP socket using netcat. This is a very simple approach and should work on most boxes (with netcat installed) and doesn't require configuring Graphite to use AMQP.

Fetch graphite_tcp.rb from the sensu-community-plugins repo and copy to /etc/sensu/handlers.

Next create /etc/sensu/conf.d/graphite_tcp.json config file:

  "graphite": {

Create /etc/sensu/conf.d/handler_graphite.json:

  "handlers": {
    "graphite": {
      "type": "pipe",
      "command": "/etc/sensu/handlers/graphite_tcp.rb"

Restart sensu-server and you should start seeing vmstat metrics show up in Graphite.

There is a downside to this approach, however, and that is scalability. For each metric that is received, sensu-server will fork and execute this handler. With many nodes, many checks generating metrics, and a low interval, this could quickly add up to dozens or hundreds of short-lived processes being forked on the sensu-server.

Method 2 – integrated AMQP handler

Remember that I keep calling Sensu "the monitoring router"? Because Sensu and Graphite both speak the AMQP messaging protocol, we can configure sensu-server to take the check output directly off of a rabbitmq queue and copy it to a new queue in a format for Graphite. This approach should be very fast and very scalable.

Configure Graphite's carbon-cache for AMQP:


AMQP_PORT = 5672
AMQP_VHOST = /sensu
AMQP_USER = sensu

An important part of this config is AMQP_METRIC_NAME_IN_BODY = True which means Graphite will determine the metric name from the contents of the message rather than via the routing key. See here for more info.

Next, we configure a handler on the sensu-server that will send metrics to the queue Graphite is listening on:

Edit /etc/sensu/conf.d/handler_graphite.json:

  "handlers": {
    "graphite": {
      "type": "amqp",
      "exchange": {
        "type": "topic",
        "name": "metrics",
        "durable": "true"
      "mutator": "only_check_output"

A few things to note here:

  • "type": "amqp" : We're telling sensu-server that this handler will re-route the check output to an AMQP exchange.
  • "durable": "true" : Graphite (carbon-cache) creates the exchange as a durable exchange. We need to match this setting otherwise we will get an error from Rabbit and the connection will fail.
  • "mutator": "only_check_output" : This instructs sensu-server to only send the raw output (from stdout) returned by the check to the AMQP exchange. By default, sensu-server would send the entire JSON doc with other metadata, but Graphite wouldn't know what to do with that data.

Debug tips:

For debugging the Graphite side of things, it can be helpful to enable AMQP_VERBOSE = True in carbon.conf. With this enabled, the following will be visible in listener.log when a metric is successfully retrieved from rabbitmq by Graphite:

02/02/2012 10:11:11 :: Message received: Method(name=deliver, id=60) ('graphite_consumer', 8, False, 'metrics', '') content = <Content instance: body='\t0\t1328195471\ntest.sensu-server.swap.out\t0\t1328195471\\t126856\t1328195471\ntest.sensu-server.memory.swap_used\t8\t1328195471\\t52528\t1328195471\ntest.sensu-server.memory.inactive\t57952\t1328195471\ntest.sensu-server.cpu.waiting\t1\t1328195471\ntest.sensu-server.cpu.idle\t95\t1328195471\ntest.sensu-server.cpu.system\t4\t1328195471\ntest.sensu-server.cpu.user\t0\t1328195471\ntest.sensu-server.system.interrupts_per_second\t89\t1328195471\ntest.sensu-server.system.context_switches_per_second\t476\t1328195471\\t8\t1328195471\\t17\t1328195471\ntest.sensu-server.procs.waiting\t4\t1328195471\ntest.sensu-server.procs.uninterruptible\t0\t1328195471\n', children=[], properties={'priority': 0, 'content type': 'application/octet-stream', 'delivery mode': 1}>
02/02/2012 10:11:11 :: Metric posted: 0 1328195471
02/02/2012 10:11:11 :: Metric posted: test.sensu-server.swap.out 0 1328195471
02/02/2012 10:11:11 :: Metric posted: 126856 1328195471

Graphite, JMXTrans, Ganglia, Logster, Collectd, say what ?

Given that @patrickdebois is working on improving data collection I thought it would be a good idea to describe the setup I currently have hacked together.

(Something which can be used as a starting point to improve stuff, and I have to write documentation anyhow)

I currently have 3 sources , and one target, which will eventually expand to at least another target and most probably more sources too.

The 3 sources are basically typical system data which I collect using collectd, However I`m using collectd-carbon from to send data to Graphite.

I`m parsing the Apache and Tomcat logfiles with logster , currently sending them only to Graphite, but logster has an option to send them to Ganglia too.

And I`m using JMXTrans to collect JMX data from Java apps that have this data exposed and send it to Graphite. (JMXTrans also comes with a Ganglia target option)

Rather than going in depth over the config it's probably easier to point to a Vagrant box I build which brings up a machine that does pretty much all of this on localhost.

Obviously it's still a work in progress and lots of classes will need to be parametrized and cleaned up. But it's a working setup, and not just on my machine ..

#monitoringsucks and we’ll fix it !

If you are hacking on monitoring solutions, and want to talk to your peers solving the problem
Block the monday and tuesday after fosdem in your calendar !

That's right on february 6 and 7 a bunch of people interrested to fix the problem will be meeting , discussing and hacking stuff together in Antwerp

In short a #monitoringsucks hackathon

Inuits is opening up their offices for everybody who wants to join the effort Please let us (@KrisBuytaert and @patrickdebois) know if you want to join us in Antwerp

Obviously if you can't make it to Antwerp you can join the effort on ##monitoringsucks on Freenode or on Twitter.

The location will be Duboistraat 50 , Antwerp
It is about 10 minutes walk from the Antwerp Central Trainstation
Depending on Traffic Antwerp is about half an hour north of Brussels and there are hotels at walking distance from the venue.

Plenty of parking space is available on the other side of the Park

GDash – Graphite Dashboard

I love graphite, I think it’s amazing, I specifically love that it’s essentially Stats as a Service for your network since you can get hold of the raw data to integrate into other tools.

I’ve started pushing more and more things to it on my network like all my Munin data as per my previous blog post.

What’s missing though is a very simple to manage dashboard. Work is ongoing by the Graphite team on this and there’s been a new release this week that refines their own dashboard even more.

I wanted a specific kind of dashboard though:

  • The graph descriptions should be files that you can version control
  • Graphs should have meta data that’s visible to people looking at the graphs for context. The image below show a popup that is activated by hovering over a graph.
  • Easy bookmarkable URLs
  • Works in common browsers and resolutions
  • Allow graphs to be added/removed/edited on the fly without any heavy restarts required using something like Puppet/Chef – graphs are just text files in a directory
  • Dashboards and graphs should be separate files that can be shared and reused

I wrote such a dashboard with the very boring name – GDash – that you can find in my GitHub. It only needs Sinatra and uses the excellent Twitter bootstrap framework for the visual side of things.

click for full size

The project is setup to be hosted in any Rack server like Passenger but it will also just work in Heroku, if you hosted it on Heroku it would create URLs to your private graphite install. To get it going on Heroku just follow their QuickStart Guide. Their free tier should be enough for a decent sized dashboard. Deploying the app into Heroku once you are signed up and setup locally is just 2 commands.

You should only need to edit the file to optionally enable authentication and to point it at your Graphite and give it a name. After that you can add graphs, the example one that creates the above image is in the sample directory.

More detail about the graph DSL used to describe graphs can be found at GitHub, I know the docs for the DSL needs to be improved and will do so soon.

I have a few plans for the future:

  • As I am looking to replace Munin I will add a host view that will show common data per host. It will show all the data there and you can give it display hints using the same DSL
  • Add a display mode suitable for big monitors – wider layout, no menu bar
  • Some more configuration options for example to set defaults that apply to all graphs
  • Add a way to use dygraphs to display Graphite data

Ideas, feedback and contributions welcome!