Hi! Welcome...

Syndication of blogs and tweets by users of the Freenode ##infra-talk IRC channel

22 March 2013 ~ Comments Off

Sharing an SSH key, securely

Update: This isn't actually that much better than letting them access the private key, since nothing is stopping the user from running their own SSH agent, which can be run under strace. A better solution is in the works. Thanks Timo Juhani Lindfors and Bob Proulx for both pointing this out.

At work, we have a shared SSH key between the different people manning the support queue. So far, this has just been a file in a directory where everybody could read it and people would sudo to the support user and then run SSH.

This has bugged me a fair bit, since there was nothing stopping a person from making a copy of the key onto their laptop, except policy.

Thanks to a tip, I got around to implementing this and figured writing up how to do it would be useful.

First, you need a directory readable by root only, I use /var/local/support-ssh here. The other bits you need are a small sudo snippet and a profile.d script.

My sudo snippet looks like:

Defaults!/usr/bin/ssh-add env_keep += "SSH_AUTH_SOCK"
%support ALL=(root)  NOPASSWD: /usr/bin/ssh-add /var/local/support-ssh/id_rsa

Everybody in group support can run ssh-add as root.

The profile.d goes in /etc/profile.d/support.sh and looks like:

if [ -n "$(groups | grep -E "(^| )support( |$)")" ]; then
    export SSH_AUTH_ENV="$HOME/.ssh/agent-env"
    if [ -f "$SSH_AUTH_ENV" ]; then
        . "$SSH_AUTH_ENV"
    fi
    ssh-add -l >/dev/null 2>&1
    if [ $? = 2 ]; then
        mkdir -p "$HOME/.ssh"
        rm -f "$SSH_AUTH_ENV"
        ssh-agent > "$SSH_AUTH_ENV"
        . "$SSH_AUTH_ENV"
    fi
    sudo ssh-add /var/local/support-ssh/id_rsa
fi

The key is unavailable for the user in question because ssh-add is sgid and so runs with group ssh and the process is only debuggable for root. The only thing missing is there's no way to have the agent prompt to use a key and I would like it to die or at least unload keys when the last session for a user is closed, but that doesn't seem trivial to do.

21 March 2013 ~ Comments Off

Pipelines: a modern approach to modelling monitoring

Over the last few years I have been experimenting with different approaches for scaling systems that monitor large numbers of heterogenous hosts, specifically in hosting environments.

This post outlines a pipeline approach for modelling and manipulating monitoring data.


Monitoring can be represented as a pipeline which data flows through, and is eventually turned into a notification for a human.

This approach has several benefits:

  • Failures are compartmentalised
  • Compartments can be scaled independently from one another
  • Clear interfaces are required between compartments, enabling composability

Each stage of the pipeline is handled by a different compartment of monitoring infrastructure that analyses and manipulates the data before deciding whether to pass it onto the next compartment.

These components are the bare minimum required for a monitoring pipeline:

  • Data collection infrastructure, is generally a collection of agents on target systems, or standalone tools that extract metrics from opaque systems (preferably via an API).

  • Data storage infrastructure, provides a place to push collected metrics. These metrics are almost always numerical. These metrics are then queried and fetched for graphing, monitoring checks, and reporting - thus enabling "We alert on what we draw".

  • Check execution infrastructure, runs the monitoring checks that are configured for each host, that query the data storage infrastructure. Checks that query textual data often poll the target system directly, which can have effects on latency.

  • Notification infrastructure, processes check results from the check execution infrastructure to send notifications to engineers or stakeholders. Ideally the notification infrastructure can also feed back actions from engineers to acknowledge, escalate, or resolve alerts.

At a high level, this is how data flows between the compartments:

basic pipeline

When using Nagios, the check + notification infrastructure are generally collapsed into one compartment (with the exception of NRPE).

Many monitoring pipelines start out with the data collection + storage infrastructure decoupled from the check infrastructure. Monitoring checks query the same targets that are being graphed, but:

  • Because the check intervals don't necessarily match up to the data collection intervals, it can be hard to correlate monitoring alerts to features on the graphs.
  • The more systems poll the target system, the more the observer effect is amplified.

There are two other compartments that are becoming increasingly common:

  • Event processing infrastructure. Sitting between the check execution and notification infrastructure, this compartment processes events generated from the check infrastructure, identifies trends and emergent behaviours, and forwards the alerts to the notification infrastructure. It may also make decisions on who to send alerts to.

  • Management infrastructure, provides command + control facilities across all the compartments, as well as being the natural place for graphing and dashboards of metrics in the data storage infrastructure to live. If the target audience is non-technical or strongly segmented (e.g. many customers on a shared monitoring infrastructure), it can also provide an abstracted pretty public face to all the compartments.

This is how event processing + management fit into the pipeline:

event processing + management added to the pipeline

The management infrastructure can likely be broken up into different compartments as well, but for now it serves as a placeholder.

Let's explore the benefits of this pipeline design.

Failures are compartmentalised

Ideally, failures and scalability bottlenecks are compartmentalised.

Where there are cascading failures that can't be contained, safeguards can be implemented in the surrounding compartments to dampen the effects1.

For example, if the data storage infrastructure stops returning data, this causes the check infrastructure to return false negatives. Or false positives. Or false UNKNOWNs. Bad times.

We can contain the effects in the event processing infrastructure by detecting a mass failure and only sending out a small number of targeted notifications, rather than sending out alerts for each individual failing check.

This problem is tricky, interesting, and fodder for further blog posts. :-)

Compartments can be scaled independently

Monolithic monitoring architectures are a pain to scale. Viewing a monolithic architecture through the prism of the pipeline model, all of the compartments are squeezed onto a single machine. Quite often there isn't a data collection or storage layer either.

a monolithic monitoring system

Monolithic architectures often use the same moving parts under the hood, but they tend to be very closely entwined. Each tool has very distinct performance characteristics, but because they all run on a single machine and poorly separated, the only way to improve performance is by throwing expensive hardware at the problem.

If you've ever worked with a monolithic monitoring system, you will likely be experiencing painful flashbacks right about now.

To generalise the workload of the different compartments:

  • Check execution, notifications, and event processing tends to be very CPU intensive + network latency sensitive
  • Data storage is IO intensive + disk space expensive

Making sure each compartment is humming along nicely is super important when providing a consistent and reliable monitoring service.

Splitting the compartments onto separate infrastructure enables us to:

  • Optimise the performance of each component individually, either through using hardware that's more appropriate for the workloads (SSDs, multi-CPU physical machines), or tuning the software stack at the kernel and user space level.
  • Expose data through well defined APIs, which leads into the next point:

Clear interfaces are required between compartments

I like to think of this as "the Duplo approach" - compartments with well defined interfaces you can plug together to compose your pipeline.

a Dulpo brick

Clear interfaces abstract the tools used in each compartment of the pipeline, which is essential for chaining tools in a composable way.

Clear interfaces help us:

  • Replace underperforming tools that have reached their scalability limits
  • Test new tools in parallel with the old tools by verifying their inputs + outputs
  • Better identify input that could be considered erroneous, and react appropriately

Concepts like Design by Contract, Service Oriented Architecture, or Defensive Programming then have direct applicability to the design of individual components and the pipeline overall.


It's not all rainbows and unicorns. There are some downsides to the pipeline approach.

Greater Cost

There will almost certainly be a bigger initial investment in building a monitoring system with the pipeline approach.

You'll be using more components, thus more servers, thus the cost is greater. While the cost of scaling out may be greater up-front, you limit the need to scale up later on.

You can counteract some of these effects by starting small and dividing up compartments over time as part of a piecemeal strategy, but this takes time + persistence.

I can tell you from personal project management experience when rolling out of this pipeline design that it's hard work keeping a model of the complexity in your head and also well documented.

More Complexity

The pipeline makes it easier to eliminate scalability bottlenecks at the expense of more moving parts. The more moving parts, the greater the likelihood of failure.

Operationally it will be more difficult to troubleshoot when failures occur, and this becomes worse as you increase the safeguards and fault tolerance within your compartments.

This is the cost of scalability, and there is no easy fix.

Conclusion

The pipeline model maps nicely to existing monitoring infrastructures, but also to larger distributed monitoring systems.

It provides scalability, fault tolerance, and composability at the cost of a larger upfront investment.


1: This is a vast simplification of a very complex topic. Thinking of failure as an energy to be contained by barriers was a popular perspective in accident prevention circles from the 1960's to the 1980's, but the concept doesn't necessarily apply to complex systems.

19 March 2013 ~ Comments Off

Anatomy of a Test Kitchen 1.0 Cookbook (Part 1)

DISCLAIMER Test Kitchen 1.0 is still in alpha at the time of this post.

Update Remove Gemfile and Vagrantfile

Let’s take a look at the anatomy of a cookbook set up with test-kitchen 1.0-alpha.

Note It is outside the scope of this post to discuss how to write minitest-chef tests or “test cookbook” recipes. Use the cookbook described below as an example to get ideas for writing your own.

This is the full directory tree of Opscode’s ”bluepill” cookbook:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
├── .kitchen.yml
├── Berksfile
├── CHANGELOG.md
├── CONTRIBUTING
├── LICENSE
├── README.md
├── TESTING.md
├── attributes
│   └── default.rb
├── metadata.rb
├── providers
│   └── service.rb
├── recipes
│   ├── default.rb
│   └── rsyslog.rb
├── resources
│   └── service.rb
├── templates
│   └── default
│       ├── bluepill_init.fedora.erb
│       ├── bluepill_init.freebsd.erb
│       ├── bluepill_init.rhel.erb
│       └── bluepill_rsyslog.conf.erb
└── test
    └── cookbooks
        └── bluepill_test
            ├── README.md
            ├── attributes
            │   └── default.rb
            ├── files
            │   └── default
            │       └── tests
            │           └── minitest
            │               ├── default_test.rb
            │               └── support
            │                   └── helpers.rb
            ├── metadata.rb
            ├── recipes
            │   └── default.rb
            └── templates
                └── default
                    └── test_app.pill.erb

I’ll assume the reader is familiar with basic components of cookbooks like “recipes,” “templates,” and the top-level documentation files, so let’s trim this down to just the areas of concern for Test Kitchen.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
├── .kitchen.yml
├── Berksfile
└── test
    └── cookbooks
        └── bluepill_test
            ├── attributes
            │   └── default.rb
            ├── files
            │   └── default
            │       └── tests
            │           └── minitest
            │               ├── default_test.rb
            │               └── support
            │                   └── helpers.rb
            ├── recipes
            │   └── default.rb
            └── templates
                └── default
                    └── test_app.pill.erb

Note that this cookbook has a “test” cookbook. I’ll get to that in a minute.

First of all, we have the .kitchen.yml. This is the project definition that describes what is required to run test kitchen itself. This particular file tells Test Kitchen to bring up nodes of the platforms we’re testing with Vagrant, and defines the boxes with their box names and URLs to download. You can view the full .kitchen.yml in the Git repo. For now, I’m going to focus on the suite stanza in the .kitchen.yml. This defines how Chef will run when Test Kitchen brings up the Vagrant machine.

1
2
3
4
5
- name: default
  run_list:
  - recipe[minitest-handler]
  - recipe[bluepill_test]
  attributes: {bluepill: { bin: "/opt/chef/embedded/bin/bluepill" } }

Each platform has a recipe it will run with, in this case apt and yum. Then the suite’s run list is appended, so for example, the final run list of the Ubuntu 12.04 node will be:

1
["recipe[apt]", "recipe[minitest-handler]", "recipe[bluepill_test]"]

We have apt so the apt cache on the node is updated before Chef does anything else. This is pretty typical so we put it in the default run list of each Ubuntu box.

The minitest-handler recipe existing in the run list means that the Minitest Chef Handler will be run at the end of the Chef run. In this case, it will use the tests from the test cookbook, bluepill_test.

The bluepill cookbook itself does not depend on any of these cookbooks. So how does Test Kitchen know where to get them? Enter the next file in the list above, Berksfile. This informs Berkshelf which cookbooks to download. The relevant excerpt from the Berksfile is:

1
2
3
4
cookbook "apt"
cookbook "yum"
cookbook "minitest-handler"
cookbook "bluepill_test", :path => "./test/cookbooks/bluepill_test"

Based on the Berksfile, it will download apt, yum, and minitest-handler from the Chef Community site. It will also use the bluepill_test included in the bluepill cookbook. This is transparent to the user, as I’ll cover in a moment.

Test Kitchen’s Vagrant driver plugin handles all the configuration of Vagrant itself based on the entries in the .kitchen.yml. To get the Berkshelf integration in the Vagrant boxes, we need to install the vagrant-berkshelf plugin in Vagrant. Then, we automatically get Berkshelf’s Vagrant integration, meaning all the cookbooks defined in the Berksfile are going to be available on the box we bring up.

Remember the test cookbook mentioned above? It’s the next component. The default suite in .kitchen.yml puts bluepill_test in the run list. This particular recipe will include the bluepill default recipe, then it sets up a test service using the bluepill_service LWRP. This means that when the nodes brought up by Test Kitchen via Vagrant converge, they’ll have bluepill installed and set up, and then a service running that we can test the final behavior. Since Chef will exit with a non-zero return code if it encounters an exception, we know that a successful run means everything is configured as defined in the recipes, and we can run tests against the node.

The tests we’ll run are written with the Minitest Chef Handler. These are defined in the test cookbook, files/default/tests/minitest directory. The minitest-handler cookbook (also in the default suite run list) will execute the default_test tests.

In the next post, we’ll look at how to run Test Kitchen, and what all the output means.

19 March 2013 ~ Comments Off

Anatomy of a Test Kitchen 1.0 Cookbook (Part 2)

DISCLAIMER Test Kitchen 1.0 is still in alpha at the time of this post.

Update We’re no longer required to use bundler, and in fact recommend installing the required RubyGems in your globalRuby environment (#3 below).

Update The log output from the various kitchen commands is not updated with the latest and greatest. Play along at home, it’ll be okay :-).

This is a continuation from part 1

In order to run the tests then, we need a few things on our machine:

  1. VirtualBox and Vagrant (1.1+)
  2. A compiler toolchain with XML/XSLT development headers (for building Gem dependencies)
  3. A sane, working Ruby environment (Ruby 1.9.3 or greater)
  4. Git

It is outside the scope of this post to cover how to get all those installed.

Once those are installed:

1
2
3
4
% vagrant plugin install vagrant-berkshelf
% gem install berkshelf
% gem install test-kitchen –pre
% gem install kitchen-vagrant

Test Kitchen combines the suite (default) with the platform names (e.g., ubuntu-12.04). To run all the suites on all platforms, simply do:

1
% kitchen test

This will take awhile, especially if you don’t already have the Vagrant boxes on your system, as it will download each one. To make this faster, we’ll just run Ubuntu 12.04:

1
% kitchen test default.*1204

Test Kitchen 1.0 can take a regular expression for the instances to test. This will match the box default-ubuntu-12.04. I could also just say 12 as that will match the single entry in my kitchen list (above).

It will take a few minutes to run Test Kitchen. Those familiar with Chef know that if it encounters an unhandled exception, it exits with a non-zero return code. This is important, because we know at the end of a successful run, Chef did the right thing, assuming our recipe is the right thing :-).

To recap the previous post, we have a run list like this:

1
["recipe[apt]", "recipe[minitest-handler]", "recipe[bluepill_test]"]

Let’s break down the output of our successful run. I’ll show the output first, and explain it after:

1
2
3
4
5
6
Starting Kitchen
Cleaning up any prior instances of <default-ubuntu-1204>
Destroying <default-ubuntu-1204>
Finished destroying <default-ubuntu-1204> (0m0.00s).
Testing <default-ubuntu-1204>
Creating <default-ubuntu-1204>

This is basic setup to ensure that “The Kitchen” is clean beforehand and we don’t have existing state interfering with the run.

1
2
3
4
5
6
[vagrant command] BEGIN (vagrant up default-ubuntu-1204 –no-provision)
[default-ubuntu-1204] Importing base box ‘canonical-ubuntu-12.04’…
[default-ubuntu-1204] Matching MAC address for NAT networking…
[default-ubuntu-1204] Clearing any previously set forwarded ports…
[default-ubuntu-1204] Forwarding ports…
[default-ubuntu-1204] – 22 => 2222 (adapter 1)

This will look familiar to Vagrant users, we’re just getting some basic setup from Vagrant initializing the box defined in the .kitchen.yml (passed to the Vagrantfile by the kitchen-vagrant plugin). This step does a vagrant up –no-provision.

1
2
3
4
5
6
7
8
[Berkshelf] installing cookbooks…
[Berkshelf] Using bluepill (2.2.2) at path: ‘/Users/jtimberman/Development/opscode/cookbooks/bluepill’
[Berkshelf] Using apt (1.8.4)
[Berkshelf] Using yum (2.0.0)
[Berkshelf] Using minitest-handler (0.1.2)
[Berkshelf] Using bluepill_test (0.0.1) at path: ‘./test/cookbooks/bluepill_test’
[Berkshelf] Using rsyslog (1.5.0)
[Berkshelf] Using chef_handler (1.1.0)

Remember from the previous post that we’re using Berkshelf? This is the integration with Vagrant that ensures that the cookbooks are available. The first four, apt, yum, minitest-handler and bluepill_test are defined in the Berksfile. The next, rsyslog is a dependency of the bluepill cookbook (for rsyslog integration), and the last, chef_handler is a dependency of minitest-handler. Berkshelf extracts the dependencies from the cookbook metadata of each cookbook defined in the Berksfile.

1
2
3
4
5
6
7
8
9
10
11
12
13
[default-ubuntu-1204] Creating shared folders metadata…
[default-ubuntu-1204] Clearing any previously set network interfaces…
[default-ubuntu-1204] Running any VM customizations…
[default-ubuntu-1204] Booting VM…
[default-ubuntu-1204] Waiting for VM to boot. This can take a few minutes.
[default-ubuntu-1204] VM booted and ready for use!
[default-ubuntu-1204] Setting host name…
[default-ubuntu-1204] Mounting shared folders…
[default-ubuntu-1204] – v-root: /vagrant
[default-ubuntu-1204] – v-csc-1: /tmp/vagrant-chef-1/chef-solo-1/cookbooks
[vagrant command] END (0m48.76s)
Vagrant instance <default-ubuntu-1204> created.
Finished creating <default-ubuntu-1204> (0m53.12s).

Again, this is familiar output to Vagrant users, where Vagrant is making the cookbooks available to the instance.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Converging <default-ubuntu-1204>
[vagrant command] BEGIN (vagrant ssh default-ubuntu-1204 –command ‘should_update_chef() {\n…’)
Installing Chef Omnibus (11.4.0)
Downloading Chef 11.4.0 for ubuntu…
Installing Chef 11.4.0
Selecting previously unselected package chef.
g database …        60513 files and directories currently installed.)
Unpacking chef (from …/chef_11.4.0_amd64.deb) …
Setting up chef (11.4.0-1.ubuntu.11.04) …
Thank you for installing Chef!
[vagrant command] END (0m34.85s)
[vagrant command] BEGIN (vagrant provision default-ubuntu-1204)
[Berkshelf] installing cookbooks…
[Berkshelf] Using bluepill (2.2.2) at path: ‘/Users/jtimberman/Development/opscode/cookbooks/bluepill’
[Berkshelf] Using apt (1.8.4)
[Berkshelf] Using yum (2.0.0)
[Berkshelf] Using minitest-handler (0.1.2)
[Berkshelf] Using bluepill_test (0.0.1) at path: ‘./test/cookbooks/bluepill_test’
[Berkshelf] Using rsyslog (1.5.0)
[Berkshelf] Using chef_handler (1.1.0)

This part is interesting, in that we’re going to install the Full Stack Chef (Omnibus) package. This means it doesn’t matter what the underlying base box has installed, we get the right version of Chef. This is defined in the .kitchen.yml. This is done through vagrant ssh (second line). Then, Test Kitchen does vagrant provision. The provisioning step is where Berkshelf happens, so we do see this happen again (perhaps a bug?).

1
2
3
4
5
6
7
8
[default-ubuntu-1204] Running provisioner: Vagrant::Provisioners::ChefSolo…
[default-ubuntu-1204] Generating chef JSON and uploading…
[default-ubuntu-1204] Running chef-solo…
INFO: *** Chef 11.4.0 ***
INFO: Setting the run_list to ["recipe[apt]", "recipe[minitest-handler]", "recipe[bluepill_test]"] from JSON
INFO: Run List is [recipe[apt], recipe[minitest-handler], recipe[bluepill_test]]
INFO: Run List expands to [apt, minitest-handler, bluepill_test]
INFO: Starting Chef Run for default-ubuntu-1204.vagrantup.com

This is the start of the actual Chef run, using Chef Solo by Vagrant’s provisioner. Note that we have our suite’s run list. I’m going to skip a lot of the Chef output because it isn’t required. Note that a few resources in the minitest–handler will report as failed, but they can be ignored because it means that those tests were simply not implemented.

1
2
3
4
5
6
7
8
9
INFO: Processing directory[/var/chef/minitest/bluepill_test] action create (minitest-handler::default line 50)
INFO: directory[/var/chef/minitest/bluepill_test] created directory /var/chef/minitest/bluepill_test
INFO: Processing cookbook_file[tests-bluepill_test-default] action create (minitest-handler::default line 53)
INFO: cookbook_file[tests-bluepill_test-default] created file /var/chef/minitest/bluepill_test/default_test.rb
INFO: Processing remote_directory[tests-support-bluepill_test-default] action create (minitest-handler::default line 60)
INFO: remote_directory[tests-support-bluepill_test-default] created directory /var/chef/minitest/bluepill_test/support
INFO: Processing cookbook_file[/var/chef/minitest/bluepill_test/support/helpers.rb] action create (dynamically defined)
INFO: cookbook_file[/var/chef/minitest/bluepill_test/support/helpers.rb] mode changed to 644
INFO: cookbook_file[/var/chef/minitest/bluepill_test/support/helpers.rb] created file /var/chef/minitest/bluepill_test/support/helpers.rb

These are the relevant parts of the minitest-handler recipe, where it has copied the tests from the bluepill_test cookbook into place.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
INFO: Processing gem_package[i18n] action install (bluepill::default line 20)
INFO: Processing gem_package[bluepill] action install (bluepill::default line 24)
INFO: Processing directory[/etc/bluepill] action create (bluepill::default line 34)
INFO: directory[/etc/bluepill] created directory /etc/bluepill
INFO: directory[/etc/bluepill] owner changed to 0
INFO: directory[/etc/bluepill] group changed to 0
INFO: Processing directory[/var/run/bluepill] action create (bluepill::default line 34)
INFO: directory[/var/run/bluepill] created directory /var/run/bluepill
INFO: directory[/var/run/bluepill] owner changed to 0
INFO: directory[/var/run/bluepill] group changed to 0
INFO: Processing directory[/var/lib/bluepill] action create (bluepill::default line 34)
INFO: directory[/var/lib/bluepill] created directory /var/lib/bluepill
INFO: directory[/var/lib/bluepill] owner changed to 0
INFO: directory[/var/lib/bluepill] group changed to 0
INFO: Processing file[/var/log/bluepill.log] action create_if_missing (bluepill::default line 41)
INFO: entered create
INFO: file[/var/log/bluepill.log] owner changed to 0
INFO: file[/var/log/bluepill.log] group changed to 0
INFO: file[/var/log/bluepill.log] mode changed to 755
INFO: file[/var/log/bluepill.log] created file /var/log/bluepill.log

Recall from the previous post that the bluepill_test recipe includes the bluepill recipe. This is the basic setup of bluepill.

1
2
3
4
5
6
7
8
9
INFO: Processing package[nc] action install (bluepill_test::default line 4)
INFO: Processing template[/etc/bluepill/test_app.pill] action create (bluepill_test::default line 16)
INFO: template[/etc/bluepill/test_app.pill] updated content
INFO: Processing bluepill_service[test_app] action enable (bluepill_test::default line 18)
INFO: Processing bluepill_service[test_app] action load (bluepill_test::default line 18)
INFO: Processing bluepill_service[test_app] action start (bluepill_test::default line 18)
INFO: Processing link[/etc/init.d/test_app] action create (/tmp/vagrant-chef-1/chef-solo-1/cookbooks/bluepill/providers/service.rb line 30)
INFO: link[/etc/init.d/test_app] created
INFO: Chef Run complete in 81.099185824 seconds

And this is the rest of the bluepill_test recipe. It sets up a test service that will basically be a netcat process listening on a port. Let’s take a moment here and discuss what we have.

First, we have successfully converged the default recipe in the bluepill cookbook via its inclusion in bluepill_test. This is awesome, because we know the recipe works exactly as we defined it, since Chef resources are declarative, and Chef exits if there’s a problem.

Second, we have successfully setup a service managed by bluepill itself using the LWRP included in the bluepill cookbook, bluepill_service. This means we know that the underlying provider configured all the resources correctly.

At this point, we could say “Ship it!” and release the cookbook, knowing it will do what we require. However, this may be disingenuous because we don’t know if the behavior of the system after all this runs is actually correct. Therefore we look to the next segment of output from Chef, from minitest:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
INFO: Running report handlers
Run options: -v –seed 38794
\# Running tests:
recipe::bluepill_test::default#test_0001_the_default_log_file_must_exist_cook_1295_ =
0.00 s = .
recipe::bluepill_test::default::create a bluepill configuration file#test_0001_anonymous =
0.00 s = .
recipe::bluepill_test::default::create a bluepill configuration file#test_0002_must_be_valid_ruby =
0.06 s = .
recipe::bluepill_test::default::runs the application as a service#test_0001_anonymous =
0.72 s = .
recipe::bluepill_test::default::runs the application as a service#test_0002_anonymous =
0.71 s = .
recipe::bluepill_test::default::spawn a netcat tcp client repeatedly#test_0001_should_receive_a_tcp_connection_from_netcat =
2.24 s = .
Finished tests in 3.746002s, 1.6017 tests/s, 1.8687 assertions/s.
6 tests, 7 assertions, 0 failures, 0 errors, 0 skips

This is performed by the minitest-handler, which runs the tests copied from the bluepill_test cookbook before. It’s outside the scope of this post to describe how to write minitest-chef tests, but we can talk about the output.

We have 6 separate tests that perform 7 assertions, and they all passed. The tests are asserting:

  1. The log file is created, and by the full name of the test, this is to check for a regression from COOK-1295.
  2. The .pill config file for the service must exist and be valid Ruby.
  3. The bluepill service must actually be enabled and running, thereby testing that those actions in the LWRP work.
  4. The running service, which listens on a TCP port, must be up and available, thereby testing that bluepill started the service correctly.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[vagrant command] END (1m29.24s)
Finished converging <default-ubuntu-1204> (2m15.45s).
Setting up <default-ubuntu-1204>
Finished setting up <default-ubuntu-1204> (0m0.00s).
Verifying <default-ubuntu-1204>
Finished verifying <default-ubuntu-1204> (0m0.00s).
Destroying <default-ubuntu-1204>
[vagrant command] BEGIN (vagrant destroy default-ubuntu-1204 -f)
[default-ubuntu-1204] Forcing shutdown of VM…
[Berkshelf] cleaning Vagrant’s shelf
[default-ubuntu-1204] Destroying VM and associated drives…
[vagrant command] END (0m3.68s)
Vagrant instance <default-ubuntu-1204> destroyed.
Finished destroying <default-ubuntu-1204> (0m4.04s).
Finished testing <default-ubuntu-1204> (3m12.62s).
Kitchen is finished. (3m12.62s)

This output shows Test Kitchen cleaning up after itself. We destroy the Vagrant instance on a successful convergence and test run in Chef, because further investigation is not required. If the test failed for some reason, Test Kitchen leaves it running so you can log into the machine and poke around to find out what went wrong. Then simply correct the required part of the cookbook (recipes, tests, etc) and rerun Test Kitchen. For example:

1
2
3
4
% bundle exec kitchen login 1204
vagrant@ubuntu-1204$ … run some commands
vagrant@ubuntu-1204$ ^D
% bundle exec kitchen converge 1204

My goal with these posts is to get some information out for folks to consider when examining Test Kitchen 1.0 alpha for their own projects. There’s a lot more to Test Kitchen, such as managing non-cookbook projects, or even using other kinds of tests. We’ll have more documentation and guides as we get the 1.0 release out.

Enjoy!

19 March 2013 ~ Comments Off

Initial loadays speakers announced

Loadays is coming up soon .. 6 and 7 april ..
Loadays is the Linux and Open Administration conference of the low lands , held in Antwerp, Belgium

We've just published the initial batch of speakers

  • Being a Sysadmin at a company full of Sysadmins (Cody Herriges)
  • OpenNebula Fundamentals (Jaime Melis)
  • Integrate UEFI into REAR (Gratien D'haese)
  • Puppet v3 and Hiera (Garrett Honeycutt)
  • Integrating Linux into an Active Directory domain (Gábor Nyers)
  • Normalised instance provisioning for dev, on-premise and public clouds (Karanbir Singh)
  • C.R.E.A.M : Cache Rules Everything Around Me (Thijs Feryn)
  • OpenLDAP's Lightning Memory-Mapped DB (Howard Chu)
  • What's new in syslog-ng (Peter Czanik)
  • Introduction to Ansible (Jan Piet Mens)
  • Network Block Device: network-based block storage for Linux systems (Wouter Verhelst)

Tutorials

  • Building your Enterprise Cloud with OpenNebula (Jaime Melis)
  • Puppet tutorial (Garrett Honeycutt)
  • Automated Everything - Setting up an openQRM Cloud (Matthias Rechenburg)
  • Tutorial about provisioning and management using Ansible (Dag Wieers - Jeroen Hoekx)

There's more to come !

15 March 2013 ~ Comments Off

Red Green Performance Testing with The Grinder

redgreen

No, not that Red Green!

Even a thoroughly-tested application can wreck havoc if it hasn’t been tested in the context of a production-like system under production-like conditions.

Tools like Puppet and Chef make it easy to produce a production-like environment for testing, but what about the production-like conditions?

One aspect of these conditions can be approximated with load testing tools like JMeter or The Grinder. I recently used The Grinder to troubleshoot a performance problem with a small web application. Here’s a walkthrough of my process.


Getting Started with the Grinder

Like JMeter, The Grinder is a Java-based load testing framework. It can coordinate the execution of a test plan by distributed worker processes for anything with a Java API. I used it to send requests to a web application with several distinct APIs and components.

The three main components of The Grinder are the Console, Agents, and Workers.

Setup

  1. Make sure you have a Java runtime.
  2. Download The Grinder from SourceForge.
  3. Unzip the archive.

You should see something like this:

~   $ tree -L 2
~   .
~   ├── CHANGES
~   ├── LICENSE-HTTPClient
~   ├── README
~   ├── contrib
~   │   └── mq
~   ├── etc
~   │   ├── httpToClojureScript.xsl
~   │   ├── httpToJythonScript.xsl
~   │   ├── httpToJythonScriptOldInstrumentation.xsl
~   │   ├── httpToXML.xsl
~   │   └── tcpproxy-http.xsd
~   ├── examples
~   │   ├── amazon.py
~   │   ├── console.py
~   │   ├── ...
~   │   └── xml-rpc.py
~   └── lib
~       ├── LICENSE-ASM
~       ├── LICENSE-Jetty
~       ├── ...
~       ├── License-ring-json-params
~       ├── asm-3.2.jar
~       ├── cheshire-4.0.0.jar
~       ├── ...
~       └── xmlbeans-2.5.0.jar

Starting the Grinder

You’ll need to create a grinder.properties file. This file is used to configure several properties including the number of worker processes and threads. Here’s a simple one:

# Please refer to
# http://net.grinder.sourceforge.net/g3/properties.html for further
# documentation.
 
# The file name of the script to run.
#
# Relative paths are evaluated from the directory containing the
# properties file. The default is "grinder.py".
grinder.script = grinder.clj
 
# The number of worker processes each agent should start. The default
# is 1.
grinder.processes = 1
 
# The number of worker threads each worker process should start. The
# default is 1.
grinder.threads = 5
 
# The number of runs each worker process will perform. When using the
# console this is usually set to 0, meaning "run until the console
# sneds a stop or reset signal". The default is 1.
grinder.runs = 1
 
### Logging ###
 
# The directory in which worker process logs should be created. If not
# specified, the agent's working directory is used.
grinder.logDirectory = log
 
# The number of archived logs from previous runs that should be kept.
# The default is 1.
grinder.numberOfOldLogs = 2

I also created a couple bash scripts to set CLASSPATH and launch different grinder processes.

Set up environment variables (grinder_env.sh):

#!/bin/bash
GRINDERPATH=$HOME/path/to/grinder-3.11
GRINDERPROPERTIES=$GRINDERPATH/grinder.properties
CLASSPATH=$GRINDERPATH/lib/grinder.jar:$CLASSPATH
PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH PATH GRINDERPROPERTIES

Launch the console (startConsole.sh):

#!/bin/bash
source ./grinder_env.sh
java -classpath $CLASSPATH net.grinder.Console

Launch the agent (startAgent.sh):

#!/bin/bash
source ./grinder_env.sh
java -classpath $CLASSPATH net.grinder.Grinder grinder.properties

Launch the proxy (startProxy.sh):

#!/bin/bash
source ./grinder_env.sh
java -classpath $CLASSPATH net.grinder.TCPProxy -console -http clojure -console > proxy_session.clj

Recording a Session with TCPProxy

Starting the Proxy

Run the startProxy.sh shell script from above. You should see something like this:

TCPProxy_Console

Using the Proxy

Once you’ve started the proxy, configure your browser to send all requests through the proxy. I chose to use Firefox for this because it allowed me to set the proxy at the browser level rather than send all of my HTTP traffic through the proxy.

Firefox_Advanced

Firefox_Proxy

Make sure to disable any extra plugins that might make extra requests not related to the subject under test. Visit pages which will model a typical user’s usage. When you’re done, stop the proxy.

A Clojure Test Script

Here is an example of some of the Clojure code generated by The Grinder.

;; The Grinder 3.11
;; HTTP script recorded by TCPProxy at Mar 14, 2013 1:59:26 AM
 
(ns user
  (:import (net.grinder.script Test Grinder)
           (net.grinder.plugin.http HTTPPluginControl HTTPRequest)
           (HTTPClient NVPair Codecs)))
 
(def grinder (Grinder/grinder))
(def connectionDefaults (HTTPPluginControl/getConnectionDefaults))
(def httpUtilities (HTTPPluginControl/getHTTPUtilities))
 
; To use a proxy server, uncomment the next line and set the host and port.
; (.setProxyServer connectionDefaults "localhost" 8001)
 
; Worker thread state is stored in a map using a dynamic var.
(def ^:dynamic *tokens*)
(defn set-token [k v] (set! *tokens* (assoc *tokens* k v)))
(defn token [k] (*tokens* k))
 
(defn nvpairs [c] (into-array NVPair
  (map (fn [[k v]] (NVPair. k v)) (partition 2 c))))
 
(defn httprequest [url & [headers]]
  (doto (HTTPRequest.) (.setUrl url) (.setHeaders (nvpairs headers))))
 
(defn basic-authorization [u p]
  (str "Basic " (Codecs/base64Encode  (str u ":" p))))
 
(defn to-bytes [s]
  (letfn [(to-byte[x] (byte (if (> x 0x7f) (- x 0x100) x)))]
    (byte-array (map to-byte s))))
 
(defmacro defrequest [name test & args]
  `(do
     (def ~name (httprequest ~@args))
     (.record ~test ~name (HTTPRequest/getHttpMethodFilter))))
 
(defmacro defpage [name description test & rest]
  `(do
     (defn ~name ~description ~@rest)
     (.record ~test ~name)))
 
; Offline debug
; (use '[clojure.string :only (join)])
; (defmacro .GET [& k] `(.. grinder (getLogger) (debug (str "GET " (join ", " `(~~@k))))))
; (defmacro .POST [& k] `(.. grinder (getLogger) (debug (str "POST " (join ", " `(~~@k))))))
 
 
(.setDefaultHeaders connectionDefaults (nvpairs [
  "Accept-Encoding", "gzip, deflate"
  "Accept-Language", "en-US,en;q=0.5"
  "User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:19.0) Gecko/20100101 Firefox/19.0"]))
 
(def headers0 [
  "Accept", "image/png,image/*;q=0.8,*/*;q=0.5"
  "Referer", "http://www.example.com/"])
 
(def headers1 [
  "Accept", "*/*"
  "Referer", "http://www.example.com/"])
 
(def url0 "http://www.example.com:80")
(def url2 "http://ssl.static.example.com:80")
 
(defrequest request101 (Test. 101 "GET /") url0)
 
(defrequest request201 (Test. 201 "POST /") url1)
 
(defrequest request301 (Test. 301 "GET chrome-48.png") url0 headers0)
 
(defrequest request302 (Test. 302 "GET logo4w.png") url0 headers0)
 
(defrequest request401 (Test. 401 "GET rs=AItRSTPdVT73a8ca8dITXjGUdziGAyC2IQ") url0 headers1)
 
(defrequest request501 (Test. 501 "GET rs=AItRSTPdVT73a8ca8dITXjGUdziGAyC2IQ") url0 headers1)
 
(defrequest request502 (Test. 502 "GET tia.png") url0 headers0)
 
(defrequest request503 (Test. 503 "GET b84c02c3b64bf7ed.js") url0 headers1)
 
(defrequest request601 (Test. 601 "GET csi") url0 headers0)
 
(defrequest request602 (Test. 602 "GET nav_logo117.png") url0 headers0)
 
(defrequest request701 (Test. 701 "GET sem_87e2600bd08d93bebd4d641cad5ffb62.js") url2 headers1)
 
 
; A function for each recorded page.
(defpage page1 "GET / (request 101)." (Test. 100 "Page 1") []
  (.GET request101 "/" nil
    (nvpairs [
      "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"]))
)
 
(defpage page3 "GET chrome-48.png (requests 301-302)." (Test. 300 "Page 3") []
  (.GET request301 "/images/icons/product/chrome-48.png")
 
  (.GET request302 "/images/srpr/logo4w.png")
)
 
(defpage page4 "GET rs=AItRSTPdVT73a8ca8dITXjGUdziGAyC2IQ (request 401)." (Test. 400 "Page 4") []
  (set-token :token_rt "j")
  (set-token :token_ver "Za8TToM0_vY.en_US.")
  (set-token :token_am "BA")
  (set-token :token_d "1")
  (set-token :token_sv "1")
  (set-token :token_rs "AItRSTPdVT73a8ca8dITXjGUdziGAyC2IQ")
  (.GET request401
    (str "/xjs/_/js/s/c,sb,cr,cdos,vm,tbui,mb,hov,wobnm,cfm,abd,klc,kat,aut,bihu,kp,lu,m,rtis,tnv,amcl,erh,hv,lc,ob,rsn,sf,sfa,shb,tbpr,hsm,j,p,pcc,csi/rt=" (token :token_rt)
      "/ver=" (token :token_ver)
      "/am=" (token :token_am)
      "/d=" (token :token_d)
      "/sv=" (token :token_sv)
      "/rs=" (token :token_rs)))
)
 
(defpage page5 "GET rs=AItRSTPdVT73a8ca8dITXjGUdziGAyC2IQ (requests 501-503)." (Test. 500 "Page 5") []
  (set-token :token_d "0")
  (.GET request501
    (str "/xjs/_/js/s/sy9,gf,ifl/rt=" (token :token_rt)
      "/ver=" (token :token_ver)
      "/am=" (token :token_am)
      "/d=" (token :token_d)
      "/sv=" (token :token_sv)
      "/rs=" (token :token_rs)))
 
  (.GET request502 "/textinputassistant/tia.png")
 
  (set-token :token_bav "on.2,or.r_qf.")
  (.GET request503
    (str "/extern_chrome/b84c02c3b64bf7ed.js"
      "?bav=" (token :token_bav)))
)
 
(defpage page6 "GET csi (requests 601-602)." (Test. 600 "Page 6") []
  (set-token :token_v "3")
  (set-token :token_s "webhp")
  (set-token :token_action "")
  (set-token :token_e "17259,18168,39523,4000116,4001569,4001947,4001959,4001975,4002206,4002562,4002734,4002855,4003053,4003178,4003386,4003575,4003638,4003917,4004181,4004213,4004235,4004257,4004334,4004356,4004363,4004364,4004388,4004479,4004488,4004490,4004653,4004754,4004758,4004904")
  (set-token :token_ei "EtE-UcqbHKKEygGIg4CoBw")
  (set-token :token_imc "2")
  (set-token :token_imn "2")
  (set-token :token_imp "2")
  (set-token :token_atyp "csi")
  (set-token :token_adh "")
  (set-token :token_rt "xjsls.504,prt.538,xjses.3445,xjsee.3656,xjs.3659,ol.3969,iml.1089,wsrt.1342,cst.0,dnst.0,rqst.1425,rspt.175")
  (.GET request601
    (str "/csi"
      "?v=" (token :token_v)
      "&s=" (token :token_s)
      "&action=" (token :token_action)
      "&e=" (token :token_e)
      "&ei=" (token :token_ei)
      "&imc=" (token :token_imc)
      "&imn=" (token :token_imn)
      "&imp=" (token :token_imp)
      "&atyp=" (token :token_atyp)
      "&adh=" (token :token_adh)
      "&rt=" (token :token_rt)))
 
  (.GET request602 "/images/nav_logo117.png")
)
 
(defpage page7 "GET sem_87e2600bd08d93bebd4d641cad5ffb62.js (request 701)." (Test. 700 "Page 7") []
  (.GET request701 "/gb/js/sem_87e2600bd08d93bebd4d641cad5ffb62.js")
)
 
 
(defn run
  "Called for every run performed by the worker thread." []
 
  (page1)      ; GET / (request 101)
 
  (.sleep grinder 246)
  (page3)      ; GET chrome-48.png (requests 301-302)
 
  (.sleep grinder 32)
  (page4)      ; GET rs=AItRSTPdVT73a8ca8dITXjGUdziGAyC2IQ (request 401)
 
  (.sleep grinder 2899)
  (page5)      ; GET rs=AItRSTPdVT73a8ca8dITXjGUdziGAyC2IQ (requests 501-503)
 
  (.sleep grinder 249)
  (page6)      ; GET csi (requests 601-602)
 
  (.sleep grinder 358)
  (page7)      ; GET sem_87e2600bd08d93bebd4d641cad5ffb62.js (request 701)
)
 
(defn runner-factory
  "Create a run function. Called for each worker thread." []
  (binding [*tokens* {}] (bound-fn* run)))

After recording your session, you can modify this script to eliminate any requests you want to exclude from your test.

Running the Test

First, start the Console:

./startConsole.sh

Then, start the Agent:

./startAgent.sh

From the Console, you can star the grinder.properies file to mark it for use:
The_Grinder_Console_star

And edit your grinder.properties to point to your test script:
The_Grinder_Console_testscript

Depending on what you’ve done, you may need to reset the Agent(s) at this point (I usually want to reset the Console, too):
The_Grinder_Console_resetprocesses

Then you can distribute files to the Agent(s) – this includes the test script specified in our grinder.properties file:
The_Grinder_Console_distributefiles

Make it Fail

Turn up the number of threads and/or worker processes until the load replicates the failure case. As Red Green says, “If it ain’t broke, you’re not trying!”

Make sure to consider the whole system at this point because it’s easy to fool yourself into thinking you’ve crushed the server under heavy load when really you’ve only sapped the resources of you agents or local network.

In my case, I was trying to model the load of approximately 30 roughly concurrent requests for the same set of resources.
Due to interactions between several system components and a broken caching mechanism, this was causing the app to become unresponsive for several minutes.
My test script was able to model this failure quite well.

Go Green

Using The Grinder, I was able to model this failure well enough to test several configuration changes as well as a replacement caching mechanism. When the system was able to withstand the load of of the test (the test passed), I was confident that the changes were likely to work in production.

Summary

By first creating a failing test for the scenario of a complete system under load, I gained confidence that configuration changes I deployed to production would solve the problem. This was relatively a rudimentary example. What tools and techniques does your team use to test system integration at this level?
 

The post Red Green Performance Testing with The Grinder appeared first on Atomic Spin.

14 March 2013 ~ Comments Off

Rebooting Flapjack

This is the first time I've actually blogged about Flapjack.

The past

In 2008 I started talking with Matt Moor about building a "next generation monitoring system" that would be simple to setup & operate, and provide obvious paths to scale.

In 2009 I started hacking on Flapjack while backpacking, and by mid 2009 I had a working prototype running basic monitoring checks.

The fundamental idea was simple: decouple the check execution from the alerting and notification, and use message queues to distribute the check execution across lots of machines.

It seems simple and obvious now, but at the time nobody was really talking about doing this, so Flapjack gathered a reasonable amount of attention relatively quickly after I started talking about it at conferences.

2010 rolled around and I was unable to maintain a good development pace and hold that attention gained by talking at conferences due to some fairly significant life changes. Pretty much all of my open source projects suffered, and in the space of 12 months:

There were plenty of other interesting projects like Sensu that were achieving similar goals excellently, so while winding up Flapjack was a source of bitter personal disappointment, it was offset by seeing other people doing awesome work in the monitoring space.

The present

Mid last year, an interesting problem arose at work:

In a modern "monitoring system", how do you:

  • Notify a dynamic group of people on a variety of media based on monitoring events? Bulletproof has thousands of people that may need to be notified by our monitoring system, depending on what monitoring checks are failing. While the thresholds on each monitoring check are universal, each of these people can have different notification settings based on time of day or week, the type of service affected, or the severity of the failure.

  • Dampen or roll up common events so on-call isn't bombarded during outages? When one system deep in the stack fails, it has significant flow-on effects to everything else that depends on it. This generally manifests as thousands (or tens of thousands, in extremely bad cases) of alerts being sent to on-call in a very short period of time (<60 seconds). Obviously this is bad, and we simply want to detect cases like these, and wake up people involved in the incident response process.

  • Do the above in an API driven way? We need to solve both problems in a way that works in a multitenant environment with strong segregation between customers, and integrates with an existing monitoring & customer self-service stack.

Thus, Flapjack was rebooted with a significantly altered focus:

  • Event processing
  • Correlation & rollup
  • API driven configuration

We've been actively working on the reboot since July last year, and have been sending alerts from Flapjack to customers since January.

We're developing Flapjack as a fully Open Source composable platform on which you can adapt and build to your organisation's needs by hooking it into your existing check execution infrastructure (we ship a Nagios event processor), and self service and provisioning automation tools.

Because we care deeply about people integrating Flapjack into their existing environments, we have invested a lot of time and energy into writing quality documentation that covers working with the API, debugging production issues, and the data structures used behind the scenes. That's all on top of the usage documentation, of course.

Flapjack is built on Redis, and funnily enough R.I. Pienaar did a post earlier this year that investigates using Redis to solve the same problem in an extremely similar way. R.I.'s post provides a good primer on some of the thinking behind Flapjack, so I recommend giving it a read.

The future

Fundamentally, Flapjack is trying to plug a notification hole in the monitoring ecosystem that I don't believe is being adequately addressed by other tools, but the key to doing this is to play nicely with other tools and build a composable pipeline.

The above is merely a glimpse of Flapjack that leaves quite a few questions unanswered (e.g. "Why aren't you using $x feature of $y check execution engine to do roll-up?", "Do Flapjack and Riemann play nicely with one another?"), so stay tuned for more:

more waffles

12 March 2013 ~ Comments Off

Devopsdays Paris For Real!

Paris is more and more becoming the DevOps place to be. We (apparently) successfully rebooted the Paris DevOps Meetups, with already two events so far, and two more already in the pipeline (stay tuned for the announcements).

We’re now announcing and pushing hard the Paris edition of devopsdays (well we’re quite under the shadow of devopsdays London at this time; I take this as an extension to our long time friendly fight with Britons :)

The Paris edition of DevOpsDays is being held 18 - 19 April 2013, and we want you to be a part of it! This conference brings together speakers and attendees from around the world, with a focus on DevOps culture, techniques, and best practices.

devopsdays Paris

The format is simple: talks in the morning, and open/hack spaces in the afternoon. We’ve done 17 very successful events on 5 continents, and we’re looking forward to another great edition here!

Perhaps you’re curious, “what exactly is DevOps?”

Well if you already follow this blog, or my twitter account you might already know what is under this term. The term is, of course, a portmanteau of Development and Operations, and is perhaps best thought of as a cultural movement within the IT world. It stresses communication, collaboration and integration between software developers and IT professionals. DevOps is a response to, and evolution of, the interdependence of software development and IT operations.

The conference itself will be held at the MAS. Tickets can be purchased for one or both days, and include full access to the talks and spaces, as well as a catered lunch.

What’s more, we’re currently offering 25% off of the ticket price - just use the code WELOVEDEVOPS when you register. This is a limited-time offer (until the end of this week), so don’t delay!

And, of course, the Call for Proposals is still open until 20 March 2013.

Finally, we invite you to peruse the list of proposals, and to comment and vote for your favorite ones!

So if you had to choose one devopsdays this year, choose ours and come

  • exchange with lots of talented French people
  • taste French food and wine
  • learn how we do devops in 3 hours of work per day (just kidding of course)
  • smell the fragrance of Paris in spring (there won’t be any more snow, I promise)
  • visit the City of Lights

Looking forward to seeing you in April!

07 March 2013 ~ Comments Off

Boostrapping A Meetup

04 March 2013 ~ Comments Off

Upcoming speaking engagements and travel

My next 2 months is going to be jam packed with conferences and travel!

  • Devopsdays NZ, March 8 2013. I will be giving a talk that analyses AA261 through a DevOps lense, looking at the collaborative maintenance and operation of the MD-83 in the crash.
  • Monitorama, March 28-29 2013. I'm looking forward to slowing down and listening at Monitorama, which has a tremendous line up of speakers. I'll be keen to hear what others think of the work we've been doing on Flapjack the last 6 months.
  • Mountain West Ruby Conf 2013, April 3-5 2013. MWRC has added an extra day of DevOps content to the conference this year, and I'll be joining an esteemed speaker lineup to talk about what both dev and ops can learn from AF447 when responding to rapidly evolving failure scenarios.
  • I'll be staying in the Netherlands for a little under a week between conferences, visiting family and friends. Hopefully I can visit a meetup or two.
  • Open Source Data Center Conference 2013, April 17-18 2013. This will be my first time in Nürenberg, and I'm really looking forward to saying I have attended both OSDCs. I'll be talking about Ript, a DSL for describing firewall rules, and a tool for incrementally applying them.
  • Puppet Camp Nürenberg 2013, April 19 2013. Straight after OSDC I'll be talking about how we are using Puppet at Bulletproof Networks in multi-tenant, isolated environments.