Xen Live Migration with MCollective

I retweeted this on twitter, but it’s just too good to not show. Over at rottenbytes.com Nicolas is showing some proof of concept code he wrote with MCollective that monitors the load on his dom0 machines and initiate live migrations of virtual machines to less loaded servers.

This is the kind of crazy functionality I wanted to enable with MCollective and it makes me very glad to see this kind of thing. The server side and client code combined is only 230 lines – very very impressive.

This is a part of what VMWare DRS does Nico has some ideas to add other sexy features as well as this was just a proof of concept. The logic for what to base migrations on will be driven by a small DSL for example.

I asked him how long it took to knock this together: time taken to get acquainted with MCollective combined with time to write the agent and client was only 2 days, that’s very impressive. He already knew Ruby well though :) And has a Ruby gem to integrate with Xen.

I’m copying the output from his code below, but absolutely head over to his blog to check it out he has the source up there too:

[mordor:~] ./mc-xen-balancer
[+] hypervisor2 : 0.0 load and 0 slice(s) running
[+] init/reset load counter for hypervisor2
[+] hypervisor2 has no slices consuming CPU time
[+] hypervisor3 : 1.11 load and 3 slice(s) running
[+] added test1 on hypervisor3 with 0 CPU time (registered 18.4 as a reference)
[+] added test2 on hypervisor3 with 0 CPU time (registered 19.4 as a reference)
[+] added test3 on hypervisor3 with 0 CPU time (registered 18.3 as a reference)
[+] sleeping for 30 seconds
 
[+] hypervisor2 : 0.0 load and 0 slice(s) running
[+] init/reset load counter for hypervisor2
[+] hypervisor2 has no slices consuming CPU time
[+] hypervisor3 : 1.33 load and 3 slice(s) running
[+] updated test1 on hypervisor3 with 0.0 CPU time eaten (registered 18.4 as a reference)
[+] updated test2 on hypervisor3 with 0.0 CPU time eaten (registered 19.4 as a reference)
[+] updated test3 on hypervisor3 with 1.5 CPU time eaten (registered 19.8 as a reference)
[+] sleeping for 30 seconds
 
[+] hypervisor2 : 0.16 load and 0 slice(s) running
[+] init/reset load counter for hypervisor2
[+] hypervisor2 has no slices consuming CPU time
[+] hypervisor3 : 1.33 load and 3 slice(s) running
[+] updated test1 on hypervisor3 with 0.0 CPU time eaten (registered 18.4 as a reference)
[+] updated test2 on hypervisor3 with 0.0 CPU time eaten (registered 19.4 as a reference)
[+] updated test3 on hypervisor3 with 1.7 CPU time eaten (registered 21.5 as a reference)
[+] hypervisor3 has 3 threshold overload
[+] Time to see if we can migrate a VM from hypervisor3
[+] VM key : hypervisor3-test3
[+] Time consumed in a run (interval is 30s) : 1.7
[+] hypervisor2 is a candidate for being a host (step 1 : max VMs)
[+] hypervisor2 is a candidate for being a host (step 2 : max load)
trying to migrate test3 from hypervisor3 to hypervisor2 (10.0.0.2)
Successfully migrated test3 !

Mcollective & Xen : naughty things

eth0I already blogged about my experiments with mcollective & xen but I had something a little bigger in my mind. A friend had sent me a video showing some vmware neat features (DRS mainly) with VMs migrating through hypervisors automatically.

So I wrote a “proof of concept” of what you can do with an awesome tool like mcollective. The setup of this funny game is the following :

  • 1 box used a iSCSI target that serves volumes to the world
  • 2 xen hypervisors (lenny packages) using open-iscsi iSCSI initiator to connect to the target. VMs are stored in LVM, nothing fancy

The 3 boxens are connected on a 100Mb network and the hypervisors have an additionnal gigabit network card with a crossover cable to link them (yes, this is a lab setup). You can find a live migration howto here.

For the mcollective part I used my Xen agent (slightly modified from the previous post to support migration), which is based on my xen gem. The client is the largest part of the work but it’s still less than 200 lines of code. It can (and will) be improved because all the config is hardcoded. It would also deserve a little DSL to be able to handle more “logic” than “if load is superior to foo” but as I said before, it’s a proof of concept.

Let’s see it in action :

hypervisor2:~# xm list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0   233     2     r-----    873.5
hypervisor3:~# xm list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0   232     2     r-----  78838.0
test1                                        6   256     1     -b----     18.4
test2                                        4   256     1     -b----     19.3
test3                                       20   256     1     r-----     11.9

test3 is a VM that is “artificially” loaded, as is the machine “hypervisor3″ (to trigger migration)

[mordor:~] ./mc-xen-balancer
[+] hypervisor2 : 0.0 load and 0 slice(s) running
[+] init/reset load counter for hypervisor2
[+] hypervisor2 has no slices consuming CPU time
[+] hypervisor3 : 1.11 load and 3 slice(s) running
[+] added test1 on hypervisor3 with 0 CPU time (registered 18.4 as a reference)
[+] added test2 on hypervisor3 with 0 CPU time (registered 19.4 as a reference)
[+] added test3 on hypervisor3 with 0 CPU time (registered 18.3 as a reference)
[+] sleeping for 30 seconds

[+] hypervisor2 : 0.0 load and 0 slice(s) running
[+] init/reset load counter for hypervisor2
[+] hypervisor2 has no slices consuming CPU time
[+] hypervisor3 : 1.33 load and 3 slice(s) running
[+] updated test1 on hypervisor3 with 0.0 CPU time eaten (registered 18.4 as a reference)
[+] updated test2 on hypervisor3 with 0.0 CPU time eaten (registered 19.4 as a reference)
[+] updated test3 on hypervisor3 with 1.5 CPU time eaten (registered 19.8 as a reference)
[+] sleeping for 30 seconds

[+] hypervisor2 : 0.16 load and 0 slice(s) running
[+] init/reset load counter for hypervisor2
[+] hypervisor2 has no slices consuming CPU time
[+] hypervisor3 : 1.33 load and 3 slice(s) running
[+] updated test1 on hypervisor3 with 0.0 CPU time eaten (registered 18.4 as a reference)
[+] updated test2 on hypervisor3 with 0.0 CPU time eaten (registered 19.4 as a reference)
[+] updated test3 on hypervisor3 with 1.7 CPU time eaten (registered 21.5 as a reference)
[+] hypervisor3 has 3 threshold overload
[+] Time to see if we can migrate a VM from hypervisor3
[+] VM key : hypervisor3-test3
[+] Time consumed in a run (interval is 30s) : 1.7
[+] hypervisor2 is a candidate for being a host (step 1 : max VMs)
[+] hypervisor2 is a candidate for being a host (step 2 : max load)
trying to migrate test3 from hypervisor3 to hypervisor2 (10.0.0.2)
Successfully migrated test3 !

Let’s see our hypervisors :

hypervisor2:~# xm list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0   233     2     r-----    878.9
test3                                       25   256     1     -b----      1.1
hypervisor3:~# xm list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0   232     2     r-----  79079.3
test1                                        6   256     1     -b----     18.4
test2                                        4   256     1     -b----     19.4

A little word about configuration options :

  • interval : the poll time in seconds.  this should not be too low, let the machine some time and avoid load peeks to distort the logic.
  • load_threshold : where you consider the machine load is too high and that it is time to move some stuff away (tampered with max_over, see below)
  • daemonize : not used yet
  • max_over : maximum time (in minutes) where load should be superior to the limit. When reached, it’s time, really. Don’t set it too low and at least 2*interval or sampling will not be efficient
  • debug : well….
  • max_vm_per_host : the maximum VMs a host can handle. If a host already hit this limit it will not be candidate for receiving a VM
  • max_load_candidate : same thing as above, but for the load
  • host_mapping : a simple CSV file to handle non-DNS destinations (typically my crossover cable address have no DNS entries)

What is left to do :

  • Add some barriers to avoid migration madness to let load go down after a migration or to avoid migrating a VM permanently
  • Add a DSL to insert some more logic
  • Write a real client, not a big fat loop

Enjoy the tool !

Files :

Authorization plugins for MCollective SimpleRPC

Till now The Marionette Collective has relied on your middleware to provide all authorization and authentication for requests. You’re able to restrict certain middleware users from certain agents, but nothing more fine grained.

In many cases you want to provide much finer grain control over who can do what, some cases could be:

  • A certain user can only request service restarts on machines with a fact customer=acme
  • A user can do any service restart but only on machines that has a certain configuration management class
  • You want to deny all users except root from being able to stop services, others can still restart and start them

This kind of thing is required for large infrastructures with lots of admins all working in their own group of machines but perhaps a central NOC need to be able to work on all the machines, you need fine grain control over who can do what and we did not have this will now. It would also be needed if you wanted to give clients control over their own servers but not others.

Version 0.4.5 will have support for this kind of scheme for SimpleRPC agents. We wont provide a authorization plugin out of the box with the core distribution but I’ve made one which will be available as a plugin.

So how would you write an auth plugin, first a typical agent would be:

module MCollective
    module Agent
         class Service<RPC::Agent
             authorized_by :action_policy
 
             # ....
         end
    end
end

The new authorized_by keyword tells MCollective to use the class MCollective::Util::ActionPolicy to do any authorization on this agent.

The ActionPolicy class can be pretty simple, if it raises any kind of exception the action will be denied.

module MCollective
    module Util
         class ActionPolicy
              def self.authorize(request)
                  unless request.caller == "uid=500"
                      raise("You are not allow access to #{request.agent}::#{request.action}")
                  end
              end
         end
    end
end

This simple check will deny all requests from anyone but Unix user id 500.

It’s pretty simple to come up with your own schemes, I wrote one that allows you to make policy files like the one below for the service agent:

policy default deny
allow   uid=500 *                    *                *
allow   uid=502 status               *                *
allow   uid=600 *                    customer=acme    acme::devserver

This will allow user 500 to do everything with the service agent. User 502 can get the status of any service on any node. User 600 will be able to do any actions on machines with the fact customer=acme that also has the configuration management class acme::devserver on them. Everything else will be denied.

You can do multiple facts and multiple classes in a simple space separated list. The entire plugin to implement such policy controls was only 120 – heavy commented – lines of code.

I think this is a elegant and easy to use layer that provides a lot of functionality. We might in future pass more information about the caller to the nodes. There’s some limitations, specifically about the source of the caller information being essentially user provided so you need to keep that mind.

As mentioned this will be in MCollective 0.4.5.

Meet the marionette

eth0Another cool project I keep an eye on for some weeks is “the marionette collective“, aka mcollective. This project is leaded & develloped by R.I. Pienaar, one of the most active people in the puppet world too.

Mcollective is an framework for distributed sysadmin. It relies on a messaging framework and has many features included : flexibility, speed, easy to understand.

Some time ago, I had wrote a tool called “whosyourdaddy” to help me (and my memory as big as a goldfish one) to find on which Xen dom0 a Xen domU was living. It worked fine, expect the fact that is was not dynamic : if a VM was migrated  from a dom0 to another, I had to update the CMDB. Not really reliable (if an update fails the CMDB is no more accurate) and I didn’t want to have to embed this constraint in the Xen logic. So I decided to try out to write my own mcollective agent and here it is ! It is built on top of a (very) small ruby module for xen and has it own client.

You can find on which dom0 a domU resides :

master1:~# ./mc-xen -a find --domu test
hypervisor2              : Absent
hypervisor1              : Absent
master1:~# ./mc-xen -a find --domu domu2
hypervisor2              : Present
hypervisor1              : Absent

Or list your domUs :

master1:~# ./mc-xen -a list
hypervisor2              
 domu2

hypervisor1              
 no domU running

Download the agent & the client

MCollective Release 0.4.4

I just released version 0.4.4 of The Marionette Collective. This release is primarily a bug fix release addressing issues with log files and general code cleanups.

The biggest change in this release is that controlling the daemon has become better, you can ask it to reload an agent or all agents and a few other bits. Read all about it on the wiki..

Please see the Release Notes, Changelog and Download List for full details.

For background information about the MCollective project please see the project website.

London DevOps Meetup

Since the inaugral DevopsDays conference in Gent, Belgium, a few of us London-based devops have been gathering once a month to eat, chat, swap stories and generally encourage one another. Since we’ve sustained this for four months, we concluded that we have a kernel of people who would turn out for a monthly meeting, and decided that this month we’d throw it open to the world!

If you don’t know what this whole Devops thing is, I suggest you take a look at my article ‘What is this Devops thing anyway?’, and also James Turnbull’s (excellent) ‘What Devops means to me’. In short, we’re trying to build a community, a movement even, of cross disciplinary people - people who can program, and manage systems, people who understand that in the software industry, coders, testers, analysts, and sysadmins are all on the same team. If this sounds like you - if you want to make a difference, and would like to meet similarly motivated people, you’d be very welcome.

There’s no particular technological bias - we generally like Puppet or Chef, we largely run our infrastructures on Linux, and there’s a tendency to program in Ruby or Python, but I think this is just representative of the places we work and the people we know. So, if you’re interested in bridging the gap between devs and ops, and if you’re interested in how to build infrastructures in an agile way, we’d love to see you.

The intended format is straightforward - every other month we’ll have a social, drinks-based event, in a hostelry of choice, and interleave this with a talk-based event, kindly sponsored by The Guardian at their awesome new premises in Kings Cross.

Tomorrow’s event is at The Priory Bar in Clerkenwelld, starting at about 1800.

The Priory Bar

Don’t worry if you can’t make it that early - it’ll be totally ok to turn up at any time - just depending on what you have on, and how far you have to travel. There’s no fixed agenda, just a chance to meet some cool people, chat and drink.

We’ll be following this up with a technical meeting on the 2nd of March, at the London ThoughtWorks office in Holborn. We’ve not yet decided what we’ll do there, so if you have any ideas, let us know.

Please let others know about the event - get tweeting and blogging - the hashtag is #ldndevops. In the meantime, take a look at our London Devops planet, and if you like irc, pop in and see us on freenode in the ##infra-talk channel.

See you on Wednesday!

London DevOps Meetup

Since the inaugral DevopsDays conference in Gent, Belgium, a few of us London-based devops have been gathering once a month to eat, chat, swap stories and generally encourage one another. Since we’ve sustained this for four months, we concluded that we have a kernel of people who would turn out for a monthly meeting, and decided that this month we’d throw it open to the world!

If you don’t know what this whole Devops thing is, I suggest you take a look at my article ‘What is this Devops thing anyway?’, and also James Turnbull’s (excellent) ‘What Devops means to me’. In short, we’re trying to build a community, a movement even, of cross disciplinary people - people who can program, and manage systems, people who understand that in the software industry, coders, testers, analysts, and sysadmins are all on the same team. If this sounds like you - if you want to make a difference, and would like to meet similarly motivated people, you’d be very welcome.

There’s no particular technological bias - we generally like Puppet or Chef, we largely run our infrastructures on Linux, and there’s a tendency to program in Ruby or Python, but I think this is just representative of the places we work and the people we know. So, if you’re interested in bridging the gap between devs and ops, and if you’re interested in how to build infrastructures in an agile way, we’d love to see you.

The intended format is straightforward - every other month we’ll have a social, drinks-based event, in a hostelry of choice, and interleave this with a talk-based event, kindly sponsored by The Guardian at their awesome new premises in Kings Cross.

Tomorrow’s event is at The Priory Bar in Clerkenwelld, starting at about 1800.

The Priory Bar

Don’t worry if you can’t make it that early - it’ll be totally ok to turn up at any time - just depending on what you have on, and how far you have to travel. There’s no fixed agenda, just a chance to meet some cool people, chat and drink.

We’ll be following this up with a technical meeting on the 2nd of March, at the London ThoughtWorks office in Holborn. We’ve not yet decided what we’ll do there, so if you have any ideas, let us know.

Please let others know about the event - get tweeting and blogging - the hashtag is #ldndevops. In the meantime, take a look at our London Devops planet, and if you like irc, pop in and see us on freenode in the ##infra-talk channel.

See you on Wednesday!