Category Archives: virtualization

Docker vs Reality , 0 – 1

(aka the opinionated summary of the #devopsdays London November OpenSpace on , Containers and the new flood of Image Sprawl)

There's a bunch of people out there that think I don't like docker, they are wrong.

I just never understood the hype about it since I didn't see, (and still don't) see it being used at large and people seem to understand that as being against it.

So let me put a couple of things straight :

There's absolutely nothing wrong with using a container based approach when deploying your infrastructure. If you remember my talks about the rise of Open Source Virtualization some years ago you've noticed that I've always mentioned OpenVZ and friends as good alternatives if you wanted to have a lot of isolated platforms on one machine. LXC and friends have grown .. they are even more usable these days. Years ago people bought bare metal and ran Hypervisors on it to isolate resources. These days people rent VM's and also want the same functionality so the use of the combination of Virtualization and Container based technologies is a very good match there.

There's also nothing wrong with using Infrastructure as Code tools to build an reproducable image you are going to deploy will provide you with a disposable image which allows you to quickly launch a reproducable and versionned platform for your application if that application is supposed to be shortlived. The tooling around today is not yet there to have these images long lived as you still need to manage the config inside the containers as your application will evolve, it will change, your environment will change (think even about changing to a different loghost..) , but when you don't have to keep state you can dispose the image and redeploy a new reproducable one.

In the embedded world, this kind of approach with multiple banks has been a round for a while , one image running, a second bank as a fallback, and when you upgrade the passive bank you can swap the roles and still have roll back.

There's is also nothing wrong on combining these to approaches and using tools such as Docker and Packer.

But there is lot wrong with building images that then start living their own life, tools like Veewee etc saw the light to create an easy way to make sure the JeOS image (Just Enough Operating System) we created was reproducable, not to ship around virtual appliances.

But, lets be realistic, the number of applications that are suitable for this kind of environment is small. Most applications these days are still very statefull, and when your application contains state you need to manage that
that state, you can't just dispose an image which has state. Specially in an Enterprise environment stateless, immutable applications are really the exception rather than the rule.

When your application maps with stateless and short lived, or a some people like to call it Immutable please do so.. but if it doesn't please remember that we started using configuration management tools like CFengine, Puppet and Chef to prevent Image Sprawl and Config Drift
There's proprietary businesses out there building tools to detect config drift and extort organisations to solve problems that shouldn't have existed in the first place.

Luckily the majority of smart people I've spoken to over the past couple of weeks pretty much confirmed this ...
Like one of the larger devops minded appliation hosting outsourcers in emea, I asked them how much % of their customer base they could all "Immutable" , exactly 0% was the answer.

Image Based Container solutions are definitely not a one size fits all solution, and we have along way to go before we get there if at all ..

Till then I like not to diffuse my attention to too many different types of deploying platforms, just not to make stuff more complex than it already is...as complexity is the enemy of reliability

Ganeti Tutorial PDF guide

As I mentioned in my previous blog post, trying out Ganeti can be cumbersome and I went out and created a platform for testing it out using Vagrant. Now I have a PDF guide that you can use to walk through some of the basics steps of using Ganeti along with even testing a fail-over scenario. Its an updated version of a guide I wrote for OSCON last year. Give it a try and let me know what you think!

Ganeti Tutorial PDF guide

As I mentioned in my previous blog post, trying out Ganeti can be cumbersome and I went out and created a platform for testing it out using Vagrant. Now I have a PDF guide that you can use to walk through some of the basics steps of using Ganeti along with even testing a fail-over scenario. Its an updated version of a guide I wrote for OSCON last year. Give it a try and let me know what you think!

My PHPNE Talk on Vagrant

I’m not a public speaker. In fact, i’m normally found either sitting at the back behind a sound desk or running round fixing technical problems.

However, Anthony Sterling approached me the week before the May 2012 PHP North East meetup and asked if i’d do a talk on Vagrant.

In order to give something back to the group, and knowing it was a product I really like that i’d be able to come up with plenty of content, I agreed.

I titled the talk Virtualized Development Environments with Vagrant so I could go into the background and why you would want to use such a tool, as well as how to use it. I also gave a brief introduction to Intrastructure as Code and Configuration Management with Puppet and Chef.

The full contents were:

  • Introduction to:
    • Development Environments
    • Virtualization
  • Why virtualize your development environment?
  • Introduction to Vagrant
  • Using Vagrant
  • Vagrant Plugins
  • Automated Provisioning (Configuration Management)

Watching it back, my public speaking and presentation skills do need quite a bit of work. However, given that it was a first attempt at public speaking, which is not natural to me, hopefully it was a good attempt. The feedback was good, plenty of people had positive things to say, questions and further discussion in the pub afterwards so that nice. I now also have some points to improve on.

The talk was video’d and available on Vimeo:

And the slides available on Slideshare or SpeakerDeck:

Trying out Ganeti with Vagrant

Ganeti is a very powerful tool but often times people have to look for spare hardware to try it out easily. I also wanted to have a way to easily test new features of Ganeti Web Manager (GWM) and Ganeti Instance Image without requiring additional hardware. While I do have the convenience of having access to hardware at the OSU Open Source Lab to do my testing, I’d rather not depend on that always. Sometimes I like trying new and crazier things and I’d rather not break a test cluster all the time. So I decided to see if I could use Vagrant as a tool to create a Ganeti test environment on my own workstation and laptop.

This all started last year while I was preparing for my OSCON tutorial on Ganeti and was manually creating VirtualBox VMs to deploy Ganeti nodes for the tutorial. It worked well but soon after I gave the tutorial I discovered Vagrant and decided to adapt my OSCON tutorial with Vagrant. Its a bit like the movie Inception of course, but I was able to successfully get Ganeti working with Ubuntu and KVM (technically just qemu) and mostly functional VMs inside of the nodes. I was also able to quickly create a three-node cluster to test failover with GWM and many facets of the webapp.

The vagrant setup I have has two parts:

  1. Ganeti Tutorial Puppet Module
  2. Ganeti Vagrant configs

The puppet module I wrote is very basic and isn’t really intended for production use. I plan to re-factor it in the coming months into a completely modular production ready set of modules. The node boxes are currently running Ubuntu 11.10 (I’ve been having some minor issues getting 12.04 to work), and the internal VMs you can deploy are based on the CirrOS Tiny OS. I also created several branches in the vagrant-ganeti repo for testing various versions of Ganeti which has helped the GWM team implement better support for 2.5 in the upcoming release.

To get started using Ganeti with Vagrant, you can do the following:

git clone git://github.com/ramereth/vagrant-ganeti.git
git submodule update --init
gem install vagrant
vagrant up node1
vagrant ssh node1
gnt-cluster verify

Moving forward I plan to implement the following:

  • Update tutorial documentation
  • Support for Xen and LXC
  • Support for CentOS and Debian as the node OS

Please check out the README for more instructions on how to use the Vagrant+Ganeti setup. If you have any feature requests please don’t hesitate to create an issue on the github repo.

Trying out Ganeti with Vagrant

Ganeti is a very powerful tool but often times people have to look for spare hardware to try it out easily. I also wanted to have a way to easily test new features of Ganeti Web Manager (GWM) and Ganeti Instance Image without requiring additional hardware. While I do have the convenience of having access to hardware at the OSU Open Source Lab to do my testing, I'd rather not depend on that always. Sometimes I like trying new and crazier things and I'd rather not break a test cluster all the time. So I decided to see if I could use Vagrant as a tool to create a Ganeti test environment on my own workstation and laptop.

This all started last year while I was preparing for my OSCON tutorial on Ganeti and was manually creating VirtualBox VMs to deploy Ganeti nodes for the tutorial. It worked well but soon after I gave the tutorial I discovered Vagrant and decided to adapt my OSCON tutorial with Vagrant. Its a bit like the movie Inception of course, but I was able to successfully get Ganeti working with Ubuntu and KVM (technically just qemu) and mostly functional VMs inside of the nodes. I was also able to quickly create a three-node cluster to test failover with GWM and many facets of the webapp.

The vagrant setup I have has two parts:

  1. Ganeti Tutorial Puppet Module
  2. Ganeti Vagrant configs

The puppet module I wrote is very basic and isn't really intended for production use. I plan to re-factor it in the coming months into a completely modular production ready set of modules. The node boxes are currently running Ubuntu 11.10 (I've been having some minor issues getting 12.04 to work), and the internal VMs you can deploy are based on the CirrOS Tiny OS. I also created several branches in the vagrant-ganeti repo for testing various versions of Ganeti which has helped the GWM team implement better support for 2.5 in the upcoming release.

To get started using Ganeti with Vagrant, you can do the following:

git clone git://github.com/ramereth/vagrant-ganeti.git
git submodule update --init
gem install vagrant
vagrant up node1
vagrant ssh node1
gnt-cluster verify

Moving forward I plan to implement the following:

  • Update tutorial documentation
  • Support for Xen and LXC
  • Support for CentOS and Debian as the node OS

Please check out the README for more instructions on how to use the Vagrant+Ganeti setup. If you have any feature requests please don't hesitate to create an issue on the github repo.

Rebalancing Ganeti Clusters

One of the best features of Ganeti is its ability to grow linearly by adding new servers easily. We recently purchased a new server to expand our ever growing production cluster and needed to rebalance cluster. Adding and expanding the cluster consisted of the following steps:

  1. Installing the base OS on the new node
  2. Adding the node to your configuration management of choice and/or installing ganeti
  3. Add the node to the cluster with gnt-node add
  4. Check Ganeti using the verification action
  5. Use htools to rebalance the cluster

For simplicity sake I’ll cover the last three steps.

Adding the node

Assuming you’re using a secondary network, this is how you would add your node:

gnt-node add -s <secondary ip> newnode

Now lets check and make sure ganeti is happy:

gnt-cluster verify

If all is well, continue on otherwise try and resolve any issue that ganeti is complaining about.

Using htools

Make sure you install ganeti-htools on all your nodes before continuing. It requires haskell so just be aware of that requirement. Lets see what htools wants to do first:

hbal -m ganeti.example.org
Loaded 5 nodes, 73 instances
Group size 5 nodes, 73 instances
Selected node group: default
Initial check done: 0 bad nodes, 0 bad instances.
Initial score: 41.00076094
Trying to minimize the CV...
 1. openmrs.osuosl.org             g1.osuosl.bak:g2.osuosl.bak => g5.osuosl.bak:g1.osuosl.bak 38.85990831 a=r:g5.osuosl.bak f
 2. stagingvm.drupal.org           g3.osuosl.bak:g1.osuosl.bak => g5.osuosl.bak:g3.osuosl.bak 36.69303985 a=r:g5.osuosl.bak f
 3. scratchvm.drupal.org           g2.osuosl.bak:g4.osuosl.bak => g5.osuosl.bak:g2.osuosl.bak 34.61266967 a=r:g5.osuosl.bak f

<snip>

 28. crisiscommons1.osuosl.org      g3.osuosl.bak:g1.osuosl.bak => g3.osuosl.bak:g5.osuosl.bak 4.93089388 a=r:g5.osuosl.bak
 29. crisiscommons-web.osuosl.org   g2.osuosl.bak:g1.osuosl.bak => g1.osuosl.bak:g5.osuosl.bak 4.57788814 a=f r:g5.osuosl.bak
 30. aqsis2.osuosl.org              g1.osuosl.bak:g3.osuosl.bak => g1.osuosl.bak:g5.osuosl.bak 4.57312216 a=r:g5.osuosl.bak
Cluster score improved from 41.00076094 to 4.57312216
Solution length=30

I’ve shortened the actual output for the sake of this blog post. Htools automatically calculates which virtual machines to move and how using the least amount of operations. In most these moves, the VMs may simply be migrated, migrated & secondary storage replaced, or migrated, secondary storage replaced, migrated. In our environment we needed to move 30 VMs around out of the total 70 VMs that are hosted on the cluster.

Now lets see what commands we actually would need to run:

hbal -C -m ganeti.example.org
Commands to run to reach the above solution:

 echo jobset 1, 1 jobs
 echo job 1/1
 gnt-instance replace-disks -n g5.osuosl.bak openmrs.osuosl.org
 gnt-instance migrate -f openmrs.osuosl.org

 echo jobset 2, 1 jobs
 echo job 2/1
 gnt-instance replace-disks -n g5.osuosl.bak stagingvm.drupal.org
 gnt-instance migrate -f stagingvm.drupal.org

 echo jobset 3, 1 jobs
 echo job 3/1
 gnt-instance replace-disks -n g5.osuosl.bak scratchvm.drupal.org
 gnt-instance migrate -f scratchvm.drupal.org

<snip>

 echo jobset 28, 1 jobs
 echo job 28/1
 gnt-instance replace-disks -n g5.osuosl.bak crisiscommons1.osuosl.org

 echo jobset 29, 1 jobs
 echo job 29/1
 gnt-instance migrate -f crisiscommons-web.osuosl.org
 gnt-instance replace-disks -n g5.osuosl.bak crisiscommons-web.osuosl.org

 echo jobset 30, 1 jobs
 echo job 30/1
 gnt-instance replace-disks -n g5.osuosl.bak aqsis2.osuosl.org

Here you can see the commands it wants  you to execute. Now you can either put these all in a script and run them, split them up, or just run them one by one. In our case I ran them one by one just to be sure we didn’t run into any issues. I had a couple of VMs not migration properly but those were exactly fixed. I split this up into a three day migration running ten jobs a day.

The length of time that it takes to move each VM depends on the following factors:

  1. How fast your secondary network is
  2. How busy the nodes are
  3. How fast your disks are

Most of our VMs ranged in size from 10G to 40G in size and on average took around 10-15 minutes to complete each move. Addtionally, make sure you read the man page for hbal to see all the various features and options you can tweak. For example, you could tell hbal to just run all the commands for you which might be handy for automated rebalancing.

Conclusion

Overall the rebalancing of our cluster went without a hitch outside of a few minor issues. Ganeti made it really easy to expand our cluster with minimal to zero downtime for our hosted projects.

Rebalancing Ganeti Clusters

One of the best features of Ganeti is its ability to grow linearly by adding new servers easily. We recently purchased a new server to expand our ever growing production cluster and needed to rebalance cluster. Adding and expanding the cluster consisted of the following steps:

  1. Installing the base OS on the new node
  2. Adding the node to your configuration management of choice and/or installing ganeti
  3. Add the node to the cluster with gnt-node add
  4. Check Ganeti using the verification action
  5. Use htools to rebalance the cluster

For simplicity sake I'll cover the last three steps.

Adding the node

Assuming you're using a secondary network, this is how you would add your node:

gnt-node add -s <secondary ip> newnode

Now lets check and make sure ganeti is happy:

gnt-cluster verify

If all is well, continue on otherwise try and resolve any issue that ganeti is complaining about.

Using htools

Make sure you install ganeti-htools on all your nodes before continuing. It requires haskell so just be aware of that requirement. Lets see what htools wants to do first:

$ hbal -m ganeti.example.org
Loaded 5 nodes, 73 instances
Group size 5 nodes, 73 instances
Selected node group: default
Initial check done: 0 bad nodes, 0 bad instances.
Initial score: 41.00076094
Trying to minimize the CV...
1. openmrs.osuosl.org g1.osuosl.bak:g2.osuosl.bak g5.osuosl.bak:g1.osuosl.bak 38.85990831 a=r:g5.osuosl.bak f
2. stagingvm.drupal.org g3.osuosl.bak:g1.osuosl.bak g5.osuosl.bak:g3.osuosl.bak 36.69303985 a=r:g5.osuosl.bak f
3. scratchvm.drupal.org g2.osuosl.bak:g4.osuosl.bak g5.osuosl.bak:g2.osuosl.bak 34.61266967 a=r:g5.osuosl.bak f

<snip>

28. crisiscommons1.osuosl.org g3.osuosl.bak:g1.osuosl.bak g3.osuosl.bak:g5.osuosl.bak 4.93089388 a=r:g5.osuosl.bak
29. crisiscommons-web.osuosl.org g2.osuosl.bak:g1.osuosl.bak g1.osuosl.bak:g5.osuosl.bak 4.57788814 a=f r:g5.osuosl.bak
30. aqsis2.osuosl.org g1.osuosl.bak:g3.osuosl.bak g1.osuosl.bak:g5.osuosl.bak 4.57312216 a=r:g5.osuosl.bak
Cluster score improved from 41.00076094 to 4.57312216
Solution length=30

I've shortened the actual output for the sake of this blog post. Htools automatically calculates which virtual machines to move and how using the least amount of operations. In most these moves, the VMs may simply be migrated, migrated & secondary storage replaced, or migrated, secondary storage replaced, migrated. In our environment we needed to move 30 VMs around out of the total 70 VMs that are hosted on the cluster.

Now lets see what commands we actually would need to run:

$ hbal -C -m ganeti.example.org

Commands to run to reach the above solution:

echo jobset 1, 1 jobs
echo job 1/1
gnt-instance replace-disks -n g5.osuosl.bak openmrs.osuosl.org
gnt-instance migrate -f openmrs.osuosl.org
echo jobset 2, 1 jobs
echo job 2/1
gnt-instance replace-disks -n g5.osuosl.bak stagingvm.drupal.org
gnt-instance migrate -f stagingvm.drupal.org
echo jobset 3, 1 jobs
echo job 3/1
gnt-instance replace-disks -n g5.osuosl.bak scratchvm.drupal.org
gnt-instance migrate -f scratchvm.drupal.org

<snip\>

echo jobset 28, 1 jobs
echo job 28/1
gnt-instance replace-disks -n g5.osuosl.bak crisiscommons1.osuosl.org
echo jobset 29, 1 jobs
echo job 29/1
gnt-instance migrate -f crisiscommons-web.osuosl.org
gnt-instance replace-disks -n g5.osuosl.bak crisiscommons-web.osuosl.org
echo jobset 30, 1 jobs
echo job 30/1
gnt-instance replace-disks -n g5.osuosl.bak aqsis2.osuosl.org

Here you can see the commands it wants you to execute. Now you can either put these all in a script and run them, split them up, or just run them one by one. In our case I ran them one by one just to be sure we didn't run into any issues. I had a couple of VMs not migration properly but those were exactly fixed. I split this up into a three day migration running ten jobs a day.

The length of time that it takes to move each VM depends on the following factors:

  1. How fast your secondary network is
  2. How busy the nodes are
  3. How fast your disks are

Most of our VMs ranged in size from 10G to 40G in size and on average took around 10-15 minutes to complete each move. Addtionally, make sure you read the man page for hbal to see all the various features and options you can tweak. For example, you could tell hbal to just run all the commands for you which might be handy for automated rebalancing.

Conclusion

Overall the rebalancing of our cluster went without a hitch outside of a few minor issues. Ganeti made it really easy to expand our cluster with minimal to zero downtime for our hosted projects.

Networking with Ganeti

I’ve been asked quite a bit about how I do our network setup with Ganeti. I admit that it did take me a bit to figure out a sane way to do it in Gentoo. Unfortunately (at least in baselayout-1.x) bringing up VLANs with bridge interfaces in Gentoo is rather a pain. What I’m about to describe is basically a hack and there’s probably a better way to do this. I hope it gets improved in baselayout-2.x but I haven’t had a chance to take a look. Please feel free to add comments on what you feel will work better.

The key problem I ran into was dealing with starting up the vlan interfaces first, then starting up the bridged interfaces in the correct order. Here’s a peek at the network config on one of our Ganeti hosts on Gentoo:

# bring up bridge interfaces manually after eth0 is up
postup() {
    local vlans="42 113"
    if [ "${IFACE}" = "eth0" ] ; then
        for vlan in $vlans ; do
            /etc/init.d/net.br${vlan} start
            if [ "${vlan}" = "113" ] ; then
                # make sure the bridges get going first
                sleep 10
            fi
        done
    fi
}
# bring down bridge interfaces first
predown() {
    local vlans="42 113"
    if [ "${IFACE}" = "eth0" ] ; then
        for vlan in $vlans ; do
            /etc/init.d/net.br${vlan} stop
        done
    fi
}

# Setup trunked VLANs
vlans_eth0="42 113"
config_eth0=( "null" )
vconfig_eth0=( "set_name_type VLAN_PLUS_VID_NO_PAD" )
config_vlan42=( "null" )
config_vlan113=( "null" )

# Bring up primary IP on eth0 via the bridged interface
bridge_br42="vlan42"
config_br42=( "10.18.0.150 netmask 255.255.254.0" )
routes_br42=( "default gw 10.18.0.1" )

# Setup bridged VLAN interfaces
bridge_br113="vlan113"
config_br113=( "null" )

# Backend drbd network
config_eth1=( "192.168.19.136 netmask 255.255.255.0" )

The latter portion of the config its fairly normal. I setup eth0 to null, set the VLAN’s to null, then I add settings to the bridge interfaces. In our case we have the IP for the node itself on br42. The rest of the VLAN’s are just set to null. Finally we setup the backend secondary IP.

The first part of the config is the “fun stuff”. In order for this to work you need to only add net.eth0 and net.eth1 to the default enabled level. The post_up() function will start the bridge interfaces after eth0 has started and iterates through the list of vlans/bridges. Since I’m using the bridge interface as the primary host connection, I added a simple sleep at the end to let it see the traffic first.

That’s it! A fun hack that seems to work. I would love to hear feedback on this :)

Networking with Ganeti

I've been asked quite a bit about how I do our network setup with Ganeti. I admit that it did take me a bit to figure out a sane way to do it in Gentoo. Unfortunately (at least in baselayout-1.x) bringing up VLANs with bridge interfaces in Gentoo is rather a pain. What I'm about to describe is basically a hack and there's probably a better way to do this. I hope it gets improved in baselayout-2.x but I haven't had a chance to take a look. Please feel free to add comments on what you feel will work better.

The key problem I ran into was dealing with starting up the vlan interfaces first, then starting up the bridged interfaces in the correct order. Here's a peek at the network config on one of our Ganeti hosts on Gentoo:

# bring up bridge interfaces manually after eth0 is up
postup() {
  local vlans="42 113"
  if [ "${IFACE}" = "eth0" ] ; then
    for vlan in $vlans ; do
      /etc/init.d/net.br${vlan} start
      if [ "${vlan}" = "113" ] ; then
        # make sure the bridges get going first
        sleep 10
      fi
    done
  fi
}

# bring down bridge interfaces first
predown() {
  local vlans="42 113"
  if [ "${IFACE}" = "eth0" ] ; then
    for vlan in $vlans ; do
      /etc/init.d/net.br${vlan} stop
    done
  fi
}

# Setup trunked VLANs
vlans_eth0="42 113"
config_eth0=( "null" )
vconfig_eth0=( "set_name_type VLAN_PLUS_VID_NO_PAD" )
config_vlan42=( "null" )
config_vlan113=( "null" )

# Bring up primary IP on eth0 via the bridged interface
bridge_br42="vlan42"
config_br42=( "10.18.0.150 netmask 255.255.254.0" )
routes_br42=( "default gw 10.18.0.1" )

# Setup bridged VLAN interfaces
bridge_br113="vlan113"
config_br113=( "null" )

# Backend drbd network
config_eth1=( "192.168.19.136 netmask 255.255.255.0" )

The latter portion of the config its fairly normal. I setup eth0 to null, set the VLAN's to null, then I add settings to the bridge interfaces. In our case we have the IP for the node itself on br42. The rest of the VLAN's are just set to null. Finally we setup the backend secondary IP.

The first part of the config is the "fun stuff". In order for this to work you need to only add net.eth0 and net.eth1 to the default enabled level. The post_up() function will start the bridge interfaces after eth0 has started and iterates through the list of vlans/bridges. Since I'm using the bridge interface as the primary host connection, I added a simple sleep at the end to let it see the traffic first.

That's it! A fun hack that seems to work. I would love to hear feedback on this :)