Category Archives: kvm

Converting KVM to VirtualBox

I have had most of my test environment, aka puppetmasters, test mysql setups etc running in KVM for the past couple of years .. (yes I`m still using a lot of Xen in production environments, but we've also been using KVM for a while already .. it's a good mix) , Virtual box has always been the lesser loved Virtualization platform , however while playing more and more with Vagrant Up I realized I needed to convirt some boxen (e.g my PuppetMaster) to Virtualbox, and google was really no good help(most people seem to go the other way , or want to use some proprietary tools )

So I remembered VBoxManage and apparently I hade blogged about it myselve already ..
I just hate it when I search for stuff and google points right back to me

So I converted my puppetmaster's disks

  1. VBoxManage convertdd Emtpy-clone.img PuppetMasterroot.vdi
  2. VBoxManage convertdd puppet-var.img PuppetMastervar.vdi

Now when booting the VM in Virtualbox , obviously the kernel panicked .. as my KVM disks are recognised as as /dev/hda and and Virtualbox defaults to /dev/sda and LVM doesn't really like disks to be on another names
No commandline fu here to help me, but using the VirtualBox gui to move the disks to the IDE controller rather than the SATA controller.

Now all I need to do is wait for some smart guy who comments that you probably could use VBoxManage storagectl to achieve the same goal :)

And wait till Vagrant Up start supporting KVM , so I can move back :)

Ganeti at the OSUOSL

One of the many large projects I’m working on at the OSUOSL has been migrating all of our virtualization over to Ganeti and KVM. Needless to say its kept me from updating my blog but I hope to make up for it. I thought I would give a rundown of how we use Ganeti at the OSUOSL and where we plan to move forward from there.

So far we have 10 clusters ranging in size from single nodes up to 4 node clusters. Each node is running Gentoo and managed with our cfengine setup. There are approximately 120 virtual machines deployed across all the clusters with the majority (~70) in our production cluster of four nodes. Each node in the production cluster is running between 17 to 18 KVM instances.

Project Ganeti Clusters

Several hosted projects including OSGeo, phpBB, and ECF have their own clusters which we fully manage on the node level. It works well for them as they don’t have to worry about  maintaining the virtualization cluster while giving them the flexibility of deploying dedicated VMs on their own hardware. I’ve been recommending moving towards this direction for current projects and new projects we get requests for. So far it seems to be working well for both the OSUOSL and the projects we host.

Image Deployment

For deployment we use ganeti-instance-image which is something I wrote to help make deployments faster and more flexible. It uses various types of images (tarball, filesystem dump, qemu-img) to unpack a pre-made system and deploy it with networking, grub, and serial fully functional. Creating the images is currently a manual process but I have it semi-automated using kickstart and preseed config files for building systems quickly and predictably. The amazing part is deploying a fully functional VM in under one minute using ganeti-instance-image.

Web-based Management

An upcoming tool that the OSUOSL is working on is a web-based frontend for managing Ganeti clusters called Ganeti Web Manager. Its written using the django framework and connecting to Ganeti via its RAPI protocol. Our lead developer Peter Krenesky and many of our students have been hard at work on this project in the last month and a half.

Some of the goals of this project include:

  • Permission system for users and how they access the cluster(s)
  • Easy VM deployment and management
  • Console access
  • Empower VM users

We’re very close to making our first release of ganeti-webmgr which should include a basic set of features. We still have a lot to work on and I look forward to seeing how it evolves.

Ganeti at the OSUOSL

One of the many large projects I'm working on at the OSUOSL has been migrating all of our virtualization over to Ganeti and KVM. Needless to say its kept me from updating my blog but I hope to make up for it. I thought I would give a rundown of how we use Ganeti at the OSUOSL and where we plan to move forward from there.

So far we have 10 clusters ranging in size from single nodes up to 4 node clusters. Each node is running Gentoo and managed with our cfengine setup. There are approximately 120 virtual machines deployed across all the clusters with the majority (~70) in our production cluster of four nodes. Each node in the production cluster is running between 17 to 18 KVM instances.

Project Ganeti Clusters

Several hosted projects including OSGeo, phpBB, and ECF have their own clusters which we fully manage on the node level. It works well for them as they don't have to worry about maintaining the virtualization cluster while giving them the flexibility of deploying dedicated VMs on their own hardware. I've been recommending moving towards this direction for current projects and new projects we get requests for. So far it seems to be working well for both the OSUOSL and the projects we host.

Image Deployment

For deployment we use ganeti-instance-image which is something I wrote to help make deployments faster and more flexible. It uses various types of images (tarball, filesystem dump, qemu-img) to unpack a pre-made system and deploy it with networking, grub, and serial fully functional. Creating the images is currently a manual process but I have it semi-automated using kickstart and preseed config files for building systems quickly and predictably. The amazing part is deploying a fully functional VM in under one minute using ganeti-instance-image.

Web-based Management

An upcoming tool that the OSUOSL is working on is a web-based frontend for managing Ganeti clusters called Ganeti Web Manager. Its written using the django framework and connecting to Ganeti via its RAPI protocol. Our lead developer Peter Krenesky and many of our students have been hard at work on this project in the last month and a half.

image

Some of the goals of this project include:

  • Permission system for users and how they access the cluster(s)
  • Easy VM deployment and management
  • Console access
  • Empower VM users

We're very close to making our first release of ganeti-webmgr which should include a basic set of features. We still have a lot to work on and I look forward to seeing how it evolves.

Building Virtual Appliances

Johan from Sizing Servers asked me if I could talk about my experiences on building (virtual) appliances at their Advanced Virtualization and Hybrid Cloud seminar . Off course I said yes ..

Slides are below ... Enjoy ..

Trackback URL for this post:

http://www.krisbuytaert.be/blog/trackback/1005

Installing Ganeti on Gentoo

Installing Ganeti is a relatively simple process on Gentoo. This post will go over the basics on getting it running on Gentoo. Its based primarily on a wiki page at the OSUOSL so check it out for more detailed instructions. I also recommend you read the upstream docs on Ganeti prior to installing it on your own. It will cover a lot more topics in detail and this post is intended just as a diff from that doc.

I should note that I have only installed Ganeti with KVM and have not tested it with Xen on Gentoo. I appreciate feedback if you have installed and used Xen with Ganeti on Gentoo. I’m also the current package maintainer for Ganeti and the related packages in Gentoo such as:

The first step is to install a base Gentoo system using the standard profile. You can use a hardened profile however if you intend to use ganeti-htools, it requires haskell which seems to have issues in hardened.

Configuring DNS

Ganeti requires the following names to resolve before you can set it up.

  • A master name for the cluster, this IP must be available (ganeti.example.org)
  • A name for each node or Dom0 (node1.example.org)
  • A name for each instance or virtual machine (instance1.example.org)

Kernel

DRBD is optional in Ganeti so you can skip this step if you’re not planning on using it. DRBD was recently included in the mainline kernel in 2.6.33 however Gentoo’s DRBD packages do not currently reflect that. I hope to get that changed soon but for now you have two options.

  1. Install gentoo-sources, drbd, and drbd-kernel
  2. Install gentoo-sources & enable drbd, install drbd without deps

For simplicity, I’ll describe option #2 above below. Check out the wiki page for #1.

DRBD requires you have the following option enabled. Make sure you’ve rebooted using a kernel with these options above before you continue.

Device Drivers --->
    <*> Connector - unified userspace <-> kernelspace linker

We recommend that you keyword both sys-cluster/drbd and sys-cluster/drbd-kernel so that you pull in the latest 8.3.x version.

echo "sys-cluster/drbd" >> /etc/portage/package.keywords
echo "sys-cluster/drbd-kernel" >> /etc/portage/package.keywords

Install DRBD.

emerge drbd

Ganeti uses DRBD in a unique way and requires the module to be loaded with specific settings. Add the autoload settings and load the module.

echo "drbd minor_count=255 usermode_helper=/bin/true" >> /etc/modules.autoload.d/kernel-2.6
modprobe drbd

If you forget this step, you will get an error similar to the one mentioned in this email thread.

Install Ganeti

Set the appropriate USE flags. In this case we will be using kvm with drbd.

echo "app-emulation/ganeti kvm drbd" >> /etc/portage/package.use

Install Ganeti (you might need to keyword other dependencies)

emerge ganeti

Configure Networking

There’s currently two methods for setting up networking: bridged or routed. I picked the bridged method mainly because I’m familiar with the setup and it seemed to be the simplest.

Ideally you should have a public network that will be used for communicating with the nodes and instances from the outside, and a backend private network that will be used by ganeti for DRBD, migrations, etc. Assuming your public IP (which node1.example.org should resolve to) is 10.1.0.11 and your backend IP is 192.168.1.11, you should edit /etc/conf.d/net to look something like this:

bridge_br0="eth0"
config_eth0=( "null" )

config_br0=( "10.1.0.11 netmask 255.255.254.0" )
routes_br0=( "default gw 10.1.0.1" )

# make sure eth0 is up before configuring br0
depend_br0() {
        need net.eth0
}

config_eth1=( "192.168.1.11 netmask 255.255.255.0" )

You can have a more complicated networking setup using VLAN tagging and bridging but I’ll go over that in another blog post.

Set the Hostname

Ganeti is picky about hostnames, and requires that the output of hostname be fully qualified. So make sure /etc/conf.d/hostname uses the FQDN and looks like this:

HOSTNAME="node1.example.org"

NOT like this:

HOSTNAME="node1"

Configure LVM

It is recommended that you edit this line in /etc/lvm/lvm.conf

filter = [ "r|/dev/nbd.*|", "a/.*/", "r|/dev/drbd[0-9]+|" ]

The important part is the

r|/dev/drbd[0-9]+|

entry, which will prevent LVM from scanning drbd devices.

Now, go ahead and create an LVM volume group with the disks you plan to use for instance storage. The default name that Ganeti prefers is xenvg but we recommend you choose something more useful for your infrastructure (we use ganeti).

pvcreate /dev/sda3
lvcreate ganeti /dev/sda3

Initialize the Cluster

Now we can initialize the cluster on the first node. The command below will do the following:

  • Set br0 as the primary interface for Ganeti communication
  • Set 192.168.1.11 as the DRBD ip for the node
  • Enable KVM
  • Set the default bridged interface for instances to br0
  • Set the default KVM settings to 2 vcpus & 512M RAM
  • Set the default kernel path to /boot/guest/vmlinuz-x86_64
  • Set the master DNS name is ganeti.example.org
gnt-cluster init --master-netdev=br0 \
  -g ganeti \
  -s 192.168.1.11 \
  --enabled-hypervisors=kvm \
  -N link=br0 \
  -B vcpus=2,memory=512M \
  -H kvm:kernel_path=/boot/guest/vmlinuz-x86_64
  ganeti.example.org

Now you have a ganeti cluster! Lets verify everything is setup correctly.

$ gnt-cluster verify
Sun May 16 22:43:00 2010 * Verifying global settings
Sun May 16 22:43:00 2010 * Gathering data (1 nodes)
Sun May 16 22:43:02 2010 * Verifying node status
Sun May 16 22:43:02 2010 * Verifying instance status
Sun May 16 22:43:02 2010 * Verifying orphan volumes
Sun May 16 22:43:02 2010 * Verifying remaining instances
Sun May 16 22:43:02 2010 * Verifying N+1 Memory redundancy
Sun May 16 22:43:02 2010 * Other Notes
Sun May 16 22:43:02 2010 * Hooks Results

Yay!

SSH Keys

Ganeti uses ssh to run some tasks but not for all tasks. During the initialization, it generated a new ssh key for the root user and installs it in /root/.ssh/authorized_keys. In our case, we manage that file with cfengine, so to work around it we copy the key as /root/.ssh/authorized_keys2 which ssh will automatically pick up.

Adding nother node

To add an additional node, you duplicate the setup steps above skipping initializing the cluster. Instead run the following command:

gnt-node add -s <node drbd_ip> <node hostname>

Next steps…

The next steps is actually deploying new virtual machines using Ganeti. I wrote a new instance creation script called ganeti-instance-image which uses disk images for deployment. I’m currently working on a new project website with detailed documentation and a blog post about it as well. We’re able to deploy new virtual machines (such as Ubuntu, Centos, or Gentoo) in under 30 seconds using this method!

Installing Ganeti on Gentoo

Installing Ganeti is a relatively simple process on Gentoo. This post will go over the basics on getting it running on Gentoo. Its based primarily on a wiki page at the OSUOSL so check it out for more detailed instructions. I also recommend you read the upstream docs on Ganeti prior to installing it on your own. It will cover a lot more topics in detail and this post is intended just as a diff from that doc.

I should note that I have only installed Ganeti with KVM and have not tested it with Xen on Gentoo. I appreciate feedback if you have installed and used Xen with Ganeti on Gentoo. I'm also the current package maintainer for Ganeti and the related packages in Gentoo such as:

The first step is to install a base Gentoo system using the standard profile. You can use a hardened profile however if you intend to use ganeti-htools, it requires haskell which seems to have issues in hardened.

Configuring DNS

Ganeti requires the following names to resolve before you can set it up.

  • A master name for the cluster, this IP must be available (ganeti.example.org)
  • A name for each node or Dom0 (node1.example.org)
  • A name for each instance or virtual machine (instance1.example.org)

Kernel

DRBD is optional in Ganeti so you can skip this step if you're not planning on using it. DRBD was recently included in the mainline kernel in 2.6.33 however Gentoo's DRBD packages do not currently reflect that. I hope to get that changed soon but for now you have two options.

  1. Install gentoo-sources, drbd, and drbd-kernel
  2. Install gentoo-sources & enable drbd, install drbd without deps

For simplicity, I'll describe option #2 above below. Check out the wiki page for #1.

DRBD requires you have the following option enabled. Make sure you've rebooted using a kernel with these options above before you continue.:

Device Drivers --->
<*> Connector - unified userspace <-> kernelspace linker

We recommend that you keyword both sys-cluster/drbd and sys-cluster/drbd-kernel so that you pull in the latest 8.3.x version.:

echo "sys-cluster/drbd" >> /etc/portage/package.keywords
echo "sys-cluster/drbd-kernel" >> /etc/portage/package.keywords

Install DRBD.:

emerge drbd

Ganeti uses DRBD in a unique way and requires the module to be loaded with specific settings. Add the autoload settings and load the module.:

echo "drbd minor_count=255 usermode_helper=/bin/true" >> \
  /etc/modules.autoload.d/kernel-2.6
modprobe drbd

If you forget this step, you will get an error similar to the one mentioned in this email thread.

Install Ganeti

Set the appropriate USE flags. In this case we will be using kvm with drbd.:

echo "app-emulation/ganeti kvm drbd" >> \
  /etc/portage/package.use

Install Ganeti (you might need to keyword other dependencies):

emerge ganeti

Configure Networking

There's currently two methods for setting up networking: bridged or routed. I picked the bridged method mainly because I'm familiar with the setup and it seemed to be the simplest.

Ideally you should have a public network that will be used for communicating with the nodes and instances from the outside, and a backend private network that will be used by ganeti for DRBD, migrations, etc. Assuming your public IP (which node1.example.org should resolve to) is 10.1.0.11 and your backend IP is 192.168.1.11, you should edit /etc/conf.d/net to look something like this:

bridge_br0="eth0"
config_eth0=( "null" )
config_br0=( "10.1.0.11 netmask 255.255.254.0" )
routes_br0=( "default gw 10.1.0.1" )

# make sure eth0 is up before configuring br0
depend_br0() {
need net.eth0
}

config_eth1=( "192.168.1.11 netmask 255.255.255.0" )

You can have a more complicated networking setup using VLAN tagging and bridging but I'll go over that in another blog post.

Set the Hostname

Ganeti is picky about hostnames, and requires that the output of hostname be fully qualified. So make sure /etc/conf.d/hostname uses the FQDN and looks like this:

HOSTNAME="node1.example.org"

NOT like this::

HOSTNAME="node1"

Configure LVM

It is recommended that you edit this line in /etc/lvm/lvm.conf:

filter = [ "r|/dev/nbd.\*|", "a/.\*/", "r|/dev/drbd[0-9]+|"]

The important part is the:

r|/dev/drbd[0-9]+|

entry, which will prevent LVM from scanning drbd devices.

Now, go ahead and create an LVM volume group with the disks you plan to use for instance storage. The default name that Ganeti prefers is xenvg but we recommend you choose something more useful for your infrastructure (we use ganeti).:

pvcreate /dev/sda3
vgcreate ganeti /dev/sda3

Initialize the Cluster

Now we can initialize the cluster on the first node. The command below will do the following:

  • Set br0 as the primary interface for Ganeti communication
  • Set 192.168.1.11 as the DRBD ip for the node
  • Enable KVM
  • Set the default bridged interface for instances to br0
  • Set the default KVM settings to 2 vcpus & 512M RAM
  • Set the default kernel path to /boot/guest/vmlinuz-x86_64
  • Set the master DNS name is ganeti.example.org
$ gnt-cluster init --master-netdev=br0 \
  -g ganeti \
  -s 192.168.1.11 \
  --enabled-hypervisors=kvm \
  -N link=br0 \
  -B vcpus=2,memory=512M \
  -H kvm:kernel_path=/boot/guest/vmlinuz-x86_64 \
  ganeti.example.org

Now you have a ganeti cluster! Lets verify everything is setup correctly.:

$ gnt-cluster verify
Sun May 16 22:43:00 2010 * Verifying global settings
Sun May 16 22:43:00 2010 * Gathering data (1 nodes)
Sun May 16 22:43:02 2010 * Verifying node status
Sun May 16 22:43:02 2010 * Verifying instance status
Sun May 16 22:43:02 2010 * Verifying orphan volumes
Sun May 16 22:43:02 2010 * Verifying remaining instances
Sun May 16 22:43:02 2010 * Verifying N+1 Memory redundancy
Sun May 16 22:43:02 2010 * Other Notes
Sun May 16 22:43:02 2010 * Hooks Results

Yay!

SSH Keys

Ganeti uses ssh to run some tasks but not for all tasks. During the initialization, it generated a new ssh key for the root user and installs it in /root/.ssh/authorized_keys. In our case, we manage that file with cfengine, so to work around it we copy the key as /root/.ssh/authorized_keys2 which ssh will automatically pick up.

Adding nother node

To add an additional node, you duplicate the setup steps above skipping initializing the cluster. Instead run the following command:

gnt-node add -s <node drbd_ip> <node hostname>

Next steps...

The next steps is actually deploying new virtual machines using Ganeti. I wrote a new instance creation script called ganeti-instance-image which uses disk images for deployment. I'm currently working on a new project website with detailed documentation and a blog post about it as well. We're able to deploy new virtual machines (such as Ubuntu, Centos, or Gentoo) in under 30 seconds using this method!

Power Outage: A true test for Ganeti

Nothing like a power outage gone wrong to test a new virtualization cluster. Last night we lost power in most of Corvallis and our UPS & Generator functioned properly in the machine room. However we had an unfortunate sequence of issues that caused some of our machines to go down, including all four of our ganeti nodes hosting 62 virtual machines went down hard. If this had happened with our old xen cluster with iSCSI, it would have taken us over an hour to get the infrastructure back in a normal state by manually restarting each VM.

But when I checked the ganeti cluster shortly after the outage, I noticed that all four nodes rebooted without any issues and the master node was already rebooting virtual machines automatically and fixing all of the DRBD block devices. Ganeti has a nice app called ganeti-watcher which is run every five minutes via cron. It has two primary functions currently (taken from ganeti-watcher(8)):

  1. Keep running all instances as marked (i.e. if they were running, restart them)
  2. Repair DRBD links by reactivating the block devices of instances which have secondaries on nodes that have rebooted.

The watcher app took around 30 minutes to bring all 62 VMs back online. The load on most of the nodes didn’t go over 4 during the recovery which is quite impressive considering how much I/O its doing while VMs are booting. Normally the nodes have loads between 0.3 and 0.5. There were only 3 VMs that didn’t boot cleanly because of incorrect fstab entries or incorrect kernel path settings in ganeti which was easy to fix. I was surprised we didn’t have more issues like that.

While ganeti is bringing instances back online you can tail watcher.log which is generally at /var/log/ganeti/watcher.log and will show output similar to this:

2010-05-20 04:06:25,077:  pid=10202 INFO Restarting busybox.osuosl.org (Attempt #1)
2010-05-20 04:07:16,311:  pid=10202 INFO Restarting driverdev.osuosl.org (Attempt #1)
2010-05-20 04:07:18,346:  pid=10202 INFO Restarting pcc.osuosl.org (Attempt #1)

And once its finished will show output like this:

2010-05-20 04:35:04,066:  pid=22741 INFO Restart of busybox.osuosl.org succeeded
2010-05-20 04:35:04,066:  pid=22741 INFO Restart of driverdev.osuosl.org succeeded
2010-05-20 04:35:04,066:  pid=22741 INFO Restart of pcc.osuosl.org succeeded

It was great watching this system recover everything automatically with little issues and quickly. Needless to say, outages are a bad thing and its our fault that our cluster went down like this but it was great seeing this system work nearly flawlessly. We’ll soon fix the power situation for our cluster so this shouldn’t happen again.

Take that ESX ;-)

Power Outage: A true test for Ganeti

Nothing like a power outage gone wrong to test a new virtualization cluster. Last night we lost power in most of Corvallis and our UPS & Generator functioned properly in the machine room. However we had an unfortunate sequence of issues that caused some of our machines to go down, including all four of our ganeti nodes hosting 62 virtual machines went down hard. If this had happened with our old xen cluster with iSCSI, it would have taken us over an hour to get the infrastructure back in a normal state by manually restarting each VM.

But when I checked the ganeti cluster shortly after the outage, I noticed that all four nodes rebooted without any issues and the master node was already rebooting virtual machines automatically and fixing all of the DRBD block devices. Ganeti has a nice app called ganeti-watcher which is run every five minutes via cron. It has two primary functions currently (taken from ganeti-watcher(8)):

  1. Keep running all instances as marked (i.e. if they were running, restart them)
  2. Repair DRBD links by reactivating the block devices of instances which have secondaries on nodes that have rebooted.

The watcher app took around 30 minutes to bring all 62 VMs back online. The load on most of the nodes didn't go over 4 during the recovery which is quite impressive considering how much I/O its doing while VMs are booting. Normally the nodes have loads between 0.3 and 0.5. There were only 3 VMs that didn't boot cleanly because of incorrect fstab entries or incorrect kernel path settings in ganeti which was easy to fix. I was surprised we didn't have more issues like that.

While ganeti is bringing instances back online you can tail watcher.log which is generally at /var/log/ganeti/watcher.log and will show output similar to this:

2010-05-20 04:06:25,077: pid=10202 INFO Restarting busybox.osuosl.org (Attempt #1)
2010-05-20 04:07:16,311: pid=10202 INFO Restarting driverdev.osuosl.org (Attempt #1)
2010-05-20 04:07:18,346: pid=10202 INFO Restarting pcc.osuosl.org (Attempt #1)

And once its finished will show output like this:

2010-05-20 04:35:04,066: pid=22741 INFO Restart of busybox.osuosl.org succeeded
2010-05-20 04:35:04,066: pid=22741 INFO Restart of driverdev.osuosl.org succeeded
2010-05-20 04:35:04,066: pid=22741 INFO Restart of pcc.osuosl.org succeeded

It was great watching this system recover everything automatically with little issues and quickly. Needless to say, outages are a bad thing and its our fault that our cluster went down like this but it was great seeing this system work nearly flawlessly. We'll soon fix the power situation for our cluster so this shouldn't happen again.

Take that ESX ;-)

Creating a scalable virtualization cluster with Ganeti

Creating a virtualization cluster that is scalable, cheap, and easy to manage usually doesn’t happen in the same sentence. Generally it involves a combination of a complex set of tools tied together, expensive storage, and difficult to scale. While I think that the suite of tools that use libvirt are great and are headed in the right direction, they’re still not quite the right tool for the right job in some situations. There’s also commercial solutions such as VMWare and Xen Server that are great but both cost money (especially if you want cluster features). If you’re looking for a completely open source solution, then you may have found it.

Enter Ganeti, an open source virtualization management platform created by Google engineers. I never heard of it until one of the students that works for me at the OSUOSL mentioned it while he was being an intern at Google. The design and goal of Ganeti is to create a virtualization cluster that is stable, easy to use, and doesn’t require expensive hardware.

So what makes it so awesome?

  • A master node controls all instances (virtual machines)
  • Built-in support for DRBD backed storage on all instances
  • Automated instance (virtual machine) deployment
  • Simple management tools all written in easy to read python
  • Responsive and helpful developer community
  • Works with both Xen and KVM

DRBD

The key feature that got me interested was the built-in DRBD support which enables us to have a “poor man’s” SAN using local server storage. DRBD is essentially like having RAID1 over the network between two servers. It duplicates data between two block devices and keeps them in sync. Until recently, DRBD had to be built as an externel kernel module, but it was recently added to the mainline kernel in 2.6.33. Ganeti has a seamless DRBD integration and requires you to have little knowledge in the specific details of setting it up.

Centralized Instance Management

Before Ganeti, we had to look up which node an instance was located and it was difficult to see the whole cluster’s state as a whole. During a crisis we would lose valuable time trying to locate a virtual machine, especially if it had been moved because of a hardware failure. Ganeti sets one node as a master and controls the other nodes via remote ssh commands and a restful API. You can switch which node is the master with one simple command and also recover a master node if it went offline. All ganeti commands must run on the master node.

Ganeti currently uses command line based interactions for all management tasks. However, it would not be difficult to create a web frontend to manage it. The OSUOSL actually has a working prototype of a django based web frontend that we’ll eventually release once its out of alpha testing.

Automated Deployment

Ganeti uses a set of bash scripts to create an instance on the fly. Each of these scripts is considered an OS definition and they include a debootstrap package by default. Since we use several different distributions, I decided to write my own OS definition using file system dumps instead of direct OS install scripts. This reduced the deployment time considerably to the point where we can deploy a new virtual machine in 30 seconds (not counting DRBD sync time). You can optionally use scripts to setup grub, serial, and networking during the deployment.

Developer Community

The developer community surrounding Ganeti is still quite small but they are very helpful and responsive. I’ve sent in several feature and bug requests on their tracker and usually have a response within 24hrs and even a committed patch withing 48 hours. The end users on the mailing lists are quite helpful and usually response quickly as well. Nothing is more important to me in a project than the health and responsiveness of the community.

OSUOSL use of Ganeti

We recently migrated all of our virtual machines to Ganeti using KVM from Xen. We went from using a 14 blade servers and 3 disk nodes to 4 1U servers with faster processors, disks, and RAM. We instantly noticed a 2 to 3 times performance boost in I/O and CPU. A part of boost was the change in the backend storage, another is KVM.

We currently host around 60 virtual machines total (~15 per node) and can host up to 90 VMS with our current hardware configuration. Adding an additional node is a simple task and takes only minutes once all the software is installed. The new server doesn’t need to have the exact same specs however I would recommend using at least have similar types and speeds of disks and CPUs.

Summary

Ganeti is still young but has matured very quickly over the last year or so. It may not be the best solution for everyone but it seems to fit quite well at the OSUOSL. I’ll be writing several posts that cover the basics of installing and using Ganeti. Additionally I’ll cover some of the specific steps we took to deploy our cluster.

Creating a scalable virtualization cluster with Ganeti

Creating a virtualization cluster that is scalable, cheap, and easy to manage usually doesn't happen in the same sentence. Generally it involves a combination of a complex set of tools tied together, expensive storage, and difficult to scale. While I think that the suite of tools that use libvirt are great and are headed in the right direction, they're still not quite the right tool for the right job in some situations. There's also commercial solutions such as VMWare and Xen Server that are great but both cost money (especially if you want cluster features). If you're looking for a completely open source solution, then you may have found it.

Enter Ganeti, an open source virtualization management platform created by Google engineers. I never heard of it until one of the students that works for me at the OSUOSL mentioned it while he was being an intern at Google. The design and goal of Ganeti is to create a virtualization cluster that is stable, easy to use, and doesn't require expensive hardware.

So what makes it so awesome?

  • A master node controls all instances (virtual machines)
  • Built-in support for DRBD backed storage on all instances
  • Automated instance (virtual machine) deployment
  • Simple management tools all written in easy to read python
  • Responsive and helpful developer community
  • Works with both Xen and KVM

DRBD

The key feature that got me interested was the built-in DRBD support which enables us to have a "poor man's" SAN using local server storage. DRBD is essentially like having RAID1 over the network between two servers. It duplicates data between two block devices and keeps them in sync. Until recently, DRBD had to be built as an externel kernel module, but it was recently added to the mainline kernel in 2.6.33. Ganeti has a seamless DRBD integration and requires you to have little knowledge in the specific details of setting it up.

Centralized Instance Management

Before Ganeti, we had to look up which node an instance was located and it was difficult to see the whole cluster's state as a whole. During a crisis we would lose valuable time trying to locate a virtual machine, especially if it had been moved because of a hardware failure. Ganeti sets one node as a master and controls the other nodes via remote ssh commands and a restful API. You can switch which node is the master with one simple command and also recover a master node if it went offline. All ganeti commands must run on the master node.

Ganeti currently uses command line based interactions for all management tasks. However, it would not be difficult to create a web frontend to manage it. The OSUOSL actually has a working prototype of a django based web frontend that we'll eventually release once its out of alpha testing.

Automated Deployment

Ganeti uses a set of bash scripts to create an instance on the fly. Each of these scripts is considered an OS definition and they include a debootstrap package by default. Since we use several different distributions, I decided to write my own OS definition using file system dumps instead of direct OS install scripts. This reduced the deployment time considerably to the point where we can deploy a new virtual machine in 30 seconds (not counting DRBD sync time). You can optionally use scripts to setup grub, serial, and networking during the deployment.

Developer Community

The developer community surrounding Ganeti is still quite small but they are very helpful and responsive. I've sent in several feature and bug requests on their tracker and usually have a response within 24hrs and even a committed patch withing 48 hours. The end users on the mailing lists are quite helpful and usually response quickly as well. Nothing is more important to me in a project than the health and responsiveness of the community.

OSUOSL use of Ganeti

We recently migrated all of our virtual machines to Ganeti using KVM from Xen. We went from using a 14 blade servers and 3 disk nodes to 4 1U servers with faster processors, disks, and RAM. We instantly noticed a 2 to 3 times performance boost in I/O and CPU. A part of boost was the change in the backend storage, another is KVM.

We currently host around 60 virtual machines total (~15 per node) and can host up to 90 VMS with our current hardware configuration. Adding an additional node is a simple task and takes only minutes once all the software is installed. The new server doesn't need to have the exact same specs however I would recommend using at least have similar types and speeds of disks and CPUs.

Summary

Ganeti is still young but has matured very quickly over the last year or so. It may not be the best solution for everyone but it seems to fit quite well at the OSUOSL. I'll be writing several posts that cover the basics of installing and using Ganeti. Additionally I'll cover some of the specific steps we took to deploy our cluster.