Hi! Welcome...

Syndication of blogs and tweets by users of the Freenode ##infra-talk IRC channel

22 April 2010 ~ Comments Off

MAC Addresses of embedded NICs on Dell servers through DRAC

I use cobbler to provision our new Dell servers, which is great but it needs the MAC addresses of the servers to identify each machine.

Previously, I have been doing this manually:

  1. log in to the DRAC web interface
  2. launch the java console
  3. rebooting the server
  4. go into the BIOS
  5. navigate to Embedded Devices
  6. manually record the MAC addresses

This takes quite a while, and is prone to error.

I recently had another 42 servers to deploy to I looked for a way to automate this process. I found one!

I got the inkling that this should be possible because I noticed that the System | Properties | System Details page in the DRAC web interface lists the MAC addresses in the Main System Chassis section:

Embedded NIC MAC Addresses
    NIC1  Ethernet  a4:ba:db:11:38:2d
          iSCSI     00:00:00:00:00:00
    NIC2  Ethernet  a4:ba:db:11:38:2e
          iSCSI     00:00:00:00:00:00
 
So, I checked in the DRAC6 documentation for a suitable command, and found it: racadm racdump

This dumps a whole load of information about the DRAC and the attached system. However, pass the output through a simple grep and … bingo!

$ racadm -r $DRAC -u root -p calvin racdump | egrep '^MAC Address|^NIC. Ethernet'
MAC Address             = a4:ba:db:11:38:2f
NIC1 Ethernet           = a4:ba:db:11:38:2d
NIC2 Ethernet           = a4:ba:db:11:38:2e
NIC3 Ethernet           = N/A
NIC4 Ethernet           = N/A

21 April 2010 ~ Comments Off

Tracking web clients in real time

Most recently I have been working on being able to more quickly identify abusers of our service ie. spammers, crawlers etc. We already have a process that rotates web logs on all web servers hourly then processes them extracting per IP access info. On occasion abusers get quite aggressive and cause some of our alarms to go off by causing excessive number of log errors etc. Trouble is that due to logs being processed on the hour there is a window of time where we may spend extra time trying to track down the cause of log errors. I figured it would help if the IP tracker was real-time. Luckily we have already been using a package called Ganglia Logtailer

http://bitbucket.org/maplebed/ganglia-logtailer/

which processes our web logs every minute and publishes metrics such as number of HTTP 200/300/400/500 hits, average and 90th percentile response time. All I had to do was send the IP data to a storage engine of my choice. Initially I thought I could use mySQL however decided against it due to following reasons

  1. Currently we can get up to 2500 hits/sec so processing them on the minute would result in roughly 150k inserts which mySQL may have some trouble processing in short amount of time.
  2. I don't need this data after couple hours.

I looked at Redis which has some interesting features around sets however I decided to use memcached since we were already using it and if I ever wanted to use a more persistent storage engine I could replace it with memcachedb or Tokyo Cabinet with no changes to the code.

Implementation

Implementation consists of two pieces

1. Modified Ganglia Logtailer class that inserts data into memcached. You can find a VarnishMemcacheLogtailer class on the Bit Bucker logtailer site which implements this. All you have to do is modify the location of the memcached server (set to localhost). Current implementation aggregates data per hour ie. all the numbers are hourly numbers. It would be trivial to do it for 10 minute or 1 minute periods.

2. Client application that displays data from memcached. I wrote a PHP interface that shows top 20 IPs from the web servers that can be downloaded from here

http://bitbucket.org/vvuksan/realtime-iptracker

Tracker looks something like this

Update: I do realize Splunk would be great for this kind of a purpose. Trouble is that for the amount of logs we create we'd have to get a really large Splunk license and those are quite expensive.

19 April 2010 ~ Comments Off

Am I one of the 10% of programmers who can write a binary search?

I think I might be! After reading an article claiming that only 10% of programmers can write a binary search I decided to give it a shot. After about 20 minutes, I came up with this:

def bsearch(array, search, start_ix=0, end_ix=array.length)
    mid_ix = start_ix + ((end_ix-start_ix)/2).ceil
    if array[mid_ix] == search then
        puts "%d found at position %d" % [ search, mid_ix]
    elsif start_ix == mid_ix then
        puts "%d not found" % [ search ]
    elsif array[mid_ix] < search
        bsearch(array, search, mid_ix, end_ix)
    elsif array[mid_ix] > search then
        bsearch(array, search, start_ix, mid_ix)
    end
end
 
bsearch([10, 20, 30, 40, 50, 60, 70, 80, 90], ARGV[0].to_i)

It’s probably not by best work, and I’m pretty doubtful that I could have done it in the same amount of time under the stress of a job interview. But it works, and that’s good enough for me, considering that I haven’t been a full time programmer in quite some time and haven’t written anything like this since college. ;-)

16 April 2010 ~ Comments Off

Oh Yeah, I Moved To Colorado

Whoa dudes, long time no blog!

So as most of you probably already know, my girlfriend and I relocated from Connecticut to Colorado about four months ago. No, we didn’t have new jobs or even any friends out here. We were both just looking for a change, and since neither of us really knew the “right way” to make a move like this, we basically just quit our jobs, packed up our cars, and drove across the country. I know, that’s a pretty brave (and possibly stupid) thing to do in the face of a major recession, but hey, we were looking for an adventure! So how did things turn out for us? Well I’m glad you asked, because it was probably the best decision either of us ever made. ;-)

First of all, the technology scene out here simply dwarfs anything we had back home. I’d say the Denver/Boulder area is about as close as you can get to Silicon Valley without actually living there, and that’s perfectly fine by me. Although I actually considered moving to California for a long time, I know how crowded it is there, and after several years of hour-long commutes on the east coast, traffic was something I was looking to avoid as much as possible. So after living here for less than a month, I was offered the position of Chief Infrastructure Engineer at SocialMedia.com. Now I’m happy to say my commute is only about 15-20 minutes, and the entire ride looks like this! Trust me, the scenery out here looks even more amazing in real life.

But the commute isn’t the only nice thing about my new job. ;-) My offer letter mentioned an opportunity to “push the very boundary of what’s possible” and “get a chance to work side by side with some of the brightest minds around.” This has proven to be absolutely true. Today I’m doing all kinds of fun stuff with EC2 and Chef in order to automate some pretty interesting high-performance distributed systems. But after being the “go to guy” for so long, these last few months have also been a bit of a humbling experience. Guys like John De Goes, Spencer Tipping, Luke Palmer, Kris Nuttycombe, and Christian Parker have managed to make me feel stupid on a fairly regular basis, but I consider that a good thing. It’s a nice feeling to be surrounded by such brilliant people.

In other news, as positive feedback about my RHCE Exam Experience post continues to pile up in my inbox, I’ve been wondering if I should look into getting the now infamous “RHCE Cheat-Sheet” published as some sort of pocket guide. Would any publisher actually be interested in this? Would anyone out there actually buy it? And perhaps most importantly, would Red Hat try to sue me over it? ;-)

14 April 2010 ~ Comments Off

London DevOps meet 28/04/2010

I have finalized speakers for the next London DevOps get together, I sent the mail below to the list, looking forward to seeing everyone there!

Hello,

I am glad to announce speakers for our first meet hosted by The Guardian.

We will meet at their shiny new offices in Kings Cross to start at 7pm, those who went to
Scale Camp will know the venue.

We have two talks of roughly 30 minutes each lined up:

We will have some time for a few lightning talks if there’s any interest before retiring to a nearby pub. If anyone has Pub suggestions please send them along.

Map and details can be found as usual at http://londondevops.org/meetings/

Thanks again to The Guardian for the venue, if anyone out there want to sponsor some sodas or something for at the venue please get in contact.

I will try to set up some RSVP system, if you mention the meet on twitter please use the #ldndevops hashtag!

14 April 2010 ~ Comments Off

Xen Live Migration with MCollective

I retweeted this on twitter, but it’s just too good to not show. Over at rottenbytes.com Nicolas is showing some proof of concept code he wrote with MCollective that monitors the load on his dom0 machines and initiate live migrations of virtual machines to less loaded servers.

This is the kind of crazy functionality I wanted to enable with MCollective and it makes me very glad to see this kind of thing. The server side and client code combined is only 230 lines – very very impressive.

This is a part of what VMWare DRS does Nico has some ideas to add other sexy features as well as this was just a proof of concept. The logic for what to base migrations on will be driven by a small DSL for example.

I asked him how long it took to knock this together: time taken to get acquainted with MCollective combined with time to write the agent and client was only 2 days, that’s very impressive. He already knew Ruby well though :) And has a Ruby gem to integrate with Xen.

I’m copying the output from his code below, but absolutely head over to his blog to check it out he has the source up there too:

[mordor:~] ./mc-xen-balancer
[+] hypervisor2 : 0.0 load and 0 slice(s) running
[+] init/reset load counter for hypervisor2
[+] hypervisor2 has no slices consuming CPU time
[+] hypervisor3 : 1.11 load and 3 slice(s) running
[+] added test1 on hypervisor3 with 0 CPU time (registered 18.4 as a reference)
[+] added test2 on hypervisor3 with 0 CPU time (registered 19.4 as a reference)
[+] added test3 on hypervisor3 with 0 CPU time (registered 18.3 as a reference)
[+] sleeping for 30 seconds
 
[+] hypervisor2 : 0.0 load and 0 slice(s) running
[+] init/reset load counter for hypervisor2
[+] hypervisor2 has no slices consuming CPU time
[+] hypervisor3 : 1.33 load and 3 slice(s) running
[+] updated test1 on hypervisor3 with 0.0 CPU time eaten (registered 18.4 as a reference)
[+] updated test2 on hypervisor3 with 0.0 CPU time eaten (registered 19.4 as a reference)
[+] updated test3 on hypervisor3 with 1.5 CPU time eaten (registered 19.8 as a reference)
[+] sleeping for 30 seconds
 
[+] hypervisor2 : 0.16 load and 0 slice(s) running
[+] init/reset load counter for hypervisor2
[+] hypervisor2 has no slices consuming CPU time
[+] hypervisor3 : 1.33 load and 3 slice(s) running
[+] updated test1 on hypervisor3 with 0.0 CPU time eaten (registered 18.4 as a reference)
[+] updated test2 on hypervisor3 with 0.0 CPU time eaten (registered 19.4 as a reference)
[+] updated test3 on hypervisor3 with 1.7 CPU time eaten (registered 21.5 as a reference)
[+] hypervisor3 has 3 threshold overload
[+] Time to see if we can migrate a VM from hypervisor3
[+] VM key : hypervisor3-test3
[+] Time consumed in a run (interval is 30s) : 1.7
[+] hypervisor2 is a candidate for being a host (step 1 : max VMs)
[+] hypervisor2 is a candidate for being a host (step 2 : max load)
trying to migrate test3 from hypervisor3 to hypervisor2 (10.0.0.2)
Successfully migrated test3 !

14 April 2010 ~ Comments Off

Mcollective & Xen : naughty things

eth0I already blogged about my experiments with mcollective & xen but I had something a little bigger in my mind. A friend had sent me a video showing some vmware neat features (DRS mainly) with VMs migrating through hypervisors automatically.

So I wrote a “proof of concept” of what you can do with an awesome tool like mcollective. The setup of this funny game is the following :

  • 1 box used a iSCSI target that serves volumes to the world
  • 2 xen hypervisors (lenny packages) using open-iscsi iSCSI initiator to connect to the target. VMs are stored in LVM, nothing fancy

The 3 boxens are connected on a 100Mb network and the hypervisors have an additionnal gigabit network card with a crossover cable to link them (yes, this is a lab setup). You can find a live migration howto here.

For the mcollective part I used my Xen agent (slightly modified from the previous post to support migration), which is based on my xen gem. The client is the largest part of the work but it’s still less than 200 lines of code. It can (and will) be improved because all the config is hardcoded. It would also deserve a little DSL to be able to handle more “logic” than “if load is superior to foo” but as I said before, it’s a proof of concept.

Let’s see it in action :

hypervisor2:~# xm list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0   233     2     r-----    873.5
hypervisor3:~# xm list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0   232     2     r-----  78838.0
test1                                        6   256     1     -b----     18.4
test2                                        4   256     1     -b----     19.3
test3                                       20   256     1     r-----     11.9

test3 is a VM that is “artificially” loaded, as is the machine “hypervisor3″ (to trigger migration)

[mordor:~] ./mc-xen-balancer
[+] hypervisor2 : 0.0 load and 0 slice(s) running
[+] init/reset load counter for hypervisor2
[+] hypervisor2 has no slices consuming CPU time
[+] hypervisor3 : 1.11 load and 3 slice(s) running
[+] added test1 on hypervisor3 with 0 CPU time (registered 18.4 as a reference)
[+] added test2 on hypervisor3 with 0 CPU time (registered 19.4 as a reference)
[+] added test3 on hypervisor3 with 0 CPU time (registered 18.3 as a reference)
[+] sleeping for 30 seconds

[+] hypervisor2 : 0.0 load and 0 slice(s) running
[+] init/reset load counter for hypervisor2
[+] hypervisor2 has no slices consuming CPU time
[+] hypervisor3 : 1.33 load and 3 slice(s) running
[+] updated test1 on hypervisor3 with 0.0 CPU time eaten (registered 18.4 as a reference)
[+] updated test2 on hypervisor3 with 0.0 CPU time eaten (registered 19.4 as a reference)
[+] updated test3 on hypervisor3 with 1.5 CPU time eaten (registered 19.8 as a reference)
[+] sleeping for 30 seconds

[+] hypervisor2 : 0.16 load and 0 slice(s) running
[+] init/reset load counter for hypervisor2
[+] hypervisor2 has no slices consuming CPU time
[+] hypervisor3 : 1.33 load and 3 slice(s) running
[+] updated test1 on hypervisor3 with 0.0 CPU time eaten (registered 18.4 as a reference)
[+] updated test2 on hypervisor3 with 0.0 CPU time eaten (registered 19.4 as a reference)
[+] updated test3 on hypervisor3 with 1.7 CPU time eaten (registered 21.5 as a reference)
[+] hypervisor3 has 3 threshold overload
[+] Time to see if we can migrate a VM from hypervisor3
[+] VM key : hypervisor3-test3
[+] Time consumed in a run (interval is 30s) : 1.7
[+] hypervisor2 is a candidate for being a host (step 1 : max VMs)
[+] hypervisor2 is a candidate for being a host (step 2 : max load)
trying to migrate test3 from hypervisor3 to hypervisor2 (10.0.0.2)
Successfully migrated test3 !

Let’s see our hypervisors :

hypervisor2:~# xm list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0   233     2     r-----    878.9
test3                                       25   256     1     -b----      1.1
hypervisor3:~# xm list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0   232     2     r-----  79079.3
test1                                        6   256     1     -b----     18.4
test2                                        4   256     1     -b----     19.4

A little word about configuration options :

  • interval : the poll time in seconds.  this should not be too low, let the machine some time and avoid load peeks to distort the logic.
  • load_threshold : where you consider the machine load is too high and that it is time to move some stuff away (tampered with max_over, see below)
  • daemonize : not used yet
  • max_over : maximum time (in minutes) where load should be superior to the limit. When reached, it’s time, really. Don’t set it too low and at least 2*interval or sampling will not be efficient
  • debug : well….
  • max_vm_per_host : the maximum VMs a host can handle. If a host already hit this limit it will not be candidate for receiving a VM
  • max_load_candidate : same thing as above, but for the load
  • host_mapping : a simple CSV file to handle non-DNS destinations (typically my crossover cable address have no DNS entries)

What is left to do :

  • Add some barriers to avoid migration madness to let load go down after a migration or to avoid migrating a VM permanently
  • Add a DSL to insert some more logic
  • Write a real client, not a big fat loop

Enjoy the tool !

Files :

11 April 2010 ~ Comments Off

The silliest poem on earth

The silliest poem on earth

Love your God, because he is the MAN in your deepest despair.

Love your parents, because without them you wouldn’t be here.

Love your self, for others to love you.

Love your home, because without your love it will just be a house.

Love your friends, because you trust them completely.

Love your family, because they will support you in your time of need.

Love your country, because we get our culture from our country.

Love your world, because it is your home.

Love your culture, because it is a gift from your ancestors.

11 April 2010 ~ Comments Off

Authorization plugins for MCollective SimpleRPC

Till now The Marionette Collective has relied on your middleware to provide all authorization and authentication for requests. You’re able to restrict certain middleware users from certain agents, but nothing more fine grained.

In many cases you want to provide much finer grain control over who can do what, some cases could be:

  • A certain user can only request service restarts on machines with a fact customer=acme
  • A user can do any service restart but only on machines that has a certain configuration management class
  • You want to deny all users except root from being able to stop services, others can still restart and start them

This kind of thing is required for large infrastructures with lots of admins all working in their own group of machines but perhaps a central NOC need to be able to work on all the machines, you need fine grain control over who can do what and we did not have this will now. It would also be needed if you wanted to give clients control over their own servers but not others.

Version 0.4.5 will have support for this kind of scheme for SimpleRPC agents. We wont provide a authorization plugin out of the box with the core distribution but I’ve made one which will be available as a plugin.

So how would you write an auth plugin, first a typical agent would be:

module MCollective
    module Agent
         class Service<RPC::Agent
             authorized_by :action_policy
 
             # ....
         end
    end
end

The new authorized_by keyword tells MCollective to use the class MCollective::Util::ActionPolicy to do any authorization on this agent.

The ActionPolicy class can be pretty simple, if it raises any kind of exception the action will be denied.

module MCollective
    module Util
         class ActionPolicy
              def self.authorize(request)
                  unless request.caller == "uid=500"
                      raise("You are not allow access to #{request.agent}::#{request.action}")
                  end
              end
         end
    end
end

This simple check will deny all requests from anyone but Unix user id 500.

It’s pretty simple to come up with your own schemes, I wrote one that allows you to make policy files like the one below for the service agent:

policy default deny
allow   uid=500 *                    *                *
allow   uid=502 status               *                *
allow   uid=600 *                    customer=acme    acme::devserver

This will allow user 500 to do everything with the service agent. User 502 can get the status of any service on any node. User 600 will be able to do any actions on machines with the fact customer=acme that also has the configuration management class acme::devserver on them. Everything else will be denied.

You can do multiple facts and multiple classes in a simple space separated list. The entire plugin to implement such policy controls was only 120 – heavy commented – lines of code.

I think this is a elegant and easy to use layer that provides a lot of functionality. We might in future pass more information about the caller to the nodes. There’s some limitations, specifically about the source of the caller information being essentially user provided so you need to keep that mind.

As mentioned this will be in MCollective 0.4.5.

09 April 2010 ~ Comments Off

Devops homebrew

There has been quite a bit of discussion about Devops and what it means. @blueben has suggested we start a Devops patterns cookbook so people can learn what worked or didn't work. This is the description of the environment we implemented at a previous job. Some of these things may or may not work for you. I will try to keep it short.

Environment background

7 distinct applications/products that had to be deployed and tested ie. base/core application, messaging platform, reporting app etc. All applications were Java based running on either Tomcat or Jboss.

Application design for deployment

These are some of the key points

  1. Application should have a sane default configuration options. Any option should be overrideable by an external file. In most cases you only need to override database credentials (host, username, password). Goal is to be able to use the same binary across multiple environments.
  2. Application should expose key internal metrics. We for instance asked for a simple key/value pairs web page ie. JMSenqueue=OK etc. This is important because there are lots of things that can break inside the application which external monitoring may miss like JMS message can't be enqueued, etc.
  3. Keep release notes actions to a minimum. Release notes are often not followed or partially followed thus make sure point 1. is followed and/or try to automate everything else.

Continuous Integration

We used CruiseControl for Continuous Integration. It was used solely to make sure that someone didn't break the build.

Creating releases

Developers are in charge of building and packaging releases. This primarily because QA or Ops will not know what to do if a build fails (this is Java remember). Each release has to be clearly labeled with the version and tagged in the repository. For example Location 1.1.5 will be packaged as location-1.1.5.tar.gz. Archives should contain only WAR (Tomcat) or EAR (Jboss) files and DB patch files. Releases are to be deposited into an appropriate file share ie. /share/releases/location.

Deployment

In order to eliminate most manual deployment steps and support all the different applications we decided to write our own deployment tool. First we started off with a data model which roughly broke down to

  1. Applications – can use different app server containers ie. Tomcat/JBoss, may/will have configuration files that can be either key/value pairs or templates. For every application we also specified a start and stop script (hotdeploy was not an option due to bad experiences with our code).
  2. Domains/Customers – we wanted a single Dashboard that would allow us to deploy to multiple environments e.g. QA staging (current release), QA development (next scheduled release), Dev playbox, etc. Each of these domains had their own set of applications they could deploy with their own configuration options

First we wrote a command line tool that was capable of doing something like this

$ deployer –version 1.2.5 –server web10 –domain joedev –app base –action deploy 

What this would do is

  1. Find and unpack the proper app server container e.g. jboss-4.2.3.tar.gz
  2. Overlay WAR/EAR files for the name version e.g. base-1.2.5.tar.gz
  3. Build configuration files and scripts
  4. Stop the server on the remote box (if it's running)
  5. Rsync the contents of the packaged release
  6. Make sure Apache AJP proxy is configured to proxy traffic and do Apache reload
  7. Start up the server

One of the main reason we started off with a command line tool is that we could easily write batch scripts to upgrade whole set of machines. This was borne out of pain of having to upgrade 200 instances via a web GUI at another job.

Once deployer was working we wrote a web GUI that interfaced with it. You could do things like View running config (what config options are actually on the appserver), Stop, Restart, Deploy (particular version), Reconfig (apply config changes) and Undeploy. We also added the ability to change or add configuration options to the application specific override files. Picture is worth thousand words. This is a tiny snippet how it approximately looked for one domain

This was a big win since QA or developers no longer needed to have someone from ops deploy software.

DB patching

Another big win was "automated" DB patching. Every application would have a table called Patch with a list of DB patches that were already applied. We also agreed that every app would have dbpatches directory in the app archive which would contain a list of patches named with version and order in which they should be applied e.g.

  • 2.54.01-addUserColumn.sql
  • 2.54.02-dropUidColumn.sql

During deployment startup script would compare contents of the patch table and a list of dbpatches and apply any missing ones. If the patch script failed e-mail would be sent to the QA or dev in charge of particular domain.

A slightly modified process was used in production to try to reduce down time ie. things like adding a column could be done at any time. Automated process was largely there to make QA's job easier.

QA and testing

When a release was ready QA would deploy the release themselves. If there was a deployment problem they would attempt to troubleshoot it themselves then contact the appropriate person. Most of the times it was an app problem ie. particular library didn't get commited etc. This was a huge win since we avoided a lots of "waterfall" problems by allowing QA to self-service themselves.

Production

Production environment was strictly controlled. Only ops and couple key engineers had access to it. Reason was we tried to keep the environment as stable as possible. Thus ad hoc changes were frowned upon. If you needed to make a change you would either have to commit a change into the configuration management system (puppet) or use the deployment tool.

Production deployment

The day before the release QA would open up a ticket listing all the applications and versions that needed to be deployed. On the morning of the deployment (that was our low time) someone from ops, development and whole QA team engaged in deploying the app and resolving any observed issues.

Monitoring

Regular metrics such as CPU utilization, load etc. were collected. In addition we kept track of internal metrics and set up adequate alerts. This is an ongoing process since over time you discover what your key metrics are and what their thresholds are ie. number of threads, number of JDBC connections etc.

Things that didn't work so well or were challenging

  1. One of the toughest parts was getting developers' attention to add "goodies" for ops. Specifically exposing application internals was often put off until eventually we would have an outage and lack of having the metric resulted in extended outage.
  2. Deployment tool took couple tries to get right. Even as it was there were couple things I would have done differently ie. not relying on a relational database for the data model since it made it difficult to create diffs (you had to dump the whole DB). I'd likely go with JSON so that diffs could be easily reviewed and committed.
  3. Other issues I can't recall right now :-)

Wrapup

This is the shortest description I could write. There are a number of things I glossed over and omitted so that this is not too long. I may write about those on another occasion. Perhaps the key take away should be that Ops should focus on developing tools that either automate things or allow its customers (QA, dev, technical support, etc.) to self-service themselves.