Pigz – Shortening backup times with parallel gzip

While searching for a completely different piece of software I stumbled on to the pigz application, a parallel implementation of gzip for modern multi-processor, multi-core machines. As some of our backups have a gzip step to conserve some space I decided to see if pigz could be useful in speeding them up.

Using remarkably unscientific means (I just wanted to know if it's worth further investigation) I ran a couple of sample compression runs. The machine is a quad core Dell server, the files are three copies of the same 899M SQL dump and the machine is lightly loaded (and mostly in disk IO).


#######################################
# Timings for two normal gzip runs
dwilson@pigztester:~/pgzip/pigz-2.1.6$ time gzip 1 2 3

real    2m43.429s
user    2m39.446s
sys     0m3.988s

real    2m43.403s
user    2m39.582s
sys     0m3.808s

#######################################
# Timings for three pigz runs

dwilson@pigztester:~/pgzip/pigz-2.1.6$ time ./pigz 1 2 3

real    0m46.504s
user    2m56.015s
sys     0m4.116s

real    0m46.976s
user    2m55.983s
sys     0m4.292s

real    0m47.402s
user    2m55.695s
sys     0m4.256s

Quite an impressive speed up considering all I did was run a slightly different command. The post compression sizes are pretty much the same (258M when compressed by gzip and 257M with pigz) and you can gunzip a pigz'd file, and get back a file with the same md5sum.

# before compression
-rw-r--r-- 1 dwilson dwilson 899M 2010-04-06 22:12 1

# post gzip compress
-rw-r--r-- 1 dwilson dwilson 258M 2010-04-06 22:12 1.gz

# post pigz compress
-rw-r--r-- 1 dwilson dwilson 257M 2010-04-06 22:12 1.gzs

I'll need to do some more testing, and compare the systems performance to a normal run while the compression is happening, before I trust it in production but the speed ups look appealing and, as it's Mark Adler code, it looks like it might be an easy win in some of our scripts.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

HTTP Server Headers via Cucumber

One of my little side projects is moving an old, configured in little steps over a long period of time, website from apache 1.3 to a much more sensible apache 2.2 server. I've been thinking about how to get the most out of the testing I need to do for the move and so today I decided to do some yak shaving and write some simple regression tests, play with Cucumber Nagios, rspec matchers and write a little ruby.

It's not exactly polished but after half an hour (mostly spent wrangling with has_key / have_key) I ended up with the following simplified example for testing HTTP headers:


Feature: http://www.unixdaemon.net/ response headers
 
  Scenario: Server header should be production quality
    When I fetch http://www.unixdaemon.net/
    Then the "Server" header should be "Apache"
 
  Scenario: Response header should contain an Etag
    When I fetch http://www.unixdaemon.net/
    Then the response should contain the "Etag" header
 
  Scenario: The Content-Type header should contain text/html
    When I fetch http://www.unixdaemon.net/
    Then the "Content-Type" header should contain "text/html"
 
  Scenario: The Content-Type header should not contain text/xml
    When I fetch http://www.unixdaemon.net/
    Then the "Content-Type" header should not contain "text/xml"

You can also find the cucumber-nagios steps for testing HTTP headers online. It's only a first step towards the full web server move safety net but it's useful one that'll stay in my toolkit.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

HTML & CSS – The Good Parts – Short Review

I'm guessing that if you're reading this then you've seen my very basic website at some point. I learned some HTML and CSS back when Netscape 4 and HTML 3.2 roamed the earth and while some of my very front end gifted co-workers have bought bits of my knowledge up to date I still don't understand how to properly lay out a CSS only multicolumn page without cheating.

I'm not sure if it's because i had vague expectations on what this book would cover or just if I'm not the target market for HTML & CSS The Good Parts but I've read the thing from cover to cover and nothing really stands out to me. All the right words are spoken, content vs style separation is good etc. but none of it feels new to me, the material is not explained in any new way that really gets the message across where other methods have failed and I very nearly gave up on the book half a dozen times. It's not a bad or horribly written book but it's also not one I could pick three best bits out of.

Make sure you have a skim through before you buy. Score 3/10

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

Ada Lovelace Day – 2010

So today is Ada Lovelace day and we're supposed to "celebrate the achievements of women in technology and science." I don't know many women in science but I do know a few in technology and one in particular seems to go from back breaking task to another with politeness and grace I wish I could muster.

So for my 2010 Lovelace day (and because she'll need all the happy thoughts she can get now she's president of the Perl Foundation) I'm naming Karen Pauley. A long standing member of the perl community who's been involved in getting things done for more years than many people realise. Listing all her achievements would take a LOT of screen space (and annoy the hell out of her) but, to name three, her TPF work, YAPC::EU organisation and involvement in more related FOSS communities than you can shake a stick at are no small matter.

Speaking as someone who's seen her speak over half-a-dozen times, it's easy to see that Karen has a gift when it comes to presenting. Whether it's about technology, business or community its rare to hear her speak and not come out feeling both smarter and entertained, a combination we'd all love to be able to perform.

I've been lucky enough to chat with Karen outside of conferences and I've always come away from our email conversations with a smile and often with an idea of two, it's hard not to when you're speaking with someone who's both intelligent and a remarkable communicator. Karen is an exceptional person who we're lucky to have in the perl world, and I'm very fortunate to be able to call a friend.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

Giving Cloud Computing An Edge – LOSUG March 2010

The LOSUG seems to be the user group with the least cross over of attendees that I go to. It seems to be a three part mix - Sun engineers going along to meet co-workers and get the external eye on to what's happening in different parts of the project, Unix people with dozens of years of experience who want something technical and interesting that matters on the server and people that don't listen to the speaker and then ask questions that, quite frankly, they should be embarrassed over. It's hard to stress how much I've always enjoyed the talks at LOSUG but some of the questions are just... insane.

Right, now I've got that of my chest - and I'll probably get lynched for it in the future - back to the March presentation by Alasdair Lumsden. I'm not going in to details about it as you can read the Giving Cloud Computing An Edge slides yourself now. It was an interesting talk and provided a nice counterbalance to similar talks I've heard in the past about Xen and UML hosting.

What made this LOSUG different to all the others though is that things are changing. Sun's always been very supportive of LOSUG (and always willing to put their hand in their pockets for food, drink and speakers) and now that Sun is owned by Oracle the group will be less driven by the core organisers. You can find more details (and less of me putting words in peoples mouths) at The Future of LOSUG but I wanted to take this chance to both encourage people to come along and show Oracle that the group's important and to say thank you to Joy Marshall, James MacFarlane and Stuart Smith - who have month in and month out organised an excellent event with speakers you couldn't see anywhere else.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

Network Ninja – Short Review

I'd never even heard of this book until Bob used its name in the same sentence as the excellent "Cisco Routers for the Desperate". However while that book is about hands on practical Cisco advice Network Ninja is all about the theory - from IP addressing to routing protocols.

While no one's ever going to confuse 200 easy to read pages with the Stevens books this slender volume is an excellent refresher for the experienced admin who doesn't do too much to the network on a day-to-day basis or for the less experienced admin who wants to know some of the why instead of just the command lines.

An enjoyable and opinionated book that covers a lot of ground in a low page count. Only let down by some bad editing - 7/10

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

LibguestFS GLLUG Talk

Over the years there have been a handful of GLLUG members that have given so many interesting talks that I'll always turn up to watch them - and Richard Jones is definitely in that short list.

The website does an excellent job of explaining: "libguestfs is a library for accessing and modifying virtual machine (VM) disk images. Amongst the things this is good for: making batch configuration changes to guests, viewing and editing files inside guests (virt-cat, virt-edit), getting disk used/free statistics (virt-df), migrating between virtualization systems (virt-p2v), performing partial backups, performing partial guest clones, cloning VMs and changing registry/UUID/hostname info, and much else besides." but it doesn't quite convey how cool it is to spin up access in to a windows machine in a handful of seconds and then dump out the registry key you're looking for - all from a Linux command line.

Oh, and even if you didn't turn up (tsk tsk) you can read all about the libguestfs gllug talk here.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

The Book of Xen – Short Review

Although I've been a big fan of virtualization for many years I've mostly been a VMWare man. UML was good for the time but VMWare workstation and GSX always seemed to be better solutions - and they had the benefits of dealing with Windows. At $WORK we looked at using Xen for our new development environment but it never felt very finished, little things like needing to compile your own dhcp client in order to get PXE booting working always felt very wrong.

But now we're looking to move away from VMWare server for certain parts of our infrastructure everything's back on the table so I went looking for a guide through the lands of Xen in the modern world - and I think I found an excellent one in The Book of Xen.

The book takes you through all the aspects of using Xen that you'd expect, from installing it, configuring the guests (DomU in Xen terminology) to making the most out of the networking options and local storage possibilities. Where it goes that extra mile is in sections like 'Beyond Linux', which guides you through using NetBSD and Solaris with Xen, Profiling and benchmarking under Xen and Lessons from the trenches, in which the authors (who run a Xen hosting service) tell you about their real-world aches and pains.

Apart from the chapter on the commercial Citrix XenServer, which I can understand the inclusion of but isn't useful to me, there was something interesting in every chapter. After working through the book I have a good understanding of what needs attention in a Xen hosting setup and what might be weaknesses. All I need now is a similar book for KVM so I can avoid doing all my own research!.

An excellent guide to Xen that brings a lot of useful material into one place - 7/10

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

London DevOps – March 2010

This month was the first of the London DevOps tech talks. Organised by R I Pienaar and masterfully shepherded on the evening by Chris Read about thirty sysadmins (and some developers, project managers and scrum masters) met for a series of impromptu discussions, beer and pizza

While there was no formal schedule for the evening Chris led the group in a fishbowl, seeding some ideas and then watched the conversations bloom. We went through some tool chain issues, trending, log analysis, how Splunk is the best thing since sliced bread with bacon in it and how Centos does some very interesting things with the data they collect. It was the first fishbowl I'd ever attended and it was actually a lot of fun, especially when people suggested RDF and SPARQL for a common data store.

A short break was taken when the pizza arrived and a number of interesting conversations broke out, how little admin time Apache Solr seems to need (and how odd it is to use rsync and shell scripts to sync out changes), how Redis and CouchDB are making certain problem domains easier to deal with and how the BBC has so many cool people hidden away were among those I ambled in to.

ThoughtWorks kindly donated beer, pizza and most importantly the venue - and for that we should say thank you. Getting a decent venue is always difficult for a new group. Although it's early days the group feels like it's got potential, the conversations were interesting, we don't all agree on where we should be heading and what we need next but the atmosphere was friendly and open. Hopefully these meets will last longer than SAGE-WISE did, with all the developer focused events in London it's nice to get to one that's a little closer to what I do.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

BSD Magazine – A decent read

While looking for an OpenBSD baseball cap on the BSD stalls at FOSDEM I was given a couple of issues of the BSD Magazine to flick through - and it's a lot better than I'd hoped.

As most of the UK Linux magazines have become very desktop focused it's nice to see some actual low-level code - packaging for OpenBSD, writing sound drivers for your NetBSD NSLU2, custom Jabber components and basic GDB were all in the two issues I skimmed. While it's not the dearly departed Sysadmin Magazine, and it could do with an editor or two - much as I could, it is a decent read and I'm considering a subscription.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!