Hi! Welcome...

Syndication of blogs and tweets by users of the Freenode ##infra-talk IRC channel

01 June 2010 ~ Comments Off

PuppetCamp Europe 2010

Last week was pretty heavy on conferences for me. On wednesday I had to give my Building Virtual Appliances talk at the at the Sizing Server event on Advanced Virtualization and Hybrid Cloud Computing , but the most important part of the week was the first edition of Puppetcamp Europe.

When the first ideas about PuppetCamp Europe started I asked Luke when and where it'd be held. He replied that I should know as I was supposed to organise it... I thanked for the honour , he went on to ask Patrick , he accepted ... I hope I helped him out enough :) I even handed out a personal invitation to some of the most famous configuration mgmt people on this planet and Inuits sponsored the event too

Luke started with the opening talk, talking about the future and past of puppet , about version numbers, 2.6 does sound familiar and stable doesn't it, about forge.puppetlabs.com
During @puppetmasterd 's talk @kartar played Bugmaster which was great and almost realtime

The real fun started with the Open Spaces ... after everybody presented themselves, a mix of usual suspects, first timers and oldskoolers from irc #puppet that finally got faces, different sessions were proposed, ranging from Puppet 101, Alternative Puppet Architectures, Puppet HA, MultiMaster Puppet to Dating for PuppetMasters

Over the 2 days spread the open space different ideas came up on e.g how to scale puppet. Different people are letting their puppetclients run from cron in batches, but probably the weirdest idea I heard was to run Puppet in Jruby in order to speed it up.

Lots of talk on certificates and how to solve the pains with them .. e.g like in a HA setup .. you need to create an authority chain .. there was also talk about having a
--trust-my-network feature that would disable certificates, Luke was open to accepting such a patch, or a patch that would make the whole certificate setup more pluggable
That would for sure be a feature a lot of people would want to use ..

The thurday evening conference dinner was "Stoofvlees met Frieten" for most of us .. but for me it was a London Devops Curry in Gent, with @unixdaemon @ripienaar and some others ;)

But with lots of interesting chatter, free beer and free icecream there's for sure going to be another similar event in Europe next year ..

Trackback URL for this post:

http://www.krisbuytaert.be/blog/trackback/1007

01 June 2010 ~ Comments Off

Call For Abstracts : NLUUG Fall Conference on Security

For all the security experts : the NLUUG has published it's Call For Abstracts for it's Fall conference.. as you might have guessed the topic is Security, we welcome all abstracts tackling security in a broad sense.

Possible topics include:

* cloud security
* online privacy
* rfid hacking
* secure programming
* programma-analysis-tools
* web services security
* web browser security
* embedded hardware hacking
* incident response and forensics
* malware and rootkits
* responsible disclosure
* legal response
* fighting spam
* patch policies
* identity management
* central point of administration
* DNSsec
* VPN based WANs
* etc.

The NLUUG fall conference is scheduled on 11 November 2010 in De Reehorst in Ede, the Netherlands.

Hint.. maybe a talk on secdevops would be welcomed too :)

Disclaimer : I`m on the program committee

Trackback URL for this post:

http://www.krisbuytaert.be/blog/trackback/1006

01 June 2010 ~ Comments Off

xdotool 2.20100601 release

Thanks to some early testing and feedback from the previous xdotool release, I've put together a new release that fixes a few minor problems.

Download: xdotool-2.20100601.2912.tar.gz

As usual, if you find problems or have feature requests, please file bugs or send an email to the list.

Changelist since previous announcement:

2.20100601.*:
  - Add --sync and --clearmodifiers support to mousemove_relative
  - Fix bug in mousemove_relative --polar (Reported by Paul S via mailing list)
  - Change polar coordinates to be 'north'-oriented (0 is up, 90 is right...)
    (Requested by Paul S via mailing list)
  - Changed xdotool search flags. '--title' now means '--name' to match the
    window name (shown in the window manager title bar for the window).
    Related: http://code.google.com/p/semicomplete/issues/detail?id=33
    --title still works, but you will get a warning about deprecation.
  - Walked through all commands and tried to make sure the manpage reflects
    reality and has more detail where needed.

2.20100525.*:
  - Skip certain tests when the requirements aren't met (ie; no such window manager, etc)
    Reported by Daniel Kahn Gillmor.

27 May 2010 ~ Comments Off

Puppet and policy – violator or enforcer?

A common challenge for an organisation running Puppet is how balance the desire for a fully automated and standardised environment, with the risk that automated Puppet runs may introduce bugs or revert hot fixes. This concern was apparent at Puppetcamp this morning, when Rafael Brito from the New York Stock Exchange gave an informative presentation about his experience of using Puppet to build machines for their live platform. What particularly struck me was that although his team puts a lot of effort into creating a standard environment, the current culture is that operations teams on the ground can and should make live changes to boxes, and that these changes may not ever make it back into Puppet.

I asked Rafael how frequently Puppet runs on the live machines, to ensure the state of each machine is kept the same, and according to standards. He told me ‘once a quarter’. I think it’s fair to say that in a context such as this, Puppet is not really being used as a config management tool - it’s being used as part of the build process to produce a standard image, which is then being managed in the traditional way.

I fully understand the motivation behind this approach. This is a very high profile application, and there’s a worry that mistakes in the Puppet manifest could accidentaly be rolled out to the live site and cause a massive problem. Their situation is also complicated by having a large, multi-tiered operations team, across several countries, many of whom who don’t know how to use Puppet. The approach they have settled on is to allow engineers to make chanegs to the live site, but to be aware that these machines will effectively be refreshed every quarter, and so there’s a risk that these changes may be lost. This places the burden of maintaining the standard on the team writing and maintaining the Puppet manifests to ensure that changes made by the operations team are folded in.

The trouble with approach is that it means that the de facto standard is always the current state of the machines, as modified by the operations team. If the Puppet run undoes some fixes applied by the operations team, Puppet is placed in the position of standards violator - that’s not a great place to be.

Once issue we often come across with clients who have started to use Puppet occurs when a change rolled out by Puppet breaks the system. In this situation Puppet advocates are in a weak negotiating position - we can argue that the changes should have been made in Puppet, but when the site is down, and money is being lost, somehow that argument doesn’t win much support. The fact is that when a mistake is made, Puppet gets blamed - it broke the site. Sadly this can even result in pressure to stop using this unstable, unreliable tool.

I’d like to turn this on its head. We all agree that we need a standard or set of standards to which the live site must adhere. Let’s make Puppet the enforcer of this standard, and never the violator. This standard can be designed, tested, approved and signed off. This is the standard - we don’t diverge from it. Now we can set up a mechanism for testing the site against the standard, so we know if the standard has ever been broken.

A great way to do this is simply to run Puppet in noop mode, so it doesn’t make the changes, but simply reports what changes it would make if it were to run in live mode. If our standard is being adhered to, Puppet should usually report that it wouldn’t make a change. If Puppet reports that it would make a change, this should only ever be because that change has been approved by, for example, a change advisory board. This mechanism, therefore, will alert us as to whether the machine is out of sync with the stand, what changed, and how Puppet intends to revert the system to the agreed standard. Running this process with reasonable frequency will give us a pretty granular report into when changes we made, and could even be tied into system logs to identify the most likely source of the change. The output of the process could be parsed and monitored, and alerts raised to senior stakeholders, and emails reports sent out, detailing the change that has occurred.

This way we get to play the role of enforcer - we can say: Hey look - this change has happened - we can change it back again, and we should, but we need to find who made the change, why they made it, make it in Puppet, then back it out and apply it properly. We then need to identify and educate the policy breakers, and find out what happened.

This approach, I think, walks the line between the kind of careful conservatism that a production site needs, and the desire to make use of the power of Puppet to guarantee a consistent environment.

Of course this approach will also catch the other risk - the risk that someone has committed a change to Puppet which may get rolled out to live machines when not wanted. Again, there needs to be a policy to protect this. Puppet changes in a live envirnment of this nature should not be made unless tested. This means that Puppet chanegs should be made in a testing branch, and confirmed against a test environment, and only merged into the production repository when the testing has been completyed to everyone’s satisfaction, and, in some environemnts, only rolled out following the appropriate change control mechanism. An hourly noop run, monitored, would immediatey alert if someone had managed to get a change into the love puppet manifest without following the correct procedure.

Of course not running the puppet daemon automatically brings with it a different set of management challenges - such as ensuring all machines are uptodate, and how to minimise the time taken to bring the machines into sync. My answer to this is to orchestrate your puppet clients from a central location, rather than to run your puppet clients in daemon mode. I’ll cover this in a future article.

27 May 2010 ~ Comments Off

Puppet and policy – violator or enforcer?

A common challenge for an organisation running Puppet is how balance the desire for a fully automated and standardised environment, with the risk that automated Puppet runs may introduce bugs or revert hot fixes. This concern was apparent at Puppetcamp this morning, when Rafael Brito from the New York Stock Exchange gave an informative presentation about his experience of using Puppet to build machines for their live platform. What particularly struck me was that although his team puts a lot of effort into creating a standard environment, the current culture is that operations teams on the ground can and should make live changes to boxes, and that these changes may not ever make it back into Puppet.

I asked Rafael how frequently Puppet runs on the live machines, to ensure the state of each machine is kept the same, and according to standards. He told me ‘once a quarter’. I think it’s fair to say that in a context such as this, Puppet is not really being used as a config management tool - it’s being used as part of the build process to produce a standard image, which is then being managed in the traditional way.

I fully understand the motivation behind this approach. This is a very high profile application, and there’s a worry that mistakes in the Puppet manifest could accidentaly be rolled out to the live site and cause a massive problem. Their situation is also complicated by having a large, multi-tiered operations team, across several countries, many of whom who don’t know how to use Puppet. The approach they have settled on is to allow engineers to make chanegs to the live site, but to be aware that these machines will effectively be refreshed every quarter, and so there’s a risk that these changes may be lost. This places the burden of maintaining the standard on the team writing and maintaining the Puppet manifests to ensure that changes made by the operations team are folded in.

The trouble with approach is that it means that the de facto standard is always the current state of the machines, as modified by the operations team. If the Puppet run undoes some fixes applied by the operations team, Puppet is placed in the position of standards violator - that’s not a great place to be.

Once issue we often come across with clients who have started to use Puppet occurs when a change rolled out by Puppet breaks the system. In this situation Puppet advocates are in a weak negotiating position - we can argue that the changes should have been made in Puppet, but when the site is down, and money is being lost, somehow that argument doesn’t win much support. The fact is that when a mistake is made, Puppet gets blamed - it broke the site. Sadly this can even result in pressure to stop using this unstable, unreliable tool.

I’d like to turn this on its head. We all agree that we need a standard or set of standards to which the live site must adhere. Let’s make Puppet the enforcer of this standard, and never the violator. This standard can be designed, tested, approved and signed off. This is the standard - we don’t diverge from it. Now we can set up a mechanism for testing the site against the standard, so we know if the standard has ever been broken.

A great way to do this is simply to run Puppet in noop mode, so it doesn’t make the changes, but simply reports what changes it would make if it were to run in live mode. If our standard is being adhered to, Puppet should usually report that it wouldn’t make a change. If Puppet reports that it would make a change, this should only ever be because that change has been approved by, for example, a change advisory board. This mechanism, therefore, will alert us as to whether the machine is out of sync with the stand, what changed, and how Puppet intends to revert the system to the agreed standard. Running this process with reasonable frequency will give us a pretty granular report into when changes we made, and could even be tied into system logs to identify the most likely source of the change. The output of the process could be parsed and monitored, and alerts raised to senior stakeholders, and emails reports sent out, detailing the change that has occurred.

This way we get to play the role of enforcer - we can say: Hey look - this change has happened - we can change it back again, and we should, but we need to find who made the change, why they made it, make it in Puppet, then back it out and apply it properly. We then need to identify and educate the policy breakers, and find out what happened.

This approach, I think, walks the line between the kind of careful conservatism that a production site needs, and the desire to make use of the power of Puppet to guarantee a consistent environment.

Of course this approach will also catch the other risk - the risk that someone has committed a change to Puppet which may get rolled out to live machines when not wanted. Again, there needs to be a policy to protect this. Puppet changes in a live envirnment of this nature should not be made unless tested. This means that Puppet chanegs should be made in a testing branch, and confirmed against a test environment, and only merged into the production repository when the testing has been completyed to everyone’s satisfaction, and, in some environemnts, only rolled out following the appropriate change control mechanism. An hourly noop run, monitored, would immediatey alert if someone had managed to get a change into the love puppet manifest without following the correct procedure.

Of course not running the puppet daemon automatically brings with it a different set of management challenges - such as ensuring all machines are uptodate, and how to minimise the time taken to bring the machines into sync. My answer to this is to orchestrate your puppet clients from a central location, rather than to run your puppet clients in daemon mode. I’ll cover this in a future article.

27 May 2010 ~ Comments Off

Devops homebrew part deux

This is the second part to the devops homebrew post.

I forgot couple things in my first post so here are couple other observations

Change is an ongoing process

All the changes I talked about in the first post took a long time. It took more than a year to get issues assessed, discussed, designed, implemented and tested so don't expect quick progress. It's like an open heart surgery where you don't have time stop everything and start from scratch.

No hardcoded paths

Perhaps this one should be obvious however it is really important to make the app relocatable ie. app should assume all the files it needs are within it's container. This means that every file reference should be relative to the base container directory e.g. all the WARs and configuration files should be placed in /run/base and startup script would pass that as a variable ie. -DBASEDIR=/run/base. Application should then use BASEDIR instead of /run/base.

Tools, tools, tools

One of the critical operations responsibilities is providing and building tools for use by other groups such as technical support, development, QA etc. This goes beyond using tools such as configuration management and deployment but also building tools that enable other groups to do their jobs more effectively. For instance at one job we used to interface to hundreds of external LDAP/IMAP sources for authentication/authorization purposes. This was fraught with problems since often these services would e.g. misconfigure firewalls (not whitelist the right IP), have expired or self-signed SSL certificates, use wrong LDAP base DNs etc. This would chew up a lot of professional services, dev and ops time since looking at the application logs often gave incomplete answers. Also it could take couple iterations to fix the problem chewing up even more time. We ended up building a simple web page that enabled professional services to quickly validate the service ie. does DNS resolve, can I open up a TCP connection to the target port, is SSL certificate expired etc. This greatly reduced work load and time to resolution. In another job technical support would often need production settings however due to compliance reasons couldn't have unfettered access to the systems. For them we built a web app that allowed read-only view to the needed settings. I'm sure you can think of other cases where little automation can yield you huge efficiencies.

Use underpowered QA environments

This may be controversial since lots of people are of the opinion that you should try to have as close to the exact replica of production in QA. This is true if you are doing performance tests however if you have an underpowered environment some issues are likely to crop up that otherwise wouldn't. It is very hard to simulate production load so having underpowered environments gives you valuable data points. For example our primary QA environment ran on couple virtualized servers with modest disk space allocation ie. 10 GB. On more than one occasion we caught serious code deficiencies when the growing query log (turned on in QA) triggered low disk space alerts. If we had bigger disks we may have missed these. This doesn't preclude having a separate environment just for running performance test just use the underpowered environment for everything else.

Dev vs ops

There is often conflict between dev and ops due to stereotypes, poor communication but very often misaligned business goals. For instance I have very often seen/experienced conflict with devs when they were under intense pressure to deliver a feature on a tight deadline. This often happens in startups that cater to large businesses, universities or government organizations where a large sales deal is contingent on a particular feature being implemented. It leads to poor implementation, QA, production issues etc. which coupled with poor division of labor causes frustration and resentment. Being woken up numerous times in the middle of night due to a production issue quickly wears people out. Therefore it is important to strike a balance between ops and dev goals and overall business goals.

One of the possible approaches is to get together and discuss following issues

  • Ops, dev and QA should jointly assess new product functionality and how it affects each of these groups. Very often product management and sales and marketing will discuss new features only with dev who may not appreciate the difficulty of certain ops decisions.
  • Division of responsibility - discuss whose responsibility is to fix things when they break. There is a spectrum here where ops can do first level troubleshooting then hand it off to developers to developers running and deploying in production and ops providing a supportive role running services and tools that enable the application
  • Off hours coverage - this is probably the most contentious one since no one likes being woken up at night however developers should be on hook for "pager duty". It doesn't have to be regularly but at least once in a while. That is really only way for them to walk in ops shoes. For some organizations this may be a non-issue since their stuff never breaks in off hours ;-) .
  • Ops should involve devs in running the production by educating them about monitoring and performance gathering systems so that they can see effect of their coding first hand. For instance you can implement "monitoring duty" where each week someone different from either dev or ops team is tasked to review performance metrics looking for things that are out of whack.
  • Discuss how you can make each other life's easier. There are always areas where you can complement each others skills and create something that helps everyone.
  • Most important don't forget that a dose of humility goes a long way :-) .

27 May 2010 ~ Comments Off

Building Virtual Appliances

Johan from Sizing Servers asked me if I could talk about my experiences on building (virtual) appliances at their Advanced Virtualization and Hybrid Cloud seminar . Off course I said yes ..

Slides are below ... Enjoy ..

Trackback URL for this post:

http://www.krisbuytaert.be/blog/trackback/1005

26 May 2010 ~ Comments Off

A Chef Definition for Managing Iptables Rules

I wrote a Chef definition for managing Iptables rules a while back, but until now, it has only existed publicly as gist on github. I figured I’d post it here as an example of how to write your own Chef definitions.

The first thing I did was create an iptables cookbook with a default recipe that simply ensures that the iptables package is installed:

package "iptables" do
  package_name "iptables"
end

Next, I used the definitions documentation on the Opscode wiki to write the actual definition. What I’m basically doing here is accepting a few Iptables-specific parameters (table, chain, options) and dynamically building the execute resources to run the appropriate Iptables commands:

define :iptables_rule, :action => :create, :table => "filter", :chain => nil, :options => nil do
 
  include_recipe "iptables"
 
  if params[:table].empty?
    raise ArgumentError, "Missing required argument: table"
  end
  if params[:chain].nil? or params[:chain].empty?
    raise ArgumentError, "Missing required argument: chain"
  end
  if params[:options].nil? or params[:options].empty?
    raise ArgumentError, "Missing required argument: options"
  end
 
  iptables_bin = "/sbin/iptables"
  rule_id = "Chef Rule: #{params[:name]}"
  comment = "-m comment --comment \"#{rule_id}\""
 
  if params[:action] == :delete
    execute "delete_iptables_rule-#{params[:name]}" do
      command "#{iptables_bin} --table #{params[:table]} -D #{params[:chain]} #{params[:options]} #{comment}"
      only_if "#{iptables_bin} --table #{params[:table]} -S #{params[:chain]} | /bin/grep \"#{rule_id}\""
    end
  else
    execute "create_iptables_rule-#{params[:name]}" do
      command "#{iptables_bin} --table #{params[:table]} -A #{params[:chain]} #{params[:options]} #{comment}"
      not_if "#{iptables_bin} --table #{params[:table]} -S #{params[:chain]} | /bin/grep \"#{rule_id}\""
    end
  end
end

So now when I want to manage an Iptables rule with chef, I can just do something like this:

iptables_rule "jetty_port_80" do
  table "nat"
  chain "PREROUTING"
  options "--proto tcp --dport 80 --jump REDIRECT --to-port 8080"
end

25 May 2010 ~ Comments Off

Hardening Apache – Short Review

I've had Hardening Apache sitting on my shelves for over five years (Sep 2004 or so Amazon tells me). While I can remember dipping in to it for the Apache chroot chapter it never seemed to progress to the top of the pile, and now I'm cleaning out a lot of my old books I decided to finally give it a chance.

The book is very well written, covers a good range of subjects from building apache from source to adding extra security modules and checking its running state. Those are all good points and if I'd read the book when it came out I'd give it a very decent score, unfortunately I waited to read it.

This is a book that hasn't aged well. The version numbers of apache mentioned, the last update times of the modules (and how many of them have fallen in to the pit of being unmaintained) and the general style of the shell scripts all just come across as very dated and prevent me from recommending this book

Well written but ravaged by time - where's the second edition?

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

25 May 2010 ~ Comments Off

Hardening Apache – Short Review

I've had Hardening Apache sitting on my shelves for over five years (Sep 2004 or so Amazon tells me). While I can remember dipping in to it for the Apache chroot chapter it never seemed to progress to the top of the pile, and now I'm cleaning out a lot of my old books I decided to finally give it a chance.

The book is very well written, covers a good range of subjects from building apache from source to adding extra security modules and checking its running state. Those are all good points and if I'd read the book when it came out I'd give it a very decent score, unfortunately I waited to read it.

This is a book that hasn't aged well. The version numbers of apache mentioned, the last update times of the modules (and how many of them have fallen in to the pit of being unmaintained) and the general style of the shell scripts all just come across as very dated and prevent me from recommending this book

Well written but ravaged by time - where's the second edition?

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!