Hi! Welcome...

Syndication of blogs and tweets by users of the Freenode ##infra-talk IRC channel

03 January 2012 ~ Comments Off

RESTful way to manage your databases

I have a need in my development environment to easily create/drop mySQL databases and users. Initially I was gonna implement a simple hacky HTTP GET method but was dissuaded by Ben Black from doing so. He suggested I write a proper RESTful interface. Without further ado I present to you dbrestadmin

https://github.com/vvuksan/dbrestadmin

It is my first foray into writing RESTful services so things may be rough around the edges. However it allows you to do following

  • manage multiple database servers
  • create/drop databases
  • list databases
  • create/drop users
  • list users
  • give user grants
  • view grants given to the user
  • view database privileges on a particular database given to a user

For example need to create a database called testdb on dbserver ID=0 use this cURL command

curl -X POST http://myhost/dbrestadmin/v1/databases/0/dbs/testdb

Create a user test2 with password test

curl -X POST "http://localhost:8000/dbrestadmin/v1/databases/0/users/test2@localhost" -d "password=test"

Give test2 user all privileges on testdb

curl -X POST "http://localhost:8000/dbrestadmin/databases/0/users/test2@'localhost'/grants" -d "grants=all privileges&database=testdb"

There is more. You can see all of the methods here

https://github.com/vvuksan/dbrestadmin/blob/master/API.md

Improvements and constructive criticism welcome

03 January 2012 ~ Comments Off

Chef Report Handler for Growl

A few weeks ago, I listened to the Changelog Podcast episode featuring Chris Forsythe, lead of the Growl project. I actually don’tWdidn’t use Growl for a long time, because I really disliked notifications of any kind, as they are distracting. However, I do appreciate the project, and supporting them by purchasing Growl on the App Store seemed totally reasonable.

Of course, buying the app means it was installed on the spot. I always resisted it in the past because it was bundled with so many apps, but I don’t want unsolicited software to show up on my computer. Now, it seemed a bit more natural, and I made a few configuration tweaks (I should put those into Chef…). I’m actually happy to use it now, especially since it has a nice network accessible API.

Growlnotify

Shortly after I started using Growl on my work system, I looked into ways I could get notifications fired off from the command-line after long running processes finished. In particular, I wanted to get a notification that knife ec2 server create was done. I found the growlnotify program, which is available via homebrew. This is quite a nifty tool, and I set it to use immediately, doing things like this:

1
2
3
4
5
knife ec2 server create \
  -f m1.small -I $lucid_small -x ubuntu \
  -r 'role[base],role[webserver]' && \
growlnotify -m "Finished launching 1 instance" || \
growlnotify -m "failed to launch instance"

I could kick off the server creation, switch focus to another workspace, and then know via growl if the instance was created.

Chef Handler

As many know, I manage my workstations with Chef. Chef has a pretty cool exception and report handler API that has a lot of flexibility. I thought it would be fun to throw together a simple report handler that would send a growl notification after a Chef run. In this case, it will report the elapsed time of the run if it was successful, or report an exception if it failed.

Using the handler is pretty straightforward. Install the chef-handler-growl Gem, then configure chef-client (or solo).

1
2
3
require "chef/handler/growl"
report_handlers << Chef::Handler::Growl.new
exception_handlers << Chef::Handler::Growl.new

Then run Chef, and see something like this:

Chef Handler Growl

The handler is available as a RubyGem. You can also view the source. I created issues on the GitHub project for the two items on the roadmap, too.

02 January 2012 ~ Comments Off

Switching to DNSimple

Reminder: this blog reflects my opinions and thoughts, and not those of my employer, Opscode, Inc.

Like any good sysadmin, I have my own domain for email and other purposes. I actually have had a couple, but this post is about my current one. I originally set it up through Google Apps a couple years ago, including registering the new domain with Google Apps’ default registrar, GoDaddy. For the most part, it was pretty simple and painless to set up, including private registration via Domains by Proxy. Yay!

However, as I automated more components of my home network with Chef, I found the lack of API driven DNS rather frustrating. At last count, I had 15 distinct networked devices, counting all the computers, consoles, mobile devices, etc. This does not count the virtual machines that I manage as a part of my daily job in doing Chef cookbook development and testing, which should all have their own DNS entries, since I’ll access them over the network, and remembering IPs is ridiculous.

Internal DNS Server

I use DJB’s tinydns as my DNS server, and it is automated with the Opscode Chef Cookbook. The first incarnation of this setup was a single monolithic template file containing all the entries in my local network zone, delivered by the djbdns::tinydns-internal recipe, like so:

1
2
3
4
5
template "#{node[:djbdns][:tinydns_internal_dir]}/root/data" do
  source "tinydns-internal-data.erb"
  mode 0644
  notifies :run, resources("execute[build-tinydns-internal-data]")
end

This was great, and simple to manage for this single purpose setup. At some point, I wrote cookbooks for unbound and powerdns, as I was evaluating whether one or the other might be easier to modularize. In the process, I created a data bag of all my DNS entries that I could step through in templates, so I could use the same data without caring which software was going to consume it. In the end, I extended the djbdns cookbook with a lightweight resource and provider, and added usage to the djbdns::tinydns-internal recipe like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
dns = data_bag_item("djbdns", node[:djbdns][:domain].gsub(/./, "_"))
#
file "#{node[:djbdns][:tinydns_internal_dir]}/root/data" do
  action :create
end
#
%w{ ns host alias }.each do |type|
  dns[type].each do |record|
    record.each do |fqdn,ip|
      #
      djbdns_rr "#{fqdn}.#{dns['domain']}" do
        cwd "#{node[:djbdns][:tinydns_internal_dir]}/root"
        ip ip
        type type
        action :add
        notifies :run, "execute[build-tinydns-internal-data]"
      end
      #
    end
  end
end

The data bag itself looks something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
{
  "id": "int_example_com",
  "domain": "example.com",
  "ns": [
    { "ns: "127.0.0.1" }
  ],
  "alias": [
    { "gw":         "10.10.20.1" },
    { "smb":        "10.10.20.20" },
    { "files":      "10.10.20.20" },
    { "apt":        "10.10.20.120" },
    { "yum":        "10.10.20.120" }
  ],
  "host": [
    { "tavern":          "10.10.20.1" },
    { "cask":            "10.10.20.20" },
    { "cider":           "10.10.20.101" },
    { "merlot":          "10.10.20.103" },
    { "bourbon":         "10.10.20.104" },
    { "iphone":          "10.10.20.105" },
    { "doppelbock":      "10.10.20.106" },
    { "ipad":            "10.10.20.107" },
    { "wii":             "10.10.20.108" },
    { "xbox":            "10.10.20.109" },
    { "htpc":            "10.10.20.110" },
    { "virt1test":       "10.10.20.120" }
  ]
}

I am pretty pleased with this approach in that this data can be used no matter what DNS resolver software I choose. While this is great for the more static part of my network, as I mentioned I do have some dynamic usage where I create new virtual machines, and I really want them to register themselves in DNS automatically.

Enter DNSimple

A few months ago I decided to switch over to DNSimple. The service was compelling over the alternatives for a few reasons:

  • Low cost ($3/mo) for my size of account.
  • Very simple interface
  • API for managing records (!)
  • Reputation for great service

Darrin Eden wrote a cookbook for automatically creating records through the API, too, so half my work for automating with Chef was already done!

However, for various reasons I procrastinated the switchover. After all, my existing solution worked ok for my purposes. Then after seeing GoDaddy show up on the SOPA supporters list, and being one a contributing author to the legislation(*), I decided that was the last straw and I busted a move to finish the switch.

Honestly, from the DNSimple side, it couldn’t have been a better experience. They have one-click services for managing DNS records for a variety of common services - including Google Apps! It took some time and hassle to move my domain out of GoDaddy, since their interface is rather clunky, and I had to unprotect things through Domains by Proxy to make the move, but after a couple hours everything was fine. DNSimple has some tips for migrating, no matter who your current registrar is.

Now for the truly fun part!

Automated DNS with Chef

Using the dnsimple cookbook is very straightforward. You create an “A” record like this:

1
2
3
4
5
6
7
8
9
10
dnsimple_record "cask.example.com" do
  name "cask"
  domain "example.com"
  content "10.10.20.20"
  type "A"
  action :create
  username node[:dnsimple][:username]
  password node[:dnsimple][:password]
  domain node[:dnsimple][:domain]
end

Yes, that is a private network IP, and yes it is going to be registered in public DNS. It really doesn’t matter that much in my (and others’) opinion. Especially given that zomg, I just created a DNS entry with Chef!

By default, the cookbook does assume, and use, node attributes for storing the username and password. This can be set by a role, but it means that all nodes will have the data. For my use, I decided to put these values in a data bag, and because they are sensitive, I used an encrypted data bag. I also wanted to reuse the data bag from my earlier DNS example, so I wrote a recipe like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
dnsimple = encrypted_data_bag_item("secrets", "dnsimple")
dns = data_bag_item("dns", "int_example_com")
#
%w{ host alias }.each do |type|
  dns[type].each do |record|
    record_type = type =~ /^host$/ ? "A" : "CNAME"
    record.each do |hostname,ip|
      #
      dnsimple_record "#{hostname}.#{dns['domain']}" do
        name hostname
        content ip
        type record_type
        action :create
        username dnsimple['username']
        password dnsimple['password']
        domain dnsimple['domain']
      end
      #
    end
  end
#
end

I put that recipe in my “dnsserver” role, ran Chef, and boom, all my DNS entries are updated on systems I don’t have to manage, and all around the world.

% host cask.int.example.com 8.8.8.8
Using domain server:
Name: 8.8.8.8
Address: 8.8.8.8#53
Aliases:

cask.int.example.com has address 10.10.20.20

What a wonderful redundant distributed key value store :-).

Note that the encrypted_data_bag_item method used in the recipe is in a cookbook library. I wrote about that in an earlier blog post. It is pretty simple:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class Chef
  class Recipe
    #
    def encrypted_data_bag_item(bag, item, secret_file =
        Chef::EncryptedDataBagItem::DEFAULT_SECRET_FILE)
      DataBag.validate_name!(bag.to_s)
      DataBagItem.validate_id!(item)
      secret = EncryptedDataBagItem.load_secret(secret_file)
      EncryptedDataBagItem.load(bag, item, secret)
    rescue Exception
      Log.error("Failed to load data bag item: #{bag.inspect} #{item.inspect}")
      raise
    end
    #
  end
end

Drawbacks

While conveniently reusing the data I already had for my DNS entries, the dnsimple_record provider does take about a second on my internet connection for each entry to check if it’s there. This makes the DNS server’s Chef client run take over a minute, where it used to be less than 12 seconds. Many of the entries that are in the DNS data bag are on systems that can add their own records, and will soon once I refactor things a bit.

Other Uses

A number of the entries in the DNS data bag are CNAMEs for services that run on my home LAN server. I have internal services like Netatalk/Time Machine, Samba, or Munin. I also have external services like OpenVPN, SSH and Teamspeak. I’ll add DNSimple records for each of these so the recipes can automatically register new DNS entries, eliminating a step in bringing up a new service.

Open Source is Awesome

Chef is open source, of course, as is the dnsimple cookbook that I’m using. While working with the cookbook as describe above over the last couple days, I made some improvements and I sent a pull request, which has been merged and released. Thanks Darrin!

If you use Chef and are looking for an API driven way to manage DNS entries for your systems, I strongly recommend DNSimple as a provider, and the dnsimple cookbook to tie it all together.

(*) This isn’t a political-blog-soap-box, but this really was the final motivator.

01 January 2012 ~ Comments Off

Goodbye, 2011.

This year's been pretty good, but the last two months were pretty lame.

In the last six weeks, I found out Caramel has lymphoma, got unemployed, and had emergency surgery to remove my appendix on Christmas Day. The unemployment caused me to lose an in-progress mortgage refinance.

I'll pick up the mortgage thing once I remedy the employment problem, but I'm staying quite happily unemployed until after my kid is born - should be any day now!

Most of my career-growing moves were outside of work: at meetups, in open source efforts, or in networking with folks on IRC or twitter. Lots of awesome folks out there, so go introduce yourself. Don't be a dick. :)

I didn't write much on this site, but mainly, that was due to an increase in my activities on IRC and twitter. Most of what I published this year was code and was less writing about said code. I'd like to fix that, though.

This years successes were topped by two new major projects, fpm and logstash. I also released some major improvements to xdotool and other tools.

The current implementation of logstash isn't very old, but prototypes, hacks, and other incarnations of pretty much the same thing date back to at least 2005 and probably earlier. This project has been a long-time-coming, and Pete Fritchman and I have been talking about logstash for years, so it's nice to finally have some code shipped and a community building around it.

FPM had a crazy positive response. I wrote it as a hack, and it's used all over the place now. Bonus that people are contributing patches and other improvements as well.

Sysadvent was another excellent success, the end of which marked the 4th year and 100th article posted to the project. It is awesome seeing such community involvement from so many different authors.

This year also cemented my move to git from svn. Why? Github, mostly, and not really the features of git itself. Sharing code and patches is so much easier on github than it is with other services.

I went to CarolinaCon and OSCON to talk about logstash. I also went to DevOps Days Mountain View and gave a lightning talk on logstash.

My OSCON talk was overflowing with people standing at the back of the room, etc; it went awesomely. I've also been able to do lunchtime logstash presentations at places like Square and others. I also gave talks at BayLISA meetings. It was a good year for getting out of the house and talking about code.

I tried to get a count of how much code I'd written this year, but I had lots of web-based projects that included third-party stuff like jquery, and I'm too lazy to pick through the results and trim that stuff out. I'm up to about 70 different projects on github now, some useful; some not; all fun!

Looking forward to 2012 :)

31 December 2011 ~ Comments Off

What is devops ?

I`m parsing the responses of the Deploying Drupal survey I started a couple of months ago (more on that later)

One of the questions in the survey is "What is devops" , apparently when you ask a zillion people (ok ok, just a large bunch of Tweeps..), you get a large amount of different answers ranging from totally wrong to spot on.

So let's go over them and see what we can learn from them ..

The most Wrong definition one can give is probably :

  • A buzzword

I think we've long passed the buzzword phase, definitely since it's not new, it's a new term we put to an existing practice. A new term that gives a lot of people that were already doing devops , a common word to dicuss about it. Also lots of people still seem to think that devops is a specific role, a job description , that it points to a specific group of people doing a certain job, it's not . Yes you'll see a lot of organisations looing for devops people, and giving them a devops job title. But it's kinda hard to be the only one doing devops in an organisation.

I described one of my current roles as Devops Kickstarter, it pretty much describes what I`m doing and it does contain devops :)

But devops also isn't

  • The connection between operations and development.
  • people that keep it running
  • crazy little fellows who find beauty in black/white letters( aka code) rather than a view like that of Taj in a full moon light.
  • the combination of developer and operations into one overall functionality
  • The perfect mixture between a developer and a system engineer. Someone who can optimize and simplify certain flows that are required by developers and system engineers, but sometimes are just outside of the scope for both of them.
  • Proxy between developer and management
  • The people in charge of the build/release cycle and planning.
  • A creature, made from 8-bit cells, with the knowledge of a seasoned developer, the skillset of a trained systems engineer and the perseverence of a true hacker.
  • The people filling the gap between the developer world and the sysadmin world. They understand dev. issues and system issues as well. They use tools from both world to solve them.

Or

  • Developers looking at the operations of the company and how we can save the company time and money

And it's definitely not

  • Someone who mixes both a sysop and dev duties
  • developers who know how to deploy and manage sites, including content and configuration.
  • I believe there's a thin line line between Ops and Devs where we need to do parts of each others jobs (or at least try) to reach our common goal..
  • A developer that creates and maintains environments tools to help other developers be more successful in building and releasing new products
  • Developers who also do IT operations, or visa versa.
  • Software developers that support development teams and assist with infrastructure systems

So no, developers that take on systems roles next to their own role and want to go for NoOps isn't feasable at all ..you really want collaboration, you want people with different skillsets that (try to) understand eachoter and (try to) work together towards a common goal.

Devops is also not just infrastructure as code

  • Writing software to manage operations
  • system administrators with a development culture.
  • Bring code management to operations, automating system admin tasks.
  • The melding of the art of Systems Administration and the skill of development with a focus on automation. A side effect of devops is the tearing down of the virtual wall that has existed between SA's and developers.
  • Infrastructure as code.
  • Applying some of the development worlds techniques (eg source control, builds, testing etc) to the operations world.
  • Code for infrastructure

Sure infastructure as code is a big part of the Automation part listed in CAMS, but just because you are doing puppet/chef doesn't mean you are doing devops.
Devops is also not just continous delivery

  • A way to let operations deploy sites in regular intervals to enable developers to interact on the systems earlier and make deployments easier.
  • Devops is the process of how you go from development to release.

Obviously lots of people doing devops also often try to achieve Continuous delivery, but just like Infrastructure as Code it devops is not limited to that :)

But I guess the truth is somewhere in the definitions below ...

  • That sweet spot between "operating system" or platform stack and the application layer. It is wanting sys admins who are willing to go beyond the normal package installers, and developers who know how to make their platform hum with their application.
  • Breaking the wall between dev and ops in the same way agile breaks the wall between business and dev e.g. coming to terms with changing requirements, iterative cycles
  • Not being an arsehole!
  • Sysadmin best-practise, using configuration as code, and facilitating communication between sysadmins and developers, with each understanding and participating in the activities of the other.
  • Devops is both the process of developers and system operators working closer together, as well as people who know (or who have worked in) both development and system operations.
  • Culture collaboration, tool-chains
  • Removing barriers to communication and efficiency through shared vocabulary, ideals, and business objectives to to deliver value.
  • A set of principles and good practices to improve the interactions between Operations and Development.
  • Collaboration between developers and sysadmins to work towards more reliable platforms
  • Building a bridge between development and operations
  • The systematic process of building, deploying, managing, and using an application or group of applications such as a drupal site.
  • Devops is collaboration and Integration between Software Development and System Administration.
  • Devops is an emerging set of principles, methods and practices for communication, collaboration and integration between software development (application/software engineering) and IT operations (systems administration/infrastructure) professionals.[1] It has developed in response to the emerging understanding of the interdependence and importance of both the development and operations disciplines in meeting an organization's goal of rapidly producing software products and services.
  • bringing together technology (development) & content (management) closer together
  • Making developers and admins understand each other.
  • Communication between developers and systems folk.
  • a cultural movement to improve agility between dev and ops
  • The cultural extension of agile to bring operations into development teams.
  • Tight collaboration of developers, operations team (sys admins) and QA-team.

But I can only conclude that there is a huge amount of evangelisation that still needs to be done, Lots of people still don't understand what devops is , or have a totally different view on it.

A number of technology conferences are and have taken up devops as a part of their conference program, inviting experienced people from outside of their focus field to talk about how they improve the quality of life !

There is still a large number of devops related problems to solve, so that's what I`ll be doing in 2012

29 December 2011 ~ Comments Off

Installing Vagrant, on Ubuntu Natty

(Warning some Ubuntu ranting ahead)

  1. apt-get install virtualbox-ose
  2. apt-get install rubygems
  3. gem install vagrant

That's what I assumed it would take me to install vagrant on a spare Ubuntu (Natty) laptop.

Well it's not. after that I was greeted with some weirdness.

  1. $vagrant
  2. vagrant: command not found...

Yet gem list --local showed the vagrant gem installed.

  1. $ruby
  2. ruby: command not found

I looked twice, checked again and indeed it seems you can install rubygems on natty with no ruby installed #dazedandconfused

So unlike other distro's on Ubuntu doesn't add the rubygems binary path to it's default path
After adding that to my .bashrc things started working better.

The active reader has noticed that by now half of the Twittersphere was pointing me to the already implemented
above solution and the other half was telling me to not install rubygems using apt-get, or to use rvm for all my rubygem troubles

Apart from the point that if you need tools to like rvm to fix things that are fundamentally broken, the fact is that joe average java developer doens't want to be bothered with RubyGem hell , he just wants to do apt-get install Vagrant and get on with his real work, and that's exactly what I'd expect from Linux for human beings

I'd expect any junior guy to be able to go to vagrantup.com read the 4 commands on the main page and be up and running
Coz that's how it works on my Bleeding Edge Enterprise Development Distro, the one I usually would not advise those people (and my mother) to use.

27 December 2011 ~ Comments Off

Puppet Internals: the parser

As more or less promised in my series of post about Puppet Extension Points, here is the first post about Puppet Internals.

The idea is to produce a series of blog post about each one about a Puppet sub-system.

Before starting, I first want to present what are the various sub-blocks that forms Puppet, or Puppet: the Big Picture:

Puppet the Big Picture

I hope to be able to cover each of those sub-blocks in various posts, but we’ll today focus on the Puppet Parser.

The Puppet Parser

The Puppet Parser responsibility is to transform the textual manifests into a computer usable data structure that could be fed to the compiler to produce the catalog. This data structure is called an AST (Abstract Syntax Tree).

The Puppet Parser is the combination of various different sub-systems:

  • the lexer
  • the racc-based parser
  • the AST model

The Lexer

The purpose of the lexer is to read manifests characters by characters and to produce a stream of tokens. A token is just a symbol (combined with data) that represents a valid part of the Puppet language.

For instance, the lexer is able to find things such (but not limited to):

  • reserved keywords (like case, class, define…)
  • quoted strings
  • identifiers
  • variables
  • various operators (like left parenthesis or right curly braces…)
  • regexes

Let’s take an example and follow what comes out of the lexer when scanning this manifest:

1
$variable = "this is a string"

And here is the stream of tokens that is the outcome of the lexer:

1
2
3
:VARIABLE(VARIABLE) {:line=>1, :value=>"variable"}
:EQUALS(EQUALS) {:line=>1, :value=>"="}
:STRING(STRING) {:line=>1, :value=>"this is a string"}

As you can see, a puppet token is the combination of a symbol and a hash.

Let’s see how we achieved this result. First you must know that the Puppet lexer is a regex-based system. Each token is defined as a regex (or a stock string). When reading a character, the lexer ‘just’ checks if one of the string or regex can match. If there is one match, the lexer emits the corresponding token.

Let’s take our example manifest (the variable assignment above), and see what happens in the lexer:

  1. read $ character
  2. no regex match, let’s read some more characters
  3. read ‘variable’, still no match, our current buffer contains $variable
  4. read ’ ‘, oh we have a match against the DOLLAR_VARIABLE token regex
  5. this token is special, it is defined with a ruby block. When one of those token is read and matched, the block is executed.
  6. the block just emits the VARIABLE("variable") token

The lexer’s scanner doesn’t try every regexes or strings, it does this in a particular order. In short it tries to maximize the length of the matched string, in a word the lexer is greedy. This helps removing ambiguity.

As seen in the token stream above, the lexer associates to each token an hash containing the line number where we found it. This allows error messages in case of parsing error to point to the correct line. It also helps puppetdoc to associate the right comment with the right language structure.

The lexer also supports lexing contexts. Some tokens are valid in some specific contexts only, this is true especially when parsing quoted strings for variables interpolation.

Not all lexed tokens emit tokens for the parser. For instance comments are scanned (and stored in a stack for puppetdoc use), but they don’t produce a token for the parser: they’re skipped.

Finally, the lexer also maintains a stack of the class names it crossed. This is to be able to find the correct fully qualified name of inner classes as seen in the following example:

Fully qualified class names by keeping a stack of names during lexing
1
2
3
4
5
6
7
class outer {
  class middle {
    class inner {
      # we're in outer::middle::inner
    }
  }
}

If you want more information about the lexer, check the Puppet::Parser::Lexer class.

The parser

The parser is based on racc. Racc is a ruby port of the good old Yacc. Racc, like Yacc, is what we call a LALR parser.

The ‘cc’ in Racc means ‘compiler of compiler’. It means in fact that the parser is generated from what we call a grammar (and for LALR parsers, even a context free grammar). The generated parser is table driven and consumes tokens one by one. Those kind of parsers are sometimes called Shift/Reduce parsers.

This grammar is written in a language that is a machine readable version of a Backus-Naur Form or “BNF”.

There are different subclasses of context free grammars. Racc works best with LR(1) grammars, which means it must be possible to parse any portion of an input string with just a single token lookahead. Parsers for LR(1) grammars are deterministic. This means that we only need a fixed number of lookahead tokens (in our case 1) and what we already parsed to find what next rule to apply.

Roughly it does the following:

  1. read a token
  2. shift (this mean put the token on the stack), goto 1. until we can reduce
  3. reduce the read tokens with a grammar rules (this involves looking ahead)

We’ll have a deeper look in the subsequent chapters. Meanwhile if you want to learn everything about LALR Parsers or parsers in general, I highly recommend the Dragon Book

The Puppet Grammar

The Puppet Grammar can be found in lib/puppet/parser/grammar.ra in the sources. It is a typical racc/yacc grammar that

  • defines the known tokens (those matches the lexed token names)
  • defines the precedence of operators
  • various recursive rules that form the definition of the Puppet languages

Let’s have a look to a bit of the Puppet Grammar to better understand how it works:

Excerpt of the Puppet Grammar
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
statement_or_declaration:    resource
  | collection
  | assignment
  | casestatement
  | ifstatement_begin
...
assignment:     VARIABLE EQUALS expression {
  variable = ast AST::Name, :value => val[0][:value], :line => val[0][:line]
  result = ast AST::VarDef, :name => variable, :value => val[2], :line => val[0][:line]
}
...
expression:   rvalue
  | hash
  ...

rvalue:       quotedtext
  | name
  ...

quotedtext: STRING  { result = ast AST::String, :value => val[0][:value], :line => val[0][:line] }

So the closer look above shows 4 rules:

  • a non-terminal rule called statement_or_declaration which is an alternation of sub-rules
  • a terminal rule called assignment, with a ruby code block that will be executed when this rule will be reduced.
  • a non terminal rule called expression
  • a terminal rule quotedtext with a ruby block

To understand what that means, we could translate those rules by:

  1. A statement or declaration can be either a resource or a collection, or an assignement
  2. An assignment is when the parser finds a VARIABLE token followed by an EQUALS token and an expression
  3. An expression can be a rvalue or an hash (all defined later on in the grammar file)
  4. A rvalue can be among other things a quotedtext
  5. And finally a quotedtext can be STRING (among other things)

You can generate yourself the puppet parser by using racc, it’s as simple as:

  1. Installing racc (available as a gem)
  2. running: make -C lib/puppet/parser

This rebuilds the lib/puppet/parser/parser.rb file.

You can generate a debug parser that prints everything it does if you use -g command-line switch to racc (check the lib/puppet/parser/makefile and define @@yydebug = true in the parser class.

The parser itself is controlled by the Puppet::Parser::Parser class which is in lib/puppet/parser/parser_support.rb. This class is requiring the generated parser (both share the same ruby class). That means that the ruby blocks in the grammar will be executed in the context of an instance of the Puppet::Parser::Parser class. In other words, you can call from the grammar, methods defined in the parser_support.rb file. That’s the reason we refer to the ast method in the above example. This method just creates an instance of the given class and associates some context to it.

Let’s go back a little bit to the reduce operation. When the parser is reducing, it pops from the stack the reduced tokens and pushes the result to the stack. The result can either be what ends in the result field of the grammar ruby block or the result of the reduction of the mentioned rule (when it’s a non-terminal one).

In the ruby block of a terminal rule, it is possible to access the tokens and rule results currently parsed in the val array. To get back to the assignment statement above, val[0] is the VARIABLE token, and val[2] the result of the reduction of the expression rule.

The AST

The AST is the computer model of the parsed manifests. It forms a tree of instances of the AST base class. There are AST classes (all inheriting the AST base class) for every elements of the language. For instance there’s one for puppet classes, for if, case and so on. You’ll find all those in lib/puppet/parser/ast/ directory.

There are two kinds of AST classes:

  • leaves: which represent some kind of values (like an identifier or a string)
  • branches: which encompass more than one other AST classes (like if, case or class). This is what forms the tree.

All AST classes implement the evaluate method which we’ll cover in the compiler article.

For instance when parsing an if/else statement like this:

An If/Else statement
1
2
3
4
5
if $var {
  notice("var is true")
} else {
  notice("var is false")
}

The whole if/else once parsed will be an instance of Puppet::Parser::AST::IfStatement (which can be found in lib/puppet/parser/ast/ifstatement.rb.

This class defines three instance variables:

  1. @test
  2. @statements
  3. @else

The grammar rule for ifstatement is (I simplified it for the purpose of the article):

Simplified Grammar rule for If/else
1
2
3
4
5
6
7
8
ifstatement:  IF expression LBRACE statements RBRACE else {
  args = {
    :test => val[0],
    :statements => val[2],
    :else = val[4]
  }
  result = ast AST::IfStatement, args
}

Notice how the AST::IfStatement is initialized with the args hash containing the test,statements and else result of the those rules. Those rules result will also be AST classes, and will end up in the IFStatement fields we talked about earlier.

Thus this forms a tree. If you look to the AST::IfStatement#evaluate implementation you’ll see that depending on the result of the evaluation of the @test it will either evaluate @statements or @else.

Calling the evaluate method of the root element of this tree will in chain trigger calling evaluate on children like for the IfStatement example. This process will be explained in details in the compiler article, but that’s essentially how Puppet compiler works.

An Example Step by Step

Let’s see an end-to-end example of parsing a simple manifest:

A Simple Example
1
2
3
4
5
class test {
  file {
    "/tmp/a": content => "test!"
  }
}

This will produce the following stream of tokens:

A Simple Example
1
2
3
4
5
6
7
8
9
10
11
12
:CLASS(CLASS) {:line=>1, :value=>"class"}
:NAME(NAME) {:line=>1, :value=>"test"}
:LBRACE(LBRACE) {:line=>1, :value=>"{"}
:NAME(NAME) {:line=>2, :value=>"file"}
:LBRACE(LBRACE) {:line=>2, :value=>"{"}
:STRING(STRING) {:line=>3, :value=>"/tmp/a"}
:COLON(COLON) {:line=>3, :value=>":"}
:NAME(NAME) {:line=>3, :value=>"content"}
:FARROW(FARROW) {:line=>3, :value=>"=>"}
:STRING(STRING) {:line=>3, :value=>"test!"}
:RBRACE(RBRACE) {:line=>4, :value=>"}"}
:RBRACE(RBRACE) {:line=>5, :value=>"}"}

And now let’s dive in the parser events (I simplified the outcome because the Puppet grammar is a little bit more complex than necessary for this article). The following example shows all actions of the Parser and how looks the parser stack after the operation took place. I elided some of the stacks when not strictly needed to understand what happened.

  1. receive: CLASS (our parser got the first token from the lexer)
  2. shift CLASS (there’s nothing else to do for the moment)

    the result of the shift is that we now have one token in the parser stack

    stack: [ CLASS ]

  3. receive: NAME("test") (we get one more token)

  4. shift NAME (still no rules can match so we shift it)

    stack: [ CLASS NAME("test") ]

  5. reduce NAME –> classname (oh and now we can reduce a rule)

    notice how the stacks now contains a classname and not a NAME

    stack: [ CLASS (classname "test") ]

  6. receive: LBRACE

  7. shift LBRACE

    stack: [ CLASS (classname "test") LBRACE ]

  8. receive: NAME("file")

  9. shift NAME

    stack: [ CLASS (classname "test")
LBRACE NAME("file") ]

  10. receive: LBRACE

  11. reduce NAME –> classname

    stack: [ CLASS (classname "test")
LBRACE (classname "file") ]

  12. shift: LBRACE

    stack: [ CLASS (classname "test")
LBRACE (classname "file") LBRACE ]

  13. receive STRING("/tmp/a")

  14. shift STRING

    stack: [ CLASS (classname "test") LBRACE (classname "file") LBRACE STRING("/tmp/a") ]

  15. reduce STRING –> quotedtext

    stack: [ CLASS (classname "test") LBRACE (classname "file") LBRACE (quotedtext AST::String("/tmp/a")) ]

  16. receive COLON

  17. reduce quotedtext –> resourcename

    stack: [ CLASS (classname "test") LBRACE (classname "file") LBRACE (resourcename AST::String("/tmp/a")) ]

  18. shift COLON

    stack: [ CLASS (classname "test") LBRACE (classname "file") LBRACE (resourcename AST::String("/tmp/a")) COLON ]

  19. receive: NAME("content")

  20. shift NAME

    stack: [ CLASS (classname "test") LBRACE (classname "file") LBRACE (resourcename AST::String("/tmp/a")) COLON NAME("content") ]

  21. receive: FARROW

  22. shift FARROW

    stack: [ CLASS (classname "test") LBRACE (classname "file") LBRACE (resourcename AST::String("/tmp/a")) COLON NAME("content") FARROW ]

  23. receive: STRING("test!")

  24. shift: STRING
  25. reduce STRING –> quotedtext
  26. receive: RBRACE
  27. reduce quotedtext –> rvalue

    stack: [ CLASS (classname "test") LBRACE (classname "file") LBRACE (resourcename AST::String("/tmp/a")) COLON NAME("content") FARROW (rvalue AST::String("test!"))]

  28. reduce rvalue –> expression

    stack: [ CLASS (classname "test") LBRACE (classname "file") LBRACE (resourcename AST::String("/tmp/a")) COLON NAME("content") FARROW (expression AST::String("test!"))]

  29. reduce NAME FARROW expression –> param (we’ve now a resource parameter)

    stack: [ CLASS (classname "test") LBRACE (classname "file") LBRACE (resourcename AST::String("/tmp/a")) COLON (param AST::ResourceParam("content"=>"test!")))]

  30. reduce param –> params (multiple parameters can form a params)

  31. reduce resourcename COLON params –> resourceinst (name: parameters form a resouce)

    stack: [ CLASS (classname "test") LBRACE (classname "file") LBRACE (resourceinst (AST::ResourceInstance(...)))]

  32. reduce resourceinst –> resourceinstances (more than one resourceinst can form resourceinstances)

    stack: [ CLASS (classname "test") LBRACE (classname "file") LBRACE (resourceinstances [(resourceinst (AST::ResourceInstance(...)))] )]

  33. shift RBRACE

  34. reduce classname LBRACE resourceinstances RBRACE –> resource (we’ve discovered a resource)

    stack: [ CLASS (classname "test") LBRACE (classname "file") LBRACE (resource AST::Resource(...))]

  35. receive: RBRACE

  36. reduce resource –> statement_or_declaration (a resource is one statement)
  37. reduce statement_or_declaration –> statement_and_declarations
  38. shift RBRACE

    stack: [ CLASS (classname "test") LBRACE (classname "file") LBRACE (resource AST::Resource(...)) RBRACE ]

  39. reduce CLASS classname LBRACE statements_and_declarations RBRACE –> hostclass (we’ve discovered a puppet class)

    stack: [ (hostclass AST::Hostclass(...)) ]

  40. reduce hostclass –> statement_or_declaration

  41. reduce statement_or_declaration –> statements_and_declarations
  42. receive: end of file
  43. reduce statements_and_declarations –> program
  44. shift end of file

    stack: [ (program (AST::ASTArray [AST::Hostclass(...))])) ]

And the parsing is now over. What is returned is this program, which is in fact an instance of an AST::ASTArray.

If we now analyze the produced AST, we find:

  • AST::ASTarray - array of AST instances, this is our program
    • AST::Hostclass - an instance of a class
      • AST::Resource - contains an array of resource instances
        • AST::ResourceInstance
          • AST::ResourceParam - contains the “content” parameter
            • AST::String("content")
            • AST::String("test!")

What’s important to understand is that the AST depends only from the manifests. Thus the Puppet master needs only to reparse manifests only if they change.

What’s next?

The next episode will follow-up after the Parser: the compilation. The Puppet compiler takes the AST, injects into it the facts and gets what we call a catalog; that’s exactly what we’ll learn in the next article (sorry, no ETA yet).

Do not hesitate to comment or ask questions on this article with the comment system below :)

And happy new year all!

25 December 2011 ~ Comments Off

Protobuf, Maven, M2E and Eclipse are on a boat

At Days of Wonder we develop several Java projects (for instance our online game servers). Those are built with Maven, and most if not all are using Google Protocol Buffers for data interchange.

Development happens mostly in Eclipse, and until a couple of months ago with m2eclipse. With the release of m2e (m2eclipse successor), our builds don’t work as is in Eclipse.

The reason is that we run the maven-protoc-plugin (the David Trott fork which is more or less now the only one available still seeing development). This maven plugins allows the protoc Protocol Buffers compiler to be run at the generate-sources phase of the Maven Lifecycle. Under m2eclipse, this phase was happening outside Eclipse and the builds was running fine.

Unfortunately m2e is not able to solve this correctly. It requires using a connector. Those connectors are Eclipse plugins that ties a maven plugin to a m2e build lifecycle phase. This way when m2e needs to execute this phase of the build, it can do so with the connector.

Until now, there wasn’t any lifecycle connector for the maven-protoc-plugin. This wasn’t possible to continue without this in the long term for our development team, so I took a stab to build it.

In fact it was way simpler than what I first thought. I used the m2e Extension Development Guide as a bootstrap (and especially the EGit extension).

The result of this few hours of development is now open-source and available in the m2e-protoc-connector Github repository.

Installation

I didn’t release an Eclipse p2 update repository (mainly because I don’t really know how to do that), so you’ll have to build the project by yourself (but it’s easy).

  1. Clone the repository
1
git clone git://github.com/masterzen/m2e-protoc-connector.git
  1. Build with maven 3
1
mvn package

Once built, you’ll find the feature packaged in com.daysofwonder.tools.m2e-protoc-connector.feature/target/com.daysofwonder.tools.m2e-protoc-connector.feature-1.0.0.20111130-1035-site.zip.

To install in Eclipse Indigo:

  1. open the Install New Software window from the Help menu.
  2. Then click on the Add button
  3. select the Archive button and point it to the: com.daysofwonder.tools.m2e-protoc-connector.feature/target/com.daysofwonder.tools.m2e-protoc-connector.feature-1.0.0.20111130-1035-site.zip file.
  4. Accept the license terms and restart eclipse.

Usage

To use it there is no specific need, as long as your pom.xml conforms roughly to what we use:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
<plugin>
    <groupId>com.google.protobuf.tools</groupId>
    <artifactId>maven-protoc-plugin</artifactId>
    <executions>
        <execution>
            <id>generate proto sources</id>
            <goals>
                <goal>compile</goal>
            </goals>
            <phase>generate-sources</phase>
            <configuration>
                <protoSourceRoot>${basedir}/src/main/proto/</protoSourceRoot>
                <includes>
                    <param>**/*.proto</param>
                </includes>
            </configuration>
        </execution>
    </executions>
</plugin>
...
  <dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
<version>2.4.1</version>
  </dependency>
...
<pluginRepositories>
    <pluginRepository>
        <id>dtrott-public</id>
        <name>David Trott's Public Repository</name>
        <url>http://maven.davidtrott.com/repository</url>
    </pluginRepository>
</pluginRepositories>

If you find any problem, do not hesitate to open an issue on the github repository.

25 December 2011 ~ Comments Off

redis-snmp: redis performance monitoring through SNMP

The same way I created mysql-snmp a small Net-SNMP subagent that allows exporting performance data from MySQL through SNMP, I’m proud to announce the first release of redis-snmp to monitor Redis servers. It is also inspired by the Cacti MySQL Templates (which also covers Redis).

I originally created this Net-SNMP perl subagent to monitor some Redis performance metrics with OpenNMS.

The where

You’ll find the sources (which allows to produce a debian package) in the redis-snmp github repository

The what

Here are the kind of graphs and metrics you can export from a redis server:

Redis Connections

Redis Commands

Redis Memory

The how

Like mysql-snmp you need to run redis-snmp on a host that has a connectivity with the monitored redis server (the same host makes sense). You also need the following dependencies:

  • Net-SNMP >= 5.4.2.1 (older versions contains a 64 bits varbind issue)
  • perl (tested under perl 5.10 from debian squeeze)

Once running, you should be able to ask your snmpd about redis values:

1
2
3
4
5
6
7
$ snmpbulkwalk -m'REDIS-SERVER-MIB' -v 2c  -c public redis-server.domain.com .1.3.6.1.4.1.20267.400
REDIS-SERVER-MIB::redisConnectedClients.0 = Gauge32: 1
REDIS-SERVER-MIB::redisConnectedSlaves.0 = Gauge32: 0
REDIS-SERVER-MIB::redisUsedMemory.0 = Counter64: 154007648
REDIS-SERVER-MIB::redisChangesSinceLastSave.0 = Gauge32: 542
REDIS-SERVER-MIB::redisTotalConnections.0 = Counter64: 6794739
REDIS-SERVER-MIB::redisCommandsProcessed.0 = Counter64: 37574019

Of course you must adjust the hostname and community. SNMP v2c (or better) is mandatory since we’re reporting 64 bits values.

Note that you can get the OID translation to name only if the REDIS-SNMP-SERVER MIB is installed on the host where you run the above command.

OpeNMS integration

To integrate to OpenNMS, it’s as simple as adding the following group to your datacollection-config.xml file:

1
2
3
4
5
6
7
8
9
<!-- REDIS-SERVER MIB -->
<group name="redis" ifType="ignore">
    <mibObj oid=".1.3.6.1.4.1.20267.400.1.1" instance="0" alias="redisConnectedClnts" type="Gauge32" />
    <mibObj oid=".1.3.6.1.4.1.20267.400.1.2" instance="0" alias="redisConnectedSlavs" type="Gauge32" />
    <mibObj oid=".1.3.6.1.4.1.20267.400.1.3" instance="0" alias="redisUsedMemory" type="Gauge64" />
    <mibObj oid=".1.3.6.1.4.1.20267.400.1.4" instance="0" alias="redisChangsSncLstSv" type="Gauge32" />
    <mibObj oid=".1.3.6.1.4.1.20267.400.1.5" instance="0" alias="redisTotalConnectns" type="Counter64" />
    <mibObj oid=".1.3.6.1.4.1.20267.400.1.6" instance="0" alias="redisCommandsPrcssd" type="Counter64" />
</group>

And the following graph definitions to your snmp-graph.properties file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
report.redis.redisconnections.name=Redis Connections
report.redis.redisconnections.columns=redisConnectedClnts,redisConnectedSlavs,redisTotalConnectns
report.redis.redisconnections.type=nodeSnmp
report.redis.redisconnections.width=565
report.redis.redisconnections.height=200
report.redis.redisconnections.command=--title "Redis Connections" \
 --width 565 \
 --height 200 \
 DEF:redisConnectedClnts={rrd1}:redisConnectedClnts:AVERAGE \
 DEF:redisConnectedSlavs={rrd2}:redisConnectedSlavs:AVERAGE \
 DEF:redisTotalConnectns={rrd3}:redisTotalConnectns:AVERAGE \
 LINE1:redisConnectedClnts#9B2B1B:"REDIS Connected Clients         " \
 GPRINT:redisConnectedClnts:AVERAGE:"Avg \\: %8.2lf %s" \
 GPRINT:redisConnectedClnts:MIN:"Min \\: %8.2lf %s" \
 GPRINT:redisConnectedClnts:MAX:"Max \\: %8.2lf %s\\n" \
 LINE1:redisConnectedSlavs#4A170F:"REDIS Connected Slaves          " \
 GPRINT:redisConnectedSlavs:AVERAGE:"Avg \\: %8.2lf %s" \
 GPRINT:redisConnectedSlavs:MIN:"Min \\: %8.2lf %s" \
 GPRINT:redisConnectedSlavs:MAX:"Max \\: %8.2lf %s\\n" \
 LINE1:redisTotalConnectns#38524B:"REDIS Total Connections Received" \
 GPRINT:redisTotalConnectns:AVERAGE:"Avg \\: %8.2lf %s" \
 GPRINT:redisTotalConnectns:MIN:"Min \\: %8.2lf %s" \
 GPRINT:redisTotalConnectns:MAX:"Max \\: %8.2lf %s\\n"

report.redis.redismemory.name=Redis Memory
report.redis.redismemory.columns=redisUsedMemory
report.redis.redismemory.type=nodeSnmp
report.redis.redismemory.width=565
report.redis.redismemory.height=200
report.redis.redismemory.command=--title "Redis Memory" \
  --width 565 \
  --height 200 \
  DEF:redisUsedMemory={rrd1}:redisUsedMemory:AVERAGE \
  AREA:redisUsedMemory#3B7AD9:"REDIS Used Memory" \
  GPRINT:redisUsedMemory:AVERAGE:"Avg \\: %8.2lf %s" \
  GPRINT:redisUsedMemory:MIN:"Min \\: %8.2lf %s" \
  GPRINT:redisUsedMemory:MAX:"Max \\: %8.2lf %s\\n"

report.redis.rediscommands.name=Redis Commands
report.redis.rediscommands.columns=redisCommandsPrcssd
report.redis.rediscommands.type=nodeSnmp
report.redis.rediscommands.width=565
report.redis.rediscommands.height=200
report.redis.rediscommands.command=--title "Redis Commands" \
 --width 565 \
 --height 200 \
 DEF:redisCommandsPrcssd={rrd1}:redisCommandsPrcssd:AVERAGE \
 AREA:redisCommandsPrcssd#FF7200:"REDIS Total Commands Processed" \
 GPRINT:redisCommandsPrcssd:AVERAGE:"Avg \\: %8.2lf %s" \
 GPRINT:redisCommandsPrcssd:MIN:"Min \\: %8.2lf %s" \
 GPRINT:redisCommandsPrcssd:MAX:"Max \\: %8.2lf %s\\n"

report.redis.redisunsavedchanges.name=Redis Unsaved Changes
report.redis.redisunsavedchanges.columns=redisChangsSncLstSv
report.redis.redisunsavedchanges.type=nodeSnmp
report.redis.redisunsavedchanges.width=565
report.redis.redisunsavedchanges.height=200
report.redis.redisunsavedchanges.command=--title "Redis Unsaved Changes" \
  --width 565 \
  --height 200 \
  DEF:redisChangsSncLstSv={rrd1}:redisChangsSncLstSv:AVERAGE \
  AREA:redisChangsSncLstSv#A88558:"REDIS Changes Since Last Save" \
  GPRINT:redisChangsSncLstSv:AVERAGE:"Avg \\: %8.2lf %s" \
  GPRINT:redisChangsSncLstSv:MIN:"Min \\: %8.2lf %s" \
  GPRINT:redisChangsSncLstSv:MAX:"Max \\: %8.2lf %s\\n"

Do not forget to register the new graphs in the report list at the top of snmp-graph.properties file.

Restart OpenNMS, and it should start graphing your redis performance metrics. You’ll find those files in the opennms directory of the source distribution.

Enjoy :)

19 December 2011 ~ Comments Off

How I like my Java

This is a repost of my article earlier posted at Jordan Sissel's awesome SysAdvent

After years of working in Java-based environments, there are a number of things that I like to implement together with the teams I`m working with - the application doesn't matter much, whether it's plain java, Tomcat, JBoss, etc, these deployment strategies will help your ops and dev teams build more managable services.

Packaging

The first step is to have the native operating system packages as build artifacts rolling out of your continuous integration server - No .ear, .war or .jar files: I want to have rpms or debs. With things like fpm or the maven rpm plugin this should not be an extra hassle, and the advantages you get from doing this are priceless.

What advantages? Most native package systems support dependency resolution, file verification, and upgrades (or downgrades). These are things you would have to implement yourself or cobble together from multiple tools. As a bonus, your fellow sysadmins are likely already comfortable with the native package tool used on your systems, so why not do it?

Proxied, not running as root

Shaken, not stirred

Just like any other daemon, for security reasons, I prefer to run run Tomcat or JBoss as its own user, rather than as root. In most cases, however, only root can bind to ports below 1024, so you need to put a proxy in front. This is a convenient requirement because proxying (with something like Apache) can be used to terminate SSL connections, give improved logging (access logs, etc), and provides the ability to run multiple java application server instances on the same infrastructure.

Service Management

Lots of Java application servers have a semi functional shell script that allows you to start the service. Often, these services don't daemonize in a clean way, so that's why I prefer to use the Java Service wrapper from Tanuki to manage most Java based services. With a small config file, you get a clean way to stop and start java as a service and even the possibility to add some more monitoring to it.

However, there are some problems the Java Service wrapper leaves unsolved. For example, after launching the service, the wrapper can return back with a successful exit code while your service is not ready yet. The application server might be ready, but your applications themselves are still starting up. If you are monitoring these applications (e.g for High Availability), you really only want to treat them as 'active' when the application is ready, so you don't want your wrapper script to return, "OK," before the application has been deployed and ready. Otherwise, you end up with false positives or nodes that failover before the application has ever started. It's pretty easy to create a ping-pong service flapping scenario on a cluster this way.

One application per host

I prefer to deploy one application per host even though you can easily deploy multiple applications within a single Java VM. With one-per-host, management becomes much easier. Given the availability and popularity of good virtualization, the overhead of launching multiple Linux VM's for different applications is so low that there are more benefits than disadvantages.

Configuration

What about configuration of the application? Where should remote API urls, database settings, and other tunables go? A good approach is to create a standard location for all your applications, like /etc/$vendor/app/, where you place the appropriate configuration files. Volatile application configuration must be outside the artifact that comes out the build (.ear , .jar, .war, .rpm). The content of these files should be managed by a configuration management tool such as puppet, chef, or cfengine. The developers should be given a basic training so they can provide the systems team with the appropriate configuration templates.

Logs

Logs are pretty important too, and very easy to neglect. There are plenty of alternative tools around to log from a Java application: Log4j, Logback, etc .. Use them and make sure that they are configured to log to syslog, then they can be collected centrally and parsed by tools much easier than if they were spread all over the filesystem.

Monitoring

You also want your application to have some ways to monitor it besides just checking if it is running - it is usually insufficient to simply check if a tcp server is listening. A nice solution is to have a simple plain text page with a list of critical services and whether they are OK or not (true/false), for example:

  1. someService: true
  2. otherService: false

This benefits humans as well as machines. Tools like mon, heartbeat or loadbalancers can just grep for "false" in the file. If the file contains false, it reports a failure and fails over. This page should live on a standard location for all your applications, maybe a pattern like this http://host / servicename/health.html and an example "http://10.0.129.10:8080/mrs-controller/health.html". The page should be accessible as soon as the app is deployed.

This true/false health report should not be a static HTML file; it should be a dynamically generated page. Text means that you can also use curl, wget, or any command-line tool or browser to check the status of your service.

The 'health.html' page should report honestly about health, executing any code necessary to compute 'health' before yielding a result. For example, if your app is a simple calculator, it should verify health by doing tests internally like adding up some numbers before sharing 'myCalculator:true' in the health report.

The 'health.html' page should report honestly about health, executing any code necessary to compute 'health' before yielding a result. For example, if your app is a simple calculator, then before reporting health it should put two and two together and get four.

This kind of approach could also be used to provide you with metrics you can't learn from the JVM, such as number of concurrent users or other valid application metadata for measurement and trending purposes.

Conclusion

If you can't convince your developers, then maybe more data can help: Check out Martin Jackson's (presentation on java deployments) Automated Java Deployments with RPM

With good strategies in packaging, deployment, logging, and monitoring, you are in a good position to have an easily manageable, reproducible, and scalable environment. You'll give your developers the opportunity to focus on writing the application, they can use the same setup on their local development boxes (e.g. by using vagrant) as you are using on production.