Category Archives: devops

Jenkins, Puppet, Graphite, Logstash and YOU

This is a repost of an article I wrote for the Acquia Blog some time ago.

As mentioned before, devops can be summarized by talking about culture, automation, monitoring metrics and sharing. Although devops is not about tooling, there are a number of open source tools out there that will be able to help you achieve your goals. Some of those tools will also enable better communication between your development and operations teams.

When we talk about Continuous Integration and Continuous Deployment we need a number of tools to help us there. We need to be able to build reproducible artifacts which we can test. And we need a reproducible infrastructure which we can manage in a fast and sane way. To do that we need a Continuous Integration framework like Jenkins.

Formerly known as Hudson, Jenkins has been around for a while. The open source project was initially very popular in the Java community but has now gained popularity in different environments. Jenkins allows you to create reproducible Build and Test scenarios and perform reporting on those. It will provide you with a uniform and managed way to , Build, Test, Release and Trigger the deployment of new Artifacts, both traditional software and infrastructure as code-based projects. Jenkins has a vibrant community that builds new plugins for the tool in different kinds of languages. People use it to build their deployment pipelines, automatically check out new versions of the source code, syntax test it and style test it. If needed, users can compile the software, triggering unit tests, uploading a tested artifact into a repository so it is ready to be deployed on a new platform level.

Jenkins then can trigger an automated way to deploy the tested software on its new target platform. Whether that be development, testing, user acceptance or production is just a parameter. Deployment should not be something we try first in production, it should be done the same on all platforms. The deltas between these platforms should be managed using a configuration management tool such as Puppet, Chef or friends.

In a way this means that Infrastructure as code is a testing dependency, as you also want to be able to deploy a platform to exactly the same state as it was before you ran your tests, so that you can compare the test results of your test runs and make sure they are correct. This means you need to be able to control the starting point of your test and tools like Puppet and Chef can help you here. Which tool you use is the least important part of the discussion, as the important part is that you adopt one of the tools and start treating your infrastructure the same way as you treat your code base: as a tested, stable, reproducible piece of software that you can deploy over and over in a predictable fashion.

Configuration management tools such as Puppet, Chef, CFengine are just a part of the ecosystem and integration with Orchestration and monitoring tools is needed as you want feedback on how your platform is behaving after the changes have been introduced. Lots of people measure the impact of a new deploy, and then we obviously move to the M part of CAMS.

There, Graphite is one of the most popular tools to store metrics. Plenty of other tools in the same area tried to go where Graphite is going , but both on flexibility, scalability and ease of use, not many tools allow developers and operations people to build dashboards for any metric they can think of in a matter of seconds.

Just sending a keyword, a timestamp and a value to the Graphite platform provides you with a large choice of actions that can be done with that metric. You can graph it, transform it, or even set an alert on it. Graphite takes out the complexity of similar tools together with an easy to use API for developers so they can integrate their own self service metrics into dashboards to be used by everyone.

One last tool that deserves our attention is Logstash. Initially just a tool to aggregate, index and search the log files of our platform, it is sometimes a huge missed source of relevant information about how our applications behave.. Logstash and it's Kibana+ElasticSearch ecosystem are now quickly evolving into a real time analytics platform. Implementing the Collect, Ship+Transform, Store and Display pattern we see emerge a lot in the #monitoringlove community. Logstash now allows us to turn boring old logfiles that people only started searching upon failure into valuable information that is being used by product owners and business manager to learn from on the behavior of their users.

Together with the Graphite-based dashboards we mentioned above, these tools help people start sharing their information and communicate better. When thinking about these tools, think about what you are doing, what goals you are trying to reach and where you need to improve. Because after all, devops is not solving a technical problem, it's trying to solve a business problem and bringing better value to the end user at a more sustainable pace. And in that way the biggest tool we need to use is YOU, as the person who enables communication.

Why Does DevOps Matter?

This is a repost of an article I wrote for the Acquia Blog some time ago.

People often ask, why does DevOps matter?

The honest answer to that question is...because having the development and operations team work together is the only way IT is successful.

Over the past few decades I've worked in different environments that include: small web start ups, big pharmaceutical companies, hardware engineering shops and large software companies and banks. All were trying different approaches to deliver quality software to their end users, customers, but most of them were failing badly.

Operations people were being pulled in at the last minute. A marketing campaign needed to go live at 5 p.m. because that's when the first radio commercial was scheduled to be broadcasted. At 11 a.m., the operations people still didn't know the campaign existed.

It was always the other person’s fault. Waterfall projects and large PID documents were the solution to all the problems. But people learned; they figured out that we can't expect humans to predict how long it would take to implement something they have never done before. Unfortunately, even today, only a small set of people understand the value of being agile and that we cannot break a project down to its granular details without factoring in the “unpredictable.” The key element here is the “uncertainty” of the many project pieces.

So on came the agile movement and software development became much smoother.
People agreed on time boxing a reasonable set of work that would result in delivering useful functionality in frequent batches. Yet, on the day of deployment, all hell breaks loose because someone forgot to loop in the Ops team.

This is where my personal experience differs from a lot of others, because I was part of a development team building a product where the developers were sitting right next to the system administration team. Within sprints, our DevOps team was building both system features and application features, making the application highly available was a story on the board next to an actual end user feature.

In the old days, a new feature that was scheduled for Friday couldn't be brought online for a couple of days because it couldn't be deployed to production. In the new setup, deploying to production was a no brainer as we had already tested the automated deployment to the acceptance platform.

This brings us to the first benefit : Actually being able to go live.

The next problem came on a Wednesday evening. A major security issue had popped up in Drupal and an upgrade needed to be performed, however nobody dared to perform the upgrade as they were afraid of breaking the site. Some people had made changes, they hadn't put their config back in code base, and thus the site didn't get updated. This is the typical state of the majority of any type of website where people build something, deploy it and never look back. This is the case until disaster strikes and it hits the evening news.

Teams then learn that not only do they need to implement features and put their config changes in code, but also do continuous integration testing on their sites.

From doing continuous integration, they go to continuous delivery and continuous deployment, where an upgrade isn't a risk anymore but a normal event which happens automatically when all the tests are green. By implementing infrastructure as code, they now have achieved 2 goals. By implementing tests, we build the confidence that the code was working, but also made sure that the number of defects in that code base went down so the number of times people needed to dig back into old code to fix issue also came down.

By delivering better software in a much more regular way, it enables the security issues to be fixed faster, but also brings new features to market faster. With faster, we often mean that there is an change from releasing software on a bi-yearly basis to a release each sprint, to a release whenever a commit has passed a number of test criteria.

Because they started to involve other stakeholders, the value of their application grew as they had faster feedback and better usage statistics. The faster feedback meant that they weren't spending as much time on features nobody used, but focusing their efforts on things that mattered.

Having other stakeholders like systems and security teams involved with early metrics and taking in the non functional requirements into the backlog planning meant that the stability of the platform was growing. Rather than people spending hours and nights fixing production problems, Potential issues are now being tackled upfront because of the
communication between devs and ops. Also, scale and high availability have been built into the application upfront, rather than afterwards -- when it is too late.

So, in the end it comes down to the most important part, which is that devops creates more happiness. It creates more happy customers, developers, operations teams, managers, and investors and for a lot of people it improves not only application quality, but also their life quality.

The Rise of the DevOps movement

This is a repost of an article I wrote for the Acquia Blog some time ago.

DevOps, DevOps, DevOps … the whole world is talking about DevOps, but what is DevOps?

Since Munich 2012, DrupalCon had a dedicated devops track. After talking to
a lot of people in Prague last month, I realized that the concept of DevOps is still very unclear to a lot of developers. To a large part of the development community, DevOps development still means folks working on 'the infrastructure part' of the development life cycle and for some it just means simply deploying Drupal, being concerned about purely keeping the site alive etc.

Obviously that's not what DevOps is about, so let's take a step back and find out how it all started.

Like all good things, Drupal included, DevOps is a Belgian thing!

Back in 2009 DevopsDays Europe was created because a group of people met over and over again at different conferences throughout the world and didn’t have a common devops conference to go to. These individuals would talk about software delivery, deployment, build, scale, clustering, management, failure, monitoring and all the important things one needs to think about when running a modern web operation. These folks included Patrick Debois, Julian Simpson, Gildas Le Nadan, Jezz Humble, Chris Read, Matt Rechenburg , John Willis, Lindsay Holmswood and me - Kris Buytaert.

O’Reilly created a conference called, “Velocity,” and that sounded interesting to a bunch of us Europeans, but on our side of the ocean we had to resort to the existing Open Source, Unix, and Agile conferences. We didn't really have a common meeting ground yet. At CloudCamp Antwerp, in the Antwerp Zoo, I started talking to Patrick Debois about ways to fill this gap.

Many different events and activities like John Allspaw and Paul Hammond’s talk at “Velocity”, multiple twitter discussions influenced Patrick to create a DevOps specific event in Gent, which became the very first ‘DevopsDays'. DevopsDays Gent was not your traditional conference, it was a mix between a couple of formal presentations in the morning and open spaces in the afternoon. And those open spaces were where people got most value. The opportunity to talk to people with the same complex problems, with actual experiences in solving them, with stories both about success and failure etc. How do you deal with that oldskool system admin that doesn’t understand what configuration management can bring him? How do you do Kanban for operations while the developers are working in 2 week sprints? What tools do you use to monitor a highly volatile and expanding infrastructure?

From that very first DevopsDays in Gent several people spread out to organize other events John Willis and Damon Edwards started organizing DevopsDays Mountain View, and the European Edition started touring Europe. It wasn’t until this year that different local communities started organizing their own local DevopsDays, e.g in Atlanta, Portland, Austin, Berlin, Paris, Amsterdam, London, Barcelona and many more.

From this group of events a community has grown of people that care about bridging the gap between development and operations, a community of people that cares about delivering holistic business value to their organization.

As a community, we have realized that there needs to be more communication between the different stakeholders in an IT project lifecycle - business owners, developers, operations, network engineers, security engineers – everybody needs to be involved as soon as possible in the project in order to help each other and talk about solving potential pitfalls ages before the application goes live. And when it goes live the communication needs to stay alive too.. We need to talk about maintaining the application, scaling it, keeping it secure . Just think about how many Drupal sites are out there vulnerable to attackers because the required security updates have never been implemented. Why does this happen? It could be because many developers don't try to touch the site anymore..because they are afraid of breaking it.

And this is where automation will help.. if we can do automatic deployments and upgrades of a site because it is automatically tested when developers push their code, upgrading won't be that difficult of a task. Typically when people only update once in 6 months, its a painful and difficult process but when its automated and done regularly, it makes life so much easier.

This ultimately comes down to the idea that the involvement of developers doesn’t end at their last commit. Collaboration is key which allows every developer to play a key role in keeping the site up and running, for more happy users. After all software with no users has no value. The involvement of the developers in the ongoing operations of their software shouldn't end before the last end user stops using their applications.

In order to keep users happy we need to get feedback and metrics, starting from the very first phases of development all the way up to production. It means we need to monitor both our application and infrastructure and get metrics from all possible aspects, with that feedback we can learn about potential problems but also about successes.

Finally, summarizing this in an acronym coined by John Willis and Damon Edwards
- CAMS. CAMS says Devops is about Culture, Automation, Measurement and Sharing.
Getting the discussion going on how to do all of that, more specifically in a Drupal environment, is the sharing part .

Girls at Devopsdays (see what I did there?)

Ben Hughes (@benjammingh) of Etsy, visiting from SF to speak on security, tells me he is horrified by the pitifully small number of women at the conference. He says he thought New York was bad 'til he saw London. I saw, maybe, six female persons here, among several hundred male ones. Personally I don't notice this much - and if I don't experience any direct discrimination or signs of overt sexism I tend to see people as people, regardless of gender. I don't see me as part of a minority, so maybe I forget to see other women as such. Talking with Ben has made me think this isn't too good.

As someone who's done a fair bit of hiring, and never had a single female applicant I shrug, I say "what can I do? They aren't applying". But after talking with Ben I'm thinking that's a lame cop out too. Something needs to be done and just because it isn't my fault as an evil misogynist hirer, doesn't mean I shouldn't be trying to make improvements. I think it's a slow burn. It probably requires a culture shift with girls moving nearer to the model of the classic teenage bedroom programmer, and boys moving further away from it, until they all meet in the middle in some kind of more social, sharing, cooperative way of being. Hackspaces for teenagers? With quotas? It probably also requires more, and more interesting, tech in schools; programming for all. Meanwhile, I'm going to start with my small daughters (and sons), and to set them a good example I'm going to upskill myself. I'm going to get around to learning Ruby, as I've been meaning to for years.

Oh but wait... did I say there was no overt sexism here at Devopsdays? So as I'm writing this there's an ignite talk going on. It's a humorous piece using Game of Thrones as a metaphor for IT company life. It starts ok - Joffrey, Ned, Robb and Tyrion get mentions as corporate characters, it rambles a bit but it gets some laughs, including from me. But then suddenly this happens...

The presenter shows a picture of Shae, saying "...and if you do it right you get a girl." Oh, you GET a "girl" do you, Mister Presenter? That well known ownable commodity, the female human. Oh and apparently you get one who's under eighteen... that's a little, er, inappropriate, isn't it?

And then, as if that wasn't enough, the presenter shows a photo of Danaerys Targaryen, saying "And here's a gratuitous picture of Danaerys". Danaerys is gratuitous apparently! Well who knew! Not anyone who's read or seen Game of Thrones. Oh and wait, it gets better - the guy now tells us, "if you do it really well you get two girls."

So, firstly in the mind of this man, women are present in fiction and in IT only as prizes for successful men to win. Secondly women can reasonably be described as girls. Thirdly, he assumes that his audience is exclusively male (I was sitting in the front row - he's got no excuse for that one).

So I spoke to the guy afterwards, and he didn't see a problem with it at all, happily defending the piece, though he did finish by thanking me for the feedback. I guess the proof will be in whether he gives the talk again in a modified form.

Such a lost opportunity though - GoT is packed with powerful and interesting women, good, evil and morally ambiguous. He could so easily have picked half and half. It's like he read it or watched it without even seeing the non-male protagonists.

Hmm, i just used the word guy... We say "guys" a lot round here, and i mostly assume it's meant in a gender neutral inclusive way. That's the way I take it, and use it. But now I'm wondering how personally contextualised is our understanding of terms like "guys". Atalanta management once upset a female colleague by sending out a group email, beginning, "Chaps...". She felt excluded, whereas we assumed everyone used the word as we do in a completely neutral way. But both terms of address are originally male-specific, so it should be no surprise that some people will understand them to be either deliberately or carelessly excluding.

I don't know... This is the kind of small detail that tends to get written off as irrelevant, over-reacting, being "touchy", political correctness gone mad, and so on. But in the absence of really overt sexism - leaving aside Game of Thrones guy's show - maybe we do need to look to the small details to explain why tech still seems to be an unattractive environment for most women.

I nearly proposed an open space on this subject, but a few things held me back. Firstly, which I'm not proud of, I thought "what if nobody cares enough to come along? A) I'd feel a little silly, and b) I'd be so disappointed in my peers for not seeing it as important". I should've been braver, taken the risk.

More importantly though, I think it's really important that this isn't seen as a "women's issue", to be debated and dealt with by women, but rather as a people's issue, something that affects us all negatively, reflects badly on us; something we should all be responsible for fixing. And for me, the only female person in the room at the time, to propose this talk felt like ghettoising it as a women-only concern. I may be wrong. Maybe lots of people would've seized the chance to work towards some solutions. I wish Ben had been around.

So, I didn't propose a talk. But what I did do was put on my organisers' t-shirt and silly hat, so at least it was visible and clear that there are female persons involved in creating a tech conference in London. Even if there weren't enough attending.

About the Author

Helena Nelson-Smith is CEO of Atalanta Systems, she's also, by some people's definition, a "girl".

Heroku log drains into Logstash

The first and obvious option for shipping logs from a heroku app to Logstash is the heroku input plugin. However, this requires installing the Heroku gem and deploying the login + password of a Heroku user to your Logstash server(s). At this time it seems that any user given permissions to an app on Heroku has full control. Not good when you just want to fetch logs. Heroku has added more granular permissions via OAuth but the Heroku gem does not support OAuth tokens yet.

Fortunately there’s another option using Heroku’s log drain. Setting up a log drain from Heroku to Logstash is almost as simple as the Heroku input plugin but has the major advantage of not requiring any new users or passwords to be deployed on the Logstash serer.

Hooking up your Heroku-deployed apps to your Logstash/Kibana/Elasticsearch infrastructure is straightforward using the syslog drains provided by Heroku. Here is a recipe for putting the pieces together:

Install Heroku toolbelt

In order to configure a log drain on Heroku you need to install the Heroku toolbelt (or using the API directly). At this time I don't think there's a way to configure log drains from the web UI.

The heroku toolbelt can also be used to tail an apps' logs on the command line which is great for debugging since the logs output by this command are identical to the logs that should be sent to the Logstash if everything is configured correctly:

List apps:

$ heroku apps
myapp1  email@dom.tld
myapp2  email@dom.tld

Tail app's logs:

$ heroku logs --app myapp1 -t
2014-01-31T14:26:11.801629+00:00 app[web.1]: 2014-01-31T14:26:11.801Z - debug: FETCHING TICKET: 15846
2014-01-31T14:26:26.753977+00:00 app[web.1]: 2014-01-31T14:26:26.752Z - debug: FETCHING TICKET: 15851
2014-01-31T14:26:27.457415+00:00 heroku[router]: at=info method=GET path=/ping? host=myapp1 request_id=6cb9b9eb-388c-4364-9278-a81179067f21 fwd="50.57.38.1" dyno=web.2 connect=7ms service=6ms status=200 bytes=2

Configure log drain

Use the Heroku toolbelt to configure a log drain:

$ heroku drains:add --app myapp1 syslog://logstash.dom.tld:1514

Heroku's Logplex system will now send all logs generated from myapp1 to logstash.dom.tld:1514 via TCP in syslog RFC-5424 format.

Configure Logstash

Next, configure a Logstash input to receive these logs:

input {
    tcp {
        port => "1514"
        tags => ["input_heroku_syslog"]
    }
}

Heroku uses the syslog format as defined in RFC 5424. Logstash ships with a grok rules that parse out most syslog formats including RFC 5424 but I found that they were not quite perfect for Heroku logs.

For more details on Heroku log drains: https://devcenter.heroku.com/articles/logging

Here are examples of raw log lines sent by Heroku. These are exactly what Logstash will receive and suitable for testing with the grok debugger.

The first is an example of a log message sent from a heroku component, in this case the Heroku router:

231 <158>1 2014-01-08T01:05:27.967180+00:00 d.9bc44987-ff40-40ac-a248-ff4ec4d71d7c heroku router - - at=info method=POST path=
/ host=myapp1.herokuapp.com fwd="204.236.185.9" dyno=web.1 connect=2ms service=81ms status=200 bytes=2

Next, is an example of a log line generated from the application's stdout:

158 <13>1 2014-01-08T17:49:14.822585+00:00 d.9bc44987-ff40-40ac-a248-ff4ec4d71d7c app web.1 - - 2014-01-08T17:49:14.822Z -
minfo: FETCHING TICKET: 15103

Here is the grok pattern we use to parse Heroku's syslog RFC 5424-(ish) log messages:

filter {
 
  if "input_heroku_syslog" in [tags] {
    grok {
      match => ["message", "%{SYSLOG5424PRI}%{NONNEGINT:syslog5424_ver} +(?:%{TIMESTAMP_ISO8601:timestamp}|-) +(?:%{HOSTNAME:heroku
_drain_id}|-) +(?:%{WORD:heroku_source}|-) +(?:%{DATA:heroku_dyno}|-) +(?:%{WORD:syslog5424_msgid}|-) +(?:%{SYSLOG5424SD:syslog5424
_sd}|-|) +%{GREEDYDATA:heroku_message}"]
    }
    mutate { rename => ["heroku_message", "message"] }
    kv { source => "message" }
    syslog_pri { syslog_pri_field_name => "syslog5424_pri" }
  }
}

A few notes about this filter config:

  • The log message will be stored in the message field.
  • Key/value pairs matching key=value in the message will be parsed and added to the Logstash event. Many of the internal Heroku components's logs include useful key/vals.
  • Special fields heroku_drain_id, heroku_source, and heroku_dyno will be extracted from each event.

Masterzen’s Blog 2014-01-11 16:45:00

All started a handful of months ago, when it appeared that we’d need to build some of our native software on Windows. Before that time, all our desktop software at Days of Wonder was mostly cross-platform java code that could be cross-compiled on Linux. Unfortunately, we badly needed a Windows build machine.

In this blog post, I’ll tell you the whole story from my zero knowledge of Windows administration to an almost fully automatized Windows build machine image construction.

Jenkins

But, first let’s digress a bit to explain in which context we operate our builds.

Our CI system is built around Jenkins, with a specific twist. We run the Jenkins master on our own infrastructure and our build slaves on AWS EC2. The reason behind this choice is out of the scope of this article (but you can still ask me, I’ll happily answer).

So, we’re using the Jenkins EC2 plugin, and a revamped by your servitor Jenkins S3 Plugin. We produce somewhat large binary artifacts when building our client software, and the bandwidth between EC2 and our master is not that great (and expensive), so using the aforementioned patch I contributed, we host all our artifacts into S3, fully managed by our out-of-aws Jenkins master.

The problem I faced when starting to explore the intricate world of Windows in relation with Jenkins slave, is that we wanted to keep the Linux model we had: on-demand slave spawned by the master when scheduling a build. Unfortunately the current state of the Jenkins EC2 plugin only supports Linux slave.

Enter WinRM and WinRS

The EC2 plugin for Linux slave works like this:

  1. it starts the slave
  2. using an internal scp implementation it copies ‘slave.jar’ which implements the client Jenkins remoting protocol
  3. using an internal ssh implementation, it executes java -jar slave.jar. The stdin and stdout of the slave.jar process is then connected to the jenkins master through an ssh tunnel.
  4. now, Jenkins does its job (basically sending more jars, classes)
  5. at this stage the slave is considered up

I needed to replicate this behavior. In the Windows world, ssh is inexistent. You can find some native implementation (like FreeSSHd or some other commercial ones), but all that options weren’t easy to implement, or simply non-working.

In the Windows world, remote process execution is achieved through the Windows Remote Management which is called WinRM for short. WinRM is an implementation of the WSMAN specifications. It allows to access the Windows Management Instrumentation to get access to hardware counters (ala SNMP or IPMI for the unix world).

One component of WinRM is WinRS: Windows Remote Shell. This is the part that allows to run remote commands. Recent Windows version (at least since Server 2003) are shipped with WinRM installed (but not started by default).

WinRM is an HTTP/SOAP based protocol. By default, the payload is encrypted if the protocol is used in a Domain Controller environment (in this case, it uses Kerberos), which will not be our case on EC2.

Digging, further, I found two client implementations:

I started integrating Overthere into the ec2-plugin but encountered several incompatibilities, most notably Overthere was using a more recent dependency on some libraries than jenkins itself.

I finally decided to create my own WinRM client implementation and released Windows support for the EC2 plugin. This hasn’t been merged upstream, and should still be considered experimental.

We’re using this version of the plugin for about a couple of month and it works, but to be honest WinRM doesn’t seem to be as stable as ssh would be. There are times the slave is unable to start correctly because WinRM abruptly stops working (especially shortly after the machine boots).

WinRM, the bootstrap

So all is great, we know how to execute commands remotely from Jenkins. But that’s not enough for our sysadmin needs. Especially we need to be able to create a Windows AMI that contains all our software to build our own applications.

Since I’m a long time Puppet user (which you certainly noticed if you read this blog in the past), using Puppet to configure our Windows build slave was the only possiblity. So we need to run Puppet on a Windows base AMI, then create an AMI from there that will be used for our build slaves. And if we can make this process repeatable and automatic that’d be wonderful.

In the Linux world, this task is usually devoted to tools like Packer or Veewee (which BTW supports provisioning Windows machines). Unfortunately Packer which is written in Go doesn’t yet support Windows, and Veewee doesn’t support EC2.

That’s the reason I ported the small implementation I wrote for the Jenkins EC2 plugin to a WinRM Go library. This was the perfect pet project to learn a new language :)

Windows Base AMI

So, starting with all those tools, we’re ready to start our project. But there’s a caveat: WinRM is not enabled by default on Windows. So before automating anything we need to create a Windows base AMI that would have the necessary tools to further allow automating installation of our build tools.

Windows boot on EC2

There’s a service running on the AWS Windows AMI called EC2config that does the following at the first boot:

  1. set a random password for the ‘Administrator’ account
  2. generate and install the host certificate used for Remote Desktop Connection.
  3. execute the specified user data (and cloud-init if installed)

On first and subsequent boot, it also does:

  1. it might set the computer host name to match the private DNS name
  2. it configures the key management server (KMS), check for Windows activation status, and activate Windows as necessary.
  3. format and mount any Amazon EBS volumes and instance store volumes, and map volume names to drive letters.
  4. some other administrative tasks

One thing that is problematic with Windows on EC2 is that the Administrator password is unfortunately defined randomly at the first boot. That means to further do things on the machine (usually using remote desktop to administer it) you need to first know it by asking AWS (with the command-line you can do: aws ec2 get-password-data).

Next, we might also want to set a custom password instead of this dynamic one. We might also want to enable WinRM and install several utilities that will help us later.

To do that we can inject specific AMI user-data at the first boot of the Windows base AMI. Those user-data can contain one or more cmd.exe or Powershell scripts that will get executed at boot.

I created this Windows bootstrap Gist (actually I forked and edited the part I needed) to prepare the slave.

First bootstrap

First, we’ll create a Windows security group allowing incoming WinRM, SMB and RDP:

1
2
3
4
5
6
7
aws ec2 create-security-group --group-name "Windows" --description "Remote access to Windows instances"
# WinRM
aws ec2 authorize-security-group-ingress --group-name "Windows" --protocol tcp --port 5985 --cidr <YOURIP>/32
# Incoming SMB/TCP 
aws ec2 authorize-security-group-ingress --group-name "Windows" --protocol tcp --port 445 --cidr <YOURIP>/32
# RDP
aws ec2 authorize-security-group-ingress --group-name "Windows" --protocol tcp --port 3389 --cidr <YOURIP>/32

Now, let’s start our base image with the following user-data (let’s put it into userdata.txt):

1
2
3
4
<powershell>
Set-ExecutionPolicy Unrestricted
icm $executioncontext.InvokeCommand.NewScriptBlock((New-Object Net.WebClient).DownloadString('https://gist.github.com/masterzen/6714787/raw')) -ArgumentList "VerySecret"
</powershell>

This powershell script will download the Windows bootstrap Gist and execute it, passing the desired administrator password.

Next we launch the instance:

1
aws ec2 run-instances --image-id ami-4524002c --instance-type m1.small --security-groups Windows --key-name <YOURKEY> --user-data "$(cat userdata.txt)"

Unlike what is written in the ec2config documentation, the user-data must not be encoded in Base64.

Note, the first boot can be quite long :)

After that we can connect through WinRM with the “VerySecret” password. To check we’ll use the WinRM Go tool I wrote and talked about above:

1
./winrm -hostname <publicip> -username Administrator -password VerySecret "ipconfig /all"

We should see the output of the ipconfig command.

Note: in the next winrm command, I’ve omitted the various credentials to increase legibility (a future version of the tool will allow to read a config file, meanwhile we can create an alias).

A few caveats:

  • BITS doesn’t work in the user-data powershell, because it requires a user to be logged-in which is not possible during boot, that’s the reason downloading is done through the System.Net.WebClient
  • WinRM enforces some resource limits, you might have to increase the allowed shell resources for running some hungry commands: winrm set winrm/config/winrs @{MaxMemoryPerShellMB="1024"} Unfortunately, this is completely broken in Windows Server 2008 unless you install this Microsoft hotfix The linked bootstrap code doesn’t install this hotfix, because I’m not sure I can redistribute the file, that’s an exercise left to the reader :)
  • the winrm traffic is not encrypted nor protected (if you use my tool). Use at your own risk. It’s possible to setup WinRM over HTTPS, but it’s a bit more involved. Current version of my WinRM tool doesn’t support HTTPS yet (but it’s very easy to add).

Baking our base image

Now that we have our base system with WinRM and Puppet installed by the bootstrap code, we need to create a derived AMI that will become our base image later when we’ll create our different windows machines.

1
aws ec2 create-image --instance-id <ourid> --name 'windows-2008-base'

For a real world example we might have defragmented and blanked the free space of the root volume before creating the image (on Windows you can use sdelete for this task).

Note that we don’t run the Ec2config sysprep prior to creating the image, which means the first boot of any instances created from this image won’t run the whole boot sequence and our Administrator password will not be reset to a random password.

Where does Puppet fit?

Now that we have this base image, we can start deriving it to create other images, but this time using Puppet instead of a powershell script. Puppet has been installed on the base image, by virtue of the powershell bootstrap we used as user-data.

First, let’s get rid of the current instance and run a fresh one coming from the new AMI we just created:

1
aws ec2 run-instances --image-id <newami> --instance-type m1.small --security-groups Windows --key-name <YOURKEY>

Anatomy of running Puppet

We’re going to run Puppet in masterless mode for this project. So we need to upload our set of manifests and modules to the target host.

One way to do this is to connect to the host with SMB over TCP (which our base image supports):

1
2
sudo mkdir -p /mnt/win
sudo mount -t cifs -o user="Administrator%VerySecret",uid="$USER",forceuid "//<instance-ip>/C\$/Users/Administrator/AppData/Local/Temp" /mnt/win

Note how we’re using an Administrative Share (the C$ above). On Windows the Administrator user has access to the local drives through Administrative Shares without having to share them as for normal users.

The user-data script we ran in the base image opens the windows firewall to allow inbound SMB over TCP (port 445).

We can then just zip our manifests/modules, send the file over there, and unzip remotely:

1
2
zip -q -r /mnt/win/puppet-windows.zip manifests/jenkins-steam.pp modules -x .git
./winrm "7z x -y -oC:\\Users\\Administrator\\AppData\\Local\\Temp\\ C:\\Users\\Administrator\\AppData\\Local\\Temp\\puppet-windows.zip | FIND /V \"ing  \""

And finally, let’s run Puppet there:

1
./winrm "\"C:\\Program Files (x86)\\Puppet Labs\\Puppet\\bin\\puppet.bat\" apply --debug --modulepath C:\\Users\\Administrator\\AppData\\Local\\Temp\\modules C:\\Users\\Administrator\\AppData\\Local\\Temp\\manifests\\site.pp"

And voila, shortly we’ll have a running instance configured. Now we can create a new image from it and use it as our Windows build slave in the ec2 plugin configuration.

Puppet on Windows

Puppet on Windows is not like your regular Puppet on Unix. Let’s focus on what works or not when running Puppet on Windows.

Core resources known to work

The obvious ones known to work:

  • File: beside symbolic links that are supported only on Puppet >3.4 and Windows 2008+, there are a few things to take care when using files:

    • NTFS is case-insensitive (but not the file resource namevar)
    • Managing permissions: octal unix permissions are mapped to Windows permissions, but the translation is imperfect. Puppet doesn’t manage Windows ACL (for more information check Managing File Permissions on Windows)
  • User: Puppet can create/delete/modify local users. The Security Identifier (SID) can’t be set. User names are case-insensitive on Windows. To my knowledge you can’t manage domain users.

  • Group: Puppet can create/delete/modify local groups. Puppet can’t manage domain groups.

  • Package: Puppet can install MSI or exe installers present on a local path (you need to specify the source). For a more comprehensive package system, check below the paragraph about Chocolatey.

  • Service: Puppet can start/stop/enable/disable services. You need to specify the short service name, not the human-reading service name.

  • Exec: Puppet can run executable (any .exe, .com or .bat). But unlike on Unix, there is no shell so you might need to wrap the commands with cmd /c. Check the Powershell exec provider module for a more comprehensive Exec system on Windows.

  • Host: works the same as for Unix systems.

  • Cron: there’s no cron system on Windows. Instead you must use the Scheduled_task type.

Do not expect your average unix module to work out-of-the-box

Of course that’s expected, mostly because of the used packages. Most of the Forge module for instance are targeting unix systems. Some Forge modules are Windows only, but they tend to cover specific Windows aspects (like registry, Powershell, etc…), still make sure to check those, as they are invaluable in your module Portfolio.

My Path is not your Path!

You certainly know that Windows paths are not like Unix paths. They use \ where Unix uses /.

The problem is that in most languages (including the Puppet DSL) ‘\’ is considered as an escape character when used in double quoted strings literals, so must be doubled \\.

Puppet single-quoted strings don’t understand all of the escape sequences double-quoted strings know (it only parses \' and \\), so it is safe to use a lone \ as long as it is not the last character of the string.

Why is that?

Let’s take this path C:\Users\Administrator\, when enclosed in a single-quoted string 'C:\Users\Administrator\' you will notice that the last 2 characters are \' which forms an escape sequence and thus for Puppet the string is not terminated correctly by a single-quote. The safe way to write a single-quoted path like above is to double the final slash: 'C:\Users\Administrator\\', which looks a bit strange. My suggestion is to double all \ in all kind of strings for simplicity.

Finally when writing an UNC Path#UNC_in_Windows) in a string literal you need to use four backslashes: \\\\host\\path.

Back to the slash/anti-slash problem there’s a simple rule: if the path is directly interpreted by Puppet, then you can safely use /. If the path if destined to a Windows command (like in an Exec), use a \.

Here’s a list of possible type of paths for Puppet resources:

  • Puppet URL: this is an url, so /
  • template paths: this is a path for the master, so /
  • File path: it is preferred to use / for coherence
  • Exec command: it is preferred to use /, but beware that most Windows executable requires \ paths (especially cmd.exe)
  • Package source: it is preferred to use /
  • Scheduled task command: use \ as this will be used directly by Windows.

Windows facts to help detection of windows

To identify a Windows client in a Puppet manifests you can use the kernel, operatingsystem and osfamily facts that all resolves to windows.

Other facts, like hostname, fqdn, domain or memory*, processorcount, architecture, hardwaremodel and so on are working like their Unix counterpart.

Networking facts also works, but with the Windows Interface name (ie Local_Area_Connection), so for instance the local ip address of a server will be in ipaddress_local_area_connection. The ipaddress fact also works, but on my Windows EC2 server it is returning a link-local IPv6 address instead of the IPv4 Local Area Connection address (but that might because it’s running on EC2).

Do yourself a favor and use Chocolatey

We’ve seen that Puppet Package type has a Windows provider that knows how to install MSI and/or exe installers when provided with a local source. Unfortunately this model is very far from what Apt or Yum is able to do on Linux servers, allowing access to multiple repositories of software and on-demand download and installation (on the same subject, we’re still missing something like that for OSX).

Hopefully in the Windows world, there’s Chocolatey. Chocolatey is a package manager (based on NuGet) and a public repository of software (there’s no easy way to have a private repository yet). If you read the bootstrap code I used earlier, you’ve seen that it installs Chocolatey.

Chocolatey is quite straightforward to install (beware that it doesn’t work for Windows Server Core, because it is missing the shell Zip extension, which is the reason the bootstrap code installs Chocolatey manually).

Once installed, the chocolatey command allows to install/remove software that might come in several flavors: either command-line packages or install packages. The first one only allows access through the command line, whereas the second does a full installation of the software.

So for instance to install Git on a Windows machine, it’s as simple as:

1
chocolatey install git.install

To make things much more enjoyable for the Puppet users, there’s a Chocolatey Package Provider Module on the Forge allowing to do the following

1
2
3
4
5
package {
  "cmake":
    ensure => installed,
    provider => "chocolatey"
}

Unfortunately at this stage it’s not possible to host easily your own chocolatey repository. But it is possible to host your own chocolatey packages, and use the source metaparameter. In the following example we assume that I packaged cmake version 2.8.12 (which I did by the way), and hosted this package on my own webserver:

1
2
3
4
5
6
7
8
9
10
11
12
# download_file uses powershell to emulate wget
# check here: http://forge.puppetlabs.com/opentable/download_file
download_file { "cmake":
  url                   => "http://chocolatey.domain.com/packages/cmake.2.8.12.nupkg",
  destination_directory => "C:\\Users\\Administrator\\AppData\\Local\\Temp\\",
}
->
package {
  "cmake":
    ensure => install,
    source => "C:\\Users\\Administrator\\AppData\\Local\\Temp\\"
}

You can also decide that chocolatey will be the default provider by adding this to your site.pp:

1
2
3
Package {
  provider => "chocolatey"
}

Finally read how to create chocolatey packages if you wish to create your own chocolatey packages.

Line endings and character encodings

There’s one final things that the Windows Puppet user must take care about. It’s line endings and character encodings. If you use Puppet File resources to install files on a Windows node, you must be aware that file content is transferred verbatim from the master (either by using content or source).

That means if the file uses the Unix LF line-endings the file content on your Windows machine will use the same. If you need to have a Windows line ending, make sure your file on the master (or the content in the manifest) is using Windows \r\n line ending.

That also means that your text files might not use a windows character set. It’s less problematic nowadays than it could have been in the past because of the ubiquitous UTF-8 encoding. But be aware that the default character set on western Windows systems is CP-1252 and not UTF-8 or ISO-8859-15. It’s possible that cmd.exe scripts not encoded in CP-1252 might not work as intended if they use characters out of the ASCII range.

Conclusion

I hope this article will help you tackle the hard task of provisioning Windows VM and running Puppet on Windows. It is the result of several hours of hard work to find the tools and learn Windows knowledge.

During this journey, I started learning a new language (Go), remembered how I dislike Windows (and its administration), contributed to several open-source projects, discovered a whole lot on Puppet on Windows, and finally learnt a lot on WinRM/WinRS.

Stay tuned on this channel for more article (when I have the time) about Puppet, programming and/or system administration :)

FOSDEM 2014 is coming

and with that almost a full week of side events.
For those who don't know FOSDEM, (where have you been hiding for the past 13 years ? ) Fosdem is the annual Free and Open Source Developers European meeting. If you are into open source , you just can't mis this event where thousands of likeminded people will meet.

And if 2 days of FOSDEM madness isn't enough people organise events around it.

Last year I organised PuppetCamp in Gent, the days before Fosdem and a MonitoringLove Hackfest in our office the 2 days after FOSDEM This year another marathon is planned.

On Friday (31/1/2014) the CentOs community is hosting a Dojo in Brussels at the IBM Forum. (Free, but registration required by the venue)

After the success of PuppetCamp in Gent last year we decided to open up the discussion and get more Infrastructure as Code people involved in a CfgMgmtCamp.eu

The keynotes for CfgMgmtCamp will include the leaders of the 3 most popular tools around , both Mark Burgess, Luke Kanies and Adam Jacob will present at the event which will take place in Gent right after Fosdem. We expect people from all the major communities including, but not limited to , Ansible, Salt, Chef, Puppet, CFengine, Rudder, Foreman and Juju (Free but registration required for catering)

And because 3 events in one week isn't enough the RedHat Community is hosting their Infrastructure.next conference after CfgMgmtCamp at the same venue. (Free but registration required for catering)

cya in Belgium next year..

The problem with params.pp

My recent post about using Hiera data in modules has had a great level of discussion already, several thousand blog views, comments, tweets and private messages on IRC. Thanks for the support and encouragement – it’s clear this is a very important topic.

I want to expand on yesterdays post by giving some background information on the underlying motivations that caused me to write this feature and why having it as a forge module is highly undesirable but the only current option.

At the heart of this discussion is the params.pp pattern and general problems with it. To recap, the basic idea is to embed all your default data into a file params.pp typically in huge case statements and then reference this data as default. Some examples of this are the puppetlabs-ntp module, the Beginners Guide to Modules and the example I had in the previous post that I’ll reproduce below:

# ntp/manifests/init.pp
class ntp (
     # allow for overrides using resource syntax or data bindings
     $config = $ntp::params::config,
     $keys_file = $ntp::params::keys_file
   ) inherits ntp::params {
 
   # validate values supplied
   validate_absolute_path($config)
   validate_absolute_path($keys_file)
 
   # optionally derive new data from supplied data
 
   # use data
   file{$config:
      ....
   }
}

# ntp/manifests/params.pp
class ntp::params {
   # set OS specific values
   case $::osfamily {
      'AIX': {
         $config = "/etc/ntp.conf"
         $keys_file = '/etc/ntp.keys'
      }
 
      'Debian': {
         $config = "/etc/ntp.conf"
         $keys_file = '/etc/ntp/keys'
      }
 
      'RedHat': {
         $config = "/etc/ntp.conf"
         $keys_file = '/etc/ntp/keys'
      }
 
      default: {
         fail("The ${module_name} module is not supported on an ${::osfamily} based system.")
      }
   }
}

Now today as Puppet stands this is pretty much the best we can hope for. This achieves a lot of useful things:

  • The data that provides OS support is contained and separate
  • You can override it using resource style syntax or Puppet 3 data bindings
  • The data provided using any means are validated
  • New data can be derived by combining supplied or default data

You can now stick this module on the forge and users can use it, it supports many Operating Systems and pretty much works on any Puppet going back quite a way. These are all good things.

The list above also demonstrates the main purpose for having data in a module – different OS/environment support, allowing users to supply their own data, validation and to transmogrify the data. The params.pp pattern achieves all of this.

So what’s the problem then?

The problem is: the data is in the code. In the pre extlookup and Hiera days we put our site data in a case statements or inheritance trees or node data or any of number of different solutions. These all solved the basic problem – our site got configured and our boxes got built just like the params.pp pattern solves the basic problem. But we wanted more, we wanted our data separate from our code. Not only did it seem natural because almost every other known programming language supports and embrace this but as Puppet users we wanted a number of things:

  • Less logic, syntax, punctuation and “programming” and more just files that look a whole lot like configuration
  • Better layering than inheritance and other tools at our disposal allowed. We want to structure our configuration like we do our DCs and environments and other components – these form a natural series of layered hierarchies.
  • We do not want to change code when we want to use it, we want to configure that code to behave according to our site needs. In a CM world data is configuration.
  • If we’re in a environment that do not let us open source our work or contribute to open source repositories we do not want to be forced to fork and modify open source code just to use it in our environments. We want to configure the code. Compliance needs should not force us to solve every problem in house.
  • We want to plug into existing data sources like LDAP or be able to create self service portals for our users to supply this configuration data. But we do not want to change our manifests to achieve this.
  • We do not want to be experts at using source control systems. We use them, we love them and agree they are needed. But like everything less is more. Simple is better. A small simple workflow we can manage at 2am is better than a complex one.
  • We want systems we can reason about. A system that takes configuration in the form of data trumps one that needs programming to change its behaviour
  • Above all we want a system that’s designed with our use cases in mind. Our User Experience needs are different from programmers. Our data needs are different and hugely complex. Our CM system must both guide in its design and be compatible with our existing approaches. We do not want to have to write our own external node data sources simply because our language do not provide solid solutions to this common problem.

I created Hiera with these items in mind after years of talking to probably 1000+ users and iterating on extlookup in order to keep pace with the Puppet language gaining support for modern constructs like Hashes. True it’s not a perfect solution to all these points – transparency of data origin to name but one – but there are approaches to make small improvements to achieve these and it does solve a high % of the above problems.

Over time Hiera has gained a tremendous following – it’s now the de facto standard to solving the problem of site configuration data largely because it’s pragmatic, simple and designed to suit the task at hand. In recognition of this I donated the code to Puppet Labs and to their credit they integrated it as a default prerequisite and created the data binding systems. The elephant in the room is our modules though.

We want to share our modules with other users. To do this we need to support many operating systems. To do this we need to create a lot of data in the modules. We can’t use Hiera to do this in a portable fashion because the module system needs improvement. So we’re stuck in the proverbial dark ages by embedding our data in code and gaining none of the advantages Hiera brings to site data.

Now we have a few options open to us. We can just suck it up and keep writing params.pp files gaining none of the above advantages that Hiera brings. This is not great and the puppetlabs-ntp module example I cited shows why. We can come up with ever more elaborate ways to wrap and extend and override the data provided in a params.pp or even far out ideas like having the data binding system query the params.pp data directly. In other words we can pander to the status quo, we can assume we cannot improve the system instead we have to iterate on an inherently bad idea. The alternative is to improve Puppet.

Every time the question of params.pp comes up the answer seems to be how to improve how we embed data in the code. This is absolutely the wrong answer. The answer should be how do we improve Puppet so that we do not have to embed data in code. We know people want this, the popularity and wide adoption of Hiera has shown that they do. The core advantages of Hiera might not be well understood by all but the userbase do understand and treasure the gains they get from using it.

Our task is to support the community in the investment they made in Hiera. We should not be rewriting it in a non backwards compatible way throwing away past learnings simply because we do not want to understand how we got here. We should be iterating with small additions and rounding out this feature as one solid ever present data system that every user of Puppet can rely on being present on every Puppet system.

Hiera adoption has reached critical mass, it’s now the solution to the problem. This is a great and historical moment for the Puppet Community, to rewrite it or throw it away or propose orthogonal solutions to this problem space is to do a great disservice to the community and the Puppet product as a whole.

Towards this I created a Hiera backend that goes a way to resolve this in a way thats a natural progression of the design of Hiera. It improves the core features provided by Puppet in a way that will allow better patterns than the current params.pp one to be created that will in the long run greatly improve the module writing and sharing experience. This is what my previous blog post introduce, a way forward from the current params.pp situation.

Now by rights a solution to this problem belong in Puppet core. A Puppet Forge dependant module just to get this ability, especially one not maintained by Puppet Labs, especially one that monkey patches its way into the system is not desirable at all. This is why the code was a PR first. The only alternatives are to wait in the dark – numerous queries by many members of the community to the Puppet product owner has yielded only vague statements of intent or outcome. Or we can take it on our hands to improve the system.

So I hope the community will support me in using this module and work with me to come up with better patterns to replace the params.pp ones. Iterating on and improving the system as a whole rather than just suck up the status quo and not move forward.

Better Puppet Modules Using Hiera Data

When writing Puppet Modules there tend to be a ton of configuration data – generally things like different paths for different operating systems. Today the general pattern to manage this data is a class module::param with a bunch of logic in it.

Here’s a simplistic example below – for an example of the full horror of this pattern see the puppetlabs-ntp module.

# ntp/manifests/init.pp
class ntp (
     $config = $ntp::params::config,
     $keys_file = $ntp::params::keys_file
   ) inherits ntp::params {
 
   file{$config:
      ....
   }
}

# ntp/manifests/params.pp
class ntp::params {
   case $::osfamily {
      'AIX': {
         $config = "/etc/ntp.conf"
         $keys_file = '/etc/ntp.keys'
      }
 
      'Debian': {
         $config = "/etc/ntp.conf"
         $keys_file = '/etc/ntp/keys'
      }
 
      'RedHat': {
         $config = "/etc/ntp.conf"
         $keys_file = '/etc/ntp/keys'
      }
 
      default: {
         fail("The ${module_name} module is not supported on an ${::osfamily} based system.")
      }
   }
}

This is the exact reason Hiera exists – to remove this kind of spaghetti code and move it into data, instinctively now whenever anyone see code like this they think they should refactor this and move the data into Hiera.

But there’s a problem. This works for your own modules in your own repos, you’d just use the Puppet 3 automatic parameter bindings and override the values in the ntp class – not ideal, but many people do it. If however you wanted to write a module for the Forge though there’s a hitch because the module author has no idea what kind of hierarchy exist where the module is used. If the site even used Hiera and today the module author can’t ship data with his module. So the only sensible thing to do is to embed a bunch of data in your code – the exact thing Hiera is supposed to avoid.

I proposed a solution to this problem that would allow module authors to embed data in their modules as well as control the Hierarchy that would be used when accessing this data. Unfortunately a year on we’re still nowhere and the community – and the forge – is suffering as a result.

The proposed solution would be a always-on Hiera backend that as a last resort would look for data inside the module. Critically the module author controls the hierarchy when it gets to the point of accessing data in the module. Consider the ntp::params class above, it is a code version of a Hiera Hierarchy keyed on the $::osfamily fact. But if we just allowed the module to supply data inside the module then the module author has to just hope that everyone has this tier in their hierarchy – not realistic. My proposal then adds a module specific Hierarchy and data that gets consulted after the site Hierarchy.

So lets look at how to rework this module around this proposed solution:

# ntp/manifests/init.pp
class ntp ($config, $keysfile)  {
   validate_absolute_path($config)
   validate_absolute_path($keysfile)
 
   file{$config:
      ....
   }
}

Next you configure Hiera to consult a hierarchy on the $::osfamily fact, note the new data directory that goes inside the module:

# ntp/data/hiera.yaml
---
:hierarchy:
  - "%{::osfamily}"

And finally we create some data files, here’s just the one for RedHat:

# ntp/data/RedHat.yaml
---
ntp::config: /etc/ntp.conf
ntp::keys_file: /etc/ntp/keys

Users of the module could add a new OS without contributing back to the module or forking the module by simply providing similar data to the site specific hierarchy leaving the downloaded module 100% untouched!

This is a very simple view of what this pattern allows, time will tell what the community makes of it. There are many advantages to this over the ntp::params pattern:

This helps the contributor to a public module:

  • Adding a new OS is easy, just drop in a new YAML file. This can be done with confidence as it will not break existing code as it will only be read on machines of the new OS. No complex case statements or 100s of braces to get right
  • On a busy module when adding a new OS they do not have to worry about complex merge problems, working hard at rebasing or any git escoteria – they’re just adding a file.
  • Syntactically it’s very easy, it’s just a YAML file. No complex case statements etc.
  • The contributor does not have to worry about breaking other Operating Systems he could not test on like AIX here. The change is contained to machines for the new OS
  • In large environments this help with change control as it’s just data – no logic changes

This helps the maintainer of a module:

  • Module maintenance is easier when it comes to adding new Operating Systems as it’s simple single files
  • Easier contribution reviews
  • Fewer merge commits, less git magic needed, cleaner commit history
  • The code is a lot easier to read and maintain. Fewer tests and validations are needed.

This helps the user of a module:

  • Well written modules now properly support supplying all data from Hiera
  • He has a single place to look for the overridable data
  • When using a module that does not support his OS he can deploy it into his site and just provide data instead of forking it

Today I am releasing my proposed code as a standalone module. It provides all the advantages above including the fact that it’s always on without any additional configuration needed.

It works exactly as above by adding a data directory with a hiera.yaml inside it. The only configuration being considered in this hiera.yaml is the hierarchy.

This module is new and does some horrible things to get itself activated automatically without any configuration, I’ve only tested it on Puppet 3.2.x but I think it will work in 3.x as is. I’d love to get feedback on this from users.

If you want to write a forge module that uses this feature simply add a dependency on the ripienaar/module_data module, soon as someone install this dependency along with your module the backend gets activated. Similarly if you just want to use this feature in your own modules, just puppet module install ripienaar/module_data.

Note though that if you do your module will only work on Puppet 3 or newer.

It’s unfortunate that my Pull Request is now over a year old and did not get merged and no real progress is being made. I hope if enough users adopt this solution we can force progress rather than sit by and watch nothing happen. Please send me your feedback and use this widely.

Masterzen’s Blog 2013-12-08 09:06:00

So I know this blog has been almost abandoned, but that’s not because I don’t have anything to say. At contrary, I’ve never dealt with so many different tools, technologies and programming languages than in the past, and that only could fuel a large number of posts. Unfortunately my spare time is close to nil, and my personal priorities are much more geared toward my private life than this blog :)

So why this new post?

Well, a bunch of very fine people leaded by @KrisBuytaert have decided to organize a conference about Configuration Management, alongside of the Configuration Management DevRoom of the FOSDEM.

This post is all about why you should attend and/or even submit a talk for those events.

FOSDEM

If you live in Europe and never heard about the FOSDEM, you’re certainly not part of the Open Source movement, and it’s now the time to solve this issue!

The FOSDEM takes place every year in Brussels the first week-end of February at the Université Libre de Bruxelles. This year it’s the week-end of Feb 1st and 2nd.

This free (it’s usually a good practice to buy the T-shirt to support them) conference is certainly the largest Open Source event in Europe since more than 10 years. This conference is the gathering of about 5000 geeks from all the world, attending more than 400 talks in dozens of tracks ranging from databases (mysql, postgresql, nosql…) to languages (java, go…) through software practices (testing…) and system administration (configuration management, cloud and IAAS), and a lot more other subjects!

For 4 years now, there has been a Configuration Management DevRoom at the FOSDEM. I’m part of the people selecting the talks, and it looks like the program this year will be awesome!

And if that only is not compelling enough for you to attend the FOSDEM, here is a short list of why you should attend:

  • It’s close to almost everywhere in Europe by train or plane
  • It’s the occasion to meet a lot of Open Source stars, talk with your peers, meet people from communities you are part of
  • Belgium beers and food (do not forget the pre-FOSDEM crowded beer event usually at the Delirium Café)
  • Brussels is a very nice city (if you’re tired of computer geeks, you can visit the city – there used to be a spouse tour for non geeks relatives, but I don’t know if that still happens)

But wait, there’s more. While being in Brussels you’re pretty close to anywhere in this country by train. And since you’ll be there for this week-end, you can extend your trip by two days and attend the conference I’ll now talk about in the next paragraph :)

Config Management Camp

So, last year before FOSDEM we had a large two-days free PuppetCamp in Ghent (one of the nicest Belgium city at 30 minutes by train from Brussels). This was an awesome event.

This year, Kris et al had the crazy idea of doing a multi-tools conference. Last year camp was a success, so why not bring all the communities gravitating around the Configuration Management space in the same location and time and see what happens.

This new conference will happen on Monday and Tuesday February 3rd and 4th.

This current version of the Config Management Camp Ghent 2014 will have a main general track, and separated per-community tracks. We’ll be able to currently attend the following lectures:

So if you’re a user or a contributor to one of this tool, you’ll be able to attend two days of interesting talks.

The good news is that you might want to actually propose a talk for this conference. Fear not, the Call for Presentation deadline is December 15th for most of the community tracks.

I’ve heard from insiders that there will be some configuration management stars attending this conference (I don’t know at this stage if I can name those, but it appears most (if not all) above mentioned tools creator will be among us.)

Even though the exact schedule of the lectures is not yet known at this time, having those people on board will certainly sparkle some very interesting discussions!

The good news is that the free registration is open since Friday evening.

Go register. Now!

For some practical details now. You can go from Brussels to Ghent by a direct train departing from the Brussel Zuid station (Bruxelles Midi). This takes about 30 minutes with a lot of trains during the day, so it’s even possible to stay in Brussels and commute every day. But I wouldn’t advise that if you want to attend the Monday evening social event.

The conference takes place at the SchoonMeersen Campus of the University College Gent, which is about 5 minutes walk from the main Ghent station.

You can book an hotel/room/flat near the station (there are plenty nice hotels there), but you would miss the very nice no-car city center with all those monuments, churches and canals (and bars and restaurants too). My recommendation (actually @ripienaar’s one) would be to stay near the city center.

Of course I’ll be there, ready for some very interesting hallway discussions and socialization :)

See you all there! Do not forget to register!

Update: I think the organizers are still looking for Config Management Camp potential sponsors.