Category Archives: nginx

Nginx and why you should be running it instead of, or at least in front of Apache

After 9 years of development, Nginx hit a milestone release this week when version 1.0.0 was released (on 12th April 2011). Despite only now reaching a 1.0 release, it is already in widespread use, powering a lot of high traffic websites and CDN’s and is very popular with developers in particular. With such a milestone release though, I thought it a good opportunity to get motivated and do some posts on it here.

Nginx (pronounced “engine-x”) is a free, open-source, high-performance HTTP server (aka web server) and reverse proxy, as well as an IMAP/POP3 proxy server. Igor Sysoev started development of Nginx in 2002, with the first public release in 2004.

Nginx is known for its high performance, stability, rich feature set, simple configuration, and low resource consumption. It is built specifically to be able handle more than 10,000 request/sec and do so using minimal server resources. It does this by using a non-blocking event based model.

In this article, i’m going to look at the problems with Apache and explain why you would want to use Nginx. In a subsequent article, i’m going to explain how to install and configure Nginx.

The most popular web server, Apache powers around 60% of the world’s web sites. I’ve been using Apache for around 10 years but more recently have been using Nginx. Due to it’s widespread use, Apache is well used, well understood and reliable. However, it does have some problems when we are dealing with high traffic websites. A lot of these problems center around the fact that it uses a blocking process based architecture.

The typical setup for serving PHP based websites in a LAMP (Linux, Apache, MySQL and PHP) environment uses the Prefork MPM and mod_php. The way this works is to have the PHP binary (and any other active Apache modules) embedded directly into the Apache process. This gives very little overhead and means Apache can talk to PHP very fast but also results in each Apache process consuming between 20MB and 50MB of RAM. The problem with this is that once a process is dealing with a request, it can not be used to serve another request so to be able to able to handle multiple simultaneous requests (and remember that even a single person visiting a web page will generate multiple requests because the page will almost certainly contain images, stylesheets and javascript files which all need to be downloaded before the page can render), Apache spawns a new child process for each simultaneous request it is handling. Because the PHP binary is always embedded (to keep the cost of spawning processes to a minimum), each of these processes takes the full 20MB-50MB of RAM, even if it is only serving static files so you can see how a server can quickly run out of memory.

To compound the problem, if a PHP script takes a while to execute (due to either processing/load or waiting on an external process like MySQL) or the client is on a slow/intermittent connection like a mobile device then the Apache process is tied up until the execution and transmission etc has completed which could be a while. These factors and a lot of traffic can often mean that Apache has hundreds of concurrent processes loaded and it can easily hit the maximum number of processes (configured) or completely exhaust the available RAM in the system (at which point it will start using the virtual memory on the hard disk and everything will get massively slower and further compound the problem). If a web page has say 10 additional assets (css, javascript and images), that’s 11 requests per user. If 100 users hit the page at the same time, that’s 1,100 requests and up to around 50GB of RAM required (although in reality you would have a limit on the number of Apache processes much lower than this so the requests would actually be queued and blocked until a process became free and browsers will generally open up a few simultaneous connections to a server at a time). Hopefully you are starting to see the problem.

With Nginx’s event based processing model, each request triggers events to a process and the process can handle multiple events in parallel. What this means is that Nginx can handle many simultaneous requests and deal with execution delays and slow clients without spawning processes. If you look at the two graphs from webfaction, you can quite clearly see that Nginx can handle a lot more simultaneous requests while using significantly less, and quite a constant level (and low amount) of RAM.

Nginx excels at serving static files and it can do so very fast. What we can’t do is embed something like PHP into the binary as PHP is not asynchronous and would block requests and therefore render the event based approach of Nginx useless. What we therefore do is have either PHP over FastCGI or Apache+mod_php in the background handle all the PHP requests. This way, Nginx can be used to serve all static files (css, javascript, images, pdf’s etc), handle slow clients etc but pass php requests over to one of these backend processes, receive the response back and handle delivering it to the client leaving the backend process free to handle other requests. Nginx doesn’t block while wating for FastCGI or Apache it just carries on handing events as they happen.

The other advantage of this “reverse proxy” mode is that Nginx can act as a load balancer and distribute requests to not just one but multiple backend servers over a network. Nginx can also act as a reverse caching proxy to reduce the amount of dynamic requests needing to be processed by the backend PHP server. Both of these functions allow even more simultaneous dynamic requests.

What this means is that if your application requires a specific Apache configuration or module then you can gain the advantages of Nginx handling simultaneous requests and serving static files but still use Apache to handle the requests you need it to.

If these is no requirement for Apache then Nginx also supports communication protocols like FastCGI, SCGI and UWSGI. PHP also happens to support FastCGI so we can have Nginx interact with PHP over FastCGI without needing the whole of Apache around.

In the past, you either had to use a script called spawn-fcgi to spawn FastCGI processes or handle FastCGI manually and then use some monitoring software to monitor them to ensure they were running. However, as of PHP 5.3.3, something called PHP-FPM is (which distributions often package up in a package called php5-fpm) part of the PHP core code which handle all this for you in a way similar to Apache – you can set the minimum and maximum number of proceses and how many you would like to spawn and keep around waiting. The other advantage to this is that PHP-FPM is an entirely separate process to Nginx so you can change configurations and restart each of them independently of each other (and Nginx actually supports reloading it’s configuration and upgrading it’s binary on-the-fly so it doesn’t require a restart).

In the next post in this series, i’ll explain how to install and configure Nginx for serving both static and dynamic content.

One of the disadvantages of Nginx is that it doesn’t support .htaccess files to dynamically modify the server configuration – all configuration must be stored in the Nginx config files and can not be changed at runtime. This is a positive for performance and security but makes it less suitable for running “shared hosting” platforms.

Migrating puppetmaster to unicorn+nginx from WEBrick

As a followup to my last post I've been thinking about the best way to migrate over a production server and be able to test the new config without blatting what's currently running so that the new setup can be tested before blowing the old one away, so what I did was to test having my unicorn setup running and then a separate instance running the standard WEBrick.

On the puppetmaster (with the nginx+unicorn setup already running) I ran:

/usr/bin/ruby1.8 /usr/bin/puppet master --masterport=18141 --servertype=webrick --pidfile /var/run/puppet/thingy.pid --no-daemonize --verbose --debug

and on my test client:

puppetd --test --verbose --debug --detailed-exitcodes --masterport 18141

With a verified client, this should run just fine, and you should see a load of output scroll past on your non-daemonized puppetmaster instance. This shows that we can (as the documentation would imply) run multiple instances of puppetmasterd via different webservers on different ports, so it is possible to set up and test an entirely different puppetmaster instance using nginx+unicorn whilst the production instance is running. Then it's just a case of changing the config and changing the ports.

Migrating puppetmaster to unicorn+nginx from WEBrick

As a followup to my last post I've been thinking about the best way to migrate over a production server and be able to test the new config without blatting what's currently running so that the new setup can be tested before blowing the old one away, so what I did was to test having my unicorn setup running and then a separate instance running the standard WEBrick.

On the puppetmaster (with the nginx+unicorn setup already running) I ran:

/usr/bin/ruby1.8 /usr/bin/puppet master --masterport=18141 --servertype=webrick --pidfile /var/run/puppet/thingy.pid --no-daemonize --verbose --debug

and on my test client:

puppetd --test --verbose --debug --detailed-exitcodes --masterport 18141

With a verified client, this should run just fine, and you should see a load of output scroll past on your non-daemonized puppetmaster instance. This shows that we can (as the documentation would imply) run multiple instances of puppetmasterd via different webservers on different ports, so it is possible to set up and test an entirely different puppetmaster instance using nginx+unicorn whilst the production instance is running. Then it's just a case of changing the config and changing the ports.

Puppet+Nginx+Unicorn

This page should outline how to set up puppet behind and nginx proxy with a unicorn ruby webserver to replace the default WEBrick. The purpose of doing this is to firstly speed up puppet's file access and script running and then to proxy it so that nginx can handle all of the client requests with cached data.

Prerequisites

Running behind an nginx proxy requires puppet packages of version 2.6.1 or higher including clients as far as I am aware (tested on 2.6.2) due to a bug in 2.6.0 where an extra / gets inserted into the file content path by the client causing the file content to 404 but not the metadata, meaning that any files distributed by puppet exist but are empty.

Package versions

This guide was written using the following package versions on debian lenny, as a result, YMMV:

  • rubygems1.8 1.3.4-1~bpo50+1 via backports <- REQUIRED min version for unicorn.
  • EDIT:rubygems 1.3.4-1~bpo50+1 via backports <- Not actually required for this, but many things require the meta-package as well as the versioned package so getting them together can save you some headaches.
  • librack-ruby 1.1.0-4~bpo50+1 via backports <- REQUIRED min version for unicorn
  • nginx 0.7.65-2~bpo50+1 via backports <- if you can use a newer version you may be able to use a conditional nginx config for the handling of certificate signing instead of my config below.
  • puppet-common 2.6.2-4~bpo50+1 via backports
  • puppetmaster 2.6.2-4~bpo50+1 via backports

I found that the easiest thing to do was to set up a local repo before starting, as it saves you scping things around and getting in to all sorts of confusing shenanigans.

Setting up the puppetmaster

  1. Install Debian
  2. apt-get install rubygems1.8 librack-ruby puppetmaster puppet-common nginx make ruby1.8-dev
  3. gem install unicorn

Configuration

unicorn

Copy /var/lib/gems/1.8/gems/rack-1.2.1/test/rackup/config.ru to /etc/puppet/config.ru (you will be running unicorn from /etc/puppet, and it will look for the rack config "config.ru" in this directory.

Create /etc/puppet/unicorn.conf as follows:

worker_processes 8
working_directory "/etc/puppet"
listen 'unix:/var/run/puppet/puppetmaster_unicorn.sock', :backlog => 512
timeout 120
pid "/var/run/puppet/puppetmaster_unicorn.pid"

preload_app true
if GC.respond_to?(:copy_on_write_friendly=)
GC.copy_on_write_friendly = true
end

before_fork do |server, worker|
old_pid = "#{server.config[:pid]}.oldbin"
if File.exists?(old_pid) && server.pid != old_pid
begin
Process.kill("QUIT", File.read(old_pid).to_i)
rescue Errno::ENOENT, Errno::ESRCH
# someone else did our job for us
end
end
end

Unicorn listens on an internal unix socket which will be proxied by nginx. As unicorn manages it's own worker processes via a unicorn-master, this means that unlike mongrel only one socket need be used.

Edit /etc/default/puppetmaster and change the following two options:

SERVERTYPE=unicorn
PORT=18140

This is so that a) puppetmaster uses the unicorn webserver and b) so that it doesn't try to run it on the port we will be configuring nginx to listen on.

Many guides try to get you to install the ruby gem "god" at this point to manage the unicorn process, but for the sake of simplicity, clarity and sanity we will use an init script /etc/init.d/unicorn-puppet (don't forget to chmod 755):

#!/bin/bash
# unicorn-puppet This init script enables the puppetmaster rackup application
# via unicorn.
#
# Authors: Richard Crowley
# Naresh V.
#
# Modified for Debian usage by Matt Carroll
#
#

lockfile=/var/lock/puppetmaster-unicorn
pidfile=/var/run/puppet/puppetmaster_unicorn.pid

RETVAL=0
DAEMON=/var/lib/gems/1.8/bin/unicorn
DAEMON_OPTS="-D -c /etc/puppet/unicorn.conf"


start() {
sudo -u $USER $DAEMON $DAEMON_OPTS
RETVAL=$?
[ $RETVAL -eq 0 ] && touch "$lockfile"
echo
return $RETVAL
}

stop() {
sudo -u $USER kill `cat $pidfile`
RETVAL=$?
echo
[ $RETVAL -eq 0 ] && rm -f "$lockfile"
return $RETVAL
}

restart() {
stop
sleep 1
start
RETVAL=$?
echo
[ $RETVAL -ne 0 ] && rm -f "$lockfile"
return $RETVAL
}

condrestart() {
status
RETVAL=$?
[ $RETVAL -eq 0 ] && restart
}

status() {
ps ax | egrep -q "unicorn (worker|master)"
RETVAL=$?
return $RETVAL
}

usage() {
echo "Usage: $0 {start|stop|restart|status|condrestart}" >&2
return 3
}

case "$1" in
start)
start
;;
stop)
stop
;;
restart)
restart
;;
condrestart)
condrestart
;;
status)
status
;;
*)
usage
;;
esac

exit $RETVAL

Also, don't forget to add this to your startup scripts!

nginx

/etc/nginx/nginx.conf
user www-data;
worker_processes 2;

events {
worker_connections 1024;
}

http {
default_type application/x-raw;
large_client_header_buffers 16 8k;

# site specific settings such as access_log, proxy_buffers
# I use
proxy_max_temp_file_size 0;
proxy_buffers 128 4k;

upstream muppet-upstream {
server unix:/var/run/puppet/puppetmaster_unicorn.sock;
}

server {
listen 8140;
include /etc/nginx/conf.d/ssl.conf;
ssl_verify_client on;
root /usr/share/empty;
location / {
proxy_pass http://muppet-upstream;
include /etc/nginx/conf.d/proxy_set_header.conf;
proxy_set_header X-Client-Verify SUCCESS;
}
}
server {
listen 8141;
include /etc/nginx/conf.d/ssl.conf;
ssl_verify_client off;
root /usr/share/empty;
location / {
proxy_pass http://muppet-upstream;
include /etc/nginx/conf.d/proxy_set_header.conf;
proxy_set_header X-Client-Verify FAILURE;
}
}
}

You may wish to change "muppet-upstream" to a more suitable name. There are two servers listening on two different ports, as our version of nginx can't handle SSL certification conditionals, so the standard port 8140 is used for the regular service for signed clients, and 8141 is used for signing requests. The setting of the proxy headers is important, as the nginx proxy is where the SSL layer terminates, so the information regarding certification success or failure and the hostname must be passed on using HTTP headers, most of which are configured in the conf.d/proxy_set_headers.conf:

proxy_redirect         off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
#proxy_set_header X-Client-Verify $ssl_client_verify;
proxy_set_header X-Client-DN $ssl_client_s_dn;
proxy_set_header X-SSL-Subject $ssl_client_s_dn;
proxy_set_header X-SSL-Issuer $ssl_client_i_dn;

SSL certification is configured in conf.d/ssl.conf:

ssl on;
ssl_certificate /var/lib/puppet/ssl/certs/muppetmaster.labs.pem;
ssl_certificate_key /var/lib/puppet/ssl/private_keys/muppetmaster.labs.pem;
ssl_ciphers ALL:-ADH:+HIGH:+MEDIUM:-LOW:-SSLv2:-EXP;
ssl_client_certificate /var/lib/puppet/ssl/ca/ca_crt.pem;

#ssl_crl /var/lib/puppet/ssl/ca/ca_crl.pem;

The headers that the puppetmaster should use must now be configured in /etc/puppet/puppet.conf by adding the section:

[master]
ssl_client_header = HTTP_X_CLIENT_DN
ssl_client_verify_header = HTTP_X_CLIENT_VERIFY
(this section was previously called puppetmasterd in 2.6.0 and earlier, so if the section is named that you will get a message telling you it assumes you meant [master]. )

Testing and initial run

Firstly ensure that all of your daemons (puppetmaster, unicorn and nginx) are started up on the master. Then, on the client stop the puppet daemon and run:

puppetd --test --verbose --debug --detailed-exitcodes --ca_port 8141 --waitforcert 10

Which will give you verbose output for the signing request. If all goes smoothly, you can then sign the request on the master, and restart the puppet daemon on the client which will continue to run normally on the default port (8140) now that the SSL has been verified. Signing requests will 404 on this port.

When debugging problems, running daemons in non-daemonized verbose debug mode is your friend.

Puppet+Nginx+Unicorn

This page should outline how to set up puppet behind and nginx proxy with a unicorn ruby webserver to replace the default WEBrick. The purpose of doing this is to firstly speed up puppet's file access and script running and then to proxy it so that nginx can handle all of the client requests with cached data.

Prerequisites

Running behind an nginx proxy requires puppet packages of version 2.6.1 or higher including clients as far as I am aware (tested on 2.6.2) due to a bug in 2.6.0 where an extra / gets inserted into the file content path by the client causing the file content to 404 but not the metadata, meaning that any files distributed by puppet exist but are empty.

Package versions

This guide was written using the following package versions on debian lenny, as a result, YMMV:

  • rubygems1.8 1.3.4-1~bpo50+1 via backports <- REQUIRED min version for unicorn.
  • EDIT:rubygems 1.3.4-1~bpo50+1 via backports <- Not actually required for this, but many things require the meta-package as well as the versioned package so getting them together can save you some headaches.
  • librack-ruby 1.1.0-4~bpo50+1 via backports <- REQUIRED min version for unicorn
  • nginx 0.7.65-2~bpo50+1 via backports <- if you can use a newer version you may be able to use a conditional nginx config for the handling of certificate signing instead of my config below.
  • puppet-common 2.6.2-4~bpo50+1 via backports
  • puppetmaster 2.6.2-4~bpo50+1 via backports

I found that the easiest thing to do was to set up a local repo before starting, as it saves you scping things around and getting in to all sorts of confusing shenanigans.

Setting up the puppetmaster

  1. Install Debian
  2. apt-get install rubygems1.8 librack-ruby puppetmaster puppet-common nginx make ruby1.8-dev
  3. gem install unicorn

Configuration

unicorn

Copy /var/lib/gems/1.8/gems/rack-1.2.1/test/rackup/config.ru to /etc/puppet/config.ru (you will be running unicorn from /etc/puppet, and it will look for the rack config "config.ru" in this directory.

Create /etc/puppet/unicorn.conf as follows:

worker_processes 8
working_directory "/etc/puppet"
listen 'unix:/var/run/puppet/puppetmaster_unicorn.sock', :backlog => 512
timeout 120
pid "/var/run/puppet/puppetmaster_unicorn.pid"

preload_app true
if GC.respond_to?(:copy_on_write_friendly=)
GC.copy_on_write_friendly = true
end

before_fork do |server, worker|
old_pid = "#{server.config[:pid]}.oldbin"
if File.exists?(old_pid) && server.pid != old_pid
begin
Process.kill("QUIT", File.read(old_pid).to_i)
rescue Errno::ENOENT, Errno::ESRCH
# someone else did our job for us
end
end
end

Unicorn listens on an internal unix socket which will be proxied by nginx. As unicorn manages it's own worker processes via a unicorn-master, this means that unlike mongrel only one socket need be used.

Edit /etc/default/puppetmaster and change the following two options:

SERVERTYPE=unicorn
PORT=18140

This is so that a) puppetmaster uses the unicorn webserver and b) so that it doesn't try to run it on the port we will be configuring nginx to listen on.

Many guides try to get you to install the ruby gem "god" at this point to manage the unicorn process, but for the sake of simplicity, clarity and sanity we will use an init script /etc/init.d/unicorn-puppet (don't forget to chmod 755):

#!/bin/bash
# unicorn-puppet This init script enables the puppetmaster rackup application
# via unicorn.
#
# Authors: Richard Crowley
# Naresh V.
#
# Modified for Debian usage by Matt Carroll
#
#

lockfile=/var/lock/puppetmaster-unicorn
pidfile=/var/run/puppet/puppetmaster_unicorn.pid

RETVAL=0
DAEMON=/var/lib/gems/1.8/bin/unicorn
DAEMON_OPTS="-D -c /etc/puppet/unicorn.conf"


start() {
sudo -u $USER $DAEMON $DAEMON_OPTS
RETVAL=$?
[ $RETVAL -eq 0 ] && touch "$lockfile"
echo
return $RETVAL
}

stop() {
sudo -u $USER kill `cat $pidfile`
RETVAL=$?
echo
[ $RETVAL -eq 0 ] && rm -f "$lockfile"
return $RETVAL
}

restart() {
stop
sleep 1
start
RETVAL=$?
echo
[ $RETVAL -ne 0 ] && rm -f "$lockfile"
return $RETVAL
}

condrestart() {
status
RETVAL=$?
[ $RETVAL -eq 0 ] && restart
}

status() {
ps ax | egrep -q "unicorn (worker|master)"
RETVAL=$?
return $RETVAL
}

usage() {
echo "Usage: $0 {start|stop|restart|status|condrestart}" >&2
return 3
}

case "$1" in
start)
start
;;
stop)
stop
;;
restart)
restart
;;
condrestart)
condrestart
;;
status)
status
;;
*)
usage
;;
esac

exit $RETVAL

Also, don't forget to add this to your startup scripts!

nginx

/etc/nginx/nginx.conf
user www-data;
worker_processes 2;

events {
worker_connections 1024;
}

http {
default_type application/x-raw;
large_client_header_buffers 16 8k;

# site specific settings such as access_log, proxy_buffers
# I use
proxy_max_temp_file_size 0;
proxy_buffers 128 4k;

upstream muppet-upstream {
server unix:/var/run/puppet/puppetmaster_unicorn.sock;
}

server {
listen 8140;
include /etc/nginx/conf.d/ssl.conf;
ssl_verify_client on;
root /usr/share/empty;
location / {
proxy_pass http://muppet-upstream;
include /etc/nginx/conf.d/proxy_set_header.conf;
proxy_set_header X-Client-Verify SUCCESS;
}
}
server {
listen 8141;
include /etc/nginx/conf.d/ssl.conf;
ssl_verify_client off;
root /usr/share/empty;
location / {
proxy_pass http://muppet-upstream;
include /etc/nginx/conf.d/proxy_set_header.conf;
proxy_set_header X-Client-Verify FAILURE;
}
}
}

You may wish to change "muppet-upstream" to a more suitable name. There are two servers listening on two different ports, as our version of nginx can't handle SSL certification conditionals, so the standard port 8140 is used for the regular service for signed clients, and 8141 is used for signing requests. The setting of the proxy headers is important, as the nginx proxy is where the SSL layer terminates, so the information regarding certification success or failure and the hostname must be passed on using HTTP headers, most of which are configured in the conf.d/proxy_set_headers.conf:

proxy_redirect         off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
#proxy_set_header X-Client-Verify $ssl_client_verify;
proxy_set_header X-Client-DN $ssl_client_s_dn;
proxy_set_header X-SSL-Subject $ssl_client_s_dn;
proxy_set_header X-SSL-Issuer $ssl_client_i_dn;

SSL certification is configured in conf.d/ssl.conf:

ssl on;
ssl_certificate /var/lib/puppet/ssl/certs/muppetmaster.labs.pem;
ssl_certificate_key /var/lib/puppet/ssl/private_keys/muppetmaster.labs.pem;
ssl_ciphers ALL:-ADH:+HIGH:+MEDIUM:-LOW:-SSLv2:-EXP;
ssl_client_certificate /var/lib/puppet/ssl/ca/ca_crt.pem;

#ssl_crl /var/lib/puppet/ssl/ca/ca_crl.pem;

The headers that the puppetmaster should use must now be configured in /etc/puppet/puppet.conf by adding the section:

[master]
ssl_client_header = HTTP_X_CLIENT_DN
ssl_client_verify_header = HTTP_X_CLIENT_VERIFY
(this section was previously called puppetmasterd in 2.6.0 and earlier, so if the section is named that you will get a message telling you it assumes you meant [master]. )

Testing and initial run

Firstly ensure that all of your daemons (puppetmaster, unicorn and nginx) are started up on the master. Then, on the client stop the puppet daemon and run:

puppetd --test --verbose --debug --detailed-exitcodes --ca_port 8141 --waitforcert 10

Which will give you verbose output for the signing request. If all goes smoothly, you can then sign the request on the master, and restart the puppet daemon on the client which will continue to run normally on the default port (8140) now that the SSL has been verified. Signing requests will 404 on this port.

When debugging problems, running daemons in non-daemonized verbose debug mode is your friend.

More Puppet Offloading

Puppet really shines at configuration management, but there are some things it is not good at, for instance file sourcing of large files, or managing deep hierarchies.

Fortunately, most of this efficiency issues will be addressed in a subsequent major version (thanks to some of my patches and other refactorings).

Meanwhile it is interesting to work-around those bugs. Since most of us are running our masters as part of a more complete stack and not isolated, we can leverage the power of this stack to address some of the issues.

In this article, I’ll expose two techniques to help your overloaded masters to serve more and more clients.

Offloading file sourcing

I already talked about offloading file sourcing in a previous blog post about puppet memory consumption. Here the idea is to prevent our puppetmasters to read the whole content of files in memory at once to serve them. Most of the installation of puppetmasterd out there are behind an http reverse proxy of some sort (ie Apache or Nginx).

The idea is that file serving is an activity that a small static server is better placed to do than puppet itself (that might change when #3373 will be fully addressed). Note: I produced an experimental patch pending review to stream puppet file sourcing on the client side, which this tip doesn’t address.

So I did implement this in Nginx (which is my favorite http server of course, but that can be ported to any other webserver quite easily, which is an exercise left to the reader):

And if you use multiple module paths (for instance to separate common modules to other modules), it is still possible to use this trick with some use of nginx try_files directive.

The try_files directive allows puppet to try several physical path (the first matching one will be served), and if none match you can use the generic location that proxies to the master which certainly will know what to do.

Something that can be useful would be to create a small script to generate the nginx config from your fileserver.conf and puppet.conf. Since mine is pretty easy, I did it manually.

Optimize Catalog Compilation

The normal process of puppet is to contact the puppetmaster at some time interval asking for a catalog. The catalog is a byproduct of the compilation of the parsed manifests in which are injected the node facts. This operation takes some times depending on the manifest complexity and the server capacity or current load.

Most of the time an host requires a catalog while the manifests didn’t change at all. In my own infrastructure I rarely change my manifests once a kind of host become stable (I might do a change every week at most when in production).

Since 0.25, puppet is now fully RESTful, that means to get a catalog puppetd contacts the master under its SSL protected links and asks for this url:

In return the puppetmaster responds by a json-encoded catalog.
The actual compilation of a catalog for one of my largest host takes about 4s (excluding storeconfigs). During this 4s one ruby thread inside the master is using the CPU. And this is done once every 30 minutes, even if the manifests don’t change.

What if we could compile only when something changes? This would really free our masters!

Since puppet uses HTTP, it is easy to add a front-most HTTP cache in front of our master to actually cache the catalog the first time it is compiled and serve this one on the subsequent requests.

Although we can do it with any HTTP Cache (ie Varnish), this is really easy to add this with Nginx (which is already running in my own stack):

Puppet currently doesn’t return any http caching headers (ie Cache-Control or Expires), so we use nginx ability to cache despite it (see proxy_cache_valid). Of course I have a custom puppet branch that introduces a new parameter called –catalog_ttl which allows puppet to set those cache headers.

One thing to note is that the cache expiration won’t coincide with when you change your manifests. So we need some ways to purge the cache when you deploy new manifests.

With Nginx this can be done with:

It’s easy to actually add one of those methods to any svn hook or git post-receive hook so that deploying manifests actually purge the cache.

Note: I think that ReductiveLabs has some plan to add catalog compilation caching directly to Puppet (which would make sense). This method is the way to go before this features gets added to Puppet. I have no doubt that caching inside Puppet will be much better than outside caching, mainly because Puppet would be able to expire the cache when the manifests change.

There a few caveats to note:

  • any host with a valid certificate can request another cached catalog, unlike with the normal puppetmaster which makes sure to serve catalogs only to the correct host. It’s something that can be a problem for some configurations
  • if your manifests rely on “dynamic” facts (like uptime or free memory), obviously you shouldn’t cache the catalog at all.
  • the above nginx configuration doesn’t include the facts as part of the cache key. That means the catalog won’t be re-generated when any facts change and the cached catalog will always be served. If that’s an issue, you need to purge the cache when the host itself change.

I should also mention that caching is certainly not the panacea of reducing the master load.

Some other people are using clever methods to smooth out master load. One notable example is the MCollective puppet scheduler, R.I Pienaar has written. In essence he wrote a puppet run scheduler running on top of MCollective that schedule puppet runs (triggered through MCollective) when the master load is appropriate. This allows for the best use of the host running the master.

If you also have some tricks or tips for running puppet, do not hesitate to contact me (I’m masterzen on freenode’s #puppet or @_masterzen_ on twitter).

Masterzen’s Blog 2010-03-21 15:56:58

Puppet really shines at configuration management, but there are some things it is not good at, for instance file sourcing of large files, or managing deep hierarchies.

Fortunately, most of this efficiency issues will be addressed in a subsequent major version (thanks to some of my patches and other refactorings).

Meanwhile it is interesting to work-around those bugs. Since most of us are running our masters as part of a more complete stack and not isolated, we can leverage the power of this stack to address some of the issues.

In this article, I’ll expose two techniques to help your overloaded masters to serve more and more clients.

Offloading file sourcing

I already talked about offloading file sourcing in a previous blog post about puppet memory consumption. Here the idea is to prevent our puppetmasters to read the whole content of files in memory at once to serve them. Most of the installation of puppetmasterd out there are behind an http reverse proxy of some sort (ie Apache or Nginx).

The idea is that file serving is an activity that a small static server is better placed to do than puppet itself (that might change when #3373 will be fully addressed). Note: I produced an experimental patch pending review to stream puppet file sourcing on the client side, which this tip doesn’t address.

So I did implement this in Nginx (which is my favorite http server of course, but that can be ported to any other webserver quite easily, which is an exercise left to the reader):

And if you use multiple module paths (for instance to separate common modules to other modules), it is still possible to use this trick with some use of nginx try_files directive.

The try_files directive allows puppet to try several physical path (the first matching one will be served), and if none match you can use the generic location that proxies to the master which certainly will know what to do.

Something that can be useful would be to create a small script to generate the nginx config from your fileserver.conf and puppet.conf. Since mine is pretty easy, I did it manually.

Optimize Catalog Compilation

The normal process of puppet is to contact the puppetmaster at some time interval asking for a catalog. The catalog is a byproduct of the compilation of the parsed manifests in which are injected the node facts. This operation takes some times depending on the manifest complexity and the server capacity or current load.

Most of the time an host requires a catalog while the manifests didn’t change at all. In my own infrastructure I rarely change my manifests once a kind of host become stable (I might do a change every week at most when in production).

Since 0.25, puppet is now fully RESTful, that means to get a catalog puppetd contacts the master under its SSL protected links and asks for this url:

In return the puppetmaster responds by a json-encoded catalog. The actual compilation of a catalog for one of my largest host takes about 4s (excluding storeconfigs). During this 4s one ruby thread inside the master is using the CPU. And this is done once every 30 minutes, even if the manifests don’t change.

What if we could compile only when something changes? This would really free our masters!

Since puppet uses HTTP, it is easy to add a front-most HTTP cache in front of our master to actually cache the catalog the first time it is compiled and serve this one on the subsequent requests.

Although we can do it with any HTTP Cache (ie Varnish), this is really easy to add this with Nginx (which is already running in my own stack):

Puppet currently doesn’t return any http caching headers (ie Cache-Control or Expires), so we use nginx ability to cache despite it (see proxy_cache_valid). Of course I have a custom puppet branch that introduces a new parameter called —catalog_ttl which allows puppet to set those cache headers.

One thing to note is that the cache expiration won’t coincide with when you change your manifests. So we need some ways to purge the cache when you deploy new manifests.

With Nginx this can be done with:

It’s easy to actually add one of those methods to any svn hook or git post-receive hook so that deploying manifests actually purge the cache.

Note: I think that ReductiveLabs has some plan to add catalog compilation caching directly to Puppet (which would make sense). This method is the way to go before this features gets added to Puppet. I have no doubt that caching inside Puppet will be much better than outside caching, mainly because Puppet would be able to expire the cache when the manifests change.

There a few caveats to note:

  • any host with a valid certificate can request another cached catalog, unlike with the normal puppetmaster which makes sure to serve catalogs only to the correct host. It’s something that can be a problem for some configurations
  • if your manifests rely on “dynamic” facts (like uptime or free memory), obviously you shouldn’t cache the catalog at all.
  • the above nginx configuration doesn’t include the facts as part of the cache key. That means the catalog won’t be re-generated when any facts change and the cached catalog will always be served. If that’s an issue, you need to purge the cache when the host itself change.

I should also mention that caching is certainly not the panacea of reducing the master load.

Some other people are using clever methods to smooth out master load. One notable example is the MCollective puppet scheduler, R.I Pienaar has written. In essence he wrote a puppet run scheduler running on top of MCollective that schedule puppet runs (triggered through MCollective) when the master load is appropriate. This allows for the best use of the host running the master.

If you also have some tricks or tips for running puppet, do not hesitate to contact me (I’m masterzen on freenode’s #puppet or @masterzen on twitter).

When puppet meets nginx

Absolut_nginxAt $WORK I started using Nginx a while ago, first as a front end to my mongrel instances for puppet. Recently I began to use it for one of its most know features : reverse proxy (and caching too). Of course this work had to be puppetized !

This is a summary of what I’ve done :

  • Basic setup
  • Automatic setup of the status page, exploited by a munin plugin
  • An “include” directory, can be specific to a host through the usual $fqdn source selection system (as well as the nginx.conf file).
  • A “reverse proxy” specific class that uses a template embedding some ruby (see the previous post). My cache dir is under tmpfs, to speed up the whole thing.

This setup is mostly inspired by this post. I use a local dnsmasq setup to resolve both internal & external requests. This way I can manage vhosts being accessible from inside ou outside our network. It’s incredibly flexible and allows you to get the most from your infrastructure.

The puppet class :

# @name : nginx
# @desc : classe de base pour nginx
# @info : nil
class nginx
{
 package { "nginx":
 ensure => installed
 }
 
 service { "nginx":
 ensure => running
 }
 
 file { "nginx.conf":
 name => "/etc/nginx/nginx.conf",
 owner => root,
 group => root,
 source => [ "puppet://$fileserver/files/apps/nginx/$fqdn/nginx-rp-secure.conf", "puppet://$fileserver/files/apps/nginx/nginx-rp-secure.conf"],
 ensure => present,
 notify => Service["nginx"]
 }
 
 # status is installed on all nginx boxens
 file { "nginx-status":
 name => "/etc/nginx/sites-enabled/nginx-status",
 owner => root,
 group => root,
 source => [ "puppet://$fileserver/files/apps/nginx/nginx-status", "puppet://$fileserver/files/apps/nginx/$fqdn/nginx-status"],
 ensure => present,
 notify => Service["nginx"]
 }
 
 # include dir, get the freshness here
 file { "include_dir":
 name => "/etc/nginx/includes",
 owner => root,
 group => root,
 source => [ "puppet://$fileserver/files/apps/nginx/includes.$fqdn", "puppet://$fileserver/files/apps/nginx/includes"],
 ensure => directory,
 recurse => true,
 notify => Service["nginx"],
 ignore => ".svn*"
 }
 
 # files managed by hand, no matter if it breaks
 file { "sites-managed":
 name => "/etc/nginx/sites-managed",
 owner => root,
 group => root,
 ensure => directory
 }
}
 
# @name : nginx::reverseproxy
# @desc : config nginx pour reverse proxy
# @info : utilisée en conjonction avec dnsmasq local
class nginx::reverseproxy
{
 include nginx
 include dnsmasq::reverseproxy
 
 # Vars used by the template below
 $mysqldatabase=extlookup("mysqldatabase")
 $mysqllogin=extlookup("mysqllogin")
 $mysqlpassword=extlookup("mysqlpassword")
 $mysqlserver=extlookup("mysqlserver")
 
 file { "nginx-cachedir":
 name => "/dev/shm/nginx-cache",
 owner => www-data,
 group => www-data,
 ensure => directory
 }
 
 file { "site_reverse-proxy":
 name => "/etc/nginx/sites-enabled/reverse-proxy",
 owner => root,
 group => root,
 content => template("nginx/$fqdn/reverse-proxy.erb"),
 ensure => present,
 notify => Service["nginx"],
 require => File["nginx-cachedir"]
 }
 
}

This is the munin plugins that are automatically distributed with the box.

One of the generated graphs :

nginx_requests-day

Puppet Memory Usage – not a fatality

As every reader of this blog certainly know, I’m a big fan of Puppet, using it in production on Days of Wonder servers, up to the point I used to contribute regularly bug fixes and new features (not that I stopped, it’s just that my spare time is a scarce resource nowadays).

Still, I think there are some issues in term of scalability or resource consumption (CPU or memory), for which we can find some workarounds or even fixes. Those issues are not a symptom bad programming or bad design. No, most of the issues come either from ruby itself or some random library issues.

Let’s review the things I have been thinking about lately.

Memory consumption

This is by far one of the most seen issues both on the client side and the server side. I’ve mainly seen this problem on the client side, up to the point that most people recommend running puppetd as cronjobs, instead of being a long lived process.

Ruby allocator

All boils down to the ruby (at least the the MRI 1.8.x version) allocator. This is the part in the ruby interpreter that deals with memory allocations. Like in many dynamic languages, the allocator manages a memory pool that is called a heap. And like some other languages (among them Java), this heap can never shrink and always grows when more memory is needed. This is done this way because it is simpler and way faster. Usually applications ends using their nominal part of memory and no more memory has to be allocated by the kernel to the process, which gives faster applications.

The problem is that if the application needs transiently a high amount of memory that will be trashed a couple of millisecond after, the process will pay this penalty all its life, even though say 80% of the memory used by the process is free but not reclaimed by the OS.

And it’s even worst. The ruby interpreter when it grows the heap, instead of allocating bytes per bytes (which would be really slow) does this by chunk. The whole question is what is the proper size of a chunk?

In the default implementation of MRI 1.8.x, a chunk is the size of the previous heap times 1.8. That means at worst a ruby process might end up allocating 1.8 times more than what it really needs at a given time. (This is a gross simplification, read the code if you want to know more).

Yes but what happens in Puppet?

So how does it apply to puppetd?

It’s easy, puppetd uses memory for two things (beside maintaining some core data to be able to run):

  1. the catalog (which contains all resources, along with all templates) as shipped by the puppetmaster (i.e. serialized) and live as ruby objects.
  2. the content of the sourced files (one at a time, so it’s the biggest transmitted file that imposes it’s high watermark for puppetd). Of course this is still better than in 0.24 where the content was transmitted encoded in XMLRPC adding the penalty of escaping everything…

Hopefully, nobody distributes large files with Puppet :-) If you’re tempted to do so, see below…

But again there’s more, as Peter Meier (known as duritong in the community) discovered a couple of month ago: when puppetd gets its catalog (which by the way is transmitted in json nowadays), it also stores it as a local cache to be able to run if it can’t contact the master for a subsequent run. This operation is done by unserializing the catalog from json to ruby live objects, and then serializing the laters to YAML. Beside the evident loss of time to do that on large catalog, YAML is a real memory hog. Peter’s experience showed that about 200MB of live memory his puppetd process was using came from this final serialization!

So I had the following idea: why not store the serialized version of the catalog (the json one) since we already have it in a serialized form when we receive it from the master (it’s a little bit more complex than that of course). This way no need to serialize it again in YAML. This is what ticket #2892 is all about. Luke is committed to have this enhancement in Rowlf, so there’s good hope!

Some puppet solutions?

So what can we do to help puppet not consume that many memory?

In theory we could play on several factors:

  • Transmit smaller catalogs. For instance get rid of all those templates you love (ok that’s not a solution)
  • Stream the serialization/deserialization with something like Yajl-Ruby
  • Use another ruby interpreter with a better allocator (like for instance JRuby)
  • Use a different constant for resizing the heap (ie replace this 1.8 by 1.0 or less on line 410 of gc.c). This can be done easily when using Rails machine GC patches or Ruby Enterprise Edition, in which case setting the environment variable RUBY_HEAP_SLOTS_GROWTH_FACTOR is enough. Check the documentation for more information.
  • Stream the sourced file on the server and the client (this way only a small buffer is used, and the total size of the file is never allocated). This one is hard.

Note that the same issues apply to the master too (especially for the file serving part). But it’s usually easier to run a different ruby interpreter (like REE) on the master than on all your clients.

Streaming HTTP requests is promising but unfortunately would require large change to how Puppet deals with HTTP. Maybe it can be done only for file content requests… This is something I’ll definitely explore.

This file serving thing let me think about the following which I already discussed several time with Peter…

File serving offloading

One of the mission of the puppetmaster is to serve sourced file to its clients. We saw in the previous section that to do that the master has to read the file in memory. That’s one reason it is recommended to use a dedicated puppetmaster server to act as a pure fileserver.

But there’s a better way, provided you run puppet behind nginx or apache. Those two proxies are also static file servers: why not leverage what they do best to serve the sourced files and thus offload our puppetmaster?

This has some advantages:

  • it frees lots of resources on the puppetmaster, so that they can serve more catalogs by unit time
  • the job will be done faster and by using less resources. Those static servers have been created to spoon-feed our puppet clients…

In fact it was impossible in 0.24.x, but now that file content serving is RESTful it becomes trivial.

Of course offloading would give its best if your clients requires lots of sourced files that change often, or if you provision lots of new hosts at the same time because we’re offloading only content, not file metadata. File content is served only if the client hasn’t the file or the file checksum on the client is different.

An example is better than thousand words

Imagine we have a standard manifest layout with:

  • some globally sourced files under /etc/puppet/files and
  • some modules files under /etc/puppet/modules/<modulename>/files.

Here is what would be the nginx configuration for such scheme:

server {
    listen 8140;

    ssl                     on;
    ssl_session_timeout     5m;
    ssl_certificate         /var/lib/puppet/ssl/certs/master.pem;
    ssl_certificate_key     /var/lib/puppet/ssl/private_keys/master.pem;
    ssl_client_certificate  /var/lib/puppet/ssl/ca/ca_crt.pem;
    ssl_crl                 /var/lib/puppet/ssl/ca/ca_crl.pem;
    ssl_verify_client       optional;

    root                    /etc/puppet;

    # those locations are for the "production" environment
    # update according to your configuration

    # serve static file for the [files] mountpoint
    location /production/file_content/files/ {
        # it is advisable to have some access rules here
        allow   172.16.0.0/16;
        deny    all;

        # make sure we serve everything
        # as raw
        types { }
        default_type application/x-raw;

        alias /etc/puppet/files/;
    }

    # serve modules files sections
    location ~ /production/file_content/[^/]+/files/ {
        # it is advisable to have some access rules here
        allow   172.16.0.0/16;
        deny    all;

        # make sure we serve everything
        # as raw
        types { }
        default_type application/x-raw;

        root /etc/puppet/modules;
        # rewrite /production/file_content/module/files/file.txt
        # to /module/file.text
        rewrite ^/production/file_content/([^/]+)/files/(.+)$  $1/$2 break;
    }

    # ask the puppetmaster for everything else
    location / {
        proxy_pass          http://puppet-production;
        proxy_redirect      off;
        proxy_set_header    Host             $host;
        proxy_set_header    X-Real-IP        $remote_addr;
        proxy_set_header    X-Forwarded-For  $proxy_add_x_forwarded_for;
        proxy_set_header    X-Client-Verify  $ssl_client_verify;
        proxy_set_header    X-SSL-Subject    $ssl_client_s_dn;
        proxy_set_header    X-SSL-Issuer     $ssl_client_i_dn;
        proxy_buffer_size   16k;
        proxy_buffers       8 32k;
        proxy_busy_buffers_size    64k;
        proxy_temp_file_write_size 64k;
        proxy_read_timeout  65;
    }
}

EDIT: the above configuration was missing the only content-type that nginx can return for Puppet to be able to actually receive the file content (that is raw).

I leave as an exercise to the reader the apache configuration.

It would also be possible to write some ruby/sh/whatever to generate the nginx configuration from the puppet fileserver.conf file.

And that’s all folks, stay tuned for more Puppet (or even different) content.

Masterzen’s Blog 2010-01-28 21:43:33

As every reader of this blog certainly know, I’m a big fan of Puppet, using it in production on Days of Wonder servers, up to the point I used to contribute regularly bug fixes and new features (not that I stopped, it’s just that my spare time is a scarce resource nowadays).

Still, I think there are some issues in term of scalability or resource consumption (CPU or memory), for which we can find some workarounds or even fixes. Those issues are not a symptom bad programming or bad design. No, most of the issues come either from ruby itself or some random library issues.

Let’s review the things I have been thinking about lately.

Memory consumption

This is by far one of the most seen issues both on the client side and the server side. I’ve mainly seen this problem on the client side, up to the point that most people recommend running puppetd as cronjobs, instead of being a long lived process.

Ruby allocator

All boils down to the ruby (at least the the MRI 1.8.x version) allocator. This is the part in the ruby interpreter that deals with memory allocations. Like in many dynamic languages, the allocator manages a memory pool that is called a heap. And like some other languages (among them Java), this heap can never shrink and always grows when more memory is needed. This is done this way because it is simpler and way faster. Usually applications ends using their nominal part of memory and no more memory has to be allocated by the kernel to the process, which gives faster applications.

The problem is that if the application needs transiently a high amount of memory that will be trashed a couple of millisecond after, the process will pay this penalty all its life, even though say 80% of the memory used by the process is free but not reclaimed by the OS.

And it’s even worst. The ruby interpreter when it grows the heap, instead of allocating bytes per bytes (which would be really slow) does this by chunk. The whole question is what is the proper size of a chunk?

In the default implementation of MRI 1.8.x, a chunk is the size of the previous heap times 1.8. That means at worst a ruby process might end up allocating 1.8 times more than what it really needs at a given time. (This is a gross simplification, read the code if you want to know more).

Yes but what happens in Puppet?

So how does it apply to puppetd?

It’s easy, puppetd uses memory for two things (beside maintaining some core data to be able to run):

  1. the catalog (which contains all resources, along with all templates) as shipped by the puppetmaster (i.e. serialized) and live as ruby objects.
  2. the content of the sourced files (one at a time, so it’s the biggest transmitted file that imposes it’s high watermark for puppetd). Of course this is still better than in 0.24 where the content was transmitted encoded in XMLRPC adding the penalty of escaping everything…

Hopefully, nobody distributes large files with Puppet :–) If you’re tempted to do so, see below…

But again there’s more, as Peter Meier (known as duritong in the community) discovered a couple of month ago: when puppetd gets its catalog (which by the way is transmitted in json nowadays), it also stores it as a local cache to be able to run if it can’t contact the master for a subsequent run. This operation is done by unserializing the catalog from json to ruby live objects, and then serializing the laters to YAML. Beside the evident loss of time to do that on large catalog, YAML is a real memory hog. Peter’s experience showed that about 200MB of live memory his puppetd process was using came from this final serialization!

So I had the following idea: why not store the serialized version of the catalog (the json one) since we already have it in a serialized form when we receive it from the master (it’s a little bit more complex than that of course). This way no need to serialize it again in YAML. This is what ticket #2892 is all about. Luke is committed to have this enhancement in Rowlf, so there’s good hope!

Some puppet solutions?

So what can we do to help puppet not consume that many memory?

In theory we could play on several factors:

  • Transmit smaller catalogs. For instance get rid of all those templates you love (ok that’s not a solution)
  • Stream the serialization/deserialization with something like Yajl-Ruby
  • Use another ruby interpreter with a better allocator (like for instance JRuby)
  • Use a different constant for resizing the heap (ie replace this 1.8 by 1.0 or less on line 410 of gc.c). This can be done easily when using Rails machine GC patches or Ruby Enterprise Edition, in which case setting the environment variable RUBY_HEAP_SLOTS_GROWTH_FACTOR is enough. Check the documentation for more information.
  • Stream the sourced file on the server and the client (this way only a small buffer is used, and the total size of the file is never allocated). This one is hard.

Note that the same issues apply to the master too (especially for the file serving part). But it’s usually easier to run a different ruby interpreter (like REE) on the master than on all your clients.

Streaming HTTP requests is promising but unfortunately would require large change to how Puppet deals with HTTP. Maybe it can be done only for file content requests… This is something I’ll definitely explore.

This file serving thing let me think about the following which I already discussed several time with Peter…

File serving offloading

One of the mission of the puppetmaster is to serve sourced file to its clients. We saw in the previous section that to do that the master has to read the file in memory. That’s one reason it is recommended to use a dedicated puppetmaster server to act as a pure fileserver.

But there’s a better way, provided you run puppet behind nginx or apache. Those two proxies are also static file servers: why not leverage what they do best to serve the sourced files and thus offload our puppetmaster?

This has some advantages:

  • it frees lots of resources on the puppetmaster, so that they can serve more catalogs by unit time
  • the job will be done faster and by using less resources. Those static servers have been created to spoon-feed our puppet clients…

In fact it was impossible in 0.24.x, but now that file content serving is RESTful it becomes trivial.

Of course offloading would give its best if your clients requires lots of sourced files that change often, or if you provision lots of new hosts at the same time because we’re offloading only content, not file metadata. File content is served only if the client hasn’t the file or the file checksum on the client is different.

An example is better than thousand words

Imagine we have a standard manifest layout with:

  • some globally sourced files under /etc/puppet/files and
  • some modules files under /etc/puppet/modules//files.

Here is what would be the nginx configuration for such scheme:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
server {
    listen 8140;

    ssl                     on;
    ssl_session_timeout     5m;
    ssl_certificate         /var/lib/puppet/ssl/certs/master.pem;
    ssl_certificate_key     /var/lib/puppet/ssl/private_keys/master.pem;
    ssl_client_certificate  /var/lib/puppet/ssl/ca/ca_crt.pem;
    ssl_crl                 /var/lib/puppet/ssl/ca/ca_crl.pem;
    ssl_verify_client       optional;

    root                    /etc/puppet;

    # those locations are for the "production" environment
    # update according to your configuration

    # serve static file for the [files] mountpoint
    location /production/file_content/files/ {
        # it is advisable to have some access rules here
        allow   172.16.0.0/16;
        deny    all;

        # make sure we serve everything
        # as raw
        types { }
        default_type application/x-raw;

        alias /etc/puppet/files/;
    }

    # serve modules files sections
    location ~ /production/file_content/[^/]+/files/ {
        # it is advisable to have some access rules here
        allow   172.16.0.0/16;
        deny    all;

        # make sure we serve everything
        # as raw
        types { }
        default_type application/x-raw;

        root /etc/puppet/modules;
        # rewrite /production/file_content/module/files/file.txt
        # to /module/file.text
        rewrite ^/production/file_content/([^/]+)/files/(.+)$  $1/$2 break;
    }

    # ask the puppetmaster for everything else
    location / {
        proxy_pass          http://puppet-production;
        proxy_redirect      off;
        proxy_set_header    Host             $host;
        proxy_set_header    X-Real-IP        $remote_addr;
        proxy_set_header    X-Forwarded-For  $proxy_add_x_forwarded_for;
        proxy_set_header    X-Client-Verify  $ssl_client_verify;
        proxy_set_header    X-SSL-Subject    $ssl_client_s_dn;
        proxy_set_header    X-SSL-Issuer     $ssl_client_i_dn;
        proxy_buffer_size   16k;
        proxy_buffers       8 32k;
        proxy_busy_buffers_size    64k;
        proxy_temp_file_write_size 64k;
        proxy_read_timeout  65;
    }
}

EDIT: the above configuration was missing the only content-type that nginx can return for Puppet to be able to actually receive the file content (that is raw).

I leave as an exercise to the reader the apache configuration.

It would also be possible to write some ruby/sh/whatever to generate the nginx configuration from the puppet fileserver.conf file.

And that’s all folks, stay tuned for more Puppet (or even different) content.