On BeautifulSoup

I’m doing some fairly hardcore screenscraping using Python, so I decided to use BeautifulSoup. After all:

Beautiful Soup won’t choke if you give it bad markup

Oh yes it will:

  <a href="/""></a>
  File "/usr/lib/python2.6/HTMLParser.py", line 115, in error
    raise HTMLParseError(message, self.getpos())
HTMLParser.HTMLParseError: malformed start tag, at line 3, column 14

lxml parses this fine.

The other issue I’m seeing is the old document.write('<scr' + 'ipt>') trick. Even if it’s enclosed in a CDATA block, BeautifulSoup chokes on it.

lxml, again, parses it fine. And it has built-in CSS selector and XPath support.

In the network space no one can hear your puppet scream

kermitI’ve been lazy at maintaining my servers recently and decided to start playing with puppet reports. First I started with something simple that helps me to find on which machines my manifests have some failure.

So here’s a quick and dirty code that goes through Puppet’s reportdir and points out neglected machines.

#!/usr/bin/env ruby
require 'puppet'
require 'find'
require 'yaml'
require 'optparse'
Puppet[:config] = "/etc/puppet/puppet.conf"
def most_recent_file(path)
	reports = []
	Find.find(path) { |file|
		if File.file? file
			reports << File.basename(file,".yaml")
	return path+"/"+reports[0].to_s+".yaml"
def scan_dir(path, debug=false)
	Find.find(path) { |entry|
		if entry != path # don't scan the basedir
			if File.directory? entry
				report = most_recent_file(entry)
				scan_file(report, debug)
def scan_file(filename, debug=false)
	notify_on_field = [:failed]
	# debug
	if debug then  puts "scanning " + filename end
	YAML::load_documents(fp) { |report|
		report.metrics["resources"].values.each { |value|
			if (notify_on_field.include? value[0]) and (value[2] > 0) then
				puts "#{report.host} has #{value[2]} #{value[0]} resource(s)"
				if debug then
					puts "log message(s) :"
					report.logs.each { |log|
						puts log.message
options = {}
myargs = Array
optparse = OptionParser.new { |opts|
	opts.banner = "Usage : report_check.rb"
	opts.on("-d", "--debug", "runs in debug mode") do |debug|
	opts.on("-h", "--help", "Displays this help") do
		puts opts
scan_dir(Puppet[:reportdir], options[:debug])

A pkgin provider for puppet

kermitOn my Solaris machines at $WORK I use iMil’s pkgin to install additional software. But until today, I add to do it by hand, on every machine… Not really what I like to do after a little more than a year using puppet. So I wrote a provider to manage packages with pkgin. It was very informative on puppet internals and I learned more about my favorite config management system.

Enough talking, here is the file : pkgin.rb

Example of use in a manifest :

class foo {
    package { "bla":
        ensure => installed,
        provider => pkgin

Put your ruby in my ERB

Today I started installing a reverse proxy at $WORK. I choose to follow this way, and all my DNS data is stored in my CMDB. Once again, the solution came from #puppet ! You can embed some “pure” ruby code in ERB templates. And, yes, you can query your database !

dbh = DBI.connect("DBI:Mysql:yourbase:mysql.mycorp.com", "you", "XXXX")
query = dbh.prepare("your fancy query")
while row = query.fetch do
<%= todisplay %>
<% end %>

I use this technique to generate the dnsmasq data file. Just use the subscribe function and all is done !