On BeautifulSoup

I’m doing some fairly hardcore screenscraping using Python, so I decided to use BeautifulSoup. After all:

Beautiful Soup won’t choke if you give it bad markup

Oh yes it will:

<html>
 <body>
  <a href="/""></a>
 </body>
</html>
  File "/usr/lib/python2.6/HTMLParser.py", line 115, in error
    raise HTMLParseError(message, self.getpos())
HTMLParser.HTMLParseError: malformed start tag, at line 3, column 14

lxml parses this fine.

The other issue I’m seeing is the old document.write('<scr' + 'ipt>') trick. Even if it’s enclosed in a CDATA block, BeautifulSoup chokes on it.

lxml, again, parses it fine. And it has built-in CSS selector and XPath support.

In the network space no one can hear your puppet scream

kermitI’ve been lazy at maintaining my servers recently and decided to start playing with puppet reports. First I started with something simple that helps me to find on which machines my manifests have some failure.

So here’s a quick and dirty code that goes through Puppet’s reportdir and points out neglected machines.

#!/usr/bin/env ruby
 
require 'puppet'
require 'find'
require 'yaml'
require 'optparse'
 
Puppet[:config] = "/etc/puppet/puppet.conf"
Puppet.parse_config
 
def most_recent_file(path)
	reports = []
	Find.find(path) { |file|
		if File.file? file
			reports << File.basename(file,".yaml")
		end
	}
	reports.sort!.reverse!
	return path+"/"+reports[0].to_s+".yaml"
end
 
 
def scan_dir(path, debug=false)
	Find.find(path) { |entry|
		if entry != path # don't scan the basedir
			if File.directory? entry
				report = most_recent_file(entry)
				scan_file(report, debug)
			end
		end
	}
end
 
 
def scan_file(filename, debug=false)
	notify_on_field = [:failed]
 
	# debug
	if debug then  puts "scanning " + filename end
 
	fp=open(filename,"r")
	YAML::load_documents(fp) { |report|
		report.metrics["resources"].values.each { |value|
			if (notify_on_field.include? value[0]) and (value[2] > 0) then
				puts "#{report.host} has #{value[2]} #{value[0]} resource(s)"
				if debug then
					puts "log message(s) :"
					report.logs.each { |log|
						puts log.message
					}
				end
			end
		}
	}	
end
 
options = {}
myargs = Array
 
optparse = OptionParser.new { |opts|
	opts.banner = "Usage : report_check.rb"
 
	options[:show]=false
	opts.on("-d", "--debug", "runs in debug mode") do |debug|
		options[:debug]=true
	end
 
	opts.on("-h", "--help", "Displays this help") do
		puts opts
		exit
	end
 
}
 
optparse.parse!
 
scan_dir(Puppet[:reportdir], options[:debug])

A pkgin provider for puppet

kermitOn my Solaris machines at $WORK I use iMil’s pkgin to install additional software. But until today, I add to do it by hand, on every machine… Not really what I like to do after a little more than a year using puppet. So I wrote a provider to manage packages with pkgin. It was very informative on puppet internals and I learned more about my favorite config management system.

Enough talking, here is the file : pkgin.rb

Example of use in a manifest :

class foo {
    package { "bla":
        ensure => installed,
        provider => pkgin
    }
}

Put your ruby in my ERB

Today I started installing a reverse proxy at $WORK. I choose to follow this way, and all my DNS data is stored in my CMDB. Once again, the solution came from #puppet ! You can embed some “pure” ruby code in ERB templates. And, yes, you can query your database !

<%
dbh = DBI.connect("DBI:Mysql:yourbase:mysql.mycorp.com", "you", "XXXX")
query = dbh.prepare("your fancy query")
query.execute
while row = query.fetch do
todisplay=some_funny_things()
%>
<%= todisplay %>
<% end %>

I use this technique to generate the dnsmasq data file. Just use the subscribe function and all is done !