I hit a weird issue today, I have Apache configured as a reverse proxy using mod_proxy_balancer which is a welcome addition in Apache 2.2.x. This is forwarding selected requests to more Apache instances running mod_perl, although this could be any sort of application layer, pretty standard stuff.
With ProxyStatus enabled and pushing a reasonable amount of traffic through the proxy, I started to notice that the application layer Apache instances would consistently get marked in error state every so often, removing them from the pool of available servers until their cooldown period expired and the proxy enabled them again.
Investigating the logs showed up this error:
[Tue Sep 28 00:24:39 2010] [error] (113)No route to host: proxy: HTTP: attempt to connect to 192.0.2.1:80 (192.0.2.1) failed
[Tue Sep 28 00:24:39 2010] [error] ap_proxy_connect_backend disabling worker for (192.0.2.1)
A spot of the ol’ google-fu turned up descriptions of similar problems, some not even related to Apache. The problem looked related to iptables.
All the servers are running CentOS 5 and anyone who runs this is probably aware of the stock Red Hat iptables ruleset. With HTTP access enabled, it looks something similar to this:
-A INPUT -j RH-Firewall-1-INPUT
-A FORWARD -j RH-Firewall-1-INPUT
-A RH-Firewall-1-INPUT -i lo -j ACCEPT
-A RH-Firewall-1-INPUT -p icmp --icmp-type any -j ACCEPT
-A RH-Firewall-1-INPUT -p 50 -j ACCEPT
-A RH-Firewall-1-INPUT -p 51 -j ACCEPT
-A RH-Firewall-1-INPUT -p udp --dport 5353 -d 126.96.36.199 -j ACCEPT
-A RH-Firewall-1-INPUT -p udp -m udp --dport 631 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m tcp --dport 631 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 80 -j ACCEPT
-A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-host-prohibited
Almost all of it is boilerplate apart from line 12, which I added and is identical to the line above it granting access to SSH. Analysing the live hit counts against each of these rules showed a large number hitting that last catch-all rule on line 13 and indeed, this is what is causing the Apache errors.
Analysing the traffic with tcpdump/wireshark showed that the frontend Apache server is only getting as far as sending the initial SYN packet and it’s failing to match either the dedicated rule on line 12 for HTTP traffic, or even the rule on line 10 to match any related or previously established traffic, although I wouldn’t really expect it to match that.
Adding a rule before the last one to match and log any HTTP packets that are considered to be in the state INVALID showed that indeed for some strange reason, iptables is deciding that an initial SYN is somehow invalid.
More information about why it might be invalid can be coaxed from the kernel by issuing the following:
# echo 255 > /proc/sys/net/ipv4/netfilter/ip_conntrack_log_invalid
Although all this gave me was some extra text saying “invalid state” and the same output you get from the standard logging target:
ip_ct_tcp: invalid state IN= OUT= SRC=192.0.2.2 DST=192.0.2.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=21158 DF PROTO=TCP SPT=57351 DPT=80 SEQ=1402246598 ACK=0 WINDOW=5840 RES=0x00 SYN URGP=0 OPT (020405B40402080A291DC7600000000001030307)
From searching the netfilter bugzilla for any matching bug reports I found a knob that relaxes the connection tracking, enabled with the following:
# echo 1 > /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_be_liberal
This didn’t fix things, plus it’s recommended to only use this in extreme cases with a broken router or firewall. As all of these machines are on the same network segment with just a switch connecting them there shouldn’t be anything mangling the packets.
With those avenues exhausted the suggested workaround of adding rules similar to the following:
-A RH-Firewall-1-INPUT -m tcp -p tcp --dport 80 --syn -j REJECT --reject-with tcp-reset
before your last rule doesn’t really work as the proxy still drops the servers from the pool as before, but just logs a different reason instead:
[Tue Sep 28 13:59:23 2010] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 192.0.2.1:80 (192.0.2.1) failed
[Tue Sep 28 13:59:23 2010] [error] ap_proxy_connect_backend disabling worker for (192.0.2.1)
This isn’t really acceptable to me anyway as it requires your proxy to retry in response to a server politely telling it to go away and it’s going to be a performance hit. The whole point of using mod_proxy_balancer for me was so it could legitimately remove servers that are dead or administratively stopped, how is it supposed to tell the difference between that and a dodgy firewall?
The only solution that worked and was deemed acceptable was to simply remove the state requirement on matching the HTTP traffic, like so:
-A RH-Firewall-1-INPUT -m tcp -p tcp --dport 80 -j ACCEPT
This will match both new and invalid packets, however it’s left me with a pretty low opinion of iptables now. Give me pf any day.