I was building up an Elasticsearch cluster as a storage backend for Logstash. Using the default Zen discovery method, I found that some of the nodes in the cluster could not automatically find each other.
Zen uses multicast to locate other nodes and by using
tcpdump I could see that some nodes weren’t receiving the multicast traffic. This lead me straight to the network layer and I found the culprit; a Cisco Nexus 5000 switch.
Normally a switch often treats multicast traffic much like broadcast traffic, it floods the packet to every port on the same VLAN. However Cisco switches try to be clever and learn which ports are interested in receiving the traffic by listening for IGMP packets, (referred to as snooping), but IGMP is only sent if there’s a multicast router on the network, (note also I’m not trying to route multicast, all nodes are on the same VLAN).
The solution was to enable a feature on the Nexus known as an “IGMP querier”. What this does is mimic enough of a multicast router that nodes report which multicast groups they’re interested in receiving traffic for and the switch can then learn which ports to forward multicast traffic on.
On the Nexus 5000, I needed to add the following configuration:
vlan configuration 1234 ip igmp snooping querier 184.108.40.206
(If you have a Catalyst switch running IOS, the configuration should be very similar)
The VLAN should match whatever VLAN the nodes are attached to, and you can basically make up the IP address used here, the switch sends IGMP packets with it as the source, but it’s never used as the destination for packets, nodes use a specific multicast group instead.
As soon as I added this configuration my Elasticsearch cluster sprung into life.