While we are waiting for the moment where we can start using Ceph we are implementing new storage for our hosting clusters. Our current Linux machines with LVM and XFS are not up to the task anymore.
After some testing and discussing we chose to use Nexenta. What Nexenta is and how awesome ZFS is can be found on other places on the net, I’m not going to discuss that here.
I wanted to publish our findings about the HA plugin and NFS.
In short, we have two headends connected with two SAS JBOD’s. The RSF-1 plugin makes sure the ZPOOL is imported on one headend at the time. If one headend fails, the plugin automatically fails the pool over to the other headend.
The plugin provides one HA IP which is shared between the headends, you probably get the point.
We’ve been doing some testing and noticed that when we mount NFS (v3) over TCP the failover takes a staggering 6 minutes! Well, the failover doesn’t take 6 minutes, but that’s the time it takes for the TCP connections to recover.
When mounting over UDP the service is continued in 50 seconds, so that’s a big difference!
Some testing showed that this is due to the following kernel settings:
net.ipv4.tcp_retries1 = 3 net.ipv4.tcp_retries2 = 15
This page explains what those two values actually control.
We’ve been experimenting with those values and lowering retries1 to 1 gave us the same recovery times as with UDP, but sometimes the recovery would still take 6 minutes..
For now I advise to use NFS with UDP (which gives better performance anyway), but if you need to use TCP for some reason try fiddling with these values.