NFS-Ganesha with libcephfs on Ubuntu 14.04

This week I’m testing a lot with CephFS and one of the things I never tried was re-exporting CephFS using NFS-Ganesha and libcephfs.

NFS-Ganesha is a NFS server which runs in userspace. It has multiple backends (FSALs) it can use and libcephfs is one of them.

libcephfs is a userspace library which you can use to access CephFS. It is written in C/C++ and has Java and Python bindings. NFS-Ganesha however links to the native C++ bindings.

Running NFS-Ganesha on Ubuntu 14.04 is not plug and play, it involves manual compiling which I’ll explain below.

I tested this using:

  • Ubuntu 14.04.1
  • Ceph 0.89
  • NFS-Ganesha 2.1

Building NFS-Ganesha

It starts with installing a couple of packages:

apt-get install git-core cmake build-essential portmap libcephfs-dev bison 
flex libkrb5-dev libtirpc1

We then clone the Git repository:

cd /usr/src
git clone https://github.com/nfs-ganesha/nfs-ganesha.git
cd nfs-ganesha
git checkout -b V2.1-stable origin/V2.1-stable
git submodule update --init

Now we have the sources we can build it:

mkdir build
cd build
cmake ../src
make
make install

NFS-Ganesha uses DBus and we have to copy a DBus profile:

cp ../src/scripts/ganeshactl/org.ganesha.nfsd.conf /etc/dbus-1/system.d/

Configuring the NFS export

Now we can create our NFS-Ganesha configuration:

nano /usr/local/etc/ganesha.conf

Add the following:

EXPORT
{
    Export_ID = 1;
    Path = "/";
    Pseudo = "/";
    Access_Type = RW;
    NFS_Protocols = "3";
    Squash = No_Root_Squash;
    Transport_Protocols = TCP;
    SecType = "none";

    FSAL {
        Name = CEPH;
    }
}

With this configuration we say that we want to export (Path) “/” of our CephFS filesystem as “/” (Pseudo) from our NFS server.

Configuring Ceph

To run NFS-Ganesha you have to make sure that CephFS is up and running and that the server where you are going to run Ganesha on can access the Ceph cluster.

Make sure your ceph.conf and ceph.client.admin.keyring file are both present in /etc/ceph and run:

ceph -s

If that works you can start NFS-Ganesha

Starting NFS-Ganesha

Now Ceph is working we can start the NFS server:

ganesha.nfsd -f /usr/local/etc/ganesha.conf -L /tmp/ganesha.log -N NIV_DEBUG -d

This makes the NFS server log with a DEBUG profile. It gives you a lot of insight on what’s happing. You probably want to disable this when it all works.

Mounting NFS

On a NFS client we can now mount the NFS filesystem which is actually our CephFS:

mkdir /mnt/cephfs-nfs
mount -o rw,noatime 1.2.3.4:/ /mnt/cephfs-nfs

Replace 1.2.3.4 with the hostname/IP-Address of the server running NFS-Ganesha.

You should now have a NFS mount which shows you your CephFS filesystem! This way legacy clients can access the most awesome filesystem.

Failover with Nexenta, NFS and the RSF-1 plugin

The title might seem a bit cryptic, but this post is about a High Available Nexenta cluster with the RSF-1 we are deploying.

While we are waiting for the moment where we can start using Ceph we are implementing new storage for our hosting clusters. Our current Linux machines with LVM and XFS are not up to the task anymore.

After some testing and discussing we chose to use Nexenta. What Nexenta is and how awesome ZFS is can be found on other places on the net, I’m not going to discuss that here.

I wanted to publish our findings about the HA plugin and NFS.

In short, we have two headends connected with two SAS JBOD’s. The RSF-1 plugin makes sure the ZPOOL is imported on one headend at the time. If one headend fails, the plugin automatically fails the pool over to the other headend.

The plugin provides one HA IP which is shared between the headends, you probably get the point.

We’ve been doing some testing and noticed that when we mount NFS (v3) over TCP the failover takes a staggering 6 minutes! Well, the failover doesn’t take 6 minutes, but that’s the time it takes for the TCP connections to recover.

When mounting over UDP the service is continued in 50 seconds, so that’s a big difference!

Some testing showed that this is due to the following kernel settings:

net.ipv4.tcp_retries1 = 3
net.ipv4.tcp_retries2 = 15

This page explains what those two values actually control.

We’ve been experimenting with those values and lowering retries1 to 1 gave us the same recovery times as with UDP, but sometimes the recovery would still take 6 minutes..

For now I advise to use NFS with UDP (which gives better performance anyway), but if you need to use TCP for some reason try fiddling with these values.