Safely backing up your Ceph monitors

So you might wonder: Why do I need to make a backup of my Ceph monitors? I have multiple monitors.

That’s true, but would you run into the very unfortunate situation where you loose all you monitors, you loose all your data. The monitors contain very important metadata (pgmap, osdmap, crushmap) to run your cluster. If you loose that metadata, you practially loose all your data.

Ceph’s monitors use Google’s LevelDB to store all their information. When looking at a monitors data directory you’ll see something like this:

[root@mon1:/var/lib/ceph/mon/ceph-alpha]$ ls -alR
.:
total 16
drwxr-xr-x 3 root root 4096 Sep 23  2013 .
drwxr-xr-x 3 root root 4096 Mar 24 11:04 ..
-rw-r--r-- 1 root root   55 Sep 23  2013 keyring
drwxr-xr-x 2 root root 4096 Mar 25 14:09 store.db

./store.db:
total 236172
drwxr-xr-x 2 root root    4096 Mar 25 14:09 .
drwxr-xr-x 3 root root    4096 Sep 23  2013 ..
-rw-r--r-- 1 root root 2116576 Mar  1 01:35 1400870.sst
-rw-r--r-- 1 root root 2111248 Mar  1 01:40 1400992.sst
...
...
-rw-r--r-- 1 root root 1149227 Mar 25 14:09 2026520.sst
-rw-r--r-- 1 root root      17 Mar 25 04:34 CURRENT
-rw-r--r-- 1 root root       0 Sep 23  2013 LOCK
-rw-r--r-- 1 root root 2196679 Mar 25 14:09 LOG
-rw-r--r-- 1 root root 3829307 Mar 25 04:33 LOG.old
-rw-r--r-- 1 root root  983040 Mar 25 14:09 MANIFEST-2016290
[root@mon1:/var/lib/ceph/mon/ceph-alpha]$

So it’s very tempting to simply run your favorite backup tool and back up this directory. Usually it’s less then 500MB, so it’s very simple to do so.

It’s however not a wise idea to do so, since you have to be sure the LevelDB database is in a consistent state before backing it up.

In a production cluster you will probably have a least three monitors, so stopping a monitor is not a big problem.

A simple backup solution would be:

service ceph stop mon
tar czf /var/backups/ceph-mon-backup_$(date +'%a').tar.gz /var/lib/ceph/mon
service ceph start mon

Put that in a Shell script and have CRON run it every 24 hours. Make sure not all three monitors create their backup at the same time, but this works just fine.

You now have a tarball which you can upload to any offsite location to make sure your monitors are safe.

Another solution would be to run the monitors on a ZFS on Linux filesystem and use ZFS’s snapshot functionalities. But you can’t be 100% sure that your LevelDB database is in a consistent state at that point.

The safest solution at this moment is to fully stop the monitor, create the backup and start the monitor again. Just make sure you don’t stop all monitors at the same time.