Safely backing up your Ceph monitors

So you might wonder: Why do I need to make a backup of my Ceph monitors? I have multiple monitors.

That’s true, but would you run into the very unfortunate situation where you loose all you monitors, you loose all your data. The monitors contain very important metadata (pgmap, osdmap, crushmap) to run your cluster. If you loose that metadata, you practially loose all your data.

Ceph’s monitors use Google’s LevelDB to store all their information. When looking at a monitors data directory you’ll see something like this:

[root@mon1:/var/lib/ceph/mon/ceph-alpha]$ ls -alR
.:
total 16
drwxr-xr-x 3 root root 4096 Sep 23  2013 .
drwxr-xr-x 3 root root 4096 Mar 24 11:04 ..
-rw-r--r-- 1 root root   55 Sep 23  2013 keyring
drwxr-xr-x 2 root root 4096 Mar 25 14:09 store.db

./store.db:
total 236172
drwxr-xr-x 2 root root    4096 Mar 25 14:09 .
drwxr-xr-x 3 root root    4096 Sep 23  2013 ..
-rw-r--r-- 1 root root 2116576 Mar  1 01:35 1400870.sst
-rw-r--r-- 1 root root 2111248 Mar  1 01:40 1400992.sst
...
...
-rw-r--r-- 1 root root 1149227 Mar 25 14:09 2026520.sst
-rw-r--r-- 1 root root      17 Mar 25 04:34 CURRENT
-rw-r--r-- 1 root root       0 Sep 23  2013 LOCK
-rw-r--r-- 1 root root 2196679 Mar 25 14:09 LOG
-rw-r--r-- 1 root root 3829307 Mar 25 04:33 LOG.old
-rw-r--r-- 1 root root  983040 Mar 25 14:09 MANIFEST-2016290
[root@mon1:/var/lib/ceph/mon/ceph-alpha]$

So it’s very tempting to simply run your favorite backup tool and back up this directory. Usually it’s less then 500MB, so it’s very simple to do so.

It’s however not a wise idea to do so, since you have to be sure the LevelDB database is in a consistent state before backing it up.

In a production cluster you will probably have a least three monitors, so stopping a monitor is not a big problem.

A simple backup solution would be:

service ceph stop mon
tar czf /var/backups/ceph-mon-backup_$(date +'%a').tar.gz /var/lib/ceph/mon
service ceph start mon

Put that in a Shell script and have CRON run it every 24 hours. Make sure not all three monitors create their backup at the same time, but this works just fine.

You now have a tarball which you can upload to any offsite location to make sure your monitors are safe.

Another solution would be to run the monitors on a ZFS on Linux filesystem and use ZFS’s snapshot functionalities. But you can’t be 100% sure that your LevelDB database is in a consistent state at that point.

The safest solution at this moment is to fully stop the monitor, create the backup and start the monitor again. Just make sure you don’t stop all monitors at the same time.

Changing the region of a RGW bucket

As of Ceph version 0.67 (Dumpling) the Ceph Object Gateway aka RADOS Gateway supports regions. This allows you to create a geo-replicated Amazon S3 compatible service.

While working on a setup we decided later in the process that we wanted regions, but we already created about 50 buckets with data in them. We didn’t feel like re-creating all the buckets, so we wanted to change the region of the buckets.

A fresh Object Gateway has a region ‘default’ with one zone ‘default’. We created the region ‘ams02’ (Amsterdam) with one zone called ‘zone01’.

All buckets had the region ‘default’ which we wanted to change to ‘ams02’. No data migrated is required since all the data is on the same Ceph cluster.

This can be done with a couple of ‘radosgw-admin’ commands.

The bucket in these examples is ‘widodh’.

$ radosgw-admin metadata get bucket:widodh

This outputs JSON data:

{ "key": "bucket:widodh",
  "ver": { "tag": "_2qGuaDCBixHpx2lddTe0g-x",
      "ver": 1},
  "mtime": 1380653343,
  "data": { "bucket": { "name": "widodh",
          "pool": ".rgw.buckets",
          "index_pool": ".rgw.buckets.index",
          "marker": "default.20111.1",
          "bucket_id": "default.20111.1"},
      "owner": "widodh",
      "creation_time": 1380653343,
      "linked": "true",
      "has_bucket_info": "false"}}

With this information we can get the rest of the information:

$ radosgw-admin metadata get bucket.instance:widodh:default.20111.1

The id at the end is ‘bucket_id’ from the previous command.

This returns us:

{ "key": "bucket.instance:widodh:default.20111.1",
  "ver": { "tag": "_-HNwyMLAnRALV9tyPqdX5_V",
      "ver": 1},
  "mtime": 1380653343,
  "data": { "bucket_info": { "bucket": { "name": "widodh",
              "pool": ".rgw.buckets",
              "index_pool": ".rgw.buckets.index",
              "marker": "default.20111.1",
              "bucket_id": "default.20111.1"},
          "creation_time": 1380653343,
          "owner": "widodh",
          "flags": 0,
          "region": "default",
          "placement_rule": "default-placement",
          "has_instance_obj": "true"},
      "attrs": [
            { "key": "user.rgw.acl",
              "val": "AgKXAAAAAgIgAAAABgAAAHdpZG9kaBIAAABXaWRvIGRlbiBIb2xsYW5kZXIDA2sAAAABAQAAAAYAAAB3aWRvZGgPAAAAAQAAAAYAAAB3aWRvZGgDA0AAAAACAgQAAAAAAAAABgAAAHdpZG9kaAAAAAAAAAAAAgIEAAAADwAAABIAAABXaWRvIGRlbiBIb2xsYW5kZXIAAAAAAAAAAA=="},
            { "key": "user.rgw.idtag",
              "val": ""},
            { "key": "user.rgw.manifest",
              "val": ""}]}}

Save this output to a file and change the ‘region’ value to what you want, in this case I changed ‘default’ to ‘ams02’.

Afterwards you run:

$ radosgw-admin metadata put bucket.instance:widodh:default.20111.1 < bucket.json

Now I could change these configuration variables in the ceph.conf:

[client.radosgw.rgw1]
    host = rgw1
    ...
    ...
    rgw zone = zone01
    rgw region = ams02
    ...
    ...

We had to change the information of 50 buckets and we didn't feel like doing this manually, so I wrote this script:

#!/usr/bin/env python

import rados
import os
import json
import copy
import subprocess

ceph_id = "admin"
ceph_secret = "ADMIN SECRET"
ceph_monitor = "MONITOR ADDRESS"
ceph_rgw_pool = ".rgw"
ceph_rgw_region = "NEW RGW REGION"

def change_bucket_region(bucket, region):
	me = os.popen("radosgw-admin metadata get bucket:" + bucket)
	meta = json.loads(me.read())
	id = meta['data']['bucket']['bucket_id']
	mei = os.popen("radosgw-admin metadata get bucket.instance:" + bucket + ":" + id)
	imeta = json.loads(mei.read())
	region = imeta['data']['bucket_info']['region']
	if region is not ceph_rgw_region:
		newmeta = copy.copy(imeta)
		newmeta['data']['bucket_info']['region'] = ceph_rgw_region
		stdin = json.dumps(newmeta)
		process = subprocess.Popen(['radosgw-admin', 'metadata', 'put', "bucket.instance:" + bucket + ":" + id], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
		process.stdin.write(stdin)
		process.stdin.close()
		process.wait()


try:
	r = rados.Rados(rados_id=ceph_id)
	r.conf_set("mon_host", ceph_monitor)
	r.conf_set("key", ceph_secret)
	r.connect()

	io = r.open_ioctx(ceph_rgw_pool)

	i = io.list_objects()
	while True:
		try:
			o = i.next()
			b = str(o.key)
			if b[0] is not ".":
				change_bucket_region(b, ceph_rgw_region)
		except StopIteration:
			break

	io.close()
	r.shutdown()
except Exception as e:
	print "Error" + str(e)

Also available as a download.

Use this script with caution since it will change the region of ALL buckets on your cluster to what you specify.

A quick note on running CloudStack with RBD on Ubuntu 12.04

When you want to use Ceph as Primary Storage in Apache CloudStack you need a recent version of libvirt with RBD storage pool support enabled.

If you want to use Ubuntu 12.04 LTS (Precise) you would need to manually compile libvirt since the default libvirt version doesn’t include RBD storage pool support.

But not any more! Ubuntu has their Cloud Archive which is aimed at OpenStack, but that doesn’t matter, we just want a newer version of libvirt with RBD storage pool support.

So, add this PPA and a Apt source for Ceph and you can use RBD with CloudStack without compiling anything!

$ sudo apt-get install ubuntu-cloud-keyring
$ echo deb http://ubuntu-cloud.archive.canonical.com/ubuntu precise-updates/grizzly main | sudo tee /etc/apt/sources.list.d/cloud-archive.list
$ wget -q -O- 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | sudo apt-key add -
$ echo deb http://eu.ceph.com/debian-cuttlefish/ $(lsb_release -sc) main | sudo tee /etc/apt/sources.list.d/ceph.list
$ sudo apt-get install cloudstack-agent

Voila, you now have all the packages you need to run a CloudStack agent with RBD support.