Ceph with a cluster and public network on IPv6

I’m a big fan of Ceph and IPv6, so I always try to deploy Ceph over IPv6 when possible. Ceph is the future, just like IPv6 is. Why implement legacy?

Recently I did a deployment of Ceph with a public and cluster network running over IPv6. It has a small catch, so I let me explain the cluster and public network first.

Ceph cluster and public network

This image comes from the Ceph documentation and shows the two types of network:

  • Public network for clients and monitors
  • Cluster network for inter-OSD communication (Replication and recovery)

If you want to run your Ceph cluster over IPv6 you have a couple of settings to make:

[global]
ms_bind_ipv6 = true
mon_host = [2a00:f10:XX:XX::XX]:6789, [2a00:f10:XX:XX::XY]:6789, [2a00:f10:XX:XX::YY]:6789

As you can see, you have to write the IPv6 address enclosed by [ and ]

When configuring the cluster and/or public network in the ceph.conf you should however not use them:

[global]
public_network = 2a00:f10:XX:XX:XX::/64
cluster_network = 2a00:f10:XX:XX:XY::/64

When that is set correctly it should all be working fine and your Ceph cluster will be running over IPv6 with different networks!

PowerDNS backend for a global RADOS Gateway namespace

At my hosting company PCextreme we are building a cloud offering based on Ceph and CloudStack. We call our cloud services Aurora.

Our cloud services are composed out of two components: Compute and Objects.

For our Aurora Objects service we use the RADOS Gateway from Ceph and we are using the Federated Config to create multiple regions.

At this moment we have one region o.auroraobjects.eu but we soon want to expand to multiple regions.

One of the things we/I wanted is a global namespace for all our regions: o.auroraobjects.com.

By design the RADOS Gateway will return a HTTP-redirect when you connect to the ‘wrong’ region for a specific bucket, but a HTTP-redirect causes extra TCP packets going over the wire causing additional and unneeded latency.

So I came up with the idea of using a custom PowerDNS backend to direct bucket traffic on DNS level.

Imagine having a bucket ceph in the region ‘eu’ and the global namespace o.auroraobjects.com.

Using my custom backend the PowerDNS server will respond with a CNAME pointing the user towards the right hostname:

wido@wido-laptop:~$ host ceph.o.auroraobjects.com ns1.auroraobjects.com
Using domain server:
Name: ns1.auroraobjects.com
Address: 2a00:f10:121:400:48c:2ff:fe00:e6b#53
Aliases: 

ceph.o.auroraobjects.com is an alias for ceph.o.auroraobjects.eu.
wido@wido-laptop:~$

As you can see it responded with a CNAME pointing towards ceph.o.auroraobjects.eu.

This allows us to create multiple regions (eu, us, asia, etc) but keep one global namespace to make it easy to consume for our end-users.

Users can create a bucket in the region they like, but they never have to worry about wich hostname to use. We take care of that.

This PowerDNS backend is in the Ceph master branch and can be installed as a WSGI application behind Apache.

I’ve put a small txt file online to show you:

As you can see, both URLs show you the same object.

Deploying the backend for PowerDNS is fairly simply, I recommend you read the README, but here are a few config snippets.

Apache VirtualHost


	ServerAdmin webmaster@localhost

	DocumentRoot /var/www
	
		Options FollowSymLinks
		AllowOverride None
	
	
		Options Indexes FollowSymLinks MultiViews
		AllowOverride None
		Order allow,deny
		allow from all
	

	ErrorLog ${APACHE_LOG_DIR}/error.log
	LogLevel warn
	CustomLog ${APACHE_LOG_DIR}/access.log combined

	WSGIScriptAlias / /var/www/pdns-backend-rgw.py

PowerDNS configuration

local-address=0.0.0.0
local-ipv6=::

cache-ttl=60
default-ttl=60
query-cache-ttl=60

launch=remote
remote-connection-string=http:url=http://localhost/dns

Note: You have to compile PowerDNS manually with –with-modules=remote –enable-remotebackend-http

Don’t forget to put a rgw-pdns.conf in /etc/ceph with the correct configuration.

This is still a work-in-progress on my side and I’ll probably make some commits in the coming months, but feedback is much appreciated!

Deploying Ceph over IPv6

I like to deploy Ceph clusters over IPv6. I actually think that’s the way forward. IPv4 is legacy just like iSCSI and NFS are.

Last week I was at a customer deploying a new Ceph cluster and they wanted to deploy with IPv6! Most deployment I did with IPv6 were done manually and not with ceph-deploy, but when trying to deploy with ceph-deploy over IPv6 I ran into some issues.

Before going into that I want to make something clear. With Ceph you choose either IPv4 OR IPv6. There is NO dual-stack support. So the whole cluster (including clients) communicates over IPv6 or over IPv4. Switching afterwards is not possible. So that’s why I urge people to deploy with IPv6 since you probably want to have your cluster running for a long time.

All package repos (including the Ceph ones) have IPv6 enabled, so in my opinion there is no good reason to prefer IPv4 with a Ceph deployment when IPv6 is available. I even think it’s easier in large deployment due to the Router Advertisements in IPv6.

Having that said it’s time to go back to the ceph-deploy issue.

In ceph.conf you have to enclose IPv6 addresses for monitors with a [ and ]. This is what ceph-deploy did wrong:

[global]
mon_host = 2a00:f10:X:X::X,2a00:f10:X:X::Y,2a00:f10:X:X::Z

While it should have been:

[global]
mon_host = [2a00:f10:X:X::X],[2a00:f10:X:X::Y],[2a00:f10:X:X::Z]
ms_bind_ipv6 = true

The ms_bind_ipv6 setting tells the Messenger inside Ceph to bind on IPv6. It’s important that you set that setting on all hosts in the Ceph cluster, otherwise things will go wrong badly. Heartbeats and such will not work.

I wrote a patch for ceph-deploy which fixes it. It writes the ‘mon_host’ setting correctly and also adds the ‘ms_bind_ipv6’ setting when IPv6 is used for the monitors.