Docker containers with IPv6 behind NAT

WARNING

In production IPv6 should always be used without NAT. Only use IPv6 and NAT for testing purposes. There is no valid reason to use IPv6 with NAT in any production environment.

IPv6 and NAT

IPv6 is designed to remove the need for NAT and that is a very, very good thing. NAT breaks Peer-to-Peer connections and that is exactly what is one of the great things of IPv6. Every device on the internet gets it’s own public IP-Address again.

Docker and IPv6

Support for IPv6 in Docker has been there for a while now. It is disabled by default however. The documentation describes on how to enable it.

I wanted to enable IPv6 on my Docker setup on my laptop running Ubuntu, but as my laptop is a mobile device the IPv6 prefix I have changes when I move to a different location. IPv6 Prefix Delegation isn’t available at every IPv6-enabled location either, so I wanted to figure out if I could enable IPv6 in my Docker setup locally and use NAT to have my containers reach the internet over IPv6.

At home I have IPv6 via ZeelandNet and at the office we have a VDSL connection from XS4All. When I’m on a remote location I enable our OpenVPN tunnel which has IPv6 enabled. This way I always have IPv6 available.

The Docker documentation shows that enabling IPv6 is very easy. I modified the systemd service file of docker and added a fixed IPv6 CIDR:

ExecStart=/usr/bin/dockerd --ipv6 --fixed-cidr-v6="fd00::/64" -H fd://

fd00::/64 is a Site-Local IPv6 subnet (deprecated) which can be safely used.

I then added a NAT rule into ip6tables so that it would NAT for me:

sudo ip6tables -t nat -A POSTROUTING -s fd00::/64 -j MASQUERADE

Result

My Docker containers now get a IPv6 Address as can be seen below:

root@da80cf3d8532:~# ip -6 a
1: lo:  mtu 65536 state UNKNOWN qlen 1
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
15: eth0@if16:  mtu 1500 state UP 
    inet6 fd00::242:ac11:2/64 scope global nodad 
       valid_lft forever preferred_lft forever
    inet6 fe80::42:acff:fe11:2/64 scope link 
       valid_lft forever preferred_lft forever
root@da80cf3d8532:~#

In this case the address is fd00::242:ac11:2 which as assigned by Docker.

Since my laptop has IPv6 I can now ping pcextreme.nl from my Docker container.

root@da80cf3d8532:~# ping6 -c 3 pcextreme.nl -n
PING pcextreme.nl (2a00:f10:101:0:46e:c2ff:fe00:93): 56 data bytes
64 bytes from 2a00:f10:101:0:46e:c2ff:fe00:93: icmp_seq=0 ttl=61 time=14.368 ms
64 bytes from 2a00:f10:101:0:46e:c2ff:fe00:93: icmp_seq=1 ttl=61 time=16.132 ms
64 bytes from 2a00:f10:101:0:46e:c2ff:fe00:93: icmp_seq=2 ttl=61 time=15.790 ms
--- pcextreme.nl ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max/stddev = 14.368/15.430/16.132/0.764 ms
root@da80cf3d8532:~#

Again, this should ONLY be used for testing purposes. For production IPv6 Prefix Delegation is the route to go down.

Testing Ceph BlueStore with the Kraken release

Ceph version Kraken (11.2.0) has been released and the Release Notes tell us that the new BlueStore backend for the OSDs is now available.

BlueStore

The current backend for the OSDs is the FileStore which mainly uses the XFS filesystem to store it’s data. To overcome several limitations of XFS and POSIX in general the BlueStore backend was developed.

It will provide more performance (mainly writes), data safety due to checksumming and compression.

Users are encouraged to test BlueStore starting with the Kraken release for non-production and non-critical data sets and report back to the community.

Deploying with BlueStore

To deploy OSDs with BlueStore you can use the ceph-deploy by using the –bluestore flag.

I created a simple test cluster with three machines: alpha, bravo and charlie.

Each machine will be running a ceph-mon and ceph-osd proces.

This is the sequence of ceph-deploy commands I used to deploy the cluster

ceph-deploy new alpha bravo charlie
ceph-deploy mon create alpha bravo charlie

Now, edit the ceph.conf file in the current directory and add:

[osd]
enable_experimental_unrecoverable_data_corrupting_features = bluestore

With this setting we allow the use of BlueStore and we can now deploy our OSDs:

ceph-deploy --overwrite-conf osd create --bluestore alpha:sdb bravo:sdb charlie:sdb

Running BlueStore

This tiny cluster how runs three OSDs with BlueStore:

root@alpha:~# ceph -s
    cluster c824e460-2f09-4994-8b2f-108aedc52d19
     health HEALTH_OK
     monmap e2: 3 mons at {alpha=[2001:db8::100]:6789/0,bravo=[2001:db8::101]:6789/0,charlie=[2001:db8::102]:6789/0}
            election epoch 14, quorum 0,1,2 alpha,bravo,charlie
        mgr active: charlie standbys: alpha, bravo
     osdmap e14: 3 osds: 3 up, 3 in
            flags sortbitwise,require_jewel_osds,require_kraken_osds
      pgmap v24: 64 pgs, 1 pools, 0 bytes data, 0 objects
            43356 kB used, 30374 MB / 30416 MB avail
                  64 active+clean
root@alpha:~#
root@alpha:~# ceph osd tree
ID WEIGHT  TYPE NAME        UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 0.02907 root default                                       
-2 0.00969     host alpha                                     
 0 0.00969         osd.0         up  1.00000          1.00000 
-3 0.00969     host bravo                                     
 1 0.00969         osd.1         up  1.00000          1.00000 
-4 0.00969     host charlie                                   
 2 0.00969         osd.2         up  1.00000          1.00000 
root@alpha:~#

On alpha I see that osd.0 only has a small partition for a bit of configuration and the rest is used by BlueStore.

root@alpha:~# df -h /var/lib/ceph/osd/ceph-0
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb1        97M  5.4M   92M   6% /var/lib/ceph/osd/ceph-0
root@alpha:~# lsblk 
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0    8G  0 disk 
├─sda1   8:1    0  7.5G  0 part /
├─sda2   8:2    0    1K  0 part 
└─sda5   8:5    0  510M  0 part [SWAP]
sdb      8:16   0   10G  0 disk 
├─sdb1   8:17   0  100M  0 part /var/lib/ceph/osd/ceph-0
└─sdb2   8:18   0  9.9G  0 part 
sdc      8:32   0   10G  0 disk 
root@alpha:~# cat /var/lib/ceph/osd/ceph-0/type
bluestore
root@alpha:~#

The OSDs should work just like OSDs running FileStore, but they should perform better.

Running headless VirtualBox inside Nested KVM

For the Ceph training at 42on I use VirtualBox to build Virtual Machines. This is because they work under MacOS, Windows and Linux.

For the internal Git at 42on we use Gitlab and I wanted to use Gitlab’s CI to build my Virtual Machines automatically.

As we don’t have any physical hardware at 42on (everything runs in the cloud) I wanted to see if I could run VirtualBox Headless inside a VM with Nested KVM enabled.

Nested KVM

The first thing I checked was if my KVM Virtual Machine actually supported Nested KVM. This can be verified with the kvm-ok command under Ubuntu:

root@glrun01:~# kvm-ok 
INFO: /dev/kvm exists
KVM acceleration can be used
root@glrun01:~#

Now that’s verified I tried to install VirtualBox.

VirtualBox

Installing VirtualBox is straight forward. Just add the repository and install the packages. Don’t forget to reboot afterwards to make sure all kernel modules are loaded and properly installed.

apt-get install virtualbox

VirtualBox Extension Pack

The trick to get everything working properly is to install Oracle’s VirtualBox Extension Pack. It took me a while to figure out that I need to install it manually. It wasn’t done by default after install.

You need to download the pack and install it using the VBoxManage command.

wget http://download.virtualbox.org/virtualbox/5.0.24/Oracle_VM_VirtualBox_Extension_Pack-5.0.24.vbox-extpack
vboxmanage extpack install Oracle_VM_VirtualBox_Extension_Pack-5.0.24.vbox-extpack
vboxmanage list extpacks
vboxmanage setproperty vrdeextpack "Oracle VM VirtualBox Extension Pack"

With that installed and configured I rebooted the machine again just to be sure.

It works!

With that it actually worked. The VirtualBox VMs can now be built inside a Nested KVM machine controlled by Gitlab’s CI 🙂