Do not use SMR disks with Ceph

Many new disks like the Seagate He8 disks are using a technique called Shingled Magnetic Recording to increase capacity.

As these disks offer a very low price per Gigabyte they seem interesting to use in a Ceph cluster.


Due to the nature of SMR these disks are very, very, very bad when it comes to Random Write performance. Random I/O is something that Ceph does a lot on the backing disks.

This results in disks spiking to 100% utilization very quickly causing all kinds of trouble with OSDS going down and committing suicide.

Do NOT use them

The solution is very simple. Do not use SMR disks in Ceph but stick to the traditional PMR disks in your Ceph cluster.

In the future we might see SMR support in the new BlueStore of Ceph, but at this moment no work has been done, so don’t expect anything soon.

Testing Ceph BlueStore with the Kraken release

Ceph version Kraken (11.2.0) has been released and the Release Notes tell us that the new BlueStore backend for the OSDs is now available.


The current backend for the OSDs is the FileStore which mainly uses the XFS filesystem to store it’s data. To overcome several limitations of XFS and POSIX in general the BlueStore backend was developed.

It will provide more performance (mainly writes), data safety due to checksumming and compression.

Users are encouraged to test BlueStore starting with the Kraken release for non-production and non-critical data sets and report back to the community.

Deploying with BlueStore

To deploy OSDs with BlueStore you can use the ceph-deploy by using the –bluestore flag.

I created a simple test cluster with three machines: alpha, bravo and charlie.

Each machine will be running a ceph-mon and ceph-osd proces.

This is the sequence of ceph-deploy commands I used to deploy the cluster

ceph-deploy new alpha bravo charlie
ceph-deploy mon create alpha bravo charlie

Now, edit the ceph.conf file in the current directory and add:

enable_experimental_unrecoverable_data_corrupting_features = bluestore

With this setting we allow the use of BlueStore and we can now deploy our OSDs:

ceph-deploy --overwrite-conf osd create --bluestore alpha:sdb bravo:sdb charlie:sdb

Running BlueStore

This tiny cluster how runs three OSDs with BlueStore:

root@alpha:~# ceph -s
    cluster c824e460-2f09-4994-8b2f-108aedc52d19
     health HEALTH_OK
     monmap e2: 3 mons at {alpha=[2001:db8::100]:6789/0,bravo=[2001:db8::101]:6789/0,charlie=[2001:db8::102]:6789/0}
            election epoch 14, quorum 0,1,2 alpha,bravo,charlie
        mgr active: charlie standbys: alpha, bravo
     osdmap e14: 3 osds: 3 up, 3 in
            flags sortbitwise,require_jewel_osds,require_kraken_osds
      pgmap v24: 64 pgs, 1 pools, 0 bytes data, 0 objects
            43356 kB used, 30374 MB / 30416 MB avail
                  64 active+clean
root@alpha:~# ceph osd tree
-1 0.02907 root default                                       
-2 0.00969     host alpha                                     
 0 0.00969         osd.0         up  1.00000          1.00000 
-3 0.00969     host bravo                                     
 1 0.00969         osd.1         up  1.00000          1.00000 
-4 0.00969     host charlie                                   
 2 0.00969         osd.2         up  1.00000          1.00000 

On alpha I see that osd.0 only has a small partition for a bit of configuration and the rest is used by BlueStore.

root@alpha:~# df -h /var/lib/ceph/osd/ceph-0
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb1        97M  5.4M   92M   6% /var/lib/ceph/osd/ceph-0
root@alpha:~# lsblk 
sda      8:0    0    8G  0 disk 
├─sda1   8:1    0  7.5G  0 part /
├─sda2   8:2    0    1K  0 part 
└─sda5   8:5    0  510M  0 part [SWAP]
sdb      8:16   0   10G  0 disk 
├─sdb1   8:17   0  100M  0 part /var/lib/ceph/osd/ceph-0
└─sdb2   8:18   0  9.9G  0 part 
sdc      8:32   0   10G  0 disk 
root@alpha:~# cat /var/lib/ceph/osd/ceph-0/type

The OSDs should work just like OSDs running FileStore, but they should perform better.

Running headless VirtualBox inside Nested KVM

For the Ceph training at 42on I use VirtualBox to build Virtual Machines. This is because they work under MacOS, Windows and Linux.

For the internal Git at 42on we use Gitlab and I wanted to use Gitlab’s CI to build my Virtual Machines automatically.

As we don’t have any physical hardware at 42on (everything runs in the cloud) I wanted to see if I could run VirtualBox Headless inside a VM with Nested KVM enabled.

Nested KVM

The first thing I checked was if my KVM Virtual Machine actually supported Nested KVM. This can be verified with the kvm-ok command under Ubuntu:

root@glrun01:~# kvm-ok 
INFO: /dev/kvm exists
KVM acceleration can be used

Now that’s verified I tried to install VirtualBox.


Installing VirtualBox is straight forward. Just add the repository and install the packages. Don’t forget to reboot afterwards to make sure all kernel modules are loaded and properly installed.

apt-get install virtualbox

VirtualBox Extension Pack

The trick to get everything working properly is to install Oracle’s VirtualBox Extension Pack. It took me a while to figure out that I need to install it manually. It wasn’t done by default after install.

You need to download the pack and install it using the VBoxManage command.

vboxmanage extpack install Oracle_VM_VirtualBox_Extension_Pack-5.0.24.vbox-extpack
vboxmanage list extpacks
vboxmanage setproperty vrdeextpack "Oracle VM VirtualBox Extension Pack"

With that installed and configured I rebooted the machine again just to be sure.

It works!

With that it actually worked. The VirtualBox VMs can now be built inside a Nested KVM machine controlled by Gitlab’s CI 🙂

3 years of Model S ownership

September 26th 2013

On 26-09-2013 the day had finally arrived: Delivery of my Tesla Model S!

In the morning my Delivery Specialist send me this picture asking me if I was ready (with a smiley behind it 😉 ).

Tesla Model S 2013

It was 3 years since I ordered my Model S, so I couldn’t wait to pick it up! (In the back you see the Blue Model S of my colleague)


These are the options I chose for my Model S:

  • 85kWh non-performance (RWD)
  • Pearl White
  • All Glass Panoramic Roof
  • Base 19″ wheels
  • Black Nappa Leather Interior
  • Piano Black Décor
  • Tech Package
  • Sound Studio Package
  • Active Air Suspension
  • Lighting Package
  • Parking Sensors
  • Twin Chargers (22kW)

Price: EUR 97.890,00 (Including all taxes)

Afterwards I swapped the 19″ base wheels for the 19″ Cyclone Grey. These wheels are no longer available.

SuperCharger Germany

September 2016

Fast forward 3 years and 141.466km: I’m still super happy with my Model S. Best car ever, hands down.

The 100.000km mark was hit at November 14th 2015 as I was almost home. Literally, I was just 200m away from my house. So I could stop I take a good picture.

100.000km on Model S

Now, 3 years later it is well over 141.000km and will probably hit 150.000km somewhere in October.

Tesla Model S 141k km

Tesla Model S driveway

The roadtrips I did in these 3 years were all over Europe:

  • 3x: Octoberfest in München (DE): 2.000km
  • 3x: To Prague (CZ): 2.2000km
  • 3x: To Berlin (DE): 2.000km
  • 2x: Northern Norway above the Arctic circle: 6.000km
  • 2x: Summer roadtrip to Germany, Austria, Italy and Switzerland: 3.500km
  • 1x: To Swansea in Wales (UK): 1.500km

Each of these trips were with either friends or my girlfriend. Awesome trips, each of them. All powered by the ever expanding SuperCharger network.

Over the Air software updates

Due to the over the Air software updates the car only got better and better in these 3 years. A few things (but not all of them) which were added:

  • Trip Planner using SuperChargers
  • Spotify integration (awesome!)
  • New UI
  • Calender sync
  • Slightly improved efficieny

All for free and while my car was parked at home. My other cars never got better over time, they always got worse.


In three years I’ve driven my car to a lot of places in Europe (see below). From the cold in Norway to the heat of Italy.

Did I have some issues? Yes, but to be clear: I was never stranded! It did not malfunction in such a way that it was disabled.

So what did I experience?

  • Humming drivetrain. It was replaced 3 times under warranty
  • Main contactor failure in battery. Reboot of car worked and contactor was replaced.
  • Fogging rear lights
  • Slave charger failure. Causing reduced charging speed with AC charging
  • Window washer pump failure.

Again, none of these issues left me stranded along the road. They were also all fixed under warranty except for the window washer pump. That was EUR 100 in total.


In total my S went for service 3 times. I figured once every year would be enough. I paid two invoices of EUR 700,00 each. The other ones were discounted from the referral credit I have at Tesla.

Including the washer fluid pump my total expenses on service and maintenance were EUR 800,00. Not bad I would say!

Energy Consumption

About 70% of my charging is done at home, the rest at SuperChargers and other (public) chargers.

There is a kWh meter in front of my charging station at home and I’ve used about 20.000kWh. Judging from my 70% ‘charge at home’ assumption my total energy usage in 3 years was roughly 28.000kWh.

28.000kWh / 141.000km = 198Wh/km, which is about what I see in my general consumption in the car.


As I wrote above I undertook multiple Roadtrips in the three years, but the best trips I did were the trips to Wales, Norway and to Slovenia. I wrote blogs about two of them:

I didn’t write a blogpost about my roadtrip through Europe in June 2015, but you can see the route below (Prague, Austria, Slovenia, Austria, Germany, Netherlands).

I tried to draw the routes I’ve driven on a Google Maps overview.

Routes Tesla Model S


In the threee years I still think my trips to the Arctic are the highlights for me. However, there were more highlights, so I gathered a bunch of pictures I took and added them below in a random order.

Route 74 between Norway and Sweden:

Route 74 in Norway

Stuck on the arctic circle in Norway:

Model S stuck Arctic Circle

On the Lofoten Islands in Norway:

Car under snow from back

Model S next to house Lofoten

Octoberfest in München: (I’m the green blouse)

Octoberfest in München.

Charging at Fastned using 50kW CHAdeMO.

Charging at Fastned

In the Belgian Ardennes:

In the Belgian Ardennes

On a train in Austria going towards Bad Gastein:

On train in Austria

At the Slovenian <> Italian border:

Slovenian Border


After owning a Audi A3 2.0TDI (2007), Toyota Auris Hybrid (2011) and BMW M5 E39 (1999) I can saw that Model S is the best car I’ve ever owned. I love driving it and still enjoy every KM. (Except when stuck in traffic….). Even without Autopilot it is still an amazing car!

People still come up to me to ask things about the car and are really interested.

As I said. Best car ever. Period. I will never by a car which burns fossil fuel again.

My deposit for Model 3 was made at the day they opened. Waiting!

Chown Ceph OSD data directory using GNU Parallel

Starting with Ceph version Jewel (10.2.X) all daemons (MON and OSD) will run under the privileged user ceph. Prior to Jewel daemons were running under root which is a potential security issue.

This means data has to change ownership before a daemon running the Jewel code can run.

Chown data

As the Release Notes state you will have to chown all your data to ceph:ceph in /var/lib/ceph.

chown -R ceph:ceph /var/lib/ceph

On a system with multiple OSDs this might take a lot of time, using GNU Parallel you can save yourself a lot of time.

Static UID

The ceph User and Group have been assigned static UID and GIDs in the major distributions:

  • Fedora/CentOS/RHEL: 167:167
  • Debian/Ubuntu: 64045/64045

Chown in parallel

Using these commands you can chown the data in /var/lib/ceph much faster.

WARNING: Make sure the OSDs are stopped on the system before you continue!

Now you can run these commands (Ubuntu in this case):

find /var/lib/ceph/osd -maxdepth 1 -mindepth 1 -type d|parallel chown -R 64045:64045
chown 64045:64045 /var/lib/ceph
chown 64045:64045 /var/lib/ceph/*
chown 64045:64045 /var/lib/ceph/bootstrap-*/*

The first command will take the longest. I tested it on a system with 24 OSDs all containing about 800GB of data. That took roughly 20 minutes.

Calculating SLAAC IPv6 Address in Java


With IPv6 a host on a network can use StateLess Address AutoConfiguration (SLAAC) to configure it’s network.

Routers will send out Router Advertisements telling the network which subnet is used in the network.

Based on their MAC address (modified EUI-64) a host will then obtain a IPv6 it will use.


For the Apache CloudStack project I had to write Java code which would take a subnet and a MAC address as an argument and would generate a IPv6 SLAAC address from it.

Combining subnet 2001:db8:100::/64 with MAC address 06:7a:88:00:00:8b yields IPv6 address 2001:db8:100:0:47a:88ff:fe00:8b.

 * Java code using Java-ipv6 from Google Code to convert
 * a given IPv6 subnet and a MAC address into a IPv6 address
 * calculated using SLAAC.
 * Author: Wido den Hollander 
import com.googlecode.ipv6.IPv6Address;
import com.googlecode.ipv6.IPv6Network;

public class IPv6EUI64 {
    public static IPv6Address EUI64Address(final IPv6Network cidr, final String macAddress) {
        if (cidr.getNetmask().asPrefixLength() > 64) {
            throw new IllegalArgumentException("IPv6 subnet " + cidr.toString() + " is not 64 bits or larger in size");

        String mac[] = macAddress.toLowerCase().split(":");

        return IPv6Address.fromString(cidr.getFirst().toString() +
                Integer.toHexString(Integer.parseInt(mac[0], 16) ^ 2) +
                mac[1] + ":" + mac[2] + "ff:fe" + mac[3] +":" + mac[4] + mac[5]);

    public static void main(String[] argv) {
        IPv6Network cidr = IPv6Network.fromString("2001:db8:100::/64");
        String mac = "06:7a:88:00:00:8b";
        IPv6Address eui64addr = EUI64Address(cidr, mac);

        /* This will print 2001:db8:100:0:47a:88ff:fe00:8b */

The code can also be found on my Github Gist page.

Calculating DS record from DNSKEY with Python 3

While working on DNSSEC for PCextreme’s Aurora DNS I had to convert a DNSKEY to a DS-record which could be set in the parent zone for proper delegation.

The foundation for Aurora DNS is PowerDNS together with Python 3.

The API for Aurora DNS has to return the DS-records so that a end-user can use these in the parent zone. I had the DNSKEY, but I didn’t have the DS-record so I had to calculate it using Python 3.

I eventually ended up with this Python code which you can find on my Github Gists page.

Generate a DNSSEC DS record based on the incoming DNSKEY record

The DNSKEY can be found using for example 'dig':

$ dig DNSKEY

The output can then be parsed with the following code to generate a DS record
for in the parent DNS zone

Author: Wido den Hollander 

Many thanks to this blogpost:

import struct
import base64
import hashlib

DNSKEY = '257 3 8 AwEAAckZ+lfb0j6aHBW5AanV5A0V0IfF99vAKFZd6+fJfEChpZtjnItWDnJLPa3/LAFec/tUhLZ4jgmzaoEuX3EQQgI1V4kp9SYf8HMlFPP014eO+AnjkYFGLE2uqHPx/Tu7/pO3EyKwTXi5fMadROKuo/mfat5AEIhGjteGGO93DhnOa6kcqj5RHYJBh5OZ/GoZfbeYHK6Muur1T16hHiI12rYGoqJ6ZW5+njYprG6qwp6TZXxJyE7wF1JdD+Zhbjhf0Md4zMEysP22wBLghBaX6eDIBh/7jU7dw1Ob+I42YWk+X4NSiU3sRYPaq1R13JEK4zVqQtL++UVtgRPEbfj5RQ8='

def _calc_keyid(flags, protocol, algorithm, dnskey):
    st = struct.pack('!HBB', int(flags), int(protocol), int(algorithm))
    st += base64.b64decode(dnskey)

    cnt = 0
    for idx in range(len(st)):
        s = struct.unpack('B', st[idx:idx+1])[0]
        if (idx % 2) == 0:
            cnt += s << 8
            cnt += s

    return ((cnt & 0xFFFF) + (cnt >> 16)) & 0xFFFF

def _calc_ds(domain, flags, protocol, algorithm, dnskey):
    if domain.endswith('.') is False:
        domain += '.'

    signature = bytes()
    for i in domain.split('.'):
        signature += struct.pack('B', len(i)) + i.encode()

    signature += struct.pack('!HBB', int(flags), int(protocol), int(algorithm))
    signature += base64.b64decode(dnskey)

    return {
        'sha1':    hashlib.sha1(signature).hexdigest().upper(),
        'sha256':  hashlib.sha256(signature).hexdigest().upper(),

def dnskey_to_ds(domain, dnskey):
    dnskeylist = dnskey.split(' ', 3)

    flags = dnskeylist[0]
    protocol = dnskeylist[1]
    algorithm = dnskeylist[2]
    key = dnskeylist[3].replace(' ', '')

    keyid = _calc_keyid(flags, protocol, algorithm, key)
    ds = _calc_ds(domain, flags, protocol, algorithm, key)

    ret = list()
    ret.append(str(keyid) + ' ' + str(algorithm) + ' ' + str(1) + ' '
               + ds['sha1'].lower())
    ret.append(str(keyid) + ' ' + str(algorithm) + ' ' + str(2) + ' '
               + ds['sha256'].lower())
    return ret

print(dnskey_to_ds(DOMAIN, DNSKEY))

VirtualBox images to experiment with IPv6

Around me I noticed that a lot of people don’t have hands-on experience with IPv6. The networks they work in do not support IPv6 nor does their ISP provide them with native IPv6 connectivity at home.

On my local systems I often use Virtual Box to set up (IPv6) testing environments. I thought I’d create some Virtual Machine images to get some hands-on experience with IPv6.

The images and README can be found on Github and are aimed to be easy to install and work with.


To run the images you need to have Virtual Box installed. You also should be able to use the Linux command line as the Virtual Machines are based on Ubuntu 16.04.

More information can be found in the repository on Github in the README file.


You can download the images here.

How to use

Please take a look at the README on Github. It tells you how to use them.

Happy testing!

Hitch TLS Proxy performance with 15k certificates

While testing with the Hitch TLS proxy in front of Varnish I stumbled upon a slow startup with a large amount of certificates.

In this case we (at PCextreme) want to run Hitch with around 50.000 certificates configured.

The webpage of Hitch says:

Safe for large installations: performant up to 15 000 listening sockets and 500 000 certificates.

10 minutes

I started testing on my local desktop with 15.000 certificates. My desktop is a Intel NUC with Ubuntu 14.04.

wido@wido-desktop:~/repos/hitch/src$ time sudo ./hitch -n 4 -u nobody -g nogroup --config=/opt/hitch/hitch.conf

real    9m40.088s
user    9m38.482s
sys 0m0.829s

A 10 minute startup time for Hitch is rather long. We started searching for the root-cause.


After some searching we discovered the OpenSSL version in Ubuntu 14.04 was the problem. Testing with Ubuntu 15.10 showed us different results.

root@VM-9d8e8cfd-e30f-4c40-8c4e-2e098b0f11a5:~# time hitch --daemon --pidfile=/run/ --user hitch --group hitch --config=/etc/hitch/hitch.conf

real    0m18.673s
user    0m6.780s
sys    0m2.000s

18 seconds is a lot better than 10 minutes!

Ubuntu 14.04 comes with OpenSSL 1.0.1f and Ubuntu 15.10 with 1.0.2d and that is where the difference seems to be.

100.000 certificates

After this we started testing with 100k certificates. It took 48 seconds to start with that amount of certificates configured.

For production we will use Ubuntu 16.04 which has similar results as Ubuntu 15.10.

So if you find Hitch slow when starting, check your OpenSSL version.

AnyIP: Bind a whole subnet to your Linux machine

IPv6 Prefix Delegation

In my previous post I wrote how you can use Docker with IPv6 and Prefix Delegation.

A IPv6 subnet routed to a Linux machine can be used with other things than Docker. That’s where the AnyIP feature of the kernel comes in.

Linux Kernel AnyIP

The AnyIP feature of the Linux kernel allows you to bind a complete IPv4 or IPv6 subnet to your system.

Instead of adding all addresses manually to the kernel you can tell it to bind a complete subnet.



ip -4 route add local dev lo

In this case the Linux kernel will now respond to ARP requests for any IPv4 address in the subnet.


ip -6 route add local 2001:db8:100::/64 dev lo

In this case the kernel will respond for Neigh Sollicitations on any IPv6 address in the 2001:db8:100::/64 subnet.

Example usage

Let’s assume that you have the IPv6 prefix 2001:db8:100::/60 routed to your Linux machine through IPv6 prefix delegation.

From that /60 subnet we take the first /64 subnet and attach it to lo.

ip -6 route add local 2001:db8:100::/64 dev lo

You can now ping any of the addresses in that subnet:

  • 2001:db8:100::1
  • 2001:db8:100::100
  • 2001:db8:100::200
  • 2001:db8:100::dead:b33f

If you would start a webserver which listens on port 80 you can use any of the IPv6 addresses in that subnet and the webserver will respond to it.

Use cases

It could be that you want to to mass-shared hosting on a system where you want to assign each hostname/domainname it’s own IPv6 address. Instead of attaching single IPs to a interface you can simply attach a complete subnet and point traffic to any of the IPs in that subnet.


On a virtual machine on PCextreme’s Aurora Compute I deployed a Instance with Prefix Delegation enabled.

After running ‘dhclient’ I got the subnet 2a00:f10:500:40::/60 assigned to my Instance.

It was then just one line to attach a /64 subnet:

ip -6 route add local 2a00:f10:500:40::/64 dev lo

Random address generator

I wrote a small piece of Python code to generate a random IPv6 address:

#!/usr/bin/env python3
Generate a random IPv6 address for a specified subnet

from random import seed, getrandbits
from ipaddress import IPv6Network, IPv6Address

subnet = '2a00:f10:500:40::/64'

network = IPv6Network(subnet)
address = IPv6Address(network.network_address + getrandbits(network.max_prefixlen - network.prefixlen))


Using a small loop in Bash I could now ping random addresses in that subnet:

while [ true ]; do ping6 -c 2 `./`; done

Some example output:

--- 2a00:f10:500:40:d142:1092:ea84:74b4 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 10.252/11.680/13.108/1.428 ms
PING 2a00:f10:500:40:4e50:f264:6ea9:d184(2a00:f10:500:40:4e50:f264:6ea9:d184) 56 data bytes
64 bytes from 2a00:f10:500:40:4e50:f264:6ea9:d184: icmp_seq=1 ttl=56 time=10.0 ms
64 bytes from 2a00:f10:500:40:4e50:f264:6ea9:d184: icmp_seq=2 ttl=56 time=10.0 ms

--- 2a00:f10:500:40:4e50:f264:6ea9:d184 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 10.085/10.087/10.089/0.002 ms
PING 2a00:f10:500:40:d831:1f89:b06d:fe12(2a00:f10:500:40:d831:1f89:b06d:fe12) 56 data bytes
64 bytes from 2a00:f10:500:40:d831:1f89:b06d:fe12: icmp_seq=1 ttl=56 time=9.77 ms
64 bytes from 2a00:f10:500:40:d831:1f89:b06d:fe12: icmp_seq=2 ttl=56 time=10.1 ms

--- 2a00:f10:500:40:d831:1f89:b06d:fe12 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1005ms
rtt min/avg/max/mdev = 9.777/9.958/10.140/0.207 ms
PING 2a00:f10:500:40:2c45:26ee:5b93:fa2(2a00:f10:500:40:2c45:26ee:5b93:fa2) 56 data bytes
64 bytes from 2a00:f10:500:40:2c45:26ee:5b93:fa2: icmp_seq=1 ttl=56 time=10.2 ms
64 bytes from 2a00:f10:500:40:2c45:26ee:5b93:fa2: icmp_seq=2 ttl=56 time=10.0 ms