Using L3 (BGP) routing for your Ceph storage

Many Ceph storage environments out there are deployed using a L2 underlay.

This means that the Ceph servers (MON, OSD, etc) are connected using LACP/Bonding to a pair of switches. On their ‘bond0’ device (example) they are assigned an IPv4/IPv6 address and this is used for connectivity between the Ceph nodes and the Ceph clients.

Although this works fine, I try to avoid L2 as much as possible in datacenter deployments. L2 scales up to a certain point, but it has it’s limitations. Modern Top-of-Rack (ToR) switches can easily route traffic and wire-speed. This used to be a limitation of switches in the past. When designing environments I prefer using a L3 approach.

This blogpost is there to show you the rough concept. It’s NOT a copy and paste tutorial. You will need to adapt it to your situation.

Network setup and BGP configuration

Using Juniper QFX5100 switches and Frrouting on the Ceph nodes I’ve established BGP sessions between the ToR and Ceph nodes according to the diagram below.

Each nodes has two independent BGP sessions with the Top-of-Rack in it’s rack. Via these BGP sessions they advertise their local IPv6 /128 address. Via the same sessions they receive a default ::/0 IPv6 route.

ceph01# sh bgp summary 

IPv6 Unicast Summary (VRF default):
BGP router identifier 1.2.3.4, local AS number 65101 vrf-id 0
BGP table version 10875
RIB entries 511, using 96 KiB of memory
Peers 2, using 1448 KiB of memory
Peer groups 1, using 64 bytes of memory

Neighbor        V    AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
enp196s0f0np0   4 65002    487385    353917        0    0    0 3d18h17m            1        1 N/A
enp196s0f1np1   4 65002    558998    411452        0    0    0 01:38:55            1        1 N/A

Total number of neighbors 2
ceph01#

Here we see two BGP sessions active over both NICs of the Ceph node. We can also see that a default IPv6 route is received via BGP.

ceph01# sh ipv6 route ::/0
Routing entry for ::/0
  Known via "bgp", distance 20, metric 0
  Last update 01:42:00 ago
    fe80::e29:efff:fed7:4719, via enp196s0f0np0, weight 1
    fe80::7686:e2ff:fe7c:a19e, via enp196s0f1np1, weight 1

ceph01# 

The Frrouting configuration ( /etc/frr/frr.conf ) is fairly simple:

frr defaults traditional
hostname ceph01
log syslog informational
no ip forwarding
no ipv6 forwarding
service integrated-vtysh-config
!
interface enp196s0f0np0
 no ipv6 nd suppress-ra
exit
!
interface enp196s0f1np1
 no ipv6 nd suppress-ra
exit
!
interface lo
 ipv6 address 2001:db8:100:1::/128
exit
!
router bgp 65101
 bgp router-id 1.2.3.4
 no bgp ebgp-requires-policy
 no bgp default ipv4-unicast
 no bgp network import-check
 neighbor upstream peer-group
 neighbor upstream remote-as external
 neighbor enp196s0f0np0 interface peer-group upstream
 neighbor enp196s0f1np1 interface peer-group upstream
 !
 address-family ipv6 unicast
  redistribute connected
  neighbor upstream activate
 exit-address-family
exit
!
end

On the Juniper switches a configuration was defined for the BGP Unnumbered (RFC5549) configuration as well. This blogpost explains very well on how BGP Unnumbered works on JunOS, I am not going to repeat it. I will highlight a couple of pieces of configuration.

root@tor01# show interfaces xe-0/0/1
description ceph01;
unit 0 {
    mtu 9216;
    family inet6;
}

root@tor01# show protocols router-advertisement 
interface xe-0/0/1.0;
root@tor01# show | compare 
[edit]
+  policy-options {
+      as-list bgp_unnumbered_as_list members 65101-65199;
+  }
[edit protocols]
+   bgp {
+       group ceph {
+           family inet6 {
+               unicast;
+           }
+           multipath;
+           export default-v6;
+           import ceph-loopback;
+           dynamic-neighbor bgp_unnumbered {
+               peer-auto-discovery {
+                   family inet6 {
+                       ipv6-nd;
+                   }
+                   interface xe-0/0/1.0;
+                   interface xe-0/0/2.0;
+                   interface xe-0/0/3.0;
+               }
+           }
+           peer-as-list bgp_unnumbered_as_list;
+       }
+   }
[edit policy]
+ policy-statement default-v6 {
+    from {
+        route-filter ::/0 exact;
+   }
+   then accept;
+}
+ policy-statement ceph-loopback {
+    from {
+        route-filter 2001:db8:100::/64 upto /128;
+   }
+   then accept;
+}

This will set up the BGP sessions via the interfaces xe-0/0/1 until xe-0/0/3 using IPv6 Autodiscovery.

The Ceph nodes should now be able to ping the other nodes:

PING 2001:db8:100::2(2001:db8:100::2) 56 data bytes
64 bytes from 2001:db8:100::2: icmp_seq=1 ttl=63 time=0.058 ms
64 bytes from 2001:db8:100::2: icmp_seq=2 ttl=63 time=0.063 ms
64 bytes from 2001:db8:100::2: icmp_seq=3 ttl=63 time=0.071 ms

--- 2001:db8:100::2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2037ms
rtt min/avg/max/mdev = 0.058/0.064/0.071/0.005 ms

Ceph configuration

From Ceph’s perspective there is not much to do. We just need to specify the IPv6 subnet Ceph is allowed to use and bind to.

[global]
	 mon_host = 2001:db8:100::1, 2001:db8:100::2, 2001:db8:100::3
	 ms_bind_ipv4 = false
	 ms_bind_ipv6 = true
	 public_network = 2001:db8:100::/64

This is all the configuration needed for Ceph 🙂

wdh@ceph01:~$ sudo ceph health
HEALTH_OK
wdh@infra-04-01-17:~$ sudo ceph mon dump
election_strategy: 1
0: [v2:[2001:db8:100::1]:3300/0,v1:[2001:db8:100::1]:6789/0] mon.ceph01
1: [v2:[2001:db8:100::2]:3300/0,v1:[2001:db8:100::2]:6789/0] mon.ceph02
2: [v2:[2001:db8:100::3]:3300/0,v1:[2001:db8:100::3]:6789/0] mon.ceph02
dumped monmap epoch 6
wdh@infra-04-01-17:~$ 

Lenovo X1 Carbon FCC Unlock of EM7455 under Ubuntu 22.04

I recently upgraded my Lenovo X1 Carbon (5th gen) from Ubuntu 20.04 to 22.04 and my 4G connection stopped working.

I travel a lot and one of the great features of the Lenovo X1 Carbon is the (optional) 4G/LTE modem which I use with an additional data-sim of my provider (Vodafone NL).

After not really wanting to fix it (time constraints) I put some time in it and searching around a bit I found the solution for my problem.

It has has to do with the Sierra Wireless EM7455 modem and an updated version of ModemManager in Ubuntu 22.04. The modem is locked by default and has to be unlocked by sending a ‘magic’ combination of commands.

The logs would show the following line:

<warn>  [modem0] couldn't enable interface: 'Invalid transition'

Links

Before jumping to the solution, let me first share a few links which I used to diagnose and resolve the problem:

The solution

For in-depth details of the root-cause of the problem I recommend reading the modemmanager’s website, but in the end you need to run these commands:

sudo ln -s /etc/ModemManager/fcc-unlock.d /usr/share/ModemManager/fcc-unlock.available.d/1199:9079
sudo apt -y install libqmi-utils
sudo systemctl restart ModemManager

Now keep in mind that this solution only works if you have a Sierra Wireless EM7455 modem. You can check this with the output of the lsusb command.

Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 002: ID 1050:0407 Yubico.com Yubikey 4/5 OTP+U2F+CCID
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 005: ID 138a:0097 Validity Sensors, Inc. 
Bus 001 Device 004: ID 13d3:5682 IMC Networks SunplusIT Integrated Camera
Bus 001 Device 003: ID 8087:0a2b Intel Corp. Bluetooth wireless interface
Bus 001 Device 019: ID 1199:9079 Sierra Wireless, Inc. EM7455
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

After creating the right symlink to unlock my Modem I was able to connect to the Vodafone network again and no longer had to use my phone’s wireless hotspot!

IPv6

wido@wido-laptop:~$ sudo ip addr show dev wwan0
12: wwan0: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 1000
    link/ether fa:73:bc:24:0f:24 brd ff:ff:ff:ff:ff:ff
    inet 100.71.255.77/30 brd 100.71.255.79 scope global noprefixroute wwan0
       valid_lft forever preferred_lft forever
wido@wido-laptop:~$ 

Unfortunately Vodafone still (Jan 2023) does not support IPv6 on their 4G network. It’s coming I’ve heard….

Allowing SSH login for user without a password

To start with: This is something you should NOT use in most cases. It’s only intended to be used in very specific situations.

In my situation I want to allow some remote systems to create a reverse SSH tunnel without a password nor a key. It’s for hobby purposes and through firewalling I make sure that only those systems are allowed to connect to my ‘SSH proxy’.

I started by creating a group and a few users with that as their primary group:

groupadd reversessh
useradd -G reversessh user1
useradd -G reversessh user2
useradd -G reversessh user3
passwd -d user1
passwd -d user2
passwd -d user3

I then modified my /etc/ssh/sshd_config that it only allows specific groups and allows users with an empty password:

PermitEmptyPasswords yes
AllowGroups root reversessh

I also needed to modify PAM to make sure it allows this login. Therefor you need to modify /etc/pam.d/common-auth that it contains:

auth    [success=1 default=ignore]      pam_unix.so nullok

After I restarted SSH to users user1 until user3 were able to log on without a password nor a key.

Is this very secure? No! But it does serve a purpose in some use-cases.

Renaming a network interface with systemd-networkd on Ubuntu 18.04

On a Ubuntu system where I’m creating a VXLAN Proof of Concept with CloudStack I wanted to rename the interface enp5s0 to cloudbr0.

I found many documentation on the internet on how to do this with *.link files, but I was missing the golden tip, which was you need to re-generate your initramfs.

/etc/systemd/network/50-cloudbr0.link

[Match]
MACAddress=00:25:90:4b:81:54

[Link]
Name=cloudbr0

After you create this file, re-generate your initramfs:

update-initramfs -c -k all

You can now use cloudbr0 in *.network files to use it like any other network interface.

In my case this is how my interfaces look like:

1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
6: cloudbr0:  mtu 9000 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:25:90:4b:81:54 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.11/24 brd 192.168.0.255 scope global cloudbr0
       valid_lft forever preferred_lft forever
    inet6 2a00:f10:114:0:225:90ff:fe4b:8154/64 scope global dynamic mngtmpaddr noprefixroute 
       valid_lft 2591993sec preferred_lft 604793sec
    inet6 fe80::225:90ff:fe4b:8154/64 scope link 
       valid_lft forever preferred_lft forever
8: cloudbr1:  mtu 1450 qdisc noqueue state UP group default qlen 1000
    link/ether 86:fa:b6:31:6e:c1 brd ff:ff:ff:ff:ff:ff
    inet 172.16.0.11/24 brd 172.16.0.255 scope global cloudbr1
       valid_lft forever preferred_lft forever
    inet6 fe80::84fa:b6ff:fe31:6ec1/64 scope link 
       valid_lft forever preferred_lft forever
9: vxlan100:  mtu 1450 qdisc noqueue master cloudbr1 state UNKNOWN group default qlen 1000
    link/ether 56:df:29:8d:db:83 brd ff:ff:ff:ff:ff:ff

VXLAN with VyOS and Ubuntu 18.04

VXLAN

Virtual Extensible LAN uses encapsulation technique to encapsulate OSI layer 2 Ethernet frames within layer 4 UDP datagrams. More on this can be found on the link provided.

For a Ceph and CloudStack environment I needed to set up a Proof-of-Concept using VXLAN and some refurbished hardware. The main purpose of this PoC is to verify that VXLAN works with CloudStack, Ceph and Ubuntu 18.04

VyOS

VyOS is an open source network operating system based on Debian Linux. It supports VXLAN, so using this we were able to test VXLAN in this setup.

In production a other VXLAN capable router would be used, but for a PoC VyOS works just fine running on a regular server.

Configuration

The VyOS router is connected to ‘the internet’ with one NIC and the other NIC is connected to a switch.

Using static routes a IPv4 subnet (/24) and a IPv6 subnet (/48) are routed towards the VyOS router. These are then splitted and send to multiple VLANs.

As it took me a while to configure VXLAN under VyOS

I’m only posting that configuration.

interfaces {
    ethernet eth0 {
        address 31.25.96.130/30
        address 2a00:f10:100:1d::2/64
        duplex auto
        hw-id 00:25:90:80:ed:fe
        smp-affinity auto
        speed auto
    }
    ethernet eth5 {
        duplex auto
        hw-id a0:36:9f:0d:ab:be
        mtu 9000
        smp-affinity auto
        speed auto
        vif 300 {
            address 192.168.0.1/24
            description VXLAN
            mtu 9000
        }
    vxlan vxlan1000 {
        address 10.0.0.1/23
        address 2a00:f10:114:1000::1/64
        group 239.0.3.232
        ip {
            enable-arp-accept
            enable-arp-announce
        }
        ipv6 {
            dup-addr-detect-transmits 1
            router-advert {
                cur-hop-limit 64
                link-mtu 1500
                managed-flag false
                max-interval 600
                name-server 2a00:f10:ff04:153::53
                name-server 2a00:f10:ff04:253::53
                other-config-flag false
                prefix 2a00:f10:114:1000::/64 {
                    autonomous-flag true
                    on-link-flag true
                    valid-lifetime 2592000
                }
                reachable-time 0
                retrans-timer 0
                send-advert true
            }
        }
        link eth5.300
        mtu 1500
        vni 1000
    }
    vxlan vxlan2000 {
        address 109.72.91.1/26
        address 2a00:f10:114:2000::1/64
        group 239.0.7.208
        ipv6 {
            dup-addr-detect-transmits 1
            router-advert {
                cur-hop-limit 64
                link-mtu 1500
                managed-flag false
                max-interval 600
                name-server 2a00:f10:ff04:153::53
                name-server 2a00:f10:ff04:253::53
                other-config-flag false
                prefix 2a00:f10:114:2000::/64 {
                    autonomous-flag true
                    on-link-flag true
                    valid-lifetime 2592000
                }
                reachable-time 0
                retrans-timer 0
                send-advert true
            }
        }
        link eth5.300
        mtu 1500
        vni 2000
    }
}

VLAN 300 on eth5 is used to route VNI 1000 and 2000 in their own multicast groups.

The MTU of eth5 is set to 9000 so that the encapsulated traffic of VXLAN can still be 1500 bytes.

Ubuntu 18.04

To test if VXLAN was actually working on the Ubuntu 18.04 host I made a very simple script:

ip link add vxlan1000 type vxlan id 1000 dstport 4789 group 239.0.3.232 dev vlan300 ttl 5
ip link set up dev vxlan1000
ip addr add 10.0.0.11/23 dev vxlan1000
ip addr add 2a00:f10:114:1000::101/64 dev vxlan1000

That works! I can ping 10.0.0.11 and 2a00:f10:114:1000::1 from my Ubuntu 18.04 machine!

Testing with ConfigDrive and cloud-init

cloud-init

cloud-init is a very easy way to bootstrap/configure Virtual Machines running in a cloud environment. It can read it’s metadata from various data sources and configure for public SSH keys or create users for example.

Most large clouds support cloud-init to quickly deploy new Instances.

Config Drive

Config Drive is a data source which reads a local ‘CD-Rom’ device which contains the metadata for the Virtual Machine. This allows for auto configuration of Virtual Machines without them requiring network. My main use case for Config Drive is CloudStack which has support for Config Drive since version 4.11.

I wanted to test with Config Drive outside CloudStack to test some functionality.

meta_data.json

On my laptop I spun up a Ubuntu 18.04 Virtual Machine with cloud-init installed and I attached a ISO which I created.

/home/wido/Desktop/
             cloud-init/
               openstack/
                 latest/
                   meta_data.json

In meta_data.json I put:

{
  "hostname": "ubuntu-test",
  "name": "ubuntu-test",
  "uuid": "0109a241-6fd9-46b6-955a-cd52ad168ee7",
  "public_keys": [
      "ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAwPKBJDJdlvOKIfilr0VSkF9i3viwLtO8GyCpxL/8TxrGKnEg19LPLN3lwKWbbTqBZgRmbrR3bgQfM4ffPoTCxSPv44eZCF8jMPv8PxpC0yVaTcqW4Q7woD7pjdIuGVImrmEls0U8rS3uGQDx7LhFphkAh+blfUtobqzyHvqcbtVEh+drESn8AXrKd1MZfGg6OB8Xrfdr6d959uHBHFJ8pOxxppYbInxKREPb3XmZzmoNQUmqFRN/VNVTreRHAxDcPM8pEPuNmr3Vp+vDVvfpA58yr2rZ21ASB4LlNOOSEx7vnLd6uH9rsqAJOtr0ZEE29fU609i4rd6Zda2HTGQO+Q== wido@wido-laptop",
      "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQC0HJ4gm7QCUZjeXh/GmCzUamABgtvOZ4GcDOA1mSGoXqGHYkG8qPa5OaURx5yqROmclTtgwkaHxjY4in6Bi7DGxlSyFQuEOKBwNCcZOrGFiRbazeh3c5jEXgoCtDynrqBjuivWTaxzl46WVglxGSy2LAaps9sAhDU+8Xfm9wBYYtjv3CDAHEe9rN3UxgGEO5K+yhoFzcI9vZl4Q9hvMZRRBBFq3vOFoLr9pyyxVUNiH9oQLubeiqDwnQqaGn48WMX9eO9KiH32eSnzlgJg8viFyonVRJq2dxJMEcrfolQhD9GR1BXNeVGDNJYuo9FX1/NeZZYJTUAwZ4jP008CfaBAnuKBfdZTkjNZlXxMQOY29VAetHI/urvh4NK17o4g4gcBpmLV+13saBIHTLXmA8ADkGZgWE6yIYEah04nBHr28vmZJBX4vpQYz9VWiZcJeEsGah/oaPkFMNQB7mXE+plz46YJ1JtozBpntir7A4cHJX3qZNz/JwOU7yAc4K3EP6fGirSvIDtQ/SJs9Gp51Wbv32E3/5ouemsGnh/ziOc8eIUJMTFWqpEG9qVi/tVJiryuvRcha6+0cZmnWnknBztzRO3Oh3JgPwECdNA/X0acXlzFlRbkQpAJx9+ADESauNaIvfQ3l+kZx3m/1eAAZpAI2ZrnvQMUa40XYksiwncM4Q== wido@wido-desktop"
  ]
}

Using mkisofs I was able to create a ISO and attached it to the VM:

mkisofs -iso-level 3 -V 'CONFIG-2' -o /var/lib/libvirt/images/cloud-init.iso /home/wido/Desktop/cloud-init

The ISO was attached to the Virtual Machine and after reboot I could log in with the user ubuntu over SSH using my pubkey!

Docker containers with IPv6 behind NAT

WARNING

In production IPv6 should always be used without NAT. Only use IPv6 and NAT for testing purposes. There is no valid reason to use IPv6 with NAT in any production environment.

IPv6 and NAT

IPv6 is designed to remove the need for NAT and that is a very, very good thing. NAT breaks Peer-to-Peer connections and that is exactly what is one of the great things of IPv6. Every device on the internet gets it’s own public IP-Address again.

Docker and IPv6

Support for IPv6 in Docker has been there for a while now. It is disabled by default however. The documentation describes on how to enable it.

I wanted to enable IPv6 on my Docker setup on my laptop running Ubuntu, but as my laptop is a mobile device the IPv6 prefix I have changes when I move to a different location. IPv6 Prefix Delegation isn’t available at every IPv6-enabled location either, so I wanted to figure out if I could enable IPv6 in my Docker setup locally and use NAT to have my containers reach the internet over IPv6.

At home I have IPv6 via ZeelandNet and at the office we have a VDSL connection from XS4All. When I’m on a remote location I enable our OpenVPN tunnel which has IPv6 enabled. This way I always have IPv6 available.

The Docker documentation shows that enabling IPv6 is very easy. I modified the systemd service file of docker and added a fixed IPv6 CIDR:

ExecStart=/usr/bin/dockerd --ipv6 --fixed-cidr-v6="fd00::/64" -H fd://

fd00::/64 is a Site-Local IPv6 subnet (deprecated) which can be safely used.

I then added a NAT rule into ip6tables so that it would NAT for me:

sudo ip6tables -t nat -A POSTROUTING -s fd00::/64 -j MASQUERADE

Result

My Docker containers now get a IPv6 Address as can be seen below:

root@da80cf3d8532:~# ip -6 a
1: lo:  mtu 65536 state UNKNOWN qlen 1
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
15: eth0@if16:  mtu 1500 state UP 
    inet6 fd00::242:ac11:2/64 scope global nodad 
       valid_lft forever preferred_lft forever
    inet6 fe80::42:acff:fe11:2/64 scope link 
       valid_lft forever preferred_lft forever
root@da80cf3d8532:~#

In this case the address is fd00::242:ac11:2 which as assigned by Docker.

Since my laptop has IPv6 I can now ping pcextreme.nl from my Docker container.

root@da80cf3d8532:~# ping6 -c 3 pcextreme.nl -n
PING pcextreme.nl (2a00:f10:101:0:46e:c2ff:fe00:93): 56 data bytes
64 bytes from 2a00:f10:101:0:46e:c2ff:fe00:93: icmp_seq=0 ttl=61 time=14.368 ms
64 bytes from 2a00:f10:101:0:46e:c2ff:fe00:93: icmp_seq=1 ttl=61 time=16.132 ms
64 bytes from 2a00:f10:101:0:46e:c2ff:fe00:93: icmp_seq=2 ttl=61 time=15.790 ms
--- pcextreme.nl ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max/stddev = 14.368/15.430/16.132/0.764 ms
root@da80cf3d8532:~#

Again, this should ONLY be used for testing purposes. For production IPv6 Prefix Delegation is the route to go down.

Testing Ceph BlueStore with the Kraken release

Ceph version Kraken (11.2.0) has been released and the Release Notes tell us that the new BlueStore backend for the OSDs is now available.

BlueStore

The current backend for the OSDs is the FileStore which mainly uses the XFS filesystem to store it’s data. To overcome several limitations of XFS and POSIX in general the BlueStore backend was developed.

It will provide more performance (mainly writes), data safety due to checksumming and compression.

Users are encouraged to test BlueStore starting with the Kraken release for non-production and non-critical data sets and report back to the community.

Deploying with BlueStore

To deploy OSDs with BlueStore you can use the ceph-deploy by using the –bluestore flag.

I created a simple test cluster with three machines: alpha, bravo and charlie.

Each machine will be running a ceph-mon and ceph-osd proces.

This is the sequence of ceph-deploy commands I used to deploy the cluster

ceph-deploy new alpha bravo charlie
ceph-deploy mon create alpha bravo charlie

Now, edit the ceph.conf file in the current directory and add:

[osd]
enable_experimental_unrecoverable_data_corrupting_features = bluestore

With this setting we allow the use of BlueStore and we can now deploy our OSDs:

ceph-deploy --overwrite-conf osd create --bluestore alpha:sdb bravo:sdb charlie:sdb

Running BlueStore

This tiny cluster how runs three OSDs with BlueStore:

root@alpha:~# ceph -s
    cluster c824e460-2f09-4994-8b2f-108aedc52d19
     health HEALTH_OK
     monmap e2: 3 mons at {alpha=[2001:db8::100]:6789/0,bravo=[2001:db8::101]:6789/0,charlie=[2001:db8::102]:6789/0}
            election epoch 14, quorum 0,1,2 alpha,bravo,charlie
        mgr active: charlie standbys: alpha, bravo
     osdmap e14: 3 osds: 3 up, 3 in
            flags sortbitwise,require_jewel_osds,require_kraken_osds
      pgmap v24: 64 pgs, 1 pools, 0 bytes data, 0 objects
            43356 kB used, 30374 MB / 30416 MB avail
                  64 active+clean
root@alpha:~#
root@alpha:~# ceph osd tree
ID WEIGHT  TYPE NAME        UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 0.02907 root default                                       
-2 0.00969     host alpha                                     
 0 0.00969         osd.0         up  1.00000          1.00000 
-3 0.00969     host bravo                                     
 1 0.00969         osd.1         up  1.00000          1.00000 
-4 0.00969     host charlie                                   
 2 0.00969         osd.2         up  1.00000          1.00000 
root@alpha:~#

On alpha I see that osd.0 only has a small partition for a bit of configuration and the rest is used by BlueStore.

root@alpha:~# df -h /var/lib/ceph/osd/ceph-0
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb1        97M  5.4M   92M   6% /var/lib/ceph/osd/ceph-0
root@alpha:~# lsblk 
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0    8G  0 disk 
├─sda1   8:1    0  7.5G  0 part /
├─sda2   8:2    0    1K  0 part 
└─sda5   8:5    0  510M  0 part [SWAP]
sdb      8:16   0   10G  0 disk 
├─sdb1   8:17   0  100M  0 part /var/lib/ceph/osd/ceph-0
└─sdb2   8:18   0  9.9G  0 part 
sdc      8:32   0   10G  0 disk 
root@alpha:~# cat /var/lib/ceph/osd/ceph-0/type
bluestore
root@alpha:~#

The OSDs should work just like OSDs running FileStore, but they should perform better.

Running headless VirtualBox inside Nested KVM

For the Ceph training at 42on I use VirtualBox to build Virtual Machines. This is because they work under MacOS, Windows and Linux.

For the internal Git at 42on we use Gitlab and I wanted to use Gitlab’s CI to build my Virtual Machines automatically.

As we don’t have any physical hardware at 42on (everything runs in the cloud) I wanted to see if I could run VirtualBox Headless inside a VM with Nested KVM enabled.

Nested KVM

The first thing I checked was if my KVM Virtual Machine actually supported Nested KVM. This can be verified with the kvm-ok command under Ubuntu:

root@glrun01:~# kvm-ok 
INFO: /dev/kvm exists
KVM acceleration can be used
root@glrun01:~#

Now that’s verified I tried to install VirtualBox.

VirtualBox

Installing VirtualBox is straight forward. Just add the repository and install the packages. Don’t forget to reboot afterwards to make sure all kernel modules are loaded and properly installed.

apt-get install virtualbox

VirtualBox Extension Pack

The trick to get everything working properly is to install Oracle’s VirtualBox Extension Pack. It took me a while to figure out that I need to install it manually. It wasn’t done by default after install.

You need to download the pack and install it using the VBoxManage command.

wget http://download.virtualbox.org/virtualbox/5.0.24/Oracle_VM_VirtualBox_Extension_Pack-5.0.24.vbox-extpack
vboxmanage extpack install Oracle_VM_VirtualBox_Extension_Pack-5.0.24.vbox-extpack
vboxmanage list extpacks
vboxmanage setproperty vrdeextpack "Oracle VM VirtualBox Extension Pack"

With that installed and configured I rebooted the machine again just to be sure.

It works!

With that it actually worked. The VirtualBox VMs can now be built inside a Nested KVM machine controlled by Gitlab’s CI 🙂

VirtualBox images to experiment with IPv6

Around me I noticed that a lot of people don’t have hands-on experience with IPv6. The networks they work in do not support IPv6 nor does their ISP provide them with native IPv6 connectivity at home.

On my local systems I often use Virtual Box to set up (IPv6) testing environments. I thought I’d create some Virtual Machine images to get some hands-on experience with IPv6.

The images and README can be found on Github and are aimed to be easy to install and work with.

Requirements

To run the images you need to have Virtual Box installed. You also should be able to use the Linux command line as the Virtual Machines are based on Ubuntu 16.04.

More information can be found in the repository on Github in the README file.

Download

You can download the images here.

How to use

Please take a look at the README on Github. It tells you how to use them.

Happy testing!