Apache CloudStack and MySQL 5.7

SQL Mode

Starting with MySQL 5.7 the default SQL mode is far more strict then it was before.

It now includes ONLY_FULL_GROUP_BY, STRICT_TRANS_TABLES, NO_ZERO_IN_DATE, NO_ZERO_DATE, ERROR_FOR_DIVISION_BY_ZERO, NO_AUTO_CREATE_USER, and NO_ENGINE_SUBSTITUTION.

This can cause problems for applications which need other SQL modes. Apache CloudStack is one of these applications.

The best thing would be to modify the SQL queries executed by CloudStack, but that’s not that easy.

Changing the mode

Luckily the SQL mode can be changed in either the my.conf or as a session variable.

In the my.cnf one can add:

[mysqld]
sql_mode = 'STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'

Or modify the /etc/cloudstack/management/db.properties file to include this line:

db.cloud.url.params=prepStmtCacheSize=517&cachePrepStmts=true&sessionVariables=sql_mode='STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'

You should now be able to run a CloudStack management server on MySQL 5.7!

Future

In the future CloudStack should only be using SQL queries which comply with the new more strict SQL mode. In the meantine a issue and Pull Request have been created to track this situation.

ISC Kea DHCPv6 server

DHCPv6

In most situations StateLess Address AutoConfiguration (SLAAC) works just fine when you work with simple clients in a IPv6 network. But in other cases you want to assign pre-defined addresses or prefixes to clients and there DHCPv6 comes in to play.

While working on the IPv6 implementation for Apache CloudStack I found Kea, a DHCPv6 server from ISC.

DHCPv6 DUID

With IPv4 you could easily identify a client based on the MAC-address it send the DHCP request from. With IPv6 there is a DUID. The “DHCP Unique Identifier”. This is generated by the client and then used by the DHCPv6 server. A few possibilities the clients can choose from:

  • DUID-LL: DUID Based on Link-layer Address
  • DUID-LLT: Link-layer Address Plus Time
  • DUID-EN: Assigned by Vendor Based on Enterprise Number

While DUID seems nice, it can’t be dictated by the DHCPv6 server. The client generates the DUID itself and sends it towards the server. Not something you prefer if your are not in control of the clients.

In a cloud you are in control over the MAC-address, so that is what you want to use where possible. It can’t be spoofed by the client.

ISC Kea

Kea is a DHCPv4/DHCPv6 server being developed by the Internet Systems Consortium. It is a extensible and flexible DHCP server. Facebook uses it in their datacenters.

My goal was very simple. Set up Kea and see if I can use it to hand out an address to a client.

Configuration

I download the tarball and tested it with this configuration between two simple KVM VMs on my desktop.

{
    "Dhcp6": {
        "renew-timer": 1000,
        "rebind-timer": 2000,
        "preferred-lifetime": 3000,
        "valid-lifetime": 4000,
        "lease-database": {
            "type": "memfile",
            "persist": true,
            "name": "/tmp/kea-leases6.csv",
            "lfc-interval": 1800
        },
        "interfaces-config": {
            "interfaces": [ "eth1/2001:db8::1" ]
        },
        "mac-sources": ["duid"],
        "subnet6": [
            {
                "subnet": "2001:db8::/64",
                "id": 1024,
                "interface": "eth1",
                "pools": [
                    { "pool": "2001:db8::100-2001:db8::ffff" }
                ],
                "pd-pools": [
                    {
                        "prefix": "2001:db8:fff::",
                        "prefix-len": 48,
                        "delegated-len": 60
                    }
                ],
                "reservations": [
                    {
                        "hw-address": "52:54:00:d6:c2:a9",
                        "ip-addresses": [ "2001:db8::5054:ff:fed6:c2a9" ]
                    }
                ]
            }
        ]
    }
}

Starting Kea with this configuration was rather simple:

Starting Kea

$ kea-dhcp6 -c /etc/kea.json -d

Logs

When it starts you see some interesting bits in the log:

DHCP6_CONFIG_NEW_SUBNET a new subnet has been added to configuration: 2001:db8::/64 with params t1=1000, t2=2000, preferred-lifetime=3000, valid-lifetime=4000, rapid-commit is disabled
DHCPSRV_CFGMGR_ADD_SUBNET6 adding subnet 2001:db8::/64
HOSTS_CFG_ADD_HOST add the host for reservations: hwaddr=52:54:00:d6:c2:a9 ipv6_subnet_id=1024 hostname=(empty) ipv4_reservation=(no) ipv6_reservation0=2001:db8::5054:ff:fed6:c2a9
HOSTS_CFG_GET_ONE_SUBNET_ID_HWADDR_DUID get one host with IPv6 reservation for subnet id 1024, HWADDR hwtype=1 52:54:00:d6:c2:a9, DUID (no-duid)
HOSTS_CFG_GET_ALL_HWADDR_DUID get all hosts with reservations for HWADDR hwtype=1 52:54:00:d6:c2:a9 and DUID (no-duid)
HOSTS_CFG_GET_ALL_IDENTIFIER get all hosts with reservations using identifier: hwaddr=52:54:00:d6:c2:a9
HOSTS_CFG_GET_ALL_IDENTIFIER_COUNT using identifier hwaddr=52:54:00:d6:c2:a9, found 0 host(s)
HOSTS_CFG_GET_ONE_SUBNET_ID_HWADDR_DUID_NULL host not found using subnet id 1024, HW address hwtype=1 52:54:00:d6:c2:a9 and DUID (no-duid)
HOSTS_CFG_GET_ONE_SUBNET_ID_ADDRESS6 get one host with reservation for subnet id 1024 and including IPv6 address 2001:db8::5054:ff:fed6:c2a9
HOSTS_CFG_GET_ALL_SUBNET_ID_ADDRESS6 get all hosts with reservations for subnet id 1024 and IPv6 address 2001:db8::5054:ff:fed6:c2a9
HOSTS_CFG_GET_ALL_SUBNET_ID_ADDRESS6_COUNT using subnet id 1024 and address 2001:db8::5054:ff:fed6:c2a9, found 0 host(s)
HOSTS_CFG_GET_ONE_SUBNET_ID_ADDRESS6_NULL host not found using subnet id 1024 and address 2001:db8::5054:ff:fed6:c2a9
DHCPSRV_MEMFILE_DB opening memory file lease database: lfc-interval=1800 name=/tmp/kea-leases6.csv persist=true type=memfile universe=6
DHCPSRV_MEMFILE_LEASE_FILE_LOAD loading leases from file /tmp/kea-leases6.csv

You can see it has one reservation based on the MAC-address of the client which it handed out after it booted:

ALLOC_ENGINE_V6_HR_ADDR_GRANTED reserved address 2001:db8::5054:ff:fed6:c2a9 was assigned to client duid=[00:01:00:01:1e:47:7e:66:52:54:00:d6:c2:a9], tid=0xe7899a

Ubuntu client

The client was a simple Ubuntu 14.04 client with this network configuration:

auto eth0
iface eth0 inet dhcp
iface eth0 inet6 dhcp

And indeed, it obtained the correct address:

root@ubuntu1404:~# ip addr show dev eth0
2: eth0:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:d6:c2:a9 brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.100/24 brd 192.168.100.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 2001:db8::5054:ff:fed6:c2a9/64 scope global deprecated dynamic 
       valid_lft 62sec preferred_lft 0sec
    inet6 fe80::5054:ff:fed6:c2a9/64 scope link 
       valid_lft forever preferred_lft forever
root@ubuntu1404:~#

Lease database

Kea can store the leases in a CSV file or MySQL database if you want. In this test I used /tmp/kea-leases6.csv as a CSV file to store the leases in.

In production a MySQL database is probably easier to use, but for the test CSV worked just fine.

Rebuilding libvirt under CentOS 7.1 with RBD storage pool support

If you want to use CentOS 7.1 for your hypervisors with Apache CloudStack and Ceph’s RBD as Primary Storage you need to rebuild libvirt.

CloudStack requires libvirt to be built with RBD storage pool support. It uses libvirt to manage RBD volumes. By default libvirt under CentOS is not built with this support. (On Ubuntu it is btw).

Rebuilding from source

First we need to install a couple of packages:

$ yum install -y rpm-build gcc make ceph-devel

Now we need to download the sRPM:

$ wget http://vault.centos.org/centos/7.1.1503/os/Source/SPackages/libvirt-1.2.8-16.el7.src.rpm

Create a rpmbuild directory:

$ mkdir /root/rpmbuild

Now edit /root/.rpmmacros so that it contains:

%_topdir    /root/rpmbuild

Install the sRPM:

$ rpm -i libvirt-1.2.8-16.el7.src.rpm

Open the /root/rpmbuild/SPECS/libvirt.spec file and look for:

%else
    %define with_storage_rbd      0
%endif

Change this to:

%else
    %define with_storage_rbd      1
%endif

Now build the RPM:

$ cd /root/rpmbuild
$ rpmbuild -ba SPECS/libvirt.spec

After a couple of minutes you should have RPMs with RBD storage pool support enabled!

Enhanced RBD support for CloudStack 4.2

About 1 hour ago the new storage subsystem got merged into the master branch of CloudStack. That is wonderful news for all you out there who want to use features like snapshotting with RBD in CloudStack.

In pre-4.2 CloudStack a snapshot was the same as a backup. As soon as you created a snapshot it would also copy that snapshot to the secondary storage. This could not only lead to high network utilization when talking about 1TB RBD volumes, but it also caused problems with the underlying ‘qemu-img’ tool. To make a long story short: Snapshots with RBD just wouldn’t work in CloudStack 4.0 or 4.1 without resorting to dirty hacking. Which we didn’t.

The new storage subsystem separates the backup and snapshot process. Snapshots are handled by the primary storage and they can be copied to the ‘backup storage’ on request. This allows is to use the full snapshot potential of RBD.

I was waiting for the storage subsystem to be merged into the master branch before I could start working on this. About two weeks ago I already wrote a small function spec in CloudStack’s wiki to describe what has to be done.

A couple of choices still have to be made. Traditionally we could do everything through libvirt and ‘qemu-img’, but from what I can see now we’ll run into some trouble. We might have to go through the process of wrapping librbd into a Java library to get it all done, but I’m not completely positive about that. Some patches for libvirt(-java) could probably also do the job, but it would take a lot of time and work to get those upstream and into the repositories. The goal is to have this new RBD code work natively on a Ubuntu 13.04 system.

The expectation is that CloudStack 4.2 will be released mid-July this year, but if you are a daredevil you can always track the master branch and play around with that.

I’ll post updates on the cloudstack-dev list on a regular base about the progress, but you can also watch the master branch and search for commits with ‘RBD’ in the message.

Ceph distributed storage with CloudStack

As we are nearing the CloudStack 4.0 release I figured it was time I’d write something about the Ceph integration in CloudStack 4.0

In the beginning of this year we (my company) decided we wanted to use CloudStack for our cloud product, but we also wanted to use Ceph for the storage. CloudStack lacked the support for Ceph, so I decided I’d implement that.

Fast forward 4 months, a long flight to California, becoming a committer and PPMC member of CloudStack, various patches for libvirt(-java) and here we are, 25 September 2012!

RBD, the RADOS Block Device from Ceph enables you to stripe disks for (virtual) machines across your Ceph cluster. This not only gives high performance, it gives you virtually unlimited scalability (without downtime!) and redundancy. Something your NetApp, EMC or EqualLogic SAN can’t give you.

Although I’m a very big fan of Nexenta (use it a lot) it also has it’s limitations. A SAS environment won’t keep scaling for ever and SAS is expensive! Yes, ZFS is truly awesome, but you can’t compare it to the distributed powers Ceph has.

The current implementation of RBD in CloudStack is for Primary Storage only, but that’s mainly what you want, it has a couple of limitations though:

  • You still need either NFS or Local Storage for your System VMs
  • Snapshotting isn’t enabled (see below!)
  • It only works with KVM (Using RBD in Qemu)

If you are happy with that you’ll able to allocate hundreds of TB’s to your CloudStack cluster like it was nothing.

What do you need to use RBD for Primary Storage?

  • CloudStack 4.0 (RC2 is out now)
  • Hypervisors with Ubuntu 12.04.1
  • librbd and librados on your hypervisors
  • Libvirt 0.10.0 (Needs manual installation)
  • Qemu compiled with RBD enabled

There is no need for special configuration on your Hypervisor, that’s all controlled by the Management Server. I’d however recommend that you test the Ceph connectivity first:

rbd -m <monitor address> –user <cephx id> –key <cephx key> ls

If that works you can go ahead and add the RBD Primary Storage pool to your CloudStack cluster. It should be there when adding a new storage pool.

It behaves like any storage pool in CloudStack, except the fact that it is running on the next generation of storage 🙂

About the snapshots, this will be implemented in a later version, probably 4.2. It mainly has to do with the way how CloudStack currently handles snapshots. A major overhaul of the storage code is planned and as part of that I’ll implement snapshotting.

Testing is needed! So if you have the time, please test and report back!

You can find me on the Ceph and CloudStack IRC channels and mailinglists, feel free to contact me. Remember that I’m in GMT +2 (Netherlands).

Quassel IRC, never miss anything on IRC!

I was one of those guys who had irssi running inside a screen on a remote Linux box somewhere. It works just fine, but I always forgot to open the SSH session so I missed a lot of IRC conversations. Private messages were a problem as well, most of the times it was a couple of days later before I noticed somebody had actually sent me a PM…

It was time to change my IRC client, with the preference to always be online.

A short search lead me to the website of Quassel IRC, a distributed IRC server/client. Exactly what I was looking for! You just install the “core” on a remote Linux box and use the Linux, Windows, Mac OSX or Android client to participate on IRC.

The core has been running on a Ubuntu 10.04 machine for about one week now and it works like a charm. My IRC conversations are secured by SSL and I never miss a PM or when somebody tags me!

Integration of the client goes well on Ubuntu 12.04 with Unity, it integrates seamlessly with Unity and notifies me whenever I’m tagged or I receive a PM.

Looking for me on IRC? Find me on OFTC @ wido where I hang out in #ceph. Or find me on Freenode @ widodh in #cloudstack