The Ceph Trafficlight

At PCextreme we have a 700TB Ceph cluster which is used behind our public cloud Aurora Compute which runs Apache CloudStack.

Ceph health

One of the things we monitor of the Ceph cluster is it’s health. This can be OK, WARN or ERR. It speaks for itself that you always want to see OK, but things do go wrong. Disks fail, machines die, kernel panics happen. Stuff goes wrong.

I thought it was a cool idea to buy a used real traffic light which I could install at the office. OK would be green, WARN would be orange/amber and ERR would be red.

2nd hand Trafficlight

Some searching on the internet brought me to trafficlightshop.com. They sell used (Dutch) traffic lights. I bought a Vialis 2230 (The largest on the picture below).

For EUR 75,00 I got my hands on a original trafficlight!

Controlling the lights

When I got the trafficlight it was already equipped with LED lights which work on 230V. A 30cm cable (cut off) was sticking out with 4 wires in it:

Blue: Neutral
Green: Phase/Positive for Green
Yellow: Phase/Positive for Orange/Amber
Red: Phase/Positive for Red

It was easy. All I had to do was buy a add-on board for a Raspberry Pi so I could control the lights.

Solid State Relay

My search for a add-on board brought me to BitWizard.nl, they make all kinds of add-on boards for the Raspberry Pi.

One of them is a SSR (Solid State Relay) board which has 4 outputs. Their wiki explained that it was very simple to control the Relays using Python.

A quick test at my desk at home brought be to a working setup.

Addition components

After writing the code which controls the light it was time to buy some housing where I could install it in.

At Conrad I found the things I needed. A housing, some connectors and some cabling. A overview of my order:

This was needed since I would install it at the office and it needed to be safe. You don’t want somebody to get shocked by 230V. That’s kind of dangerous.

Bringing it together

It was time to start drilling and soldering! In my shed it looked like this:

And a few more pictures of building it. Took me about 3 hours to complete.

At the office

The next day it was time to install it at the office! Some drilling and the result:

Health OK: Green

Health WARN: Amber/Orange

Health ERR: Red

No picture! We can trigger a WARN state in Ceph without service interruptions, but not a ERR state.

The code

The Python code I wrote is all on Github. It’s just some Python code which polls our Ceph dashboard every second. If the status changes it also changes the traffic light.

Ceph Monitors are laggy or clock might be skewed

This weekend I got to investigate a Ceph cluster which had issues where the Monitors were constantly performing new elections.

After some investigation on of the three monitors was eating 100% CPU on a single core and kept printing this in the logs:

mon.charlie@2(peon).paxos(paxos updating c 106399655..106400232) lease_expire from mon.0 [2a00:XXX:121:XXX::6789:1]:6789/0 is 2.380296 seconds in the past; mons are probably laggy (or possibly clocks are too skewed)

Digging further I found that the LevelDB store in /var/lib/ceph/mon/X/store.db was 2.5GB in size.

Compact on Start

You can tell the monitor to compact the LevelDB database on start. Add the following to your ceph.conf:

[mon]
mon compact on start = true

Now restart the monitor and it will compact the LevelDB database.

The CPU usage now dropped and the monitors were happy again.

Protecting your Ceph pools against removal or property changes

One of the dangers of Ceph was that by accident you could remove a multi TerraByte pool and loose all the data. Although the CLI tools asked you for conformation, librados and all it’s bindings did not.

Imagine explaining that you just removed a 200TB pool from your storage system due to a typo in your Python code…

So I suggested that we came up with a mechanism to prevent pools from being deleted from a Ceph cluster. And Sage quickly came up with something!

Hammer v0.94

Ceph version 0.94 aka ‘Hammer’ came out a couple of weeks ago and it has a some fancy features which prevent you from removing a pool by accident or on purpose.

Monitors denying pool removal

A new configuration setting for the monitors has been introduced:

mon_allow_pool_delete = false

If you add that to the ceph.conf ([mon] section) and restart your MONs you will not be able to remove any pool from your Ceph cluster. Not via the CLI or directly via librados. The Monitors will simply refuse it:

root@admin:~# ceph osd pool delete rbd rbd --yes-i-really-really-mean-it
Error EPERM: pool deletion is disabled; you must first set the mon_allow_pool_delete config option to true before you can destroy a pool
root@admin:~#

root@admin:~# rados rmpool rbd rbd --yes-i-really-really-mean-it
pool rbd does not exist
error 1: (1) Operation not permitted
root@admin:~#

This is a cluster-wide configuration setting and can only be changed by restarting your Monitors. A good way to prevent anybody from removing a pool by accident or on purpose.

Pool flags

A different way to achieve this is by setting the new nodelete flag on a pool. Setting this flag prevents the pool from being removed.

Next to this flag a couple of other flags were introduced:

nodelete
nosizechange
nopgchange

The flags speak for themselves. If you set these flags those operations are no longer allowed:

root@admin:~# ceph osd pool set rbd nosizechange true
set pool 0 nosizechange to true
root@admin:~# ceph osd pool set rbd size 5
Error EPERM: pool size change is disabled; you must unset nosizechange flag for the pool first
root@admin:~#

I’m not allowed to change the size (aka replication level/setting) for the pool ‘rbd’ while that flag is set.

Applying all flags

To apply these flags quickly to all your pools, simply execute these three one-liners:

$ for pool in $(rados lspools); do ceph osd pool set $pool nosizechange true; done

$ for pool in $(rados lspools); do ceph osd pool set $pool nopgchange true; done

$ for pool in $(rados lspools); do ceph osd pool set $pool nodelete true; done

Your Ceph cluster just became a lot safer! No data loss or downtime due to fat fingers anymore 🙂