Quick overview of Ceph version running on OSDs

When checking a Ceph cluster it’s useful to know which versions you OSDs in the cluster are running.

There is a very simple on-line command to do this:

ceph osd metadata|jq '.[].ceph_version'|sort|uniq -c

Running this on a cluster which is currently being upgraded to Jewel to Luminous it shows:

     10 "ceph version 10.2.6 (656b5b63ed7c43bd014bcafd81b001959d5f089f)"
   1670 "ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)"
    426 "ceph version 10.2.9 (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0)"
     66 "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)"

So 66 OSDs are running Luminous and 2106 OSDs are running Jewel.

Starting with Luminous there is also this command:

ceph features

This shows us all daemon and client versions in the cluster:

{
    "mon": {
        "group": {
            "features": "0x1ffddff8eea4fffb",
            "release": "luminous",
            "num": 5
        }
    },
    "osd": {
        "group": {
            "features": "0x7fddff8ee84bffb",
            "release": "jewel",
            "num": 426
        },
        "group": {
            "features": "0x1ffddff8eea4fffb",
            "release": "luminous",
            "num": 66
        }
    },
    "client": {
        "group": {
            "features": "0x7fddff8ee84bffb",
            "release": "jewel",
            "num": 357
        },
        "group": {
            "features": "0x1ffddff8eea4fffb",
            "release": "luminous",
            "num": 7
        }
    }
}

Chown Ceph OSD data directory using GNU Parallel

Starting with Ceph version Jewel (10.2.X) all daemons (MON and OSD) will run under the privileged user ceph. Prior to Jewel daemons were running under root which is a potential security issue.

This means data has to change ownership before a daemon running the Jewel code can run.

Chown data

As the Release Notes state you will have to chown all your data to ceph:ceph in /var/lib/ceph.

chown -R ceph:ceph /var/lib/ceph

On a system with multiple OSDs this might take a lot of time, using GNU Parallel you can save yourself a lot of time.

Static UID

The ceph User and Group have been assigned static UID and GIDs in the major distributions:

  • Fedora/CentOS/RHEL: 167:167
  • Debian/Ubuntu: 64045/64045

Chown in parallel

Using these commands you can chown the data in /var/lib/ceph much faster.

WARNING: Make sure the OSDs are stopped on the system before you continue!

Now you can run these commands (Ubuntu in this case):

find /var/lib/ceph/osd -maxdepth 1 -mindepth 1 -type d|parallel chown -R 64045:64045
chown 64045:64045 /var/lib/ceph
chown 64045:64045 /var/lib/ceph/*
chown 64045:64045 /var/lib/ceph/bootstrap-*/*

The first command will take the longest. I tested it on a system with 24 OSDs all containing about 800GB of data. That took roughly 20 minutes.