Setting noout flag per Ceph OSD

Prior to Ceph Luminous you could only set the noout flag cluster-wide which means that none of your OSDs will be marked as out.

On large(r) cluster this isn’t always what you want as you might be performing maintenance on a part of the cluster, but you sill want other OSDs which go down to be marked as out.

You can however also set the noout flag per OSD basis using this command:

ceph osd add-noout 0

This means that osd.0 will not be marked as out. You can verify this by looking at the OSDMap:

root@alpha:~# ceph osd dump|grep osd.0
osd.0 up in weight 1 up_from 5 up_thru 0 down_at 0 last_clean_interval [0,0) [v2:[2001:db8::100]:6800/1618,v1:[2001:db8::100]:6801/1618,v2:0.0.0.0:6802/1618,v1:0.0.0.0:6803/1618] [v2:[2001:db8::100]:6804/1618,v1:[2001:db8::100]:6805/1618,v2:0.0.0.0:6806/1618,v1:0.0.0.0:6807/1618] exists,noout,up c0c3e5c3-918b-4f3e-a48c-ea8e7c014a3b
root@alpha:~#

Here you can see the noout flag has been set for osd.0.

Removing the flag for this OSD is the reverse process:

ceph osd rm-noout 0

Other flags you can set per osd:

  • nodown
  • noup
  • noin
  • noout

Use this to your advance!

Comparing two Ceph CRUSH maps

Sometimes you want to test if changes you are about to make to a CRUSH map will cause data to move or not.

In this case I wanted to change a rule in CRUSH where it would use device classes, but I didn’t want any of the ~1PB of data in that cluster to move.

By swapping IDs I could prevent data to move:

root default {
id -50 # do not change unnecessarily
id -53 class hdd # do not change unnecessarily
id -122 class ssd # do not change unnecessarily
root default {
id -53 # do not change unnecessarily
id -50 class hdd # do not change unnecessarily
id -122 class ssd # do not change unnecessarily

Notice how I swapped the IDs. After this I updated the rule:

rule rgw {
id 6
type replicated
min_size 1
max_size 10
step take ams02-objects class hdd
step chooseleaf firstn 0 type host
step emit
}

I then compiled the CRUSHMap and ran crushtool to see if there were any differences:

root@mon01:~# crushtool -i crushmap --compare crushmap.new 
rule 0 had 0/10240 mismatched mappings (0)
rule 1 had 0/10240 mismatched mappings (0)
rule 2 had 0/10240 mismatched mappings (0)
rule 3 had 0/10240 mismatched mappings (0)
rule 4 had 0/10240 mismatched mappings (0)
rule 5 had 0/3072 mismatched mappings (0)
rule 6 had 0/10240 mismatched mappings (0)
maps appear equivalent
root@mon01:~#

No changes! So it was safe to inject this map:

root@mon01:~# ceph osd setcrushmap -i crushmap.new

HAProxy in front of Ceph Manager dashboard

The Ceph Mgr dashboard plugin allows for an easy dashboard which can show you how your Ceph cluster is performing.

In certain situations you can’t contact the Mgr daemons directly and you have to place a Proxy server between your computer and the Mgr daemons.

This can be done easily with HAProxy and the following configuration which assumes that:

  • SSL has been disabled in the Dashboard plugin
  • Dashboard plugin listens in port 8080
  • Mgr is running on the hosts mon01, mon02 and mon03
global
  log         127.0.0.1 local1
  log         127.0.0.1 local2 notice

  chroot      /var/lib/haproxy
  pidfile     /var/run/haproxy.pid
  maxconn     4000
  user        haproxy
  group       haproxy
  daemon

  stats socket /var/lib/haproxy/stats

defaults
  log                     global
  mode                    http
  retries                 3
  timeout http-request    10s
  timeout queue           1m
  timeout connect         10s
  timeout client          1m
  timeout server          1m
  timeout http-keep-alive 10s
  timeout check           10s
  maxconn                 3000
  option                  httplog
  no option               httpclose
  no option               http-server-close
  no option               forceclose

  stats enable
  stats hide-version
  stats refresh 30s
  stats show-node
  stats uri /haproxy?stats
  stats auth admin:haproxy

frontend https
  bind *:80
  default_backend ceph-dashboard

backend ceph-dashboard
  balance roundrobin
  option httpchk GET /
  http-check expect status 200
  server mon01 mon01:8080 check
  server mon02 mon02:8080 check
  server mon03 mon03:8080 check

You can now point your browser to the URL/IP of your HAProxy and use your Ceph dashboard.

In case a Mgr machine fails the health checks of HAProxy will make sure it fails over to on of the other Mgr daemons.