This weekend I got to investigate a Ceph cluster which had issues where the Monitors were constantly performing new elections.
After some investigation on of the three monitors was eating 100% CPU on a single core and kept printing this in the logs:
mon.charlie@2(peon).paxos(paxos updating c 106399655..106400232) lease_expire from mon.0 [2a00:XXX:121:XXX::6789:1]:6789/0 is 2.380296 seconds in the past; mons are probably laggy (or possibly clocks are too skewed)
Digging further I found that the LevelDB store in /var/lib/ceph/mon/X/store.db was 2.5GB in size.
Compact on Start
You can tell the monitor to compact the LevelDB database on start. Add the following to your ceph.conf:
[mon] mon compact on start = true
Now restart the monitor and it will compact the LevelDB database.
The CPU usage now dropped and the monitors were happy again.