Somewhere along the lines, maybe during the upgrade to Luminous one of my larger Ceph clusters got borked up. Everything was running fine, but my two dedicated MDSes which also act as MONs weren’t running the MGR daemon. Easy enough to fix with ceph-deploy:
$ ceph-deploy mgr create HOST ..... [HOST][INFO ] Running command: sudo ceph --cluster ceph --name client.bootstrap-mgr --keyring /var/lib/ceph/bootstrap-mgr/ceph.keyring auth get-or-create mgr.HOST mon allow profile mgr osd allow * mds allow * -o /var/lib/ceph/mgr/ceph-HOST/keyring [HOST][ERROR ] Error EINVAL: key for mgr.HOST exists but cap mds does not match [HOST][ERROR ] exit code from command was: 22 [ceph_deploy.mgr][ERROR ] could not create mgr [ceph_deploy][ERROR ] GenericError: Failed to create 1 MGRs
It took a little digging but the solution wasn’t too difficult to find. First, we compare the auth caps for a working MGR to our troubled host.
$ ceph auth get mgr.HOST exported keyring for mgr.HOST [mgr.HOST] key = [REDACTED]== caps mon = "allow profile mgr" $ ceph auth get mgr.OTHER_HOST_THAT_WORKS mgr.OTHER_HOST_THAT_WORKS key: [REDACTED]== caps: [mds] allow * caps: [mon] allow profile mgr caps: [osd] allow *
It seems like “allow profile mgr” is what we would need, but there is no cap for mds at all. Sure enough, that is where the auth command bombed out. Manually setting the caps feels like a reasonable idea. Remember that setting the caps overwrites all previous caps, it is not additive.
$ ceph auth caps mgr.HOST mon 'allow profile mgr' mds 'allow *' osd 'allow *' updated caps for mgr.HOST $ ceph auth get mgr.HOST exported keyring for mgr.HOST [mgr.HOST] key = [REDACTED]== caps mds = "allow *" caps mon = "allow profile mgr" caps osd = "allow *"
Now ceph-deploy mgr create works as expected!