Last week I had occasion to test deploying ceph-csi on a k3s cluster, so that Kubernetes workloads could access block storage provided by an external Ceph cluster. I went with the upstream Ceph documentation, because assuming everything worked it’d then be really easy for me to say to others “just go do this”.
Everything did not work.
I’d gone through all the instructions, inserting my own Ceph cluster’s FSID and MON IP addresses in the right places, applied the YAML to deploy the provisioner and node plugins, and all the provisioner bits were running just fine, but the csi-rbdplugin pods were stuck in CrashLoopBackOff:
> kubectl get pods NAME READY STATUS RESTARTS AGE csi-rbdplugin-22zjr 1/3 CrashLoopBackOff 107 (3m55s ago) 2d csi-rbdplugin-pbtc2 1/3 CrashLoopBackOff 104 (3m33s ago) 2d csi-rbdplugin-provisioner-9dcfd56d7-c8s72 7/7 Running 28 (35m ago) 8d csi-rbdplugin-provisioner-9dcfd56d7-hcztz 7/7 Running 28 (35m ago) 8d csi-rbdplugin-provisioner-9dcfd56d7-w2ctc 7/7 Running 28 (35m ago) 8d csi-rbdplugin-r2rzr 1/3 CrashLoopBackOff 106 (3m39s ago) 2d
The csi-rbdplugin pod consists of three containers – driver-registrar, csi-rbdplugin, liveness-prometheus – and csi-rbdplugin wasn’t able to load the rbd kernel module:
> kubectl logs csi-rbdplugin-22zjr --container csi-rbdplugin I0726 10:25:12.862125 7628 cephcsi.go:199] Driver version: canary and Git version: d432421a88238a878a470d54cbf2c50f2e61cdda I0726 10:25:12.862452 7628 cephcsi.go:231] Starting driver type: rbd with name: rbd.csi.ceph.com I0726 10:25:12.865907 7628 mount_linux.go:284] Detected umount with safe 'not mounted' behavior E0726 10:25:12.872477 7628 rbd_util.go:303] modprobe failed (an error (exit status 1) occurred while running modprobe args: [rbd]): "modprobe: ERROR: could not insert 'rbd': Key was rejected by service\n" F0726 10:25:12.872702 7628 driver.go:150] an error (exit status 1) occurred while running modprobe args: [rbd]
Matching “modprobe: ERROR: could not insert ‘rbd’: Key was rejected by service” in the above was an error on each host’s console: “Loading of unsigned module is rejected”. These hosts all have secure boot enabled, so I figured it had to be something to do with that. So I logged into one of the hosts and ran modprobe rbd
as root, but that worked just fine. No key errors, no unsigned module errors. And once I’d run modprobe rbd
(and later modprobe nbd
) on the host, the csi-rbdplugin container restarted and worked just fine.
So why wouldn’t modprobe
work inside the container? /lib/modules
from the host is mounted inside the container, the container has the right extra privileges… Clearly I needed to run a shell in the failing container to poke around inside when it was in CrashLoopBackOff state, but I realised I had no idea how to do that. I knew I could kubectl exec -it csi-rbdplugin-22zjr --container csi-rbdplugin -- /bin/bash
but of course that only works if the container is actually running. My container wouldn’t even start because of that modprobe error.
Having previously spent a reasonable amount of time with podman, which has podman run
, I wondered if there were a kubectl run
that would let me start a new container using the upstream cephcsi image, but running a shell, instead of its default command. Happily, there is a kubectl run
, so I tried it:
> kubectl run -it cephcsi --image=quay.io/cephcsi/cephcsi:canary --rm=true --command=true -- /bin/bash If you don't see a command prompt, try pressing enter. [root@cephcsi /]# modprobe rbd modprobe: FATAL: Module rbd not found in directory /lib/modules/5.14.21-150400.24.66-default [root@cephcsi /]# ls /lib/modules/ [root@cephcsi /]#
Ohhh, right, of course, that doesn’t have the host’s /lib/modules
mounted. podman run
lets me add volume mounts using -v
options , so surely kubectl run
will let me do that too.
At this point in the story, the notes I wrote last week include an awful lot of swearing.
See, kubectl run
doesn’t have a -v
option to add mounts, but what it does have is an --overrides
option to let you add a chunk of JSON to override the generated pod. So I went back to the relevant YAML and teased out the bits I needed to come up with this monstrosity:
> kubectl run -it cephcsi-test \ --image=quay.io/cephcsi/cephcsi:canary --rm=true \ --overrides='{ "apiVersion": "v1", "spec": { "containers": [ { "name": "cephcsi", "command": ["/bin/bash"], "stdin": true, "tty": true, "image": "quay.io/cephcsi/cephcsi:canary", "volumeMounts": [ { "mountPath": "/lib/modules", "name": "lib-modules" }], "securityContext": { "allowPrivilegeEscalation": true, "capabilities": { "add": [ "SYS_ADMIN" ] }, "privileged": true } } ], "volumes": [ { "name": "lib-modules", "hostPath": { "path": "/lib/modules", "type": "" } } ] } }'
But at least I could get a shell and reproduce the problem:
> kubectl run -it cephcsi-test [honking great horrible chunk of JSON] [root@cephcsi-test /]# ls /lib/modules/ 5.14.21-150400.24.66-default [root@cephcsi-test /]# modprobe rbd modprobe: ERROR: could not insert 'rbd': Key was rejected by service
A certain amount more screwing around looking at the source for modprobe and bits of the kernel confirmed that the kernel really didn’t think the module was signed for some reason (mod_verify_sig()
was returning -ENODATA
), but I knew these modules were fine, because I could load them on the host. Eventually I hit on this:
[root@cephcsi-test /]# ls /lib/modules/*/kernel/drivers/block/rbd* /lib/modules/5.14.21-150400.24.66-default/kernel/drivers/block/rbd.ko.zst
Wait, what’s that .zst
extension? It turns out we (SUSE) have been shipping zstd-compressed kernel modules since – as best as I can tell – some time in 2021. modprobe
on my SLE Micro 5.3 host of course supports this:
# grep PRETTY /etc/os-release PRETTY_NAME="SUSE Linux Enterprise Micro for Rancher 5.3" # modprobe --version kmod version 29 +ZSTD +XZ +ZLIB +LIBCRYPTO -EXPERIMENTAL
modprobe
in the CentOS Stream 8 upstream cephcsi container does not:
[root@cephcsi-test /]# grep PRETTY /etc/os-release PRETTY_NAME="CentOS Stream 8" [root@cephcsi-test /]# modprobe --version kmod version 25 +XZ +ZLIB +OPENSSL -EXPERIMENTAL
Mystery solved, but I have to say the error messages presented were spectacularly misleading. I later tried with secure boot disabled, and got something marginally better – in that case modprobe
failed with “modprobe: ERROR: could not insert ‘rbd’: Exec format error”, and dmesg
on the host gave me “Invalid ELF header magic: != \x7fELF”. If I’d seen messaging like that in the first place I might have been quicker to twig to the compression thing.
Anyway, the point of this post wasn’t to rant about inscrutable kernel errors, it was to rant about how there’s no way anyone could be reasonably expected to figure out how to do that --overrides
thing with the JSON to debug a container stuck in CrashLoopBackOff. Assuming I couldn’t possibly be the first person to need to debug containers in this state, I told my story to some colleagues, a couple of whom said (approximately) “Oh, I edit the pod YAML and change the container’s command to tail -f /dev/null
or sleep 1d
. Then it starts up just fine and I can kubectl exec
into it and mess around”. Those things totally work, and I wish I’d thought to do that myself. The best answer I got though was to use kubectl debug
to make a copy of the existing pod but with the command changed. I didn’t even know kubectl debug
existed, which I guess is my reward for not reading the entire manual 😉
So, finally, here’s the right way to do what I was trying to do:
> kubectl debug csi-rbdplugin-22zjr -it \ --copy-to=csi-debug --container=csi-rbdplugin -- /bin/bash [root@... /]# modprobe rbd modprobe: ERROR: could not insert 'rbd': Key was rejected by service (...do whatever other messing around you need to do, then...) [root@... /]# exit Session ended, resume using 'kubectl attach csi-debug -c csi-rbdplugin -i -t' command when the pod is running > kubectl delete pod csi-debug pod "csi-debug" deleted
In the above kubectl debug
invocation, csi-rbdplugin-22zjr
is the existing pod that’s stuck in CrashLoopBackOff, csi-debug
is the name of the new pod being created, and csi-rbdplugin
is the container in that pod that has its command replaced with /bin/bash
, so you can mess around inside it.