The wrong way to debug CrashLoopBackOff

Last week I had occasion to test deploying ceph-csi on a k3s cluster, so that Kubernetes workloads could access block storage provided by an external Ceph cluster. I went with the upstream Ceph documentation, because assuming everything worked it’d then be really easy for me to say to others “just go do this”.

Everything did not work.

I’d gone through all the instructions, inserting my own Ceph cluster’s FSID and MON IP addresses in the right places, applied the YAML to deploy the provisioner and node plugins, and all the provisioner bits were running just fine, but the csi-rbdplugin pods were stuck in CrashLoopBackOff:

> kubectl get pods
NAME                                        READY   STATUS             RESTARTS          AGE
csi-rbdplugin-22zjr                         1/3     CrashLoopBackOff   107 (3m55s ago)   2d
csi-rbdplugin-pbtc2                         1/3     CrashLoopBackOff   104 (3m33s ago)   2d
csi-rbdplugin-provisioner-9dcfd56d7-c8s72   7/7     Running            28 (35m ago)      8d
csi-rbdplugin-provisioner-9dcfd56d7-hcztz   7/7     Running            28 (35m ago)      8d
csi-rbdplugin-provisioner-9dcfd56d7-w2ctc   7/7     Running            28 (35m ago)      8d
csi-rbdplugin-r2rzr                         1/3     CrashLoopBackOff   106 (3m39s ago)   2d

The csi-rbdplugin pod consists of three containers – driver-registrar, csi-rbdplugin, liveness-prometheus – and csi-rbdplugin wasn’t able to load the rbd kernel module:

> kubectl logs csi-rbdplugin-22zjr --container csi-rbdplugin
I0726 10:25:12.862125    7628 cephcsi.go:199] Driver version: canary and Git version: d432421a88238a878a470d54cbf2c50f2e61cdda
I0726 10:25:12.862452    7628 cephcsi.go:231] Starting driver type: rbd with name: rbd.csi.ceph.com
I0726 10:25:12.865907    7628 mount_linux.go:284] Detected umount with safe 'not mounted' behavior
E0726 10:25:12.872477    7628 rbd_util.go:303] modprobe failed (an error (exit status 1) occurred while running modprobe args: [rbd]): "modprobe: ERROR: could not insert 'rbd': Key was rejected by service\n"
F0726 10:25:12.872702    7628 driver.go:150] an error (exit status 1) occurred while running modprobe args: [rbd] 

Matching “modprobe: ERROR: could not insert ‘rbd’: Key was rejected by service” in the above was an error on each host’s console: “Loading of unsigned module is rejected”. These hosts all have secure boot enabled, so I figured it had to be something to do with that. So I logged into one of the hosts and ran modprobe rbd as root, but that worked just fine. No key errors, no unsigned module errors. And once I’d run modprobe rbd (and later modprobe nbd) on the host, the csi-rbdplugin container restarted and worked just fine.

So why wouldn’t modprobe work inside the container? /lib/modules from the host is mounted inside the container, the container has the right extra privileges… Clearly I needed to run a shell in the failing container to poke around inside when it was in CrashLoopBackOff state, but I realised I had no idea how to do that. I knew I could kubectl exec -it csi-rbdplugin-22zjr --container csi-rbdplugin -- /bin/bash but of course that only works if the container is actually running. My container wouldn’t even start because of that modprobe error.

Having previously spent a reasonable amount of time with podman, which has podman run, I wondered if there were a kubectl run that would let me start a new container using the upstream cephcsi image, but running a shell, instead of its default command. Happily, there is a kubectl run, so I tried it:

> kubectl run -it cephcsi --image=quay.io/cephcsi/cephcsi:canary --rm=true --command=true -- /bin/bash
If you don't see a command prompt, try pressing enter.
[root@cephcsi /]# modprobe rbd
modprobe: FATAL: Module rbd not found in directory /lib/modules/5.14.21-150400.24.66-default
[root@cephcsi /]# ls /lib/modules/
[root@cephcsi /]#  

Ohhh, right, of course, that doesn’t have the host’s /lib/modules mounted. podman run lets me add volume mounts using -v options , so surely kubectl run will let me do that too.

At this point in the story, the notes I wrote last week include an awful lot of swearing.

See, kubectl run doesn’t have a -v option to add mounts, but what it does have is an --overrides option to let you add a chunk of JSON to override the generated pod. So I went back to the relevant YAML and teased out the bits I needed to come up with this monstrosity:

> kubectl run -it cephcsi-test \
  --image=quay.io/cephcsi/cephcsi:canary --rm=true \
  --overrides='{
    "apiVersion": "v1",
    "spec": {
      "containers": [ {
        "name": "cephcsi",
        "command": ["/bin/bash"],
        "stdin": true, "tty": true,
        "image": "quay.io/cephcsi/cephcsi:canary",
        "volumeMounts": [ {
          "mountPath": "/lib/modules", "name": "lib-modules" }],
        "securityContext": {
          "allowPrivilegeEscalation": true,
          "capabilities": { "add": [ "SYS_ADMIN" ] },
          "privileged": true }
      } ],
      "volumes": [ {
        "name": "lib-modules",
        "hostPath": { "path": "/lib/modules", "type": "" }
      } ]
    } }'

But at least I could get a shell and reproduce the problem:

> kubectl run -it cephcsi-test [honking great horrible chunk of JSON]
[root@cephcsi-test /]# ls /lib/modules/
5.14.21-150400.24.66-default
[root@cephcsi-test /]# modprobe rbd
modprobe: ERROR: could not insert 'rbd': Key was rejected by service

A certain amount more screwing around looking at the source for modprobe and bits of the kernel confirmed that the kernel really didn’t think the module was signed for some reason (mod_verify_sig() was returning -ENODATA), but I knew these modules were fine, because I could load them on the host. Eventually I hit on this:

[root@cephcsi-test /]# ls /lib/modules/*/kernel/drivers/block/rbd*
/lib/modules/5.14.21-150400.24.66-default/kernel/drivers/block/rbd.ko.zst

Wait, what’s that .zst extension? It turns out we (SUSE) have been shipping zstd-compressed kernel modules since – as best as I can tell – some time in 2021. modprobe on my SLE Micro 5.3 host of course supports this:

# grep PRETTY /etc/os-release
PRETTY_NAME="SUSE Linux Enterprise Micro for Rancher 5.3"
# modprobe --version
kmod version 29
+ZSTD +XZ +ZLIB +LIBCRYPTO -EXPERIMENTAL

modprobe in the CentOS Stream 8 upstream cephcsi container does not:

[root@cephcsi-test /]# grep PRETTY /etc/os-release 
PRETTY_NAME="CentOS Stream 8"
[root@cephcsi-test /]# modprobe --version
kmod version 25
+XZ +ZLIB +OPENSSL -EXPERIMENTAL

Mystery solved, but I have to say the error messages presented were spectacularly misleading. I later tried with secure boot disabled, and got something marginally better – in that case modprobe failed with “modprobe: ERROR: could not insert ‘rbd’: Exec format error”, and dmesg on the host gave me “Invalid ELF header magic: != \x7fELF”. If I’d seen messaging like that in the first place I might have been quicker to twig to the compression thing.

Anyway, the point of this post wasn’t to rant about inscrutable kernel errors, it was to rant about how there’s no way anyone could be reasonably expected to figure out how to do that --overrides thing with the JSON to debug a container stuck in CrashLoopBackOff. Assuming I couldn’t possibly be the first person to need to debug containers in this state, I told my story to some colleagues, a couple of whom said (approximately) “Oh, I edit the pod YAML and change the container’s command to tail -f /dev/null or sleep 1d. Then it starts up just fine and I can kubectl exec into it and mess around”. Those things totally work, and I wish I’d thought to do that myself. The best answer I got though was to use kubectl debug to make a copy of the existing pod but with the command changed. I didn’t even know kubectl debug existed, which I guess is my reward for not reading the entire manual 😉

So, finally, here’s the right way to do what I was trying to do:

> kubectl debug csi-rbdplugin-22zjr -it \
    --copy-to=csi-debug --container=csi-rbdplugin -- /bin/bash
[root@... /]# modprobe rbd
modprobe: ERROR: could not insert 'rbd': Key was rejected by service

(...do whatever other messing around you need to do, then...)

[root@... /]# exit
Session ended, resume using 'kubectl attach csi-debug -c csi-rbdplugin -i -t' command when the pod is running
> kubectl delete pod csi-debug
pod "csi-debug" deleted 

In the above kubectl debug invocation, csi-rbdplugin-22zjr is the existing pod that’s stuck in CrashLoopBackOff, csi-debug is the name of the new pod being created, and csi-rbdplugin is the container in that pod that has its command replaced with /bin/bash, so you can mess around inside it.

Leave a Reply

Your email address will not be published. Required fields are marked *