Version

Troubleshooting

This page is a collection for some common problems and their solution

not all pods are started - `error when creating "/kadalu/templates/csi-driver-object.yaml"`

Kadalu-operator spins up several pods like csi-provisioner or csi-nodeplugin. In case you don’t see them expect the operator-pod check the log of the pod.

$ kubectl get pods -n kadalu
NAME                        READY   STATUS    RESTARTS   AGE
operator-68649f4bb6-zq7fp   1/1     Running   0          126m

...
Traceback (most recent call last):
  File "/kadalu/main.py", line 475, in <module>
    main()
  File "/kadalu/main.py", line 458, in main
    deploy_csi_pods(core_v1_client)
  File "/kadalu/main.py", line 394, in deploy_csi_pods
    execute(KUBECTL_CMD, CREATE_CMD, "-f", filename)
  File "/kadalu/kadalulib.py", line 60, in execute
    raise CommandException(proc.returncode, out.strip(), err.strip())
kadalulib.CommandException: [1] b'' b'Error from server (AlreadyExists): error when creating "/kadalu/templates/csi-driver-object.yaml": csidrivers.storage.k8s.io "kadalu" already exists'

If the log complains about ` error when creating "/kadalu/templates/csi-driver-object.yaml"` you might delete the CSIDriver as follows

$ kubectl delete CSIDriver kadalu

Note: Use the cleanup script to properly cleanup kadalu.

Storage cannot be created - `Failed to create file system fstype=xfs device=/dev/md3`

If storage cannot be created, check the logs. In case of the following error

+ pid=0
+ cmd=/usr/bin/python3
+ script=/kadalu/server.py
+ trap 'kill ${!}; term_handler' SIGTERM
+ pid=6
+ true
+ /usr/bin/python3 /kadalu/server.py
+ wait 7
+ tail -f /dev/null
[2020-01-06 13:21:41,200] ERROR [glusterfsd - 107:create_and_mount_brick] - Failed to create file system fstype=xfs device=/dev/md3

you might check your disk config and ensure that there are no partitions and especially no partition table on the disk. The following command may be handy to delete the partition table

$ dd if=/dev/zero of=/dev/md3 bs=512 count=1
$ wipefs -a -t dos -f /dev/md3/

Note	above, you may need to replace 'md3' with proper device of your choice.

Different Pods and where to look for logs

Kadalu namespace has many pods created if everything is fine, including those of storage pods. Lets look at which pod would have the required information for you when you get into an error!

operator

This pod is the first pod to be started in the namespace, and starts other required pods. This is the pod which keeps a watch on CRD, and starts the storage service too.

If you have any error in starting of storage pods, check the logs here.

csi-provisioner

This pod creates the PV, and assigns the size (quota) to the PV. If PV creation fails, this pod’s log is what we need to check.

csi-nodeplugin

If PVC is successfully created, but it failed to move to Bound state, then this is where the issue can be. This performs the mount of all the PVs.

server-*-N

These are the pods, which has glusterfsd processes running, exporting the storage provided in storage config. One may need to check the logs of server too if PVC creation.

All pods' log using CLI

If you have installed kubectl_kadalu package, then you can do below to get the logs of all pods running in kadalu namespace. It is helpful when one is not sure where to look for errors.

$ kubectl kadalu logs

Quota of PVCs

kadalu uses simple-quota feature of glusterfs, which is present only in kadalu storage releases of glusterfs.

As this is a new feature of glusterfs, there is possibilities where an user can hit a bug which is stopping the usage in in production. Hence, we have provided an option to disable quota limit check on PVCs of a particular storage pool. Please use below steps to get this working.

$ kubectl exec -it kadalu-csi-provisioner-0 -c kadalu-provisioner -- bash
# setfattr -n glusterfs.quota.disable-check -v "1" /mnt/${storage-pool-name}

This disable check is 'runtime' only fix right now, so if the server pods are restarted, this command may need to be issued again. Similarly to enable the check again, just pass the value as "0".