Troubleshooting
This page is a collection for some common problems and their solution
not all pods are started - error when creating "/kadalu/templates/csi-driver-object.yaml"
Kadalu-operator spins up several pods like csi-provisioner
or csi-nodeplugin
. In case you don’t see them expect the operator
-pod check the log of the pod.
$ kubectl get pods -n kadalu
NAME READY STATUS RESTARTS AGE
operator-68649f4bb6-zq7fp 1/1 Running 0 126m
... Traceback (most recent call last): File "/kadalu/main.py", line 475, in <module> main() File "/kadalu/main.py", line 458, in main deploy_csi_pods(core_v1_client) File "/kadalu/main.py", line 394, in deploy_csi_pods execute(KUBECTL_CMD, CREATE_CMD, "-f", filename) File "/kadalu/kadalulib.py", line 60, in execute raise CommandException(proc.returncode, out.strip(), err.strip()) kadalulib.CommandException: [1] b'' b'Error from server (AlreadyExists): error when creating "/kadalu/templates/csi-driver-object.yaml": csidrivers.storage.k8s.io "kadalu" already exists'
If the log complains about ` error when creating "/kadalu/templates/csi-driver-object.yaml"` you might delete the CSIDriver
as follows
$ kubectl delete CSIDriver kadalu
Note: Use the cleanup script to properly cleanup kadalu.
Storage cannot be created - Failed to create file system fstype=xfs device=/dev/md3
If storage cannot be created, check the logs. In case of the following error
+ pid=0
+ cmd=/usr/bin/python3
+ script=/kadalu/server.py
+ trap 'kill ${!}; term_handler' SIGTERM
+ pid=6
+ true
+ /usr/bin/python3 /kadalu/server.py
+ wait 7
+ tail -f /dev/null
[2020-01-06 13:21:41,200] ERROR [glusterfsd - 107:create_and_mount_brick] - Failed to create file system fstype=xfs device=/dev/md3
-
you might check your disk config and ensure that there are no partitions and especially no partition table on the disk. The following command may be handy to delete the partition table
$ dd if=/dev/zero of=/dev/md3 bs=512 count=1
$ wipefs -a -t dos -f /dev/md3/
Note
|
above, you may need to replace 'md3' with proper device of your choice. |
Different Pods and where to look for logs
Kadalu namespace has many pods created if everything is fine, including those of storage pods. Lets look at which pod would have the required information for you when you get into an error!
operator
This pod is the first pod to be started in the namespace, and starts other required pods. This is the pod which keeps a watch on CRD, and starts the storage service too.
If you have any error in starting of storage pods, check the logs here.
csi-provisioner
This pod creates the PV, and assigns the size (quota) to the PV. If PV creation fails, this pod’s log is what we need to check.
csi-nodeplugin
If PVC is successfully created, but it failed to move to Bound
state, then this is where the issue can be. This performs the mount of all the PVs.
server-*-N
These are the pods, which has glusterfsd
processes running, exporting the storage provided in storage config. One may need to check the logs of server too if PVC creation.
All pods' log using CLI
If you have installed kubectl_kadalu
package, then you can do below to get the logs of all pods running in kadalu namespace. It is helpful when one is not sure where to look for errors.
$ kubectl kadalu logs
Quota of PVCs
kadalu
uses simple-quota feature of glusterfs, which is present only in kadalu storage releases of glusterfs.
As this is a new feature of glusterfs, there is possibilities where an user can hit a bug which is stopping the usage in in production. Hence, we have provided an option to disable quota limit check on PVCs of a particular storage pool. Please use below steps to get this working.
$ kubectl exec -it kadalu-csi-provisioner-0 -c kadalu-provisioner -- bash
# setfattr -n glusterfs.quota.disable-check -v "1" /mnt/${storage-pool-name}
This disable check is 'runtime' only fix right now, so if the server pods are restarted, this command may need to be issued again. Similarly to enable the check again, just pass the value as "0"
.
Troubleshooting External GlusterFS with native directory quota
When PVCs can grow outside of defined size while using External GlusterFS, and quota management is delegated to GlusterFS, you have to verify that:
-
Namespace 'Kadalu' is created prior the 'glusterquota-ssh-secret' secret
-
SSH private key and user are defined in 'glusterquota-ssh-secret' secret and kadalu-provisioner has the correct values:
# kubectl get secret glusterquota-ssh-secret -o jsonpath='{.data}' -o json | jq '.data | map_values(@base64d)' # kubectl exec -it kadalu-csi-provisioner-0 -c kadalu-provisioner -- sh -c 'df -h | grep -P secret-volume; echo $SECRET_GLUSTERQUOTA_SSH_USERNAME'
-
The value of 'ssh-privatekey' is not padded ("=" will be added when the base64 encoded string is too short)
-
glusterquota-ssh-username is a valid user on all Gluster nodes and can execute the gluster binaries
-
Verify that the quota is enabled on the gluster volume
-
KadaluStorage is of type 'External'
-
You gave enough time for GlusterFS to realize that the PVC is full and to block any write
PVC is pending, Error 'No Hosting Volumes available, add more storage' is observed in the logs but there is enough space.
In case the PVC is remaining in pending state and the PV is not created, you can check:
-
Check the logs. Sample error message:
W0111 07:10:42.877267 1 controller.go:943] Retrying syncing claim "056b1267-2f62-4554-8625-5fc1686b1ac8", failure 0 E0111 07:10:42.878137 1 controller.go:966] error syncing claim "056b1267-2f62-4554-8625-5fc1686b1ac8": failed to provision volume with StorageClass "kadalu.gluster": rpc error: code = ResourceExhausted desc = No Hosting Volumes available, add more storage
-
Connect to the provisioner and test verify the volume is mounted and write-able:
kubectl -n kadalu exec -it kadalu-csi-provisioner-0 -c kadalu-provisioner -- bash df -h /mnt/<KadaluStorage's name> dd if=/dev/zero of=/mnt/<KadaluStorage's name>/iotest_file bs=1M count=10
-
Verify that the PVC is requested with at least 10% less than the KadaluStorage. Kadalu adheres to Gluster’s reserve requirements (10%) and will refuse to create the PV/PVC if the PVC request > (total size - reserve)
POD is unable to attach or mount volumes "driver name kadalu not found in the list of registered CSI drivers"
Describing a pod shows the following events:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 10m default-scheduler 0/5 nodes are available: 5 pod has unbound immediate PersistentVolumeClaims. Warning FailedScheduling 10m default-scheduler 0/5 nodes are available: 5 pod has unbound immediate PersistentVolumeClaims. Normal Scheduled 10m default-scheduler Successfully assigned openshift-monitoring/alertmanager-main-0 to okd4-compute-1 Normal SuccessfulAttachVolume 10m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-a047ee57-d5b3-4f37-a217-995e26d2f066" Warning FailedMount 8m32s kubelet Unable to attach or mount volumes: unmounted volumes=[alertmanager], unattached volumes=[alertmanager-trusted-ca-bundle kube-api-access-srkrv config-volume tls-assets alertmanager secret-alertmanager-main-tls secret-alertmanager-main-proxy secret-alertmanager-kube-rbac-proxy]: timed out waiting for the condition Warning FailedMount 6m18s kubelet Unable to attach or mount volumes: unmounted volumes=[alertmanager], unattached volumes=[config-volume tls-assets alertmanager secret-alertmanager-main-tls secret-alertmanager-main-proxy secret-alertmanager-kube-rbac-proxy alertmanager-trusted-ca-bundle kube-api-access-srkrv]: timed out waiting for the condition Warning FailedMount 4m4s kubelet Unable to attach or mount volumes: unmounted volumes=[alertmanager], unattached volumes=[secret-alertmanager-main-tls secret-alertmanager-main-proxy secret-alertmanager-kube-rbac-proxy alertmanager-trusted-ca-bundle kube-api-access-srkrv config-volume tls-assets alertmanager]: timed out waiting for the condition Warning FailedMount 106s kubelet Unable to attach or mount volumes: unmounted volumes=[alertmanager], unattached volumes=[tls-assets alertmanager secret-alertmanager-main-tls secret-alertmanager-main-proxy secret-alertmanager-kube-rbac-proxy alertmanager-trusted-ca-bundle kube-api-access-srkrv config-volume]: timed out waiting for the condition Warning FailedMount 11s (x13 over 10m) kubelet MountVolume.MountDevice failed for volume "pvc-a047ee57-d5b3-4f37-a217-995e26d2f066" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name kadalu not found in the list of registered CSI drivers
Reapply the csi-nodeplugin-platform
.yaml manifest from Kadalu Latest Release