Kubernetes Troubleshooting Cheat Sheet: Tips for Debugging Common Issues

Kubernetes is powerful but can be tricky to troubleshoot when something goes wrong. This cheat sheet will help you debug common Kubernetes issues, with simple explanations, examples

1. Pod is not starting

✅ Check pod status:

kubectl get pods

✅ Describe the pod:

kubectl describe pod <pod-name>

Common causes:

Image pull error
CrashLoopBackOff
Insufficient resources

🛠️ Example: Image pull error

Events:
  Warning  Failed     Failed to pull image "nginx:wrong-tag": rpc error: code = Unknown

Fix: Check your image tag. Correct the image name in your YAML file.

Before:

image: nginx:wrong-tag

After:

image: nginx:latest

2. Pods stuck in `CrashLoopBackOff`

✅ Get pod logs:

kubectl logs <pod-name>

🛠️ Example: App is crashing due to a missing config

Error: Missing environment variable DB_HOST

Fix: Check your deployment YAML to ensure the environment variable is defined.

env:
  - name: DB_HOST
    value: my-database

3. Service not reachable

✅ Check service:

kubectl get svc

✅ Check endpoints:

kubectl get endpoints <service-name>

Common issues:

No endpoints available
Wrong targetPort in service definition

🛠️ Example: No endpoints

NAME         ENDPOINTS   AGE
my-service   <none>      5m

Fix: Make sure your pods are labeled correctly and match the selector in the service.

# Pod labels
labels:
  app: myapp

# Service selector
selector:
  app: myapp

4. DNS not working inside the cluster

✅ Exec into pod and try DNS resolution:

kubectl exec -it <pod-name> --nslookup kubernetes.default

Common cause: CoreDNS is not working

✅ Check CoreDNS pods:

kubectl get pods -n kube-system -l k8s-app=kube-dns

Fix: If CoreDNS pods are failing, get their logs:

kubectl logs <coredns-pod-name> -n kube-system

5. Node issues

✅ Check node status:

kubectl get nodes

Common issues:

Node is NotReady

🛠️ Example: Disk pressure

Conditions:
  Type=DiskPressure Status=True

Fix: Clean up unused files or increase disk space.

6. Deployment not updating

✅ Check rollout status:

kubectl rollout status deployment <deployment-name>

✅ Check for paused rollout:

kubectl rollout history deployment <deployment-name>

Fix: Resume rollout if paused:

kubectl rollout resume deployment <deployment-name>

✅ Bonus Tips

Use kubectl explain <resource> to understand resource structure
Use kubectl top pods and kubectl top nodes for resource usage
Add -n <namespace> to commands if you’re not in the default namespace

🚀 Final Thoughts

Troubleshooting in Kubernetes can feel overwhelming, especially for beginners. But once you understand how to read pod events, logs, and describe outputs, it becomes much easier to diagnose and fix issues. Keep this cheat sheet bookmarked for quick reference during those frustrating debugging sessions.

If you found this helpful, consider sharing it with your team or on LinkedIn. Also, leave a comment or reach out if you’d like more detailed tutorials, downloadable PDFs, or want me to cover advanced topics like network policies, RBAC debugging, or Helm chart issues.

Happy Debugging! 🧠🔧

1. Pod is not starting

✅ Check pod status:

✅ Describe the pod:

🛠️ Example: Image pull error

2. Pods stuck in `CrashLoopBackOff`

✅ Get pod logs:

🛠️ Example: App is crashing due to a missing config

3. Service not reachable

✅ Check service:

✅ Check endpoints:

🛠️ Example: No endpoints

4. DNS not working inside the cluster

✅ Check CoreDNS pods:

5. Node issues

✅ Check node status:

🛠️ Example: Disk pressure

6. Deployment not updating

✅ Check rollout status:

✅ Check for paused rollout:

✅ Bonus Tips

🚀 Final Thoughts

Submit a Comment Cancel reply

Recent Posts

Recent Comments

Kubernetes Troubleshooting Cheat Sheet: Tips for Debugging Common Issues

1. Pod is not starting

✅ Check pod status:

✅ Describe the pod:

🛠️ Example: Image pull error

2. Pods stuck in CrashLoopBackOff

✅ Get pod logs:

🛠️ Example: App is crashing due to a missing config

3. Service not reachable

✅ Check service:

✅ Check endpoints:

🛠️ Example: No endpoints

4. DNS not working inside the cluster

✅ Check CoreDNS pods:

5. Node issues

✅ Check node status:

🛠️ Example: Disk pressure

6. Deployment not updating

✅ Check rollout status:

✅ Check for paused rollout:

✅ Bonus Tips

🚀 Final Thoughts

Submit a Comment Cancel reply

Recent Posts

Recent Comments

2. Pods stuck in `CrashLoopBackOff`