Kubernetes is powerful but can be tricky to troubleshoot when something goes wrong. This cheat sheet will help you debug common Kubernetes issues, with simple explanations, examples
1. Pod is not starting
✅ Check pod status:
kubectl get pods
✅ Describe the pod:
kubectl describe pod <pod-name>
Common causes:
- Image pull error
- CrashLoopBackOff
- Insufficient resources
🛠️ Example: Image pull error
Events:
Warning Failed Failed to pull image "nginx:wrong-tag": rpc error: code = Unknown
Fix: Check your image tag. Correct the image name in your YAML file.
Before:
image: nginx:wrong-tag
After:
image: nginx:latest
2. Pods stuck in CrashLoopBackOff
✅ Get pod logs:
kubectl logs <pod-name>
🛠️ Example: App is crashing due to a missing config
Error: Missing environment variable DB_HOST
Fix: Check your deployment YAML to ensure the environment variable is defined.
env:
- name: DB_HOST
value: my-database
3. Service not reachable
✅ Check service:
kubectl get svc
✅ Check endpoints:
kubectl get endpoints <service-name>
Common issues:
- No endpoints available
- Wrong targetPort in service definition
🛠️ Example: No endpoints
NAME ENDPOINTS AGE
my-service <none> 5m
Fix: Make sure your pods are labeled correctly and match the selector in the service.
# Pod labels
labels:
app: myapp
# Service selector
selector:
app: myapp
4. DNS not working inside the cluster
✅ Exec into pod and try DNS resolution:
kubectl exec -it <pod-name> --nslookup kubernetes.default
Common cause: CoreDNS is not working
✅ Check CoreDNS pods:
kubectl get pods -n kube-system -l k8s-app=kube-dns
Fix: If CoreDNS pods are failing, get their logs:
kubectl logs <coredns-pod-name> -n kube-system
5. Node issues
✅ Check node status:
kubectl get nodes
Common issues:
- Node is NotReady
🛠️ Example: Disk pressure
Conditions:
Type=DiskPressure Status=True
Fix: Clean up unused files or increase disk space.
6. Deployment not updating
✅ Check rollout status:
kubectl rollout status deployment <deployment-name>
✅ Check for paused rollout:
kubectl rollout history deployment <deployment-name>
Fix: Resume rollout if paused:
kubectl rollout resume deployment <deployment-name>
✅ Bonus Tips
- Use
kubectl explain <resource>
to understand resource structure - Use
kubectl top pods
andkubectl top nodes
for resource usage - Add
-n <namespace>
to commands if you’re not in the default namespace
🚀 Final Thoughts
Troubleshooting in Kubernetes can feel overwhelming, especially for beginners. But once you understand how to read pod events, logs, and describe outputs, it becomes much easier to diagnose and fix issues. Keep this cheat sheet bookmarked for quick reference during those frustrating debugging sessions.
If you found this helpful, consider sharing it with your team or on LinkedIn. Also, leave a comment or reach out if you’d like more detailed tutorials, downloadable PDFs, or want me to cover advanced topics like network policies, RBAC debugging, or Helm chart issues.
Happy Debugging! 🧠🔧