pod
Here’s a breakdown of Kubernetes pod troubleshooting questions across basic, intermediate, and advanced levels, along with explanations and approaches for each:
Basic Questions
1. How do you troubleshoot a pod that is in a CrashLoopBackOff
state?
Steps:
Check Logs: Use
kubectl logs <pod-name>
to check the logs of the last container that started in the pod. If multiple containers are in the pod, specify the container name with-c <container-name>
.Describe the Pod: Run
kubectl describe pod <pod-name>
to view recent events, restart reasons, and any error messages.Check Exit Codes: Look at the exit code to identify if it's a known error (like
127
for command not found or137
for OOMKilled).Fix the Issue: Based on logs and exit codes, fix issues like incorrect commands, missing environment variables, or configuration errors.
2. What do you do if a pod is in a Pending
state?
Steps:
Check Node Availability: Run
kubectl get nodes
to confirm nodes are available and ready.Describe the Pod:
kubectl describe pod <pod-name>
can show if there are unsatisfied resource requests or missing volume mounts.Check Resource Requests: If the pod requests more resources than available, adjust resource requests in the YAML spec.
Verify Node Affinity/Tolerations: Ensure node affinity rules and tolerations match your node pool configurations.
3. How do you access the terminal of a running container in a pod?
Solution:
Use
kubectl exec -it <pod-name> -c <container-name> -- /bin/sh
or/bin/bash
to access the terminal.If the container only has a minimal shell, you might need to install or use a different shell if possible.
4. How can you restart a pod in Kubernetes?
Solution:
Kubernetes does not support directly restarting pods. Instead:
Delete the pod:
kubectl delete pod <pod-name>
. If it is managed by a controller (e.g., a deployment), it will be recreated automatically.For deployments, use
kubectl rollout restart deployment <deployment-name>
to restart all pods in the deployment.
Intermediate Questions
1. How do you troubleshoot a pod stuck in the ContainerCreating
state?
Steps:
Describe the Pod: Run
kubectl describe pod <pod-name>
to see events. Look for common issues like volume mount failures, image pull errors, or network issues.Check Node Disk Space: Insufficient disk space can prevent a pod from being scheduled. Use
kubectl describe node <node-name>
to check disk and other resources.Check ImagePull Policy: Ensure the
imagePullPolicy
is set correctly. Usekubectl get pod <pod-name> -o yaml
to review and confirm policies.Check Network/Storage Availability: Verify if network policies or storage classes required by the pod are available and properly configured.
2. How can you troubleshoot slow pod startups?
Steps:
Check Resource Requests and Limits: Ensure that resource requests and limits are properly configured. Overloaded nodes can slow down pod startup.
Image Pull Policy: Using
imagePullPolicy: Always
for frequently updated images can slow startups if the image is large. Instead, consider setting it toIfNotPresent
.Readiness/Liveness Probes: Misconfigured probes can make the pod stay in a not-ready state. Check the probe configurations and ensure the endpoints respond within the expected time.
Node Capacity: Confirm that the node has enough capacity for the pod’s resource requests. Use
kubectl describe node <node-name>
for node details.
3. What do you do if a pod is not responding to readiness/liveness probes?
Steps:
Describe the Pod: Use
kubectl describe pod <pod-name>
to check if the pod is failing the probe and why.Check Probe Configurations: Verify that the readiness and liveness probe settings are correct (e.g., paths, ports, initial delay, and timeout).
Test Probe Endpoints: Use
kubectl exec
to connect to the pod and manually check if the endpoint is accessible.Adjust Timeout and Interval: Sometimes, increasing the timeout or interval for probes helps if the application takes longer to respond.
4. How do you troubleshoot high resource usage in a pod?
Steps:
Inspect Pod Metrics: Use
kubectl top pod <pod-name>
to check the CPU and memory usage.Logs: High resource usage can sometimes be identified by looking at logs (
kubectl logs <pod-name>
), which may show errors or high processing.Limit and Request Adjustments: If the pod needs more resources than initially requested, consider adjusting the limits and requests.
Optimize the Application: Profile and optimize application code or configuration to reduce resource consumption.
Advanced Questions
1. How can you troubleshoot inter-pod communication issues?
Steps:
Network Policies: Check if network policies block communication between pods. Run
kubectl describe networkpolicy
to review policies.DNS Resolution: Use
kubectl exec <pod> -- nslookup <service-name>
to verify if DNS resolution works.Connectivity Testing: Use
kubectl exec
to runping
orcurl
commands between pods to ensure connectivity.Check Services and Endpoints: Verify that the service has the correct endpoints using
kubectl get endpoints <service-name>
.
2. How do you handle pod evictions?
Steps:
Describe the Node: Run
kubectl describe node <node-name>
to check for resource pressure (CPU, memory, or disk).Check Pod Status:
kubectl describe pod <pod-name>
will show if the pod is evicted due to resource constraints.Set Pod Priority: If the pod is critical, assign it a higher priority using a
PriorityClass
.Consider Node Scaling: If evictions are frequent, consider adding more nodes or increasing node capacity.
3. How do you troubleshoot persistent volume (PV) mount issues in a pod?
Steps:
Describe the Pod:
kubectl describe pod <pod-name>
will show events if the PV mount fails.Verify PVC and PV Status: Use
kubectl get pvc
andkubectl get pv
to check if the PVC and PV are bound and available.Check Storage Class: Ensure the pod’s PVC requests a valid storage class. Use
kubectl get sc
to list storage classes.Permissions: Check if the PV has the necessary permissions for the pod. Certain storage providers may require specific access permissions.
4. How do you troubleshoot image pull issues in Kubernetes?
Steps:
Check Pod Events: Run
kubectl describe pod <pod-name>
to check for image pull errors, such asImagePullBackOff
.Validate Image Credentials: If pulling from a private registry, ensure the image pull secrets are correctly configured.
Inspect Image Name and Tag: Verify that the image name and tag are correctly specified in the pod spec.
Registry Availability: Confirm the container registry is accessible from your cluster’s network.
5. How do you investigate and troubleshoot CPU throttling in a pod?
Steps:
Check CPU Requests and Limits: CPU throttling occurs if the pod reaches its CPU limit. Review and adjust these values in the pod spec if needed.
Monitor Pod Metrics: Use
kubectl top pod <pod-name>
to see current CPU usage. Look for high CPU usage relative to the limit.Profile Application Performance: CPU-intensive workloads may need optimization or distribution across more pods to reduce throttling.
Each of these scenarios combines Kubernetes commands and diagnostic approaches that help you understand and fix pod-related issues. For advanced cases, learning tools like Prometheus, Grafana, and kubectl-debug
plugins can enhance troubleshooting capabilities.
Last updated