kubernetes
  • kubernetes
  • RBAC
  • backstage
  • cpu-mem
  • End2End testing
  • ingress
  • Bookmarks for learning Kubernetes
  • aliases
    • kubectl
  • apache-kafka
    • backup-kasten-k10
  • api-server
    • secure
  • operators
    • Kubernetes Operator Development Bookmarks
    • books
    • api-conventions
    • monitoring
      • refs
    • tutorial
      • concepts
        • capability_model
        • Index
        • watching_resources
  • prep
    • deployment
    • pod
  • quotas
    • refs
    • openshift
      • Resource quotas per project
  • tilt
    • refs
  • wasm
    • 3scale
Powered by GitBook
On this page
  1. prep

pod

Here’s a breakdown of Kubernetes pod troubleshooting questions across basic, intermediate, and advanced levels, along with explanations and approaches for each:


Basic Questions

1. How do you troubleshoot a pod that is in a CrashLoopBackOff state?

Steps:

  • Check Logs: Use kubectl logs <pod-name> to check the logs of the last container that started in the pod. If multiple containers are in the pod, specify the container name with -c <container-name>.

  • Describe the Pod: Run kubectl describe pod <pod-name> to view recent events, restart reasons, and any error messages.

  • Check Exit Codes: Look at the exit code to identify if it's a known error (like 127 for command not found or 137 for OOMKilled).

  • Fix the Issue: Based on logs and exit codes, fix issues like incorrect commands, missing environment variables, or configuration errors.

2. What do you do if a pod is in a Pending state?

Steps:

  • Check Node Availability: Run kubectl get nodes to confirm nodes are available and ready.

  • Describe the Pod: kubectl describe pod <pod-name> can show if there are unsatisfied resource requests or missing volume mounts.

  • Check Resource Requests: If the pod requests more resources than available, adjust resource requests in the YAML spec.

  • Verify Node Affinity/Tolerations: Ensure node affinity rules and tolerations match your node pool configurations.

3. How do you access the terminal of a running container in a pod?

Solution:

  • Use kubectl exec -it <pod-name> -c <container-name> -- /bin/sh or /bin/bash to access the terminal.

  • If the container only has a minimal shell, you might need to install or use a different shell if possible.

4. How can you restart a pod in Kubernetes?

Solution:

  • Kubernetes does not support directly restarting pods. Instead:

    • Delete the pod: kubectl delete pod <pod-name>. If it is managed by a controller (e.g., a deployment), it will be recreated automatically.

    • For deployments, use kubectl rollout restart deployment <deployment-name> to restart all pods in the deployment.


Intermediate Questions

1. How do you troubleshoot a pod stuck in the ContainerCreating state?

Steps:

  • Describe the Pod: Run kubectl describe pod <pod-name> to see events. Look for common issues like volume mount failures, image pull errors, or network issues.

  • Check Node Disk Space: Insufficient disk space can prevent a pod from being scheduled. Use kubectl describe node <node-name> to check disk and other resources.

  • Check ImagePull Policy: Ensure the imagePullPolicy is set correctly. Use kubectl get pod <pod-name> -o yaml to review and confirm policies.

  • Check Network/Storage Availability: Verify if network policies or storage classes required by the pod are available and properly configured.

2. How can you troubleshoot slow pod startups?

Steps:

  • Check Resource Requests and Limits: Ensure that resource requests and limits are properly configured. Overloaded nodes can slow down pod startup.

  • Image Pull Policy: Using imagePullPolicy: Always for frequently updated images can slow startups if the image is large. Instead, consider setting it to IfNotPresent.

  • Readiness/Liveness Probes: Misconfigured probes can make the pod stay in a not-ready state. Check the probe configurations and ensure the endpoints respond within the expected time.

  • Node Capacity: Confirm that the node has enough capacity for the pod’s resource requests. Use kubectl describe node <node-name> for node details.

3. What do you do if a pod is not responding to readiness/liveness probes?

Steps:

  • Describe the Pod: Use kubectl describe pod <pod-name> to check if the pod is failing the probe and why.

  • Check Probe Configurations: Verify that the readiness and liveness probe settings are correct (e.g., paths, ports, initial delay, and timeout).

  • Test Probe Endpoints: Use kubectl exec to connect to the pod and manually check if the endpoint is accessible.

  • Adjust Timeout and Interval: Sometimes, increasing the timeout or interval for probes helps if the application takes longer to respond.

4. How do you troubleshoot high resource usage in a pod?

Steps:

  • Inspect Pod Metrics: Use kubectl top pod <pod-name> to check the CPU and memory usage.

  • Logs: High resource usage can sometimes be identified by looking at logs (kubectl logs <pod-name>), which may show errors or high processing.

  • Limit and Request Adjustments: If the pod needs more resources than initially requested, consider adjusting the limits and requests.

  • Optimize the Application: Profile and optimize application code or configuration to reduce resource consumption.


Advanced Questions

1. How can you troubleshoot inter-pod communication issues?

Steps:

  • Network Policies: Check if network policies block communication between pods. Run kubectl describe networkpolicy to review policies.

  • DNS Resolution: Use kubectl exec <pod> -- nslookup <service-name> to verify if DNS resolution works.

  • Connectivity Testing: Use kubectl exec to run ping or curl commands between pods to ensure connectivity.

  • Check Services and Endpoints: Verify that the service has the correct endpoints using kubectl get endpoints <service-name>.

2. How do you handle pod evictions?

Steps:

  • Describe the Node: Run kubectl describe node <node-name> to check for resource pressure (CPU, memory, or disk).

  • Check Pod Status: kubectl describe pod <pod-name> will show if the pod is evicted due to resource constraints.

  • Set Pod Priority: If the pod is critical, assign it a higher priority using a PriorityClass.

  • Consider Node Scaling: If evictions are frequent, consider adding more nodes or increasing node capacity.

3. How do you troubleshoot persistent volume (PV) mount issues in a pod?

Steps:

  • Describe the Pod: kubectl describe pod <pod-name> will show events if the PV mount fails.

  • Verify PVC and PV Status: Use kubectl get pvc and kubectl get pv to check if the PVC and PV are bound and available.

  • Check Storage Class: Ensure the pod’s PVC requests a valid storage class. Use kubectl get sc to list storage classes.

  • Permissions: Check if the PV has the necessary permissions for the pod. Certain storage providers may require specific access permissions.

4. How do you troubleshoot image pull issues in Kubernetes?

Steps:

  • Check Pod Events: Run kubectl describe pod <pod-name> to check for image pull errors, such as ImagePullBackOff.

  • Validate Image Credentials: If pulling from a private registry, ensure the image pull secrets are correctly configured.

  • Inspect Image Name and Tag: Verify that the image name and tag are correctly specified in the pod spec.

  • Registry Availability: Confirm the container registry is accessible from your cluster’s network.

5. How do you investigate and troubleshoot CPU throttling in a pod?

Steps:

  • Check CPU Requests and Limits: CPU throttling occurs if the pod reaches its CPU limit. Review and adjust these values in the pod spec if needed.

  • Monitor Pod Metrics: Use kubectl top pod <pod-name> to see current CPU usage. Look for high CPU usage relative to the limit.

  • Profile Application Performance: CPU-intensive workloads may need optimization or distribution across more pods to reduce throttling.


Each of these scenarios combines Kubernetes commands and diagnostic approaches that help you understand and fix pod-related issues. For advanced cases, learning tools like Prometheus, Grafana, and kubectl-debug plugins can enhance troubleshooting capabilities.

PreviousdeploymentNextquotas

Last updated 7 months ago