Understanding Kubernetes: Fixing Exit Code 137 and Pod Termination

Fix Exit Code 137 in Kubernetes with memory diagnostics and configuration tips. Ensure stable performance; click to optimize now!
Have you ever encountered a mysterious exit code 137 in your Kubernetes logs, leaving you scratching your head and wondering what went wrong?
You’re not alone. According to a recent survey by the
55% of Kubernetes users reported experiencing pod termination issues, with exit code 137 being one of the most common culprits. Cloud Native Computing Foundation (CNCF)
In the world of Kubernetes, exit code 137 is like a cryptic message that something has gone awry with your containers. It’s a signal that can cause frustration and confusion, especially for those new to container orchestration. But fear not! This blog post will demystify exit code 137 and equip you with the knowledge to tackle this issue head-on.
We’ll dive deep into:
- What exit code 137 really means
- The common reasons behind its occurrence
- Step-by-step troubleshooting techniques
- Practical solutions to resolve and prevent this issue
Understanding and resolving exit code 137 is crucial for maintaining the stability and operational efficiency of your Kubernetes clusters. By the end of this article, you’ll have the tools and insights needed to keep your containers running smoothly and your deployments on track.
Let’s begin this journey to master one of Kubernetes’ most notorious exit codes and ensure your pods stay healthy and operational!
What is Exit Code 137 in Kubernetes?
Exit code 137 in Kubernetes indicates that a SIGKILL signal terminated a container. This signal is typically sent when a container exceeds its memory limits or needs to be forcefully shut down.
Understanding Exit Code 137
Exit code 137 is calculated by adding 128 to the signal number 9 (SIGKILL)
When you see this code, it means:
- The container was terminated abruptly
- No graceful shutdown was possible
- The process couldn’t perform any cleanup operations
Common Scenarios Leading to SIGKILL
- Memory Limits Exceeded: The most frequent cause of exit code 137 is when a container uses more memory than allocated.
- Manual Termination:Administrators might forcefully terminate a container using commands like kubectl delete.
- Node Memory Pressure: When a Kubernetes node is under memory pressure, it may terminate containers to free up resources.
- Failed Health Checks: In some cases, failed health checks can result in a container being terminated with exit code 137.
Key Technical Terminology
OOMKilled (Out of Memory Killed):
- The status given to containers terminated due to exceeding memory limits.
- Kubernetes marks containers as OOMKilled when they’re terminated for memory-related issues.
Linux Signals:
- SIGKILL (Signal 9): The signal sent to terminate a process immediately.
- SIGTERM (Signal 15): A softer termination signal, usually sent before SIGKILL
Understanding exit code 137 is crucial for troubleshooting container terminations and optimizing resource allocation in Kubernetes environments.
Why Does Exit Code 137 Occur?
Exit code 137 in Kubernetes can occur due to several reasons, all related to container termination. Let’s explore the most common causes:
Memory Resource Limits
The most frequent cause of exit code 137 is when a container exceeds its specified memory quota. This happens when:
- The container uses more memory than allocated in its pod definition
- Kubernetes terminates the container to prevent it from consuming excessive resources
- The OOM (Out of Memory) killer is triggered, resulting in the SIGKILL signal
Node Resource Constraints
Sometimes, the issue stems from the node itself running out of available memory. This can happen when:
- Multiple containers on the node consume high amounts of memory
- The node’s total memory is insufficient for all running containers
- Kubernetes terminates containers to free up resources for critical system processes
Manual Termination
Exit code 137 can also result from intentional termination by a user or script. This occurs when:
- An administrator forcefully deletes a pod using ‘kubectl delete’
- A script or automation tool sends a SIGKILL signal to the container
- The container is stopped abruptly without a graceful shutdown period
Underlying Application Issues
In some cases, the root cause lies within the application running in the container. Common issues include:
- Memory leaks: The application fails to release memory properly over time
- Inefficient resource utilization: Poor code optimization leads to excessive memory consumption
- Unexpected spikes in memory usage due to specific operations or user loads
Understanding these causes is crucial for effectively troubleshooting and preventing exit code 137 errors in your Kubernetes environment. By identifying the specific reason for the termination, you can implement targeted solutions to improve your container’s stability and resource management.
How to Identify Exit Code 137 in Your Kubernetes Cluster

Detecting exit code 137 in your Kubernetes cluster is crucial for maintaining optimal performance and preventing unexpected pod terminations. Here are some effective tools and methods to identify this issue:
Using kubectl Commands
1. kubectl logs:
- Run ‘kubectl logs <pod-name>’ to check for any memory-related errors or warnings.
- Look for messages indicating memory pressure or out-of-memory conditions.
2. kubectl describe pod:
- Execute ‘kubectl describe pod <pod-name>’ to check for any memory-related errors or warnings.
- Check the “State” section for “OOMKilled” status, which indicates exit code 137.
- Review the “Events” section for any memory-related events or terminations.
Inspecting Resource Usage
1. kubectl top:
- Use ‘kubectl top nodes’ to view CPU and memory usage of cluster nodes.
- Run ‘kubectl top pods’ to check resource consumption of individual pods.
- Look for pods or nodes with consistently high memory usage.
2. Monitoring Dashboards:
- Utilize tools like Grafana to create custom dashboards for resource monitoring.
- Set up alerts for high memory usage or frequent OOMKilled events.
Leveraging Logs and Events
1. Cluster-wide Events:
- Run ‘kubectl get events –sort-by=.metadata.creationTimestamp’ to view recent cluster events.
- Filter for memory-related events or pod terminations.
2. Application Logs:
- Analyze application logs for memory-related errors or unusual patterns.
- Look for indicators of memory leaks or inefficient resource usage.
3. Node Logs:
- Check node-level logs for system memory pressure indications.
- Use ‘journalctl’ on the node to investigate system-level memory issues.
Proactive Monitoring
1. Resource Quotas and Limits:
- Regularly review and adjust resource quotas and limits in pod specifications.
- Use ‘kubectl describe resourcequota’ to check current quota usage.
2. Trend Analysis:
- Implement long-term monitoring to identify trends in memory usage.
- Use tools like Prometheus for collecting and analyzing historical data.
By combining these methods, you can effectively identify and address exit code 137 issues in your Kubernetes cluster, ensuring better stability and performance of your applications.
Fixing Exit Code 137: Troubleshooting Steps
When encountering exit code 137 in your Kubernetes environment, follow these step-by-step solutions to resolve the issue and prevent its recurrence:
Analyze Resource Limits
1. Review pod configuration:
- Use ‘kubectl describe pod <pod_name>’ to check current resource settings.
- Look for memory requests and limits in the container spec.
2. Adjust memory limits:
- Increase memory limits if they’re too restrictive.
- Example: ‘resources.limits.memory: “512Mi”‘
3. Understand memory patterns:
- Use monitoring tools to analyze application memory usage over time.
- Identify peak usage periods and adjust limits accordingly.
Optimize Applications
1. Fix memory leaks:
- Review application code for potential memory leaks.
- Use profiling tools to identify problematic areas.
2. Optimize resource-heavy processes:
- Refactor code to reduce memory consumption.
- Consider using more efficient algorithms or data structures.
Scale Your Cluster
1. Add node resources:
- Use ‘kubectl top nodes’ to check current node utilization.
- Scale up nodes if cluster-wide resources are constrained.
2. Implement autoscaling:
- Set up Horizontal Pod Autoscaler (HPA) for automatic scaling.
- Configure Cluster Autoscaler to dynamically adjust node count.
Use Monitoring and Alerts
1. Implement proactive monitoring:
- Set up Prometheus and Grafana for comprehensive resource tracking.
- Create dashboards to visualize memory usage trends.
2. Configure alerts:
- Set up alerts for high memory usage or frequent OOMKilled events.
- Use tools like AlertManager to notify teams of potential issues.
Configure Eviction Policies
1. Set appropriate QoS classes:
- Assign Guaranteed or Burstable QoS to critical pods.
- Example: Set both requests and limits for Guaranteed QoS.
2. Adjust kubelet eviction thresholds:
- Configure soft and hard eviction thresholds.
- Example: ‘–eviction-hard=memory.available<100Mi’
Best Practices for Avoiding Recurrence
1. Regular resource audits:
- Periodically review and adjust resource allocations.
- Use tools like Kubecost for cost and resource optimization.
2. Implement graceful shutdown:
- Ensure applications can handle SIGTERM for proper cleanup.
- Set appropriate terminationGracePeriodSeconds in pod specs.
3. Use resource quotas:
- Implement namespace-level resource quotas to prevent overallocation.
- Example: ‘kubectl create quota compute-resources –hard=requests.cpu=1,limits.cpu=2,requests.memory=1Gi,limits.memory=2Gi’
4. Educate development teams:
- Provide guidelines for efficient resource usage in Kubernetes.
- Encourage regular application profiling and optimization.
By following these steps and best practices, you can effectively troubleshoot and prevent exit code 137 issues, ensuring a more stable and efficient Kubernetes environment.

Preventing pod terminations due to exit code 137 is crucial for maintaining a stable Kubernetes environment. Here are some best practices to help you avoid these issues:
Proactive Monitoring and Alerts
1. Implement robust monitoring systems:
- Use tools like Grafana and Prometheus to track resource usage.
- Monitor key metrics such as CPU, memory, network traffic, and pod status.
2. Set up alerts for resource thresholds:
- Create alerts from panel data in Kubernetes monitoring dashboards.
- Configure notifications for finance and DevOps teams when resources approach limits.
3. Use custom metrics:
- Enable application-specific metrics to measure errors and unexpected behaviours.
- Set up alerts based on these custom metrics for early detection of issues.
Regular Code Audits and Memory Optimization
1. Periodic code reviews:
- Regularly audit and optimize application code to reduce memory footprint.
- Refactor complex functions and optimize data structures.
2. Use profiling tools:
- Employ tools like HeapTrack, Memcheck, or LeakSanitizer during development.
- Analyze memory usage patterns to identify potential leaks or inefficiencies.
3. Optimize container images:
- Build small container images to reduce resource overhead.
- Remove unnecessary dependencies and files from your containers.
Ensuring Fair Resource Allocation
1. Set appropriate resource limits and requests:
- Define accurate memory and CPU limits for all containers.
- Use ‘kubectl describe pod’ to review current resource settings.
2. Utilize Kubernetes QoS classes:
- Understand and leverage Guaranteed, Burstable, and BestEffort QoS classes.
- Assign critical workloads to Guaranteed QoS for consistent resource availability.
3. Implement resource quotas:
- Use namespace-level resource quotas to prevent overallocation.
- Regularly review and adjust quotas based on application needs.
By implementing these best practices, you can significantly reduce the occurrence of pod terminations due to exit code 137, ensuring a more stable and efficient Kubernetes environment for your applications.
How NudgeBee Helps with Kubernetes Troubleshooting
NudgeBee offers a comprehensive suite of tools designed to simplify and enhance Kubernetes management, with a particular focus on troubleshooting, cost optimization, and operational efficiency.
Troubleshooting Agent
NudgeBee’s Troubleshooting Agent significantly improves issue resolution in Kubernetes environments:
- Accelerates problem-solving from hours to minutes.
- Improves incident handling productivity by 3-5 times.
- Provides automated remediation and guided troubleshooting steps.
- Automatically detect issues like exit code 137 and other Kubernetes errors.
The agent offers clear, actionable instructions for resolving problems, making it particularly valuable for teams with varying levels of Kubernetes expertise.
FinOps Agent
NudgeBee’s FinOps Agent focuses on cost optimization in cloud-native environments:
- Achieves 30-60% cost reduction beyond existing manual efforts.
- Offers continuous real-time optimization.
- Includes features like right-sizing, autonomous optimization, and cost anomaly detection.
- Provides insights for optimal container limits and configuration improvements.
CloudOps Agent
The CloudOps Agent enhances operational efficiency:
- Boosts operational productivity by 100-200%.
- Automates time-consuming actions.
- Integrates with existing tools and supports automated remediation.
- Streamlines tasks with a customized runbook library.
Additional Features
NudgeBee also offers:
- Certificate management and expiry tracking
- Image CVE scanning
- Compliance – CIS Scan reports
- Kubernetes version upgrade dependency reports
- Helm chart upgrade dependency reports
- Integration with Jira, Git, Slack, MS Teams, and Google Chat
NudgeBee’s platform is designed to be extensible, allowing teams to add new agents, tools, and APIs for more internal use cases. It provides a comprehensive solution for Kubernetes management, addressing common challenges such as slow incident resolution, high cloud costs, and limited automation control.
Conclusion
Addressing exit code 137 is crucial for maintaining stable and efficient Kubernetes operations. This error, often indicative of memory-related issues, can significantly impact your application’s performance and reliability if left unchecked. Let’s summarize how we can tackle the exit code 137:
- Proactive Monitoring: Implementing robust monitoring practices is essential for early detection and prevention of exit code 137 and other Kubernetes issues.
- Resource Management: Properly configuring resource limits and requests is critical to prevent OOMKilled errors and ensure optimal cluster performance.
- Continuous OptimizatioRapid Issue Resolution: NudgeBee’s Troubleshooting Agent accelerates problem-solving from hours to minutes, improving incident handling productivity by 3-5 times.
- Cost Optimization: The FinOps Agent achieves 30-60% cost reduction beyond existing manual efforts through continuous real-time optimization.
- Operational Efficiency: The CloudOps Agent enhances operational productivity by 100-200%, automating time-consuming actions and integrating with existing tools.n: Regularly reviewing and adjusting your Kubernetes configurations can help prevent resource-related issues and improve overall efficiency.
- Automated Solutions: Leveraging tools like NudgeBee can significantly enhance your ability to troubleshoot and resolve Kubernetes issues quickly.
The Value of Guided Solutions
NudgeBee’s suite of tools offers significant advantages for Kubernetes management:
- Rapid Issue Resolution: NudgeBee’s Troubleshooting Agent accelerates problem-solving from hours to minutes, improving incident handling productivity by 3-5 times.
- Cost Optimization: The FinOps Agent achieves 30-60% cost reduction beyond existing manual efforts through continuous real-time optimization.
- Operational Efficiency: The CloudOps Agent enhances operational productivity by 100-200%, automating time-consuming actions and integrating with existing tools.
By implementing these solutions and best practices, you can create a more resilient, efficient, and cost-effective Kubernetes environment.
Want to resolve Kubernetes issues faster? Try NudgeBee and experience the difference in your Kubernetes operations.