Error command failed with exit code 137 yarn - Исправление ошибок и поиск оптимальных решений проблем

Содержание

Migrations crash with exit code 137 on VPS #4986
Comments
How do I resolve «Container killed on request. Exit code is 137» errors in Spark on Amazon EMR?
Short description
Resolution
Increase driver or executor memory
Add more Spark partitions
Increase the number of shuffle partitions
Reduce the number of executor cores
Increase instance size
How to Fix Exit Code 137 | Memory Issues
What Is Exit Code 137?
Causes of Container Memory Issues
Container Memory Limit Exceeded
Application Memory Leak
Natural Increases in Load
Requesting More Memory Than Your Compute Nodes Can Provide
Running Too Many Containers Without Memory Limits
Preventing Pods and Containers From Causing Memory Issues
Setting Memory Limits
Investigating Application Problems
Using ContainIQ to Monitor and Debug Memory Problems
Final Thoughts

Migrations crash with exit code 137 on VPS #4986

Issue type:

[x] question
[ ] bug report
[ ] feature request
[ ] documentation issue

Database system/driver:

[ ] cordova
[ ] mongodb
[ ] mssql
[ ] mysql / mariadb
[ ] oracle
[X] postgres
[ ] cockroachdb
[ ] sqlite
[ ] sqljs
[ ] react-native
[ ] expo

TypeORM version:

Steps to reproduce or a small repository showing the problem:

Locally, all is fine. When I try to run migrations on a $5/mo Digital Ocean droplet, it dies after hanging for a while. It seems likely it’s because it runs out of memory, but is there a way around this without resizing droplet every time I need to do a migration? This works, but is obviously kinda lame lol.

The text was updated successfully, but these errors were encountered:

Did you end up fixing this @dannytatom? I am running into the same issue on an AWS micro box that I am trying to use for dev branches.

Had a further look into this. It is an npm run run issue running out of memory rather than a typeorm migration one (unless that itself is taking too much memory which is a different discussion).

You can either use a box with more memory or you can hack around it with swapfiles if you are on ubuntu. (This is what I have done for my short lived feature branches).

In my case, i added —transpile-only inside migration script
example:
DATABASE_NAME=myDatabase MIGRATION_DIR=myApp yarn run ts-node —transpile-only ./node_modules/typeorm/cli.js migration:run

Pretty sure this is an issue of not having enough memory on the smaller droplets.

If you can find that the migrations cause an undue amount of memory usage please include more information to help replicate.

Until then, I’ll be closing this.

Had a further look into this. It is an npm run run issue running out of memory rather than a typeorm migration one (unless that itself is taking too much memory which is a different discussion).

You can either use a box with more memory or you can hack around it with swapfiles if you are on ubuntu. (This is what I have done for my short lived feature branches).

@hbthegreat how did you determined or figured out ^ it was bc of the npm run vs. the typeorm migration? doesn’t the npm run executes the typeorm migration script, so they’re technically the same thing right?

Источник

How do I resolve «Container killed on request. Exit code is 137» errors in Spark on Amazon EMR?

Last updated: 2022-08-01

My Apache Spark job on Amazon EMR fails with a «Container killed on request» stage failure:

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 3.0 failed 4 times, most recent failure: Lost task 2.3 in stage 3.0 (TID 23, ip-xxx-xxx-xx-xxx.compute.internal, executor 4): ExecutorLostFailure (executor 4 exited caused by one of the running tasks) Reason: Container marked as failed: container_1516900607498_6585_01_000008 on host: ip-xxx-xxx-xx-xxx.compute.internal. Exit status: 137. Diagnostics: Container killed on request. Exit code is 137

Short description

When a container (Spark executor) runs out of memory, YARN automatically kills it. This causes the «Container killed on request. Exit code is 137» error. These errors can happen in different job stages, both in narrow and wide transformations. YARN Containers can also be killed by the OS oom_reaper when the OS is running out of memory, causing the «Container killed on request. Exit code is 137» error.

Resolution

Use one or more of the following methods to resolve «Exit status: 137» stage failures.

Increase driver or executor memory

Increase container memory by tuning the spark.executor.memory or spark.driver.memory parameters (depending on which container caused the error).

On a running cluster:

Modify spark-defaults.conf on the master node. Example:

For a single job:

Use the —executor-memory or —driver-memory option to increase memory when you run spark-submit. Example:

Add more Spark partitions

If you can’t increase container memory (for example, if you’re using maximizeResourceAllocation on the node), then increase the number of Spark partitions. Doing this reduces the amount of data that’s processed by a single Spark task, and that reduces the overall memory used by a single executor. Use the following Scala code to add more Spark partitions:

Increase the number of shuffle partitions

If the error happens during a wide transformation (for example join or groupBy), add more shuffle partitions. The default value is 200.

On a running cluster:

Modify spark-defaults.conf on the master node. Example:

For a single job:

Use the —conf spark.sql.shuffle.partitions option to add more shuffle partitions when you run spark-submit. Example:

Reduce the number of executor cores

Reducing the number of executor cores reduces the maximum number of tasks that the executor processes simultaneously. Doing this reduces the amount of memory that the container uses.

On a running cluster:

Modify spark-defaults.conf on the master node. Example:

For a single job:

Use the —executor-cores option to reduce the number of executor cores when you run spark-submit. Example:

Increase instance size

YARN containers can also be killed by the OS oom_reaper when the OS is running out of memory. If this error happens due to oom_reaper, use a larger instance with more RAM. You can also lower yarn.nodemanager.resource.memory-mb to keep YARN containers from using up all of the Amazon EC2’s ram.

You can detect if the error is due to oom_reaper by reviewing your Amazon EMR Instance logs for the dmesg command output. Start by finding the core or task node where the killed YARN container was running. You can find this information by using the YARN Resource Manager UI or logs. Then, check the Amazon EMR Instance state logs on this node before and after the container was killed to see what killed the process.

In the following example, the process with ID 36787 corresponding to YARN container_165487060318_0001_01_000244 was killed by the kernel (Linux’s OOM killer):

Источник

How to Fix Exit Code 137 | Memory Issues

Exit Code 137 errors happen when a container or pod was terminated because they used more memory than allowed. The purpose of this tutorial is to show readers how to fix Exit Code 137 related to memory issues.

Exit code 137 occurs when a process is terminated because itвЂ™s using too much memory. Your container or Kubernetes pod will be stopped to prevent the excessive resource consumption from affecting your hostвЂ™s reliability.

Processes that end with exit code 137 need to be investigated. The problem could be that your system simply needs more physical memory to meet user demands. However, there might also be a memory leak or sub-optimal programming inside your application thatвЂ™s causing resources to be consumed excessively.

In this article, youвЂ™ll learn how to identify and debug exit code 137 so your containers run reliably. This will reduce your maintenance overhead and help stop inconsistencies caused by services stopping unexpectedly. Although some causes of exit code 137 can be highly specific to your environment, most problems can be solved with a simple troubleshooting sequence.

What Is Exit Code 137?

All processes emit an exit code when they terminate. Exit codes provide a mechanism for informing the user, operating system, and other applications why the process stopped. Each code is a number between 0 and 255. The meaning of codes below 125 is application-dependent, while higher values have special meanings.

A 137 code is issued when a process is terminated externally because of its memory consumption. The operating systemвЂ™s out of memory manager (OOM) intervenes to stop the program before it destabilizes the host.

When you start a foreground program in your shell, you can read the ? variable to inspect the process exit code:

As this example returned 137 , you know that demo-binary was stopped because it used too much memory. The same thing happens for container processes, tooвЂ”when a memory limit is being approached, the process will be terminated, and a 137 code issued.

Pods running in Kubernetes will show a status of OOMKilled when they encounter a 137 exit code. Although this looks like any other Kubernetes status, itвЂ™s caused by the operating systemвЂ™s OOM killer terminating the podвЂ™s process. You can check for pods that have used too much memory by running KubectlвЂ™s get pods command:

$ kubectl get pods

NAME	READY	STATUS	RESTARTS	AGE
demo-pod	0/1	OOMKilled		2m05s

Memory consumption problems can affect anyone, not just organizations using Kubernetes. You could run into similar issues with Amazon ECS, RedHat OpenShift, Nomad, CloudFoundry, and plain Docker deployments. Regardless of the platform, if a container fails with a 137 exit code, the root cause will be the same: thereвЂ™s not enough memory to keep it running.

For example, you can view a stopped Docker containerвЂ™s exit code by running docker ps -a :

CONTAINER ID	IMAGE	COMMAND	CREATED	STATUS
cdefb9ca658c	demo-org/demo-image:latest	«demo-binary»	2 days ago	Exited (137) 1 day ago

The exit code is shown in brackets under the STATUS column. The 137 value confirms this container stopped because of a memory problem.

Causes of Container Memory Issues

Understanding the situations that lead to memory-related container terminations is the first step towards debugging exit code 137. Here are some of the most common issues that you might experience.

Container Memory Limit Exceeded

Kubernetes pods will be terminated when they try to use more memory than their configured limit allows. You might be able to resolve this situation by increasing the limit if your cluster has spare capacity available.

Application Memory Leak

Poorly optimized code can create memory leaks. A memory leak occurs when an application uses memory, but doesnвЂ™t release it when the operationвЂ™s complete. This causes the memory to gradually fill up, and will eventually consume all the available capacity.

Natural Increases in Load

Sometimes adding physical memory is the only way to solve a problem. Growing services that experience an increase in active users can reach a point where more memory is required to serve the increase in traffic.

Requesting More Memory Than Your Compute Nodes Can Provide

Kubernetes pods configured with memory resource requests can use more memory than the clusterвЂ™s nodeshave if limits arenвЂ™t also used. A request allows consumption overages because itвЂ™s only an indication of how much memory a pod will consume, and doesnвЂ™t prevent the pod from consuming more memory if itвЂ™s available.

Running Too Many Containers Without Memory Limits

Running several containers without memory limits can create unpredictable Kubernetes behavior when the nodeвЂ™s memory capacity is reached. Containers without limits have a greater chance of being killed, even if a neighboring container caused the capacity breach.

Preventing Pods and Containers From Causing Memory Issues

Debugging container memory issues in KubernetesвЂ”or any other orchestratorвЂ”can seem complex, but using the right tools and techniques helps make it less stressful. Kubernetes assigns memory to pods based on the requests and limits they declare. Unless it resides in a namespace with a default memory limit, a pod that doesnвЂ™t use these mechanisms can normally access limitless memory.

Setting Memory Limits

Pods without memory limits increase the chance of OOM kills and exit code 137 errors. These pods are able to use more memory than the node can provide, which poses a stability risk. When memory consumption gets close to the physical limit, the Linux kernel OOM killer intervenes to stop processes that are using too much memory.

Making sure each of your pods includes a memory limit is a good first step towards preventing OOM kill issues. HereвЂ™s a sample pod manifest:

The requests field indicates the pod wants 256 Mi of memory. Kubernetes will use this information to influence scheduling decisions, and will ensure that the pod is hosted by a node with at least 256 Mi of memory available. Requests help to reduce resource contention, ensuring your applications have the resources they need. ItвЂ™s important to note, though, that they donвЂ™t prevent the pod from using more memory if itвЂ™s available on the node.

This sample pod also includes a memory limit of 512 Mi. If memory consumption goes above 512 Mi, the pod becomes a candidate for termination. If thereвЂ™s too much memory pressure and Kubernetes needs to free up resources, the pod could be stopped. Setting limits on all of your pods helps prevent excessive memory consumption in one from affecting the others.

Investigating Application Problems

Once your pods have appropriate memory limits, you can start investigating why those limits are being reached. Start by analyzing traffic levels to identify anomalies as well as natural growth in your service. If memory use has grown in correlation with user activity, it could be time to scale your cluster with new nodes, or to add more memory to existing ones.

If your nodes have sufficient memory, youвЂ™ve set limits on all your pods, and service use has remained relatively steady, the problem is likely to be within your application. To figure out where, you need to look at the nature of your memory consumption issues: is usage suddenly spiking, or does it gradually increase over the course of the podвЂ™s lifetime?

A memory usage graph that shows large peaks can point to poorly optimized functions in your application. Specific parts of your codebase could be allocating a lot of memory to handle demanding user requests. You can usually work out the culprit by reviewing pod logs to determine which actions were taken around the time of the spike. It might be possible to refactor your code to use less memory, such as by explicitly freeing up variables and destroying objects after youвЂ™ve finished using them.

Memory graphs that show continual increases over time usually mean youвЂ™ve got a memory leak. These problems can be tricky to find, but reviewing application logs and running language-specific analysis tools can help you discover suspect code. Unchecked memory leaks will eventually fill all the available physical memory, forcing the OOM killer to stop processes so the capacity can be reclaimed.

Using ContainIQ to Monitor and Debug Memory Problems

Debugging Kubernetes problems manually is time-consuming and error-prone. You have to inspect pod status codes and retrieve their logs using terminal commands, which can create delays in your incident response. Kubernetes also lacks a built-in way of alerting you when memory consumptionвЂ™s growing. You might not know about spiraling resource usage until your pods start to terminate and knock parts of your service offline.

ContainIQ addresses these challenges by providing a complete monitoring solution for your Kubernetes cluster. With ContainIQ, you can view real-time metrics using visual dashboards, and create alerts for when limits are breached. The platform surfaces events within your cluster, such as pod OOM kills, and provides convenient access to the logs created by your containers.

You can start inspecting pod memory issues in ContainIQ by heading to the Events tab from your dashboard. This lets you search for cluster activity, or Kubernetes events, that led up to a pod being terminated. Try вЂњOOMKilledвЂќ as your search term to find a podвЂ™s termination event, then review the events that occurred immediately prior to the termination to understand why the container was stopped.

You can access a live overview of current memory usage compared to pod limits by going to the Nodes tab and scrolling down to the вЂњMEM Per NodeвЂќ graph. Toggle the Show Limits button to include limits on the graph. If the limits are higher than the available memory, this is a sign theyвЂ™re too permissive, and memory exhaustion could occur. Conversely, relatively low limits might mean there are pods in your cluster that havenвЂ™t been configured with a limit. This could also lead to skyrocketing memory use.

Finally, you can set up alerts to notify you when pods are terminated by the OOM killer. Click the New Monitor button in the top right of the screen, and choose the вЂњEventвЂќ alert type in the popup that appears. On the next screen, type вЂњOOMKilledвЂќ as the event reason. YouвЂ™ll now be notified each time a pod terminates with exit code 137. You can set up monitors that alert based on metrics, too, letting you detect high memory consumption before your containers are terminated.

Final Thoughts

Exit code 137 means a container or pod is trying to use more memory than itвЂ™s allowed. The process gets terminated to prevent memory usage ballooning indefinitely, which could cause your host system to become unstable.

Excessive memory usage can occur due to natural growth in your applicationвЂ™s use, or as the result of a memory leak in your code. ItвЂ™s important to set correct memory limits on your pods to guard against these issues; while reaching the limit will prompt termination with a 137 exit code, this mechanism is meant to protect you against worse problems that will occur if system memory is depleted entirely.

When youвЂ™re using Kubernetes, you should proactively monitor your cluster so youвЂ™re aware of normal memory consumption and can identify any spikes. ContainIQ is an all-inclusive solution for analyzing your clusterвЂ™s health that can track metrics like memory use and send you alerts when pods are close to their limits. This provides a single source of truth when youвЂ™re inspecting and debugging Kubernetes performance.

Источник

I am trying to run a chainlink node from source and am following the install instructions.

I’m currently running make install

It looks like the error code says there is not enough memory on my machine, but my linux container on my chrome duet has 50GB. I don’t know the problem.
I ran the command:

These are the errors

error Command failed with exit code 137.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
error Command failed with exit code 137.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
 | `yarn setup` failed with exit code 137
Stopping 2 active children
Aborted execution due to previous error
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

What do I do?

asked Nov 24, 2021 at 16:18

Per the documentation:

If you got any errors regarding locked yarn package, try running yarn install before this step

So run yarn install

answered Nov 24, 2021 at 21:02

Patrick CollinsPatrick Collins

5,2412 gold badges22 silver badges58 bronze badges

Источник

Last updated: 2022-08-01

My Apache Spark job on Amazon EMR fails with a «Container killed on request» stage failure:

Short description

Resolution

Use one or more of the following methods to resolve «Exit status: 137» stage failures.

Increase driver or executor memory

Increase container memory by tuning the spark.executor.memory or spark.driver.memory parameters (depending on which container caused the error).

On a running cluster:

Modify spark-defaults.conf on the master node. Example:

sudo vim /etc/spark/conf/spark-defaults.conf
spark.executor.memory 10g
spark.driver.memory 10g

For a single job:

Use the —executor-memory or —driver-memory option to increase memory when you run spark-submit. Example:

spark-submit --executor-memory 10g --driver-memory 10g ...

Add more Spark partitions

val numPartitions = 500
val newDF = df.repartition(numPartitions)

Increase the number of shuffle partitions

If the error happens during a wide transformation (for example join or groupBy), add more shuffle partitions. The default value is 200.

On a running cluster:

Modify spark-defaults.conf on the master node. Example:

sudo vim /etc/spark/conf/spark-defaults.conf
spark.sql.shuffle.partitions 500

For a single job:

Use the —conf spark.sql.shuffle.partitions option to add more shuffle partitions when you run spark-submit. Example:

spark-submit --conf spark.sql.shuffle.partitions=500 ...

Reduce the number of executor cores

Reducing the number of executor cores reduces the maximum number of tasks that the executor processes simultaneously. Doing this reduces the amount of memory that the container uses.

On a running cluster:

Modify spark-defaults.conf on the master node. Example:

sudo vim /etc/spark/conf/spark-defaults.conf
spark.executor.cores  1

For a single job:

Use the —executor-cores option to reduce the number of executor cores when you run spark-submit. Example:

spark-submit --executor-cores 1 ...

Increase instance size

In the following example, the process with ID 36787 corresponding to YARN container_165487060318_0001_01_000244 was killed by the kernel (Linux’s OOM killer):

# hows the kernel looking
dmesg | tail -n 25

[ 3910.032284] Out of memory: Kill process 36787 (java) score 96 or sacrifice child
[ 3910.043627] Killed process 36787 (java) total-vm:15864568kB, anon-rss:13876204kB, file-rss:0kB, shmem-rss:0kB
[ 3910.748373] oom_reaper: reaped process 36787 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Did this article help?

Do you need billing or technical support?

AWS support for Internet Explorer ends on 07/31/2022. Supported browsers are Chrome, Firefox, Edge, and Safari.
Learn more »

Источник

Migrations crash with exit code 137 on VPS #4986

How do I resolve «Container killed on request. Exit code is 137» errors in Spark on Amazon EMR?

Short description

Resolution

Increase driver or executor memory

Add more Spark partitions

Increase the number of shuffle partitions

Reduce the number of executor cores

Increase instance size

How to Fix Exit Code 137 | Memory Issues

What Is Exit Code 137?

Causes of Container Memory Issues

Container Memory Limit Exceeded

Application Memory Leak

Natural Increases in Load

Requesting More Memory Than Your Compute Nodes Can Provide

Running Too Many Containers Without Memory Limits

Preventing Pods and Containers From Causing Memory Issues

Setting Memory Limits

Investigating Application Problems

Using ContainIQ to Monitor and Debug Memory Problems

Final Thoughts

Short description

Resolution

Increase driver or executor memory

Add more Spark partitions

Increase the number of shuffle partitions

Reduce the number of executor cores

Increase instance size

Читайте также: