(Inf1) Neuron Compilation OOM when model's weight changes #1064

takipipo opened this issue Dec 11, 2024 · 2 comments

Copy link

takipipo commented Dec 11, 2024


I am able to compile the pretrained detection tasks COCO weight from ultralytics (i.e., However when I load the weight from at the Download, I cannot compile the model to neuron due to the OOM


neuron-cc -V

Neuron Compiler version

HWM version
NEFF version Dynamic
TVM version
NumPy version 1.23.4
MXNet not available
TF not available

Log Output from Neuron Compiler

(aws_neuron_venv_pytorch_1_13_inf1) root@ip-10-104-110-148:/var/snap/amazon-ssm-agent/6312/ultralytics# ipython

Python 3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.28.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from ultralytics import NeuronYOLO
   ...: model = NeuronYOLO("")
   ...: model.export(format = "neuron")
Ultralytics YOLOv8.2.48 🚀 Python-3.10.12 torch-1.13.1+cu117 CPU (Intel Xeon Platinum 8275CL 3.00GHz)
Model summary (fused): 268 layers, 68125494 parameters, 0 gradients, 257.4 GFLOPs

PyTorch: starting from '' with input shape (1, 3, 640, 640) BCHW and output shape(s) (1, 6, 8400) (130.4 MB)

AWS Neuron: starting export with torch
INFO:Neuron:All operators are compiled by neuron-cc (this does not guarantee that neuron-cc will successfully compile)
INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 278, fused = 278, percent fused = 100.0%
/opt/aws_neuron_venv_pytorch_1_13_inf1/lib/python3.10/site-packages/dask/dataframe/ FutureWarning: 
Dask dataframe query planning is disabled because dask-expr is not installed.

You can install it with `pip install dask[dataframe]` or `conda install dask`.
This will raise in a future version.

  warnings.warn(msg, FutureWarning)
INFO:Neuron:Compiling function _NeuronGraph$1070 with neuron-cc
INFO:Neuron:Compiling with command line: '/opt/aws_neuron_venv_pytorch_1_13_inf1/bin/neuron-cc compile /tmp/tmp5ldpdpcf/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp5ldpdpcf/graph_def.neff --io-config {"inputs": {"0:0": [[1, 3, 640, 640], "float32"]}, "outputs": ["Detect_74/aten_cat_5/concat:0"]} --verbose 35'
............................WARNING:Neuron:The neuron-cc (neuron compiler) process was killed (SIG_KILL).  This typically happens when there is insufficient memory to compile and the linux Out Of Memory (OOM) killer terminates the compiler.  Consider trying compilation on an instance with more memory
WARNING:Neuron:torch.neuron.trace failed on _NeuronGraph$1070; falling back to native python function call
ERROR:Neuron:neuron-cc failed with the following command line call:
/opt/aws_neuron_venv_pytorch_1_13_inf1/bin/neuron-cc compile /tmp/tmp5ldpdpcf/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp5ldpdpcf/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 640, 640], "float32"]}, "outputs": ["Detect_74/aten_cat_5/concat:0"]}' --verbose 35
Traceback (most recent call last):
  File "/opt/aws_neuron_venv_pytorch_1_13_inf1/lib/python3.10/site-packages/torch_neuron/", line 413, in op_converter
    neuron_function = self.subgraph_compiler(
  File "/opt/aws_neuron_venv_pytorch_1_13_inf1/lib/python3.10/site-packages/torch_neuron/", line 263, in trace
    raise subprocess.SubprocessError(
subprocess.SubprocessError: neuron-cc failed with the following command line call:
/opt/aws_neuron_venv_pytorch_1_13_inf1/bin/neuron-cc compile /tmp/tmp5ldpdpcf/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp5ldpdpcf/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 640, 640], "float32"]}, "outputs": ["Detect_74/aten_cat_5/concat:0"]}' --verbose 35
INFO:Neuron:Number of arithmetic operators (post-compilation) before = 278, compiled = 0, percent compiled = 0.0%
INFO:Neuron:The neuron partitioner created 1 sub-graphs
INFO:Neuron:Neuron successfully compiled 0 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 0.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 7 [supported]
INFO:Neuron: => aten::_convolution: 104 [supported]
INFO:Neuron: => aten::add: 20 [supported]
INFO:Neuron: => aten::cat: 19 [supported]
INFO:Neuron: => aten::chunk: 1 [supported]
INFO:Neuron: => aten::div: 1 [supported]
INFO:Neuron: => aten::max_pool2d: 3 [supported]
INFO:Neuron: => aten::mul: 1 [supported]
INFO:Neuron: => aten::sigmoid: 1 [supported]
INFO:Neuron: => aten::silu_: 97 [supported]
INFO:Neuron: => aten::size: 3 [supported]
INFO:Neuron: => aten::softmax: 1 [supported]
INFO:Neuron: => aten::split_with_sizes: 9 [supported]
INFO:Neuron: => aten::sub: 2 [supported]
INFO:Neuron: => aten::transpose: 1 [supported]
INFO:Neuron: => aten::unsqueeze: 1 [supported]
INFO:Neuron: => aten::upsample_nearest2d: 2 [supported]
INFO:Neuron: => aten::view: 5 [supported]
AWS Neuron: export failure ❌ 644.2s: No operations were successfully partitioned and compiled to neuron for this model - aborting trace!
RuntimeError                              Traceback (most recent call last)
Cell In[1], line 3
      1 from ultralytics import NeuronYOLO
      2 model = NeuronYOLO("")
----> 3 model.export(format = "neuron")

File /var/snap/amazon-ssm-agent/6312/ultralytics/ultralytics/engine/, in NeuronModel.export(self, **kwargs)
     43 custom = {
     44     "imgsz": self.model.args["imgsz"],
     45     "batch": 1,
     46     "data": None,
     47     "verbose": False,
     48 }  # method defaults
     49 args = {
     50     **self.overrides,
     51     **custom,
     52     **kwargs,
     53     "mode": "export",
     54 }  # highest priority args on the right
---> 55 return NeuronExporter(overrides=args, _callbacks=self.callbacks)(model=self.model)

File /opt/aws_neuron_venv_pytorch_1_13_inf1/lib/python3.10/site-packages/torch/autograd/, in _DecoratorContextManager.__call__.<locals>.decorate_context(*args, **kwargs)
     24 @functools.wraps(func)
     25 def decorate_context(*args, **kwargs):
     26     with self.clone():
---> 27         return func(*args, **kwargs)

File /var/snap/amazon-ssm-agent/6312/ultralytics/ultralytics/engine/, in NeuronExporter.__call__(self, model)
    317     f[12], _ = self.export_neuronx()
    318 if neuron:  # Neuron
--> 319     f[13], _ = self.export_neuron()
    321 # Finish
    322 f = [str(x) for x in f if x]  # filter out '' and None

File /var/snap/amazon-ssm-agent/6312/ultralytics/ultralytics/engine/, in try_export.<locals>.outer_func(*args, **kwargs)
    128 except Exception as e:
    129"{prefix} export failure ❌ {dt.t:.1f}s: {e}")
--> 130     raise e

File /var/snap/amazon-ssm-agent/6312/ultralytics/ultralytics/engine/, in try_export.<locals>.outer_func(*args, **kwargs)
    123 try:
    124     with Profile() as dt:
--> 125         f, model = inner_func(*args, **kwargs)
    126"{prefix} export success ✅ {dt.t:.1f}s, saved as '{f}' ({file_size(f):.1f} MB)")
    127     return f, model

File /var/snap/amazon-ssm-agent/6312/ultralytics/ultralytics/engine/, in NeuronExporter.export_neuron(self, prefix)
    370"\n{prefix} starting export with torch {torch_neuron.__version__}...")
    371 f = self.file.with_suffix(".neuron")
--> 372 ts = torch_neuron.trace(self.model,, strict=False)
    373 extra_files = {"config.txt": json.dumps(self.metadata)}
    374, _extra_files=extra_files)

File /opt/aws_neuron_venv_pytorch_1_13_inf1/lib/python3.10/site-packages/torch_neuron/, in trace(func, example_inputs, fallback, op_whitelist, minimum_segment_size, subgraph_builder_function, subgraph_inputs_pruning, skip_compiler, debug_must_trace, allow_no_ops_on_neuron, compiler_workdir, dynamic_batch_size, compiler_timeout, single_fusion_ratio_threshold, _neuron_trace, compiler_args, optimizations, separate_weights, verbose, **kwargs)
    215     logger.debug("skip_inference_context - trace with fallback at {}".format(get_file_and_line()))
    216     neuron_graph = cu.compile_fused_operators(neuron_graph, **compile_kwargs)
--> 217 cu.stats_post_compiler(neuron_graph)
    219 # Wrap the compiled version of the model in a script module. Note that this is
    220 # necessary for torch==1.8.1 due to the usage of `torch.classes.model.Model`. The
    221 # custom class must be a submodule of the traced graph.
    222 neuron_graph = AwsNeuronGraphModule(neuron_graph)

File /opt/aws_neuron_venv_pytorch_1_13_inf1/lib/python3.10/site-packages/torch_neuron/, in CompilationUnit.stats_post_compiler(self, neuron_graph)
    526   ' => {}: {} {}'.format(
    527                 name, remaining_count, supported_string))
    529 if succesful_compilations == 0 and not self.allow_no_ops_on_neuron:
--> 530     raise RuntimeError(
    531         "No operations were successfully partitioned and compiled to neuron for this model - aborting trace!")
    533 if percent_operations_compiled < 50.0:
    534     logger.warning(
    535         "torch.neuron.trace was unable to compile > 50% of the operators in the compiled model!")

RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace!

How to Reproduce

  1. Start EC2 instance c5.2xlarge with AMI: ami-09c4564a5c7fa27d5
  2. Install required libraries to compile the model
source /opt/aws_neuron_venv_pytorch_1_13_inf1/bin/activate
git clone
cd ultralytics
git checkout v8.2.48-aws-neuron
pip install .
pip install numpy==1.23.4
  1. Compile the model
from ultralytics import NeuronYOLO
model = NeuronYOLO("")
model.export(format = "neuron")

What I've Tried

  1. Try compile with 64GB memory instance, but still failed.
Copy link

Thanks @takipipo for filing this issue. We will take a look and get back to you.

Copy link

FThompsonAWS commented Dec 19, 2024

@takipipo this model is a 68B model, and it takes twice that much memory to compile in Neuron V1. Can you try with a larger instance that has at least 192GB of memory?

Additionally, you may encounter issues running a model this size on Inf1. We recommend that you upgrade to Inferentia2 and the latest version of the Neuron SDK (which includes neuronx_cc and torch_neuronx).

