Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Inf1) Neuron Compilation OOM when model's weight changes #1064

Open
takipipo opened this issue Dec 11, 2024 · 2 comments
Open

(Inf1) Neuron Compilation OOM when model's weight changes #1064

takipipo opened this issue Dec 11, 2024 · 2 comments
Labels

Comments

@takipipo
Copy link

takipipo commented Dec 11, 2024

Description

I am able to compile the pretrained detection tasks COCO weight from ultralytics (i.e. yolov8l.pt, yolov8x.pt). However when I load the weight from https://github.com/WildChlamydia/MiVOLO?tab=readme-ov-file#demo at the Download, I cannot compile the model to neuron due to the OOM

Environments

pip list

Package                   Version
------------------------- ------------------
absl-py                   2.1.0
aiohappyeyeballs          2.4.3
aiohttp                   3.10.10
aiosignal                 1.3.1
amqp                      5.2.0
annotated-types           0.7.0
ansicolors                1.1.8
antlr4-python3-runtime    4.9.3
anyio                     4.6.2.post1
argon2-cffi               23.1.0
argon2-cffi-bindings      21.2.0
arrow                     1.3.0
astor                     0.8.1
astroid                   3.3.5
asttokens                 2.4.1
async-lru                 2.0.4
async-timeout             4.0.3
attrs                     24.2.0
Automat                   24.8.1
awscli                    1.35.14
babel                     2.16.0
beautifulsoup4            4.12.3
billiard                  4.2.1
bleach                    6.1.0
boto3                     1.35.48
botocore                  1.35.48
build                     1.2.2.post1
celery                    5.4.0
certifi                   2024.8.30
cffi                      1.17.1
charset-normalizer        3.4.0
click                     8.1.7
click-didyoumean          0.3.1
click-plugins             1.1.1
click-repl                0.3.0
cloudpickle               3.1.0
cmake                     3.30.5
colorama                  0.4.6
comm                      0.2.2
constantly                23.10.4
contourpy                 1.3.0
cryptography              43.0.3
cssselect                 1.2.0
cycler                    0.12.1
dask                      2024.10.0
debugpy                   1.8.7
decorator                 5.1.1
defusedxml                0.7.1
dill                      0.3.9
distlib                   0.3.9
dmlc-nnvm                 1.19.6.0+0
dmlc-topi                 1.19.6.0+0
dmlc-tvm                  1.19.6.0+0
docutils                  0.16
dparse                    0.6.3
entrypoints               0.4
environment-kernels       1.2.0
exceptiongroup            1.2.2
executing                 2.1.0
fastapi                   0.115.3
fastjsonschema            2.20.0
filelock                  3.16.1
fonttools                 4.54.1
fqdn                      1.5.1
frozenlist                1.5.0
fsspec                    2024.10.0
gast                      0.2.2
google-pasta              0.2.0
grpcio                    1.67.0
h11                       0.14.0
h5py                      3.6.0
httpcore                  1.0.6
httpie                    3.2.3
httpx                     0.27.2
hyperlink                 21.0.0
idna                      3.10
imageio                   2.36.0
importlib_metadata        8.5.0
incremental               24.7.2
inferentia-hwm            1.17.6.0+fbcd6c853
iniconfig                 2.0.0
ipykernel                 6.29.5
ipython                   8.28.0
ipywidgets                8.1.5
islpy                     2023.1
isoduration               20.11.0
isort                     5.13.2
itemadapter               0.9.0
itemloaders               1.3.2
jedi                      0.19.1
Jinja2                    3.1.4
jmespath                  1.0.1
joblib                    1.4.2
json5                     0.9.25
jsonpointer               3.0.0
jsonschema                4.23.0
jsonschema-specifications 2024.10.1
jupyter                   1.1.1
jupyter_client            8.6.3
jupyter-console           6.6.3
jupyter_core              5.7.2
jupyter-events            0.10.0
jupyter-lsp               2.2.5
jupyter_server            2.14.2
jupyter_server_terminals  0.5.3
jupyterlab                4.2.5
jupyterlab_pygments       0.3.0
jupyterlab_server         2.27.3
jupyterlab_widgets        3.0.13
Keras-Applications        1.0.8
Keras-Preprocessing       1.1.2
kiwisolver                1.4.7
kombu                     5.4.2
llvmlite                  0.43.0
locket                    1.0.0
lxml                      5.3.0
Markdown                  3.7
markdown-it-py            3.0.0
MarkupSafe                3.0.2
matplotlib                3.9.2
matplotlib-inline         0.1.7
mccabe                    0.7.0
mdurl                     0.1.2
mistune                   3.0.2
multidict                 6.1.0
nbclient                  0.10.0
nbconvert                 7.16.4
nbformat                  5.10.4
nest-asyncio              1.6.0
networkx                  2.6.3
neuron-cc                 1.24.0.0+d58fa6134
notebook                  7.2.2
notebook_shim             0.2.4
numba                     0.60.0
numpy                     1.23.4
nvidia-cublas-cu11        11.10.3.66
nvidia-cuda-nvrtc-cu11    11.7.99
nvidia-cuda-runtime-cu11  11.7.99
nvidia-cudnn-cu11         8.5.0.96
omegaconf                 2.3.0
opencv-python             4.10.0.84
opt_einsum                3.4.0
overrides                 7.7.0
packaging                 21.3
pandas                    2.2.3
pandocfilters             1.5.1
papermill                 2.6.0
parsel                    1.9.1
parso                     0.8.4
partd                     1.4.2
pexpect                   4.9.0
pillow                    11.0.0
pip                       24.2
pip-tools                 7.4.1
pipenv                    2024.2.0
platformdirs              4.3.6
plotly                    5.24.1
pluggy                    1.5.0
prometheus_client         0.21.0
prompt_toolkit            3.0.48
propcache                 0.2.0
Protego                   0.3.1
protobuf                  3.20.1
psutil                    6.1.0
ptyprocess                0.7.0
pure_eval                 0.2.3
py-cpuinfo                9.0.0
pyasn1                    0.6.1
pyasn1_modules            0.4.1
pycparser                 2.22
pydantic                  2.9.2
pydantic_core             2.23.4
PyDispatcher              2.0.7
Pygments                  2.18.0
pylint                    3.3.1
pyOpenSSL                 24.2.1
pyparsing                 3.2.0
pyproject_hooks           1.2.0
PySocks                   1.7.1
pytest                    8.3.3
python-dateutil           2.9.0.post0
python-json-logger        2.0.7
pytz                      2024.2
PyYAML                    6.0.2
pyzmq                     26.2.0
queuelib                  1.7.0
referencing               0.35.1
requests                  2.31.0
requests-file             2.1.0
requests-toolbelt         1.0.0
rfc3339-validator         0.1.4
rfc3986-validator         0.1.1
rich                      13.9.3
rpds-py                   0.20.0
rsa                       4.7.2
ruamel.yaml               0.18.6
ruamel.yaml.clib          0.2.12
s3transfer                0.10.3
safety                    2.3.5
scikit-learn              1.5.2
scipy                     1.11.4
Scrapy                    2.11.2
seaborn                   0.13.2
Send2Trash                1.8.3
service-identity          24.1.0
setuptools                69.5.1
shap                      0.46.0
six                       1.16.0
slicer                    0.0.8
sniffio                   1.3.1
soupsieve                 2.6
stack-data                0.6.3
starlette                 0.41.0
tenacity                  9.0.0
tensorboard               1.15.0
tensorflow                1.15.5.post1
tensorflow-estimator      1.15.1
termcolor                 2.5.0
terminado                 0.18.1
threadpoolctl             3.5.0
tinycss2                  1.4.0
tldextract                5.1.2
tomli                     2.0.2
tomlkit                   0.13.2
toolz                     1.0.0
torch                     1.13.1
torch-neuron              1.13.1.2.11.7.0
torchvision               0.14.1
tornado                   6.4.1
tqdm                      4.66.5
traitlets                 5.14.3
Twisted                   24.7.0
types-python-dateutil     2.9.0.20241003
typing_extensions         4.12.2
tzdata                    2024.2
ultralytics               8.2.48
ultralytics-thop          2.0.12
uri-template              1.3.0
urllib3                   2.2.3
vine                      5.1.0
virtualenv                20.27.0
w3lib                     2.2.1
wcwidth                   0.2.13
webcolors                 24.8.0
webencodings              0.5.1
websocket-client          1.8.0
Werkzeug                  3.0.5
wget                      3.2
wheel                     0.44.0
widgetsnbextension        4.0.13
wrapt                     1.16.0
yarl                      1.16.0
zipp                      3.20.2
zope.interface            7.1.1

neuron-cc -V

Neuron Compiler version 1.24.0.0+d58fa6134

HWM version 1.17.6.0-fbcd6c853
NEFF version Dynamic
TVM version 1.19.6.0+0
NumPy version 1.23.4
MXNet not available
TF not available

Log Output from Neuron Compiler

(aws_neuron_venv_pytorch_1_13_inf1) root@ip-10-104-110-148:/var/snap/amazon-ssm-agent/6312/ultralytics# ipython

Python 3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.28.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from ultralytics import NeuronYOLO
   ...: model = NeuronYOLO("yolov8x_person_face.pt")
   ...: model.export(format = "neuron")
   ...: 
Ultralytics YOLOv8.2.48 🚀 Python-3.10.12 torch-1.13.1+cu117 CPU (Intel Xeon Platinum 8275CL 3.00GHz)
Model summary (fused): 268 layers, 68125494 parameters, 0 gradients, 257.4 GFLOPs

PyTorch: starting from 'yolov8x_person_face.pt' with input shape (1, 3, 640, 640) BCHW and output shape(s) (1, 6, 8400) (130.4 MB)

AWS Neuron: starting export with torch 1.13.1.2.11.7.0...
INFO:Neuron:All operators are compiled by neuron-cc (this does not guarantee that neuron-cc will successfully compile)
INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 278, fused = 278, percent fused = 100.0%
/opt/aws_neuron_venv_pytorch_1_13_inf1/lib/python3.10/site-packages/dask/dataframe/__init__.py:42: FutureWarning: 
Dask dataframe query planning is disabled because dask-expr is not installed.

You can install it with `pip install dask[dataframe]` or `conda install dask`.
This will raise in a future version.

  warnings.warn(msg, FutureWarning)
INFO:Neuron:Compiling function _NeuronGraph$1070 with neuron-cc
INFO:Neuron:Compiling with command line: '/opt/aws_neuron_venv_pytorch_1_13_inf1/bin/neuron-cc compile /tmp/tmp5ldpdpcf/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp5ldpdpcf/graph_def.neff --io-config {"inputs": {"0:0": [[1, 3, 640, 640], "float32"]}, "outputs": ["Detect_74/aten_cat_5/concat:0"]} --verbose 35'
............................WARNING:Neuron:The neuron-cc (neuron compiler) process was killed (SIG_KILL).  This typically happens when there is insufficient memory to compile and the linux Out Of Memory (OOM) killer terminates the compiler.  Consider trying compilation on an instance with more memory
WARNING:Neuron:torch.neuron.trace failed on _NeuronGraph$1070; falling back to native python function call
ERROR:Neuron:neuron-cc failed with the following command line call:
/opt/aws_neuron_venv_pytorch_1_13_inf1/bin/neuron-cc compile /tmp/tmp5ldpdpcf/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp5ldpdpcf/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 640, 640], "float32"]}, "outputs": ["Detect_74/aten_cat_5/concat:0"]}' --verbose 35
Traceback (most recent call last):
  File "/opt/aws_neuron_venv_pytorch_1_13_inf1/lib/python3.10/site-packages/torch_neuron/convert.py", line 413, in op_converter
    neuron_function = self.subgraph_compiler(
  File "/opt/aws_neuron_venv_pytorch_1_13_inf1/lib/python3.10/site-packages/torch_neuron/decorators.py", line 263, in trace
    raise subprocess.SubprocessError(
subprocess.SubprocessError: neuron-cc failed with the following command line call:
/opt/aws_neuron_venv_pytorch_1_13_inf1/bin/neuron-cc compile /tmp/tmp5ldpdpcf/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp5ldpdpcf/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 640, 640], "float32"]}, "outputs": ["Detect_74/aten_cat_5/concat:0"]}' --verbose 35
INFO:Neuron:Number of arithmetic operators (post-compilation) before = 278, compiled = 0, percent compiled = 0.0%
INFO:Neuron:The neuron partitioner created 1 sub-graphs
INFO:Neuron:Neuron successfully compiled 0 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 0.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 7 [supported]
INFO:Neuron: => aten::_convolution: 104 [supported]
INFO:Neuron: => aten::add: 20 [supported]
INFO:Neuron: => aten::cat: 19 [supported]
INFO:Neuron: => aten::chunk: 1 [supported]
INFO:Neuron: => aten::div: 1 [supported]
INFO:Neuron: => aten::max_pool2d: 3 [supported]
INFO:Neuron: => aten::mul: 1 [supported]
INFO:Neuron: => aten::sigmoid: 1 [supported]
INFO:Neuron: => aten::silu_: 97 [supported]
INFO:Neuron: => aten::size: 3 [supported]
INFO:Neuron: => aten::softmax: 1 [supported]
INFO:Neuron: => aten::split_with_sizes: 9 [supported]
INFO:Neuron: => aten::sub: 2 [supported]
INFO:Neuron: => aten::transpose: 1 [supported]
INFO:Neuron: => aten::unsqueeze: 1 [supported]
INFO:Neuron: => aten::upsample_nearest2d: 2 [supported]
INFO:Neuron: => aten::view: 5 [supported]
AWS Neuron: export failure ❌ 644.2s: No operations were successfully partitioned and compiled to neuron for this model - aborting trace!
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[1], line 3
      1 from ultralytics import NeuronYOLO
      2 model = NeuronYOLO("yolov8x_person_face.pt")
----> 3 model.export(format = "neuron")

File /var/snap/amazon-ssm-agent/6312/ultralytics/ultralytics/engine/neuron_model.py:55, in NeuronModel.export(self, **kwargs)
     43 custom = {
     44     "imgsz": self.model.args["imgsz"],
     45     "batch": 1,
     46     "data": None,
     47     "verbose": False,
     48 }  # method defaults
     49 args = {
     50     **self.overrides,
     51     **custom,
     52     **kwargs,
     53     "mode": "export",
     54 }  # highest priority args on the right
---> 55 return NeuronExporter(overrides=args, _callbacks=self.callbacks)(model=self.model)

File /opt/aws_neuron_venv_pytorch_1_13_inf1/lib/python3.10/site-packages/torch/autograd/grad_mode.py:27, in _DecoratorContextManager.__call__.<locals>.decorate_context(*args, **kwargs)
     24 @functools.wraps(func)
     25 def decorate_context(*args, **kwargs):
     26     with self.clone():
---> 27         return func(*args, **kwargs)

File /var/snap/amazon-ssm-agent/6312/ultralytics/ultralytics/engine/neuron_exporter.py:319, in NeuronExporter.__call__(self, model)
    317     f[12], _ = self.export_neuronx()
    318 if neuron:  # Neuron
--> 319     f[13], _ = self.export_neuron()
    321 # Finish
    322 f = [str(x) for x in f if x]  # filter out '' and None

File /var/snap/amazon-ssm-agent/6312/ultralytics/ultralytics/engine/neuron_exporter.py:130, in try_export.<locals>.outer_func(*args, **kwargs)
    128 except Exception as e:
    129     LOGGER.info(f"{prefix} export failure ❌ {dt.t:.1f}s: {e}")
--> 130     raise e

File /var/snap/amazon-ssm-agent/6312/ultralytics/ultralytics/engine/neuron_exporter.py:125, in try_export.<locals>.outer_func(*args, **kwargs)
    123 try:
    124     with Profile() as dt:
--> 125         f, model = inner_func(*args, **kwargs)
    126     LOGGER.info(f"{prefix} export success ✅ {dt.t:.1f}s, saved as '{f}' ({file_size(f):.1f} MB)")
    127     return f, model

File /var/snap/amazon-ssm-agent/6312/ultralytics/ultralytics/engine/neuron_exporter.py:372, in NeuronExporter.export_neuron(self, prefix)
    370 LOGGER.info(f"\n{prefix} starting export with torch {torch_neuron.__version__}...")
    371 f = self.file.with_suffix(".neuron")
--> 372 ts = torch_neuron.trace(self.model, self.im, strict=False)
    373 extra_files = {"config.txt": json.dumps(self.metadata)}
    374 ts.save(str(f), _extra_files=extra_files)

File /opt/aws_neuron_venv_pytorch_1_13_inf1/lib/python3.10/site-packages/torch_neuron/convert.py:217, in trace(func, example_inputs, fallback, op_whitelist, minimum_segment_size, subgraph_builder_function, subgraph_inputs_pruning, skip_compiler, debug_must_trace, allow_no_ops_on_neuron, compiler_workdir, dynamic_batch_size, compiler_timeout, single_fusion_ratio_threshold, _neuron_trace, compiler_args, optimizations, separate_weights, verbose, **kwargs)
    215     logger.debug("skip_inference_context - trace with fallback at {}".format(get_file_and_line()))
    216     neuron_graph = cu.compile_fused_operators(neuron_graph, **compile_kwargs)
--> 217 cu.stats_post_compiler(neuron_graph)
    219 # Wrap the compiled version of the model in a script module. Note that this is
    220 # necessary for torch==1.8.1 due to the usage of `torch.classes.model.Model`. The
    221 # custom class must be a submodule of the traced graph.
    222 neuron_graph = AwsNeuronGraphModule(neuron_graph)

File /opt/aws_neuron_venv_pytorch_1_13_inf1/lib/python3.10/site-packages/torch_neuron/convert.py:530, in CompilationUnit.stats_post_compiler(self, neuron_graph)
    526             logger.info(' => {}: {} {}'.format(
    527                 name, remaining_count, supported_string))
    529 if succesful_compilations == 0 and not self.allow_no_ops_on_neuron:
--> 530     raise RuntimeError(
    531         "No operations were successfully partitioned and compiled to neuron for this model - aborting trace!")
    533 if percent_operations_compiled < 50.0:
    534     logger.warning(
    535         "torch.neuron.trace was unable to compile > 50% of the operators in the compiled model!")

RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace!

How to Reproduce

  1. Start EC2 instance c5.2xlarge with AMI: ami-09c4564a5c7fa27d5
  2. Install required libraries to compile the model
source /opt/aws_neuron_venv_pytorch_1_13_inf1/bin/activate
git clone https://github.com/wisesight/ultralytics.git
cd ultralytics
git checkout v8.2.48-aws-neuron
pip install .
pip install numpy==1.23.4
  1. Compile the model
from ultralytics import NeuronYOLO
model = NeuronYOLO("yolov8x_person_face.pt")
model.export(format = "neuron")

What I've Tried

  1. Try compile with 64GB memory instance, but still failed.
    image
@FThompsonAWS
Copy link
Contributor

Thanks @takipipo for filing this issue. We will take a look and get back to you.

@FThompsonAWS
Copy link
Contributor

FThompsonAWS commented Dec 19, 2024

@takipipo this model is a 68B model, and it takes twice that much memory to compile in Neuron V1. Can you try with a larger instance that has at least 192GB of memory?

Additionally, you may encounter issues running a model this size on Inf1. We recommend that you upgrade to Inferentia2 and the latest version of the Neuron SDK (which includes neuronx_cc and torch_neuronx).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants