Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Facing context_gpu.cu:415: out of memory while running inference #269

Open
SriRamGovardhanam opened this issue Apr 24, 2020 · 2 comments
Open

Comments

@SriRamGovardhanam
Copy link

SriRamGovardhanam commented Apr 24, 2020

Hello
While running inference
cd DensePose && python2 tools/infer_simple.py \ --cfg configs/DensePose_ResNet50_FPN_s1x-e2e.yaml \ --output-dir DensePoseData/infer_out/ \ --image-ext jpg \ --wts https://dl.fbaipublicfiles.com/densepose/DensePose_ResNet50_FPN_s1x-e2e.pkl \ DensePoseData/demo_data/demo_im.jpg

I got this error

`Found Detectron ops lib: /home/sriram/anaconda2/lib/libcaffe2_detectron_ops_gpu.so

[E init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.

[E init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.

[E init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
WARNING cnn.py: 25: [====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information.
INFO net.py: 51: Loading weights from: /tmp/detectron-download-cache/DensePose_ResNet50_FPN_s1x-e2e.pkl
[I net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 5.0165e-05 secs
[I net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 4.2673e-05 secs
[I net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 1.1072e-05 secs
INFO infer_simple.py: 103: Processing DensePoseData/demo_data/demo_im.jpg -> DensePoseData/infer_out/demo_im.jpg.pdf
[I net_async_base.h:211] Using specified CPU pool size: 4; device id: -1
[I net_async_base.h:216] Created new CPU pool, size: 4; device id: -1
[E net_async_base.cc:377] [enforce fail at context_gpu.cu:415] error == cudaSuccess. 2 vs 0. Error at: /opt/conda/conda-bld/pytorch_1549617926868/work/caffe2/core/context_gpu.cu:415: out of memory

Error from operator:
input: "gpu_0/res2_2_sum" input: "gpu_0/fpn_inner_res2_2_sum_lateral_w" input: "gpu_0/fpn_inner_res2_2_sum_lateral_b" output: "gpu_0/fpn_inner_res2_2_sum_lateral" name: "" type: "Conv" arg { name: "kernel" i: 1 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 0 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 device_id: 0 } engine: "CUDNN"frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x59 (0x7f9992a2a339 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so)

frame #1: + 0x29581fc (0x7f99958101fc in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)

frame #2: + 0x12a6095 (0x7f999415e095 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)

frame #3: + 0x145bb94 (0x7f9994313b94 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)

frame #4: bool caffe2::CudnnConvOp::DoRunWithType<float, float, float, float>() + 0x3d9 (0x7f9994320dc9 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)

frame #5: caffe2::CudnnConvOp::RunOnDevice() + 0x1b0 (0x7f9994308100 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)

frame #6: + 0x13ae835 (0x7f9994266835 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)

frame #7: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7f99b8480a24 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)

frame #8: + 0x1493dc2 (0x7f99b8487dc2 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)

frame #9: c10::ThreadPool::main_loop(unsigned long) + 0x258 (0x7f99b75cf6f8 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)

frame #10: + 0xb8678 (0x7f99c8e48678 in /home/sriram/anaconda2/bin/../lib/libstdc++.so.6)
frame #11: + 0x76db (0x7f99cff886db in /lib/x86_64-linux-gnu/libpthread.so.0)

frame #12: clone + 0x3f (0x7f99cf50c88f in /lib/x86_64-linux-gnu/libc.so.6)
, op Conv
[E net_async_base.cc:377] [enforce fail at context_gpu.cu:415] error == cudaSuccess. 2 vs 0. Error at: /opt/conda/conda-bld/pytorch_1549617926868/work/caffe2/core/context_gpu.cu:415: out of memory

Error from operator:
input: "gpu_0/res3_0_branch2c" input: "gpu_0/res3_0_branch2c_bn_s" input: "gpu_0/res3_0_branch2c_bn_b" output: "gpu_0/res3_0_branch2c_bn" name: "" type: "AffineChannel" device_option { device_type: 1 device_id: 0 }frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x59 (0x7f9992a2a339 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #1: + 0x29581fc (0x7f99958101fc in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #2: + 0x12a6095 (0x7f999415e095 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #3: + 0x2afcd73 (0x7f99959b4d73 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #4: + 0x13ae835 (0x7f9994266835 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #5: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7f99b8480a24 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #6: + 0x1493dc2 (0x7f99b8487dc2 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #7: c10::ThreadPool::main_loop(unsigned long) + 0x258 (0x7f99b75cf6f8 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #8: + 0xb8678 (0x7f99c8e48678 in /home/sriram/anaconda2/bin/../lib/libstdc++.so.6)
frame #9: + 0x76db (0x7f99cff886db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #10: clone + 0x3f (0x7f99cf50c88f in /lib/x86_64-linux-gnu/libc.so.6)
, op AffineChannel
[E net_async_base.cc:129] Rethrowing exception from the run of 'generalized_rcnn'
WARNING workspace.py: 204: Original python traceback for operator 44 in network generalized_rcnn in exception above (most recent call last):
WARNING workspace.py: 209: File "tools/infer_simple.py", line 140, in
WARNING workspace.py: 209: File "tools/infer_simple.py", line 91, in main
WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/core/test_engine.py", line 334, in initialize_model_from_cfg
WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/model_builder.py", line 119, in create
WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/model_builder.py", line 84, in generalized_rcnn
WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/model_builder.py", line 233, in build_generic_detection_model
WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/optimizer.py", line 46, in build_data_parallel_model
WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/model_builder.py", line 165, in _single_gpu_build_func
WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/FPN.py", line 40, in add_fpn_ResNet50_conv5_body
WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/FPN.py", line 96, in add_fpn_onto_conv_body
WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/ResNet.py", line 32, in add_ResNet50_conv5_body
WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/ResNet.py", line 98, in add_ResNet_convX_body
WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/ResNet.py", line 77, in add_stage
WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/ResNet.py", line 174, in add_residual_block
WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/ResNet.py", line 322, in bottleneck_transformation
WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/detector.py", line 402, in ConvAffine
WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/detector.py", line 97, in AffineChannel

Traceback (most recent call last):
File "tools/infer_simple.py", line 140, in
main(args)
File "tools/infer_simple.py", line 109, in main
model, im, None, timers=timers
File "/home/sriram/DensePose/detectron/core/test.py", line 58, in im_detect_all
model, im, cfg.TEST.SCALE, cfg.TEST.MAX_SIZE, boxes=box_proposals
File "/home/sriram/DensePose/detectron/core/test.py", line 158, in im_detect_bbox
workspace.RunNet(model.net.Proto().name)
File "/home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/workspace.py", line 236, in RunNet
StringifyNetName(name), num_iter, allow_fail,
File "/home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/workspace.py", line 197, in CallWithExceptionIntercept
return func(args, kwargs)
RuntimeError: [enforce fail at context_gpu.cu:415] error == cudaSuccess. 2 vs 0. Error at: /opt/conda/conda-bld/pytorch_1549617926868/work/caffe2/core/context_gpu.cu:415: out of memory
Error from operator:
input: "gpu_0/res2_2_sum" input: "gpu_0/fpn_inner_res2_2_sum_lateral_w" input: "gpu_0/fpn_inner_res2_2_sum_lateral_b" output: "gpu_0/fpn_inner_res2_2_sum_lateral" name: "" type: "Conv" arg { name: "kernel" i: 1 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 0 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 device_id: 0 } engine: "CUDNN"frame #0: c10::ThrowEnforceNotMet(char const
, int, char const
, std::string const&, void const
) + 0x59 (0x7f9992a2a339 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #1: + 0x29581fc (0x7f99958101fc in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #2: + 0x12a6095 (0x7f999415e095 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #3: + 0x145bb94 (0x7f9994313b94 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #4: bool caffe2::CudnnConvOp::DoRunWithType<float, float, float, float>() + 0x3d9 (0x7f9994320dc9 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #5: caffe2::CudnnConvOp::RunOnDevice() + 0x1b0 (0x7f9994308100 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #6: + 0x13ae835 (0x7f9994266835 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #7: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7f99b8480a24 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #8: + 0x1493dc2 (0x7f99b8487dc2 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #9: c10::ThreadPool::main_loop(unsigned long) + 0x258 (0x7f99b75cf6f8 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #10: + 0xb8678 (0x7f99c8e48678 in /home/sriram/anaconda2/bin/../lib/libstdc++.so.6)
frame #11: + 0x76db (0x7f99cff886db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #12: clone + 0x3f (0x7f99cf50c88f in /lib/x86_64-linux-gnu/libc.so.6)
`

also i ran the test_spatial_narrow_as_op.py file successfully, but when i tried to run test_zero_even_op.py
i got this OSError
OSError: /home/sriram/DensePose/build/libcaffe2_detectron_custom_ops_gpu.so: undefined symbol: _ZN6caffe219CPUOperatorRegistryB5cxx11Ev
Somewhere i have seen that it will only work on gcc-4.9.2 but when i tried in colab there the default gcc version is 7.5.0 , mine with the same 7.5.0 didnt solve the issue

I heard somewhere that ResNet50 will work fine for inference if vram is > 4GB and < 2GB, didnt understand out of memory error

Please someone help me out, i am struggling to make it from the last 7 days

@vkhalidov
Copy link
Contributor

Hello @SriRamGovardhanam,

  1. For what concerns GPU OOM, you can track memory usage using nvidia-smi; in particular, please verify that no process occupies GPU memory prior to launching the inference; also, reducing input image size would decrease the amount of required GPU memory;
  2. Please use ldd to track dependencies of libcaffe2_detectron_custom_ops_gpu.so and nm to verify signatures;
  3. Finally, DensePose is now available as a project inside Detectron2. It's based on PyTorch, it is faster and requires less memory. You might want to give it a try

@SriRamGovardhanam
Copy link
Author

Hello @SriRamGovardhanam,

  1. For what concerns GPU OOM, you can track memory usage using nvidia-smi; in particular, please verify that no process occupies GPU memory prior to launching the inference; also, reducing input image size would decrease the amount of required GPU memory;
  2. Please use ldd to track dependencies of libcaffe2_detectron_custom_ops_gpu.so and nm to verify signatures;
  3. Finally, DensePose is now available as a project inside Detectron2. It's based on PyTorch, it is faster and requires less memory. You might want to give it a try

Hey, thank you so much for responding
Anyway i have checked 1, 2 points you mentioned which are well and good
still the same output, even after resizing the image .
I think maybe if there is no other possibility then i should go with 3rd point.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants