Facing context_gpu.cu:415: out of memory while running inference #269

SriRamGovardhanam · 2020-04-24T22:01:22Z

Hello
While running inference
cd DensePose && python2 tools/infer_simple.py \ --cfg configs/DensePose_ResNet50_FPN_s1x-e2e.yaml \ --output-dir DensePoseData/infer_out/ \ --image-ext jpg \ --wts https://dl.fbaipublicfiles.com/densepose/DensePose_ResNet50_FPN_s1x-e2e.pkl \ DensePoseData/demo_data/demo_im.jpg

I got this error

`Found Detectron ops lib: /home/sriram/anaconda2/lib/libcaffe2_detectron_ops_gpu.so

[E init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.

[E init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.

[E init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
WARNING cnn.py: 25: [====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information.
INFO net.py: 51: Loading weights from: /tmp/detectron-download-cache/DensePose_ResNet50_FPN_s1x-e2e.pkl
[I net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 5.0165e-05 secs
[I net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 4.2673e-05 secs
[I net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 1.1072e-05 secs
INFO infer_simple.py: 103: Processing DensePoseData/demo_data/demo_im.jpg -> DensePoseData/infer_out/demo_im.jpg.pdf
[I net_async_base.h:211] Using specified CPU pool size: 4; device id: -1
[I net_async_base.h:216] Created new CPU pool, size: 4; device id: -1
[E net_async_base.cc:377] [enforce fail at context_gpu.cu:415] error == cudaSuccess. 2 vs 0. Error at: /opt/conda/conda-bld/pytorch_1549617926868/work/caffe2/core/context_gpu.cu:415: out of memory

Error from operator:
input: "gpu_0/res2_2_sum" input: "gpu_0/fpn_inner_res2_2_sum_lateral_w" input: "gpu_0/fpn_inner_res2_2_sum_lateral_b" output: "gpu_0/fpn_inner_res2_2_sum_lateral" name: "" type: "Conv" arg { name: "kernel" i: 1 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 0 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 device_id: 0 } engine: "CUDNN"frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x59 (0x7f9992a2a339 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so)

frame #1: + 0x29581fc (0x7f99958101fc in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)

frame #2: + 0x12a6095 (0x7f999415e095 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)

frame #3: + 0x145bb94 (0x7f9994313b94 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)

frame #4: bool caffe2::CudnnConvOp::DoRunWithType<float, float, float, float>() + 0x3d9 (0x7f9994320dc9 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)

frame #5: caffe2::CudnnConvOp::RunOnDevice() + 0x1b0 (0x7f9994308100 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)

frame #6: + 0x13ae835 (0x7f9994266835 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)

frame #7: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7f99b8480a24 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)

frame #8: + 0x1493dc2 (0x7f99b8487dc2 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)

frame #9: c10::ThreadPool::main_loop(unsigned long) + 0x258 (0x7f99b75cf6f8 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)

frame #10: + 0xb8678 (0x7f99c8e48678 in /home/sriram/anaconda2/bin/../lib/libstdc++.so.6)
frame #11: + 0x76db (0x7f99cff886db in /lib/x86_64-linux-gnu/libpthread.so.0)

frame #12: clone + 0x3f (0x7f99cf50c88f in /lib/x86_64-linux-gnu/libc.so.6)
, op Conv
[E net_async_base.cc:377] [enforce fail at context_gpu.cu:415] error == cudaSuccess. 2 vs 0. Error at: /opt/conda/conda-bld/pytorch_1549617926868/work/caffe2/core/context_gpu.cu:415: out of memory

Error from operator:
input: "gpu_0/res3_0_branch2c" input: "gpu_0/res3_0_branch2c_bn_s" input: "gpu_0/res3_0_branch2c_bn_b" output: "gpu_0/res3_0_branch2c_bn" name: "" type: "AffineChannel" device_option { device_type: 1 device_id: 0 }frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x59 (0x7f9992a2a339 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #1: + 0x29581fc (0x7f99958101fc in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #2: + 0x12a6095 (0x7f999415e095 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #3: + 0x2afcd73 (0x7f99959b4d73 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #4: + 0x13ae835 (0x7f9994266835 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #5: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7f99b8480a24 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #6: + 0x1493dc2 (0x7f99b8487dc2 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #7: c10::ThreadPool::main_loop(unsigned long) + 0x258 (0x7f99b75cf6f8 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #8: + 0xb8678 (0x7f99c8e48678 in /home/sriram/anaconda2/bin/../lib/libstdc++.so.6)
frame #9: + 0x76db (0x7f99cff886db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #10: clone + 0x3f (0x7f99cf50c88f in /lib/x86_64-linux-gnu/libc.so.6)
, op AffineChannel
[E net_async_base.cc:129] Rethrowing exception from the run of 'generalized_rcnn'
WARNING workspace.py: 204: Original python traceback for operator 44 in network generalized_rcnn in exception above (most recent call last):
WARNING workspace.py: 209: File "tools/infer_simple.py", line 140, in
WARNING workspace.py: 209: File "tools/infer_simple.py", line 91, in main
WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/core/test_engine.py", line 334, in initialize_model_from_cfg
WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/model_builder.py", line 119, in create
WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/model_builder.py", line 84, in generalized_rcnn
WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/model_builder.py", line 233, in build_generic_detection_model
WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/optimizer.py", line 46, in build_data_parallel_model
WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/model_builder.py", line 165, in _single_gpu_build_func
WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/FPN.py", line 40, in add_fpn_ResNet50_conv5_body
WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/FPN.py", line 96, in add_fpn_onto_conv_body
WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/ResNet.py", line 32, in add_ResNet50_conv5_body
WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/ResNet.py", line 98, in add_ResNet_convX_body
WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/ResNet.py", line 77, in add_stage
WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/ResNet.py", line 174, in add_residual_block
WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/ResNet.py", line 322, in bottleneck_transformation
WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/detector.py", line 402, in ConvAffine
WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/detector.py", line 97, in AffineChannel

Traceback (most recent call last):
File "tools/infer_simple.py", line 140, in
main(args)
File "tools/infer_simple.py", line 109, in main
model, im, None, timers=timers
File "/home/sriram/DensePose/detectron/core/test.py", line 58, in im_detect_all
model, im, cfg.TEST.SCALE, cfg.TEST.MAX_SIZE, boxes=box_proposals
File "/home/sriram/DensePose/detectron/core/test.py", line 158, in im_detect_bbox
workspace.RunNet(model.net.Proto().name)
File "/home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/workspace.py", line 236, in RunNet
StringifyNetName(name), num_iter, allow_fail,
File "/home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/workspace.py", line 197, in CallWithExceptionIntercept
return func(args, kwargs)
RuntimeError: [enforce fail at context_gpu.cu:415] error == cudaSuccess. 2 vs 0. Error at: /opt/conda/conda-bld/pytorch_1549617926868/work/caffe2/core/context_gpu.cu:415: out of memory
Error from operator:
input: "gpu_0/res2_2_sum" input: "gpu_0/fpn_inner_res2_2_sum_lateral_w" input: "gpu_0/fpn_inner_res2_2_sum_lateral_b" output: "gpu_0/fpn_inner_res2_2_sum_lateral" name: "" type: "Conv" arg { name: "kernel" i: 1 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 0 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 device_id: 0 } engine: "CUDNN"frame #0: c10::ThrowEnforceNotMet(char const, int, char const, std::string const&, void const) + 0x59 (0x7f9992a2a339 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #1: + 0x29581fc (0x7f99958101fc in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #2: + 0x12a6095 (0x7f999415e095 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #3: + 0x145bb94 (0x7f9994313b94 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #4: bool caffe2::CudnnConvOp::DoRunWithType<float, float, float, float>() + 0x3d9 (0x7f9994320dc9 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #5: caffe2::CudnnConvOp::RunOnDevice() + 0x1b0 (0x7f9994308100 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #6: + 0x13ae835 (0x7f9994266835 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #7: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7f99b8480a24 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #8: + 0x1493dc2 (0x7f99b8487dc2 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #9: c10::ThreadPool::main_loop(unsigned long) + 0x258 (0x7f99b75cf6f8 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #10: + 0xb8678 (0x7f99c8e48678 in /home/sriram/anaconda2/bin/../lib/libstdc++.so.6)
frame #11: + 0x76db (0x7f99cff886db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #12: clone + 0x3f (0x7f99cf50c88f in /lib/x86_64-linux-gnu/libc.so.6)
`

also i ran the test_spatial_narrow_as_op.py file successfully, but when i tried to run test_zero_even_op.py
i got this OSError
OSError: /home/sriram/DensePose/build/libcaffe2_detectron_custom_ops_gpu.so: undefined symbol: _ZN6caffe219CPUOperatorRegistryB5cxx11Ev
Somewhere i have seen that it will only work on gcc-4.9.2 but when i tried in colab there the default gcc version is 7.5.0 , mine with the same 7.5.0 didnt solve the issue

I heard somewhere that ResNet50 will work fine for inference if vram is > 4GB and < 2GB, didnt understand out of memory error

Please someone help me out, i am struggling to make it from the last 7 days

The text was updated successfully, but these errors were encountered:

vkhalidov · 2020-04-24T22:58:08Z

Hello @SriRamGovardhanam,

For what concerns GPU OOM, you can track memory usage using nvidia-smi; in particular, please verify that no process occupies GPU memory prior to launching the inference; also, reducing input image size would decrease the amount of required GPU memory;
Please use ldd to track dependencies of libcaffe2_detectron_custom_ops_gpu.so and nm to verify signatures;
Finally, DensePose is now available as a project inside Detectron2. It's based on PyTorch, it is faster and requires less memory. You might want to give it a try

SriRamGovardhanam · 2020-04-25T17:21:50Z

Hello @SriRamGovardhanam,

For what concerns GPU OOM, you can track memory usage using nvidia-smi; in particular, please verify that no process occupies GPU memory prior to launching the inference; also, reducing input image size would decrease the amount of required GPU memory;

Please use ldd to track dependencies of libcaffe2_detectron_custom_ops_gpu.so and nm to verify signatures;

Finally, DensePose is now available as a project inside Detectron2. It's based on PyTorch, it is faster and requires less memory. You might want to give it a try

Hey, thank you so much for responding
Anyway i have checked 1, 2 points you mentioned which are well and good
still the same output, even after resizing the image .
I think maybe if there is no other possibility then i should go with 3rd point.

SriRamGovardhanam mentioned this issue Apr 25, 2020

issue when running python $DENSEPOSE/detectron/tests/test_zero_even_op.py Johnqczhang/densepose_installation#8

Open

SriRamGovardhanam mentioned this issue Apr 25, 2020

Facing error while trying to run inference trrahul/densepose-video#14

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Facing context_gpu.cu:415: out of memory while running inference #269

Facing context_gpu.cu:415: out of memory while running inference #269

SriRamGovardhanam commented Apr 24, 2020 •

edited

Loading

vkhalidov commented Apr 24, 2020

SriRamGovardhanam commented Apr 25, 2020

Facing context_gpu.cu:415: out of memory while running inference #269

Facing context_gpu.cu:415: out of memory while running inference #269

Comments

SriRamGovardhanam commented Apr 24, 2020 • edited Loading

vkhalidov commented Apr 24, 2020

SriRamGovardhanam commented Apr 25, 2020

SriRamGovardhanam commented Apr 24, 2020 •

edited

Loading