Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM for TECO GAN #224

Open
stalagmite7 opened this issue Jun 24, 2021 · 9 comments · Fixed by #225
Open

OOM for TECO GAN #224

stalagmite7 opened this issue Jun 24, 2021 · 9 comments · Fixed by #225

Comments

@stalagmite7
Copy link

Seems like using even a height of 360 (whicle maintaining aspect ratio) for tecogan gives runtime OOM errors; whats the largest size possible that I can use to try to upscale to 4k? I imagine if I want to upscale to 4k, I would use 1080p as the resolution for my input but its too big for the GPU to handle; if there a way to use only CPU for this?

@TakuyaNarihira
Copy link
Contributor

Thanks for reporting.

It's probably because the clear_buffer option in forward() method is not specified in the following code block.
https://github.com/sony/nnabla-examples/blob/master/GANs/tecogan/generate.py#L83-L85

With .forward(clear_buffer=True), it will aggressively release unused memory in the network.

Could you try this quickly?

            pre_gen_warp.forward(clear_buffer=True)
            pre_warp.data.copy_from(pre_gen_warp.data)
        outputs.forward(clear_buffer=True)

We'll also see if it works properly and reduces memory usage later soon.

@stalagmite7
Copy link
Author

stalagmite7 commented Jun 24, 2021 via email

@stalagmite7
Copy link
Author

Tried this, got a invalid configuration error from CUDA

Error during forward propagation:
  TransposeCuda <-- ERROR
Traceback (most recent call last):
  File "generate.py", line 105, in <module>
    main()
  File "generate.py", line 84, in main
    pre_gen_warp.forward(clear_buffer=True)
  File "_variable.pyx", line 564, in nnabla._variable.Variable.forward
RuntimeError: target_specific error in forward_impl
/home/gitlab-runner/builds/zxvvzZDJ/0/nnabla/builders/all/nnabla-ext-cuda/src/nbla/cuda/function/./generic/transpose.cu:184
(cudaGetLastError()) failed with "invalid configuration argument" (cudaErrorInvalidConfiguration).

Cursory checking looks like it could be a number of blocks error from CUDA. Will need to dig in further on my end later today.

@TakuyaNarihira
Copy link
Contributor

TakuyaNarihira commented Jun 25, 2021

Looks it exceeds the limitation of the number of blocks. We should introduce the grid-strided loop in CUDA kernel. I created a issue in sony/nnabla-ext-cuda#321 (Let's continue there on this specific matter).

Btw, how long is your input video sequence?

@stalagmite7
Copy link
Author

Checking back in, I know it says the fix has been deployed but the OOM error persists. Like I asked before, what is the maximum size possible that I can upscale a video to? I am trying 1080 -> 4k but I get the OOM errors. Seems to work for smaller video sizes, so does that mean 1080p cases won't be handled by this implementation?

@Srinidhi-Srinivasa
Copy link
Contributor

Srinidhi-Srinivasa commented Feb 1, 2022

Checking back in, I know it says the fix has been deployed but the OOM error persists. Like I asked before, what is the maximum size possible that I can upscale a video to? I am trying 1080 -> 4k but I get the OOM errors. Seems to work for smaller video sizes, so does that mean 1080p cases won't be handled by this implementation?

@stalagmite7, is it possible to share more information about computation environment?

@Srinidhi-Srinivasa
Copy link
Contributor

Srinidhi-Srinivasa commented Feb 3, 2022

Checking back in, I know it says the fix has been deployed but the OOM error persists. Like I asked before, what is the maximum size possible that I can upscale a video to? I am trying 1080 -> 4k but I get the OOM errors. Seems to work for smaller video sizes, so does that mean 1080p cases won't be handled by this implementation?

@stalagmite7
Following are approximate memory requirements to run TeCoGAN:

Resolution Peak Memory Usage (in MB)
144p 708
280p 2816
360p 4074
480p 6818

Please note that it may not be possible to run TeCoGAN on any resolution higher than this on GPUs which have upto 32 GB of memory.

Current pre-trained weights are in NHWC (channel last) format which is not supported in CPU version. However, it is indeed possible to run inference on CPU-only by transposing weights into NCHW format and setting "channel_last" flag to "False" in PF.conv functions.
Following are reference codes for that:
Memory-Layout-Conversion
convert_parameter_format.py

@stalagmite7
Copy link
Author

Sorry it took me so long; the GPU is a Nvidia 3060 Ti . The input video as I mentioned was 1080p resolution; you're saying this is too high to get TecoGan to try to process, then?

@Srinidhi-Srinivasa
Copy link
Contributor

Sorry it took me so long; the GPU is a Nvidia 3060 Ti . The input video as I mentioned was 1080p resolution; you're saying this is too high to get TecoGan to try to process, then?

Yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants