-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
单机单卡一直显示oom #47
Comments
Maybe we need more error logs to reproduce it @Lllllolita |
这是运行python -m up train --config configs/det/efl/efl_yolox_medium_test.yaml --nm 1 --ng 1 --launch pytorch 2>&1 | tee log.train输出的log文件。 这两个文件是运行./easy_setup.sh输出的log文件。 |
the batch_size in your log is 8, maybe you need to recompile and export TORCH_CUDA_ARCH_LIST='3.5;5.0+PTX;6.0;7.0;8.0;8.6' in easy_setup.sh @Lllllolita |
非常感谢您的建议,现在我单机单卡设置batch是4是可以成功运行的,但是运行单机多卡仍然失败了,torch.cuda.aviable()显示的是False。 |
Maybe you need to check your cuda env ? @Lllllolita |
@Lllllolita Hi, have you solved the problem torch.cuda.aviable() is False when the number of gpus > 1? I met the same problem now, how to solve it ? |
@yqyao Why are the version requirements so difficult? |
本人使用up框架复现efl遇到如下问题,服务器显卡内存充足,但是一直显示oom,已经检查服务器显卡没有占用显存的僵尸进程,并且设置batch为1仍然显示oom,本人服务器配置如下:
python:3.7
cuda :11.3
torch:1.10.0
gpu:RTX3090
config:configs/det/efl/efl_yolox_medium.yaml
请问可能是什么问题呢
The text was updated successfully, but these errors were encountered: