Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

单机单卡一直显示oom #47

Open
Lllllolita opened this issue Nov 14, 2022 · 7 comments
Open

单机单卡一直显示oom #47

Lllllolita opened this issue Nov 14, 2022 · 7 comments

Comments

@Lllllolita
Copy link

本人使用up框架复现efl遇到如下问题,服务器显卡内存充足,但是一直显示oom,已经检查服务器显卡没有占用显存的僵尸进程,并且设置batch为1仍然显示oom,本人服务器配置如下:
python:3.7
cuda :11.3
torch:1.10.0
gpu:RTX3090
config:configs/det/efl/efl_yolox_medium.yaml
请问可能是什么问题呢

@yqyao
Copy link

yqyao commented Nov 16, 2022

Maybe we need more error logs to reproduce it @Lllllolita

@Lllllolita
Copy link
Author

这是运行python -m up train --config configs/det/efl/efl_yolox_medium_test.yaml --nm 1 --ng 1 --launch pytorch 2>&1 | tee log.train输出的log文件。
train.log

这两个文件是运行./easy_setup.sh输出的log文件。
compile.log
compile_err.log

@yqyao
Copy link

yqyao commented Nov 16, 2022

the batch_size in your log is 8, maybe you need to recompile and export TORCH_CUDA_ARCH_LIST='3.5;5.0+PTX;6.0;7.0;8.0;8.6' in easy_setup.sh @Lllllolita

@Lllllolita
Copy link
Author

非常感谢您的建议,现在我单机单卡设置batch是4是可以成功运行的,但是运行单机多卡仍然失败了,torch.cuda.aviable()显示的是False。
指令是:python -m up train --config configs/det/efl/efl_yolox_medium_test.yaml --nm 1 --ng 2 --launch pytorch 2>&1 | tee log.train
train(1).log

@yqyao
Copy link

yqyao commented Nov 18, 2022

Maybe you need to check your cuda env ? @Lllllolita

@happygds
Copy link

@Lllllolita Hi, have you solved the problem torch.cuda.aviable() is False when the number of gpus > 1? I met the same problem now, how to solve it ?

@happygds
Copy link

@yqyao Why are the version requirements so difficult?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants