You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Again, I have a question about the process of train, when I use your guidline ,and there are some error:
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
Traceback (most recent call last):
File "tools/train.py", line 167, in
main()
File "tools/train.py", line 98, in main
init_dist(args.launcher, **cfg.dist_params)
File "/home/dongxiaoxiao/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/dist_utils.py", line 20, in init_dist
_init_dist_pytorch(backend, **kwargs)
File "/home/dongxiaoxiao/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/dist_utils.py", line 34, in _init_dist_pytorch
dist.init_process_group(backend=backend, **kwargs)
File "/home/dongxiaoxiao/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 422, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "/home/dongxiaoxiao/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/rendezvous.py", line 172, in _env_rendezvous_handler
store = TCPStore(master_addr, master_port, world_size, start_daemon, timeout)
RuntimeError: Permission denied
Traceback (most recent call last):
File "/home/dongxiaoxiao/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/dongxiaoxiao/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/dongxiaoxiao/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 261, in
main()
File "/home/dongxiaoxiao/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 257, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/dongxiaoxiao/anaconda3/envs/open-mmlab/bin/python', '-u', 'tools/train.py', '--local_rank=3', '--config', 'configs/foodnet/fpn_r50_512x1024_80k_RM.py', '--work-dir', 'checkpoints/FPN_r50_RM', '--launcher', 'pytorch']' returned non-zero exit status 1.
And I remember that about 20 days ago,I saw some said "--launcher pytorch" may cause some questions, but I don`t know, hope your reply ,Thanks a lot!
The text was updated successfully, but these errors were encountered:
@Mark1Dong sorry for replying late since I am super busy recently. Can you first paste your environment (OS, GPU etc.) and I can check it in more details?
Again, I have a question about the process of train, when I use your guidline ,and there are some error:
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
Traceback (most recent call last):
File "tools/train.py", line 167, in
main()
File "tools/train.py", line 98, in main
init_dist(args.launcher, **cfg.dist_params)
File "/home/dongxiaoxiao/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/dist_utils.py", line 20, in init_dist
_init_dist_pytorch(backend, **kwargs)
File "/home/dongxiaoxiao/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/dist_utils.py", line 34, in _init_dist_pytorch
dist.init_process_group(backend=backend, **kwargs)
File "/home/dongxiaoxiao/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 422, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "/home/dongxiaoxiao/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/rendezvous.py", line 172, in _env_rendezvous_handler
store = TCPStore(master_addr, master_port, world_size, start_daemon, timeout)
RuntimeError: Permission denied
Traceback (most recent call last):
File "/home/dongxiaoxiao/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/dongxiaoxiao/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/dongxiaoxiao/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 261, in
main()
File "/home/dongxiaoxiao/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 257, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/dongxiaoxiao/anaconda3/envs/open-mmlab/bin/python', '-u', 'tools/train.py', '--local_rank=3', '--config', 'configs/foodnet/fpn_r50_512x1024_80k_RM.py', '--work-dir', 'checkpoints/FPN_r50_RM', '--launcher', 'pytorch']' returned non-zero exit status 1.
And I remember that about 20 days ago,I saw some said "--launcher pytorch" may cause some questions, but I don`t know, hope your reply ,Thanks a lot!
The text was updated successfully, but these errors were encountered: