You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank the author for this amazing repository. I am having problems with training the model with multiple GPUs and I wonder if anyone else is also having the problem. The training is fine when using a a single RTX3090, but whenever I tried to use 2 GPUs with the following command: python main.py configs/resa/resa34_openlane.py --gpus 0 1
The following error occurs:
/home/anaconda3/envs/lanedet/lib/python3.8/site-packages/torch/nn/parallel/functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.
warnings.warn('Was asked to gather along dimension 0, but all '
Traceback (most recent call last):
File "main.py", line 66, in
main()
File "main.py", line 36, in main
runner.train()
File "/home/Documents/git/lanedet/lanedet/engine/runner.py", line 99, in train
self.train_epoch(epoch, train_loader)
File "/home/Documents/git/lanedet/lanedet/engine/runner.py", line 75, in train_epoch
loss.backward()
File "/home/anaconda3/envs/lanedet/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/anaconda3/envs/lanedet/lib/python3.8/site-packages/torch/autograd/init.py", line 141, in backward
grad_tensors = make_grads(tensors, grad_tensors)
File "/home/anaconda3/envs/lanedet/lib/python3.8/site-packages/torch/autograd/init.py", line 50, in _make_grads
raise RuntimeError("grad can be implicitly created only for scalar outputs")
RuntimeError: grad can be implicitly created only for scalar outputs
After searching on the Internet, I found out that this error can be avoided by changing loss.backward() to loss.sum().backward(). However, this would cause the recorder and logging function to fail:
--- Logging error ---
Traceback (most recent call last):
File "/home/anaconda3/envs/lanedet/lib/python3.8/logging/init.py", line 1085, in emit
msg = self.format(record)
File "/home/anaconda3/envs/lanedet/lib/python3.8/logging/init.py", line 929, in format
return fmt.format(record)
File "/home/anaconda3/envs/lanedet/lib/python3.8/logging/init.py", line 668, in format
record.message = record.getMessage()
File "/home/anaconda3/envs/lanedet/lib/python3.8/logging/init.py", line 371, in getMessage
msg = str(self.msg)
File "/home/Documents/git/lanedet/lanedet/utils/recorder.py", line 116, in str
loss_state.append('{}: {:.4f}'.format(k, v.avg))
File "/home/Documents/git/lanedet/lanedet/utils/recorder.py", line 32, in avg
d = torch.tensor(list(self.deque))
ValueError: only one element tensors can be converted to Python scalars
Call stack:
File "main.py", line 66, in
main()
File "main.py", line 36, in main
runner.train()
File "/home/Documents/git/lanedet/lanedet/engine/runner.py", line 99, in train
self.train_epoch(epoch, train_loader)
File "/home/Documents/git/lanedet/lanedet/engine/runner.py", line 89, in train_epoch
self.recorder.record('train')
File "/home/Documents/git/lanedet/lanedet/utils/recorder.py", line 97, in record
self.logger.info(self)
Message: <lanedet.utils.recorder.Recorder object at 0x7fd865ac7eb0>
Arguments: ()
Does anyone have a idea how to solve this? Any help is appreciated! Thank you.
The text was updated successfully, but these errors were encountered:
Hi,
Thank the author for this amazing repository. I am having problems with training the model with multiple GPUs and I wonder if anyone else is also having the problem. The training is fine when using a a single RTX3090, but whenever I tried to use 2 GPUs with the following command:
python main.py configs/resa/resa34_openlane.py --gpus 0 1
The following error occurs:
/home/anaconda3/envs/lanedet/lib/python3.8/site-packages/torch/nn/parallel/functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.
warnings.warn('Was asked to gather along dimension 0, but all '
Traceback (most recent call last):
File "main.py", line 66, in
main()
File "main.py", line 36, in main
runner.train()
File "/home/Documents/git/lanedet/lanedet/engine/runner.py", line 99, in train
self.train_epoch(epoch, train_loader)
File "/home/Documents/git/lanedet/lanedet/engine/runner.py", line 75, in train_epoch
loss.backward()
File "/home/anaconda3/envs/lanedet/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/anaconda3/envs/lanedet/lib/python3.8/site-packages/torch/autograd/init.py", line 141, in backward
grad_tensors = make_grads(tensors, grad_tensors)
File "/home/anaconda3/envs/lanedet/lib/python3.8/site-packages/torch/autograd/init.py", line 50, in _make_grads
raise RuntimeError("grad can be implicitly created only for scalar outputs")
RuntimeError: grad can be implicitly created only for scalar outputs
After searching on the Internet, I found out that this error can be avoided by changing
loss.backward()
toloss.sum().backward()
. However, this would cause the recorder and logging function to fail:--- Logging error ---
Traceback (most recent call last):
File "/home/anaconda3/envs/lanedet/lib/python3.8/logging/init.py", line 1085, in emit
msg = self.format(record)
File "/home/anaconda3/envs/lanedet/lib/python3.8/logging/init.py", line 929, in format
return fmt.format(record)
File "/home/anaconda3/envs/lanedet/lib/python3.8/logging/init.py", line 668, in format
record.message = record.getMessage()
File "/home/anaconda3/envs/lanedet/lib/python3.8/logging/init.py", line 371, in getMessage
msg = str(self.msg)
File "/home/Documents/git/lanedet/lanedet/utils/recorder.py", line 116, in str
loss_state.append('{}: {:.4f}'.format(k, v.avg))
File "/home/Documents/git/lanedet/lanedet/utils/recorder.py", line 32, in avg
d = torch.tensor(list(self.deque))
ValueError: only one element tensors can be converted to Python scalars
Call stack:
File "main.py", line 66, in
main()
File "main.py", line 36, in main
runner.train()
File "/home/Documents/git/lanedet/lanedet/engine/runner.py", line 99, in train
self.train_epoch(epoch, train_loader)
File "/home/Documents/git/lanedet/lanedet/engine/runner.py", line 89, in train_epoch
self.recorder.record('train')
File "/home/Documents/git/lanedet/lanedet/utils/recorder.py", line 97, in record
self.logger.info(self)
Message: <lanedet.utils.recorder.Recorder object at 0x7fd865ac7eb0>
Arguments: ()
Does anyone have a idea how to solve this? Any help is appreciated! Thank you.
The text was updated successfully, but these errors were encountered: