You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I encountered some issues when finetuning on the realrobot dataset. The error occurs intermittently and may not occur at other times.
Error executing job with overrides: ['training=finetune', 'dataset=realrobot']
Traceback (most recent call last):
File "/home/dodo/ljx/LIV/liv/train_liv.py", line 194, in main
workspace.train()
File "/home/dodo/ljx/LIV/liv/train_liv.py", line 100, in train
metrics, st = trainer.update(self.model, batch, self.global_step)
File "/home/dodo/ljx/LIV/liv/trainer.py", line 122, in update
model.module.encoder_opt.step()
File "/home/dodo/miniconda3/envs/liv-env/lib/python3.9/site-packages/torch/optim/optimizer.py", line 140, in wrapper
out = func(*args, **kwargs)
File "/home/dodo/miniconda3/envs/liv-env/lib/python3.9/site-packages/torch/optim/optimizer.py", line 23, in _use_grad
ret = func(self, *args, **kwargs)
File "/home/dodo/miniconda3/envs/liv-env/lib/python3.9/site-packages/torch/optim/adam.py", line 218, in step
state['exp_avg'] = torch.zeros_like(p, memory_format=torch.preserve_format)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
The text was updated successfully, but these errors were encountered:
Hi,
thanks for sharing the code and the model!
I encountered some issues when finetuning on the realrobot dataset. The error occurs intermittently and may not occur at other times.
The text was updated successfully, but these errors were encountered: