-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: CUDA error: device-side assert triggered #69
Comments
@PhamLeQuangNhat did you solved this? could you share your solution Please? |
@bharatsubedi |
@PhamLeQuangNhat |
I am using Korean dataset for training and I only modified character set. what is mean by wrong class index? could you explain please? Special characters also included in my data |
For example, if the channel dimension of your logits is C, and your gt has a number lager than C. Using this logits and gt to compute cross entropy loss will cause errors like |
@ChaseMonsterAway could you please let me know which part of code I should change for solving that problem? The length of character set is 1800 in my case, |
I think you should change the ignore index of |
@ChaseMonsterAway after changing criterion equal to |
@bharatsubedi |
@ChaseMonsterAway |
@ChaseMonsterAway batch_text[i][:len(text)] = torch.LongTensor(text) @bharatsubedi |
@PhamLeQuangNhat |
@bharatsubedi @PhamLeQuangNhat |
@bharatsubedi @PhamLeQuangNhat |
@ChaseMonsterAway Yes, I tried to run the latest code of cstr branch. It works well. Thank you very much. |
When i was training my own dataset, I modified following in the config flie:
character = 'aAàÀảẢãÃáÁạẠăĂằẰẳẲẵẴắẮặẶâÂầẦẩẨẫẪấẤậẬbBcCdDđĐeEèÈẻẺẽẼéÉẹẸêÊềỀểỂễỄếẾệỆfFgGhHiIìÌỉỈĩĨíÍịỊj
JkKlLmMnNoOòÒỏỎõÕóÓọỌôÔồỒổỔỗỖốỐộỘơƠờỜởỞỡỠớỚợỢpPqQrRsStTuUùÙủỦũŨúÚụỤưƯừỪửỬữỮ
ứỨựỰvVwWxXyYỳỲỷỶỹỸýÝỵỴzZ0123456789'
batch_max_length = 25
num_class = len(character) + 1 # num_class = 197
gpu_id='5,7'
and I ran the command: bash tools/dist_train.sh configs/stn_cstr.py 2
then I got the error:
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [24,0,0] Assertion
t >= 0 && t < n_classes
failed./pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [25,0,0] Assertion
t >= 0 && t < n_classes
failed.Traceback (most recent call last):
File "/home/recognition/vedastr-cstr/tools/train.py", line 49, in
main()
File "/home/recognition/vedastr-cstr/tools/train.py", line 45, in main
runner()
File "/home/recognition/vedastr-cstr/tools/../vedastr/runners/train_runner.py", line 165, in call
self._train_batch(img, label)
File "/home/recognition/vedastr-cstr/tools/../vedastr/runners/train_runner.py", line 118, in _train_batch
loss.backward()
File "/root/anaconda3/envs/vedastr/lib/python3.9/site-packages/torch/tensor.py", line 245, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/root/anaconda3/envs/vedastr/lib/python3.9/site-packages/torch/autograd/init.py", line 145, in backward
Variable._execution_engine.run_backward(
RuntimeError: CUDA error: device-side assert triggered
What might cause it? How to fix it? Thank in advance.
The text was updated successfully, but these errors were encountered: