What seems to be the problem? #12261
Unanswered
blackbolttt
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
AutoAnchor: 4.39 anchors/target, 0.996 Best Possible Recall (BPR). Current anchors are a good fit to dataset ✅
Plotting labels to runs/train/exp5/labels.jpg...
Image sizes 640 train, 640 val
Using 2 dataloader workers
Logging results to runs/train/exp5
Starting training for 10 epochs...
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [12,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [13,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [14,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [15,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [16,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [17,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.Class Images Instances P R mAP50 mAP50-95: 21%|██ | 6/29 [00:22<01:24, 3.68s/it]
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7fe6777a5617 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7fe67776098d in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7fe6a1b28518 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
frame #3: + 0x125f71d (0x7fe6075c371d in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: + 0x5197b6 (0x7fe670e9f7b6 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so)
frame #5: + 0x55ca7 (0x7fe67778aca7 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #6: c10::TensorImpl::~TensorImpl() + 0x1e3 (0x7fe677782cb3 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #7: c10::TensorImpl::~TensorImpl() + 0x9 (0x7fe677782e49 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #8: + 0x7c9738 (0x7fe67114f738 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so)
frame #9: THPVariable_subclass_dealloc(_object*) + 0x305 (0x7fe67114fac5 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so)
frame #32: + 0x29d90 (0x7fe6ab253d90 in /lib/x86_64-linux-gnu/libc.so.6)
frame #33: __libc_start_main + 0x80 (0x7fe6ab253e40 in /lib/x86_64-linux-gnu/libc.so.6)
I am getting this error while training my custom dataset. What might be the reason of this error? I am running my code in Google Colab.
Beta Was this translation helpful? Give feedback.
All reactions