What seems to be the problem? #12261

blackbolttt · 2023-10-20T18:05:48Z

blackbolttt
Oct 20, 2023

AutoAnchor: 4.39 anchors/target, 0.996 Best Possible Recall (BPR). Current anchors are a good fit to dataset ✅
Plotting labels to runs/train/exp5/labels.jpg...
Image sizes 640 train, 640 val
Using 2 dataloader workers
Logging results to runs/train/exp5
Starting training for 10 epochs...

  Epoch    GPU_mem   box_loss   obj_loss   cls_loss  Instances       Size
    0/9       1.8G     0.1074     0.1802    0.03663        263        640: 100%|██████████| 337/337 [07:49<00:00,  1.39s/it]
             Class     Images  Instances          P          R      mAP50   mAP50-95:   3%|▎         | 1/29 [00:03<01:27,  3.13s/it]WARNING ⚠️ NMS time limit 1.300s exceeded
             Class     Images  Instances          P          R      mAP50   mAP50-95:  10%|█         | 3/29 [00:07<00:56,  2.16s/it]WARNING ⚠️ NMS time limit 1.300s exceeded
             Class     Images  Instances          P          R      mAP50   mAP50-95:  17%|█▋        | 5/29 [00:12<00:54,  2.25s/it]WARNING ⚠️ NMS time limit 1.300s exceeded
             Class     Images  Instances          P          R      mAP50   mAP50-95:  21%|██        | 6/29 [00:15<01:00,  2.61s/it]WARNING ⚠️ NMS time limit 1.300s exceeded

../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [12,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [13,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [14,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [15,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [16,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [17,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
Class Images Instances P R mAP50 mAP50-95: 21%|██ | 6/29 [00:22<01:24, 3.68s/it]
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7fe6777a5617 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7fe67776098d in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7fe6a1b28518 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
frame #3: + 0x125f71d (0x7fe6075c371d in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: + 0x5197b6 (0x7fe670e9f7b6 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so)
frame #5: + 0x55ca7 (0x7fe67778aca7 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #6: c10::TensorImpl::~TensorImpl() + 0x1e3 (0x7fe677782cb3 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #7: c10::TensorImpl::~TensorImpl() + 0x9 (0x7fe677782e49 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #8: + 0x7c9738 (0x7fe67114f738 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so)
frame #9: THPVariable_subclass_dealloc(_object*) + 0x305 (0x7fe67114fac5 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so)

frame #32: + 0x29d90 (0x7fe6ab253d90 in /lib/x86_64-linux-gnu/libc.so.6)
frame #33: __libc_start_main + 0x80 (0x7fe6ab253e40 in /lib/x86_64-linux-gnu/libc.so.6)

I am getting this error while training my custom dataset. What might be the reason of this error? I am running my code in Google Colab.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What seems to be the problem? #12261

{{title}}

Replies: 0 comments

Select a reply

What seems to be the problem? #12261

blackbolttt Oct 20, 2023

Replies: 0 comments

blackbolttt
Oct 20, 2023