RuntimeError: The size of tensor a (4) must match the size of tensor b (400) at non-singleton dimension 1 #17

olojuwin · 2024-12-31T16:52:00Z

I encountered this bug with the latest code.

DEIM/engine/deim/hybrid_encoder.py", line 243, in with_pos_embed
[rank0]:     return tensor if pos_embed is None else tensor + pos_embed

RuntimeError: The size of tensor a (4) must match the size of tensor b (400) at non-singleton dimension 1

The text was updated successfully, but these errors were encountered:

mathisTH · 2025-01-29T09:39:25Z

It occurs for me when modifying the input size during training and then running inference with my finetuned model, here is my changes to have it working

diff --git a/tools/inference/torch_inf.py b/tools/inference/torch_inf.py
index 5103ad8..4aa5c73 100644
--- a/tools/inference/torch_inf.py
+++ b/tools/inference/torch_inf.py
@@ -33,13 +33,13 @@ def draw(images, labels, boxes, scores, thrh=0.4):
         im.save('torch_results.jpg')
 
 
-def process_image(model, device, file_path):
+def process_image(model, device, file_path, input_size):
     im_pil = Image.open(file_path).convert('RGB')
     w, h = im_pil.size
     orig_size = torch.tensor([[w, h]]).to(device)
 
     transforms = T.Compose([
-        T.Resize((640, 640)),
+        T.Resize(input_size),
         T.ToTensor(),
     ])
     im_data = transforms(im_pil).unsqueeze(0).to(device)
@@ -50,7 +50,7 @@ def process_image(model, device, file_path):
     draw([im_pil], labels, boxes, scores)
 
 
-def process_video(model, device, file_path):
+def process_video(model, device, file_path, input_size):
     cap = cv2.VideoCapture(file_path)
 
     # Get video properties
@@ -63,7 +63,7 @@ def process_video(model, device, file_path):
     out = cv2.VideoWriter('torch_results.mp4', fourcc, fps, (orig_w, orig_h))
 
     transforms = T.Compose([
-        T.Resize((640, 640)),
+        T.Resize(input_size),
         T.ToTensor(),
     ])
 
@@ -140,11 +140,11 @@ def main(args):
     file_path = args.input
     if os.path.splitext(file_path)[-1].lower() in ['.jpg', '.jpeg', '.png', '.bmp']:
         # Process as image
-        process_image(model, device, file_path)
+        process_image(model, device, file_path, input_size=cfg.global_cfg["eval_spatial_size"])
         print("Image processing complete.")
     else:
         # Process as video
-        process_video(model, device, file_path)
+        process_video(model, device, file_path, input_size=cfg.global_cfg["eval_spatial_size"])
 
 
 if __name__ == '__main__':

Hope it helps

picosankaricpp · 2025-02-14T17:26:32Z

I get the same error after trying to adjust the input size and finetune the network. Is there more code that needs to be modified to change input size?

Yawen-Tan · 2025-03-02T08:38:06Z

I encountered this error during training. Did you finally solve this problem?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: The size of tensor a (4) must match the size of tensor b (400) at non-singleton dimension 1 #17

RuntimeError: The size of tensor a (4) must match the size of tensor b (400) at non-singleton dimension 1 #17

olojuwin commented Dec 31, 2024

mathisTH commented Jan 29, 2025 •

edited

Loading

picosankaricpp commented Feb 14, 2025

Yawen-Tan commented Mar 2, 2025

RuntimeError: The size of tensor a (4) must match the size of tensor b (400) at non-singleton dimension 1 #17

RuntimeError: The size of tensor a (4) must match the size of tensor b (400) at non-singleton dimension 1 #17

Comments

olojuwin commented Dec 31, 2024

mathisTH commented Jan 29, 2025 • edited Loading

picosankaricpp commented Feb 14, 2025

Yawen-Tan commented Mar 2, 2025

mathisTH commented Jan 29, 2025 •

edited

Loading