Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about multi points and boxes as prompt #435

Open
ZhilunT opened this issue Nov 4, 2024 · 1 comment
Open

Question about multi points and boxes as prompt #435

ZhilunT opened this issue Nov 4, 2024 · 1 comment

Comments

@ZhilunT
Copy link

ZhilunT commented Nov 4, 2024

Hello,i am using the predictor._predict function for prediction, input_points contains 86 points, and input_bbox contains 2 bounding boxes, as there are multiple points within 2 boxes.

masks, scores, logits = predictor._predict( point_coords = input_points, point_labels = np.ones([input_points.shape[0],1]), box=input_bbox )
The goal is to use both points and bounding boxes for prediction simultaneously. However, the points and bounding boxes may not be equal in number.

The error mentioned above occurs because the current implementation expects the number of points and bounding boxes to match. This works fine if the number of input_box is set to match the number of points, but in practice, a single bounding box may contain multiple points.

How can this issue be resolved to handle cases where a bounding box contains multiple points?

`
masks, scores, logits = predictor.predict(
^^^^^^^^^^^^^^^^^^
File "sam2/sam2_image_predictor.py", line 271, in predict
masks, iou_predictions, low_res_masks = self._predict(
^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "sam2/sam2_image_predictor.py", line 384, in _predict
concat_coords = torch.cat([box_coords, concat_points[0]], dim=1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 2 but got size 86 for tensor number 1 in the list.

`

@heyoeyo
Copy link

heyoeyo commented Nov 8, 2024

The short answer is that having more than 1 box per prompt requires code changes, but the model doesn't seem to handle this well. Having a single box & many points should work however, as long as the 'N' points are in the 1st dimension slot (i.e. the shape of the given points should be: BxNx2, where B is batch size, N is number of points and 2 is for (x,y) coordinates). This is discussed in more detail in issue #235.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants