-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I gave same prompt input but makes different result in video and single image prediction #533
Comments
Box selecting can tend to create lots of errors/artifacts if the box isn't extremely tight to the object that you want to segment. Here's an example of a 'loose' box around an object: You can see it seems to try to segment stuff around/behind the main object, making a mess of things. The right side shows all other mask predictions, none of which are useful in this case. So the errors you're getting may just be due to a loose fitting box (and maybe the box used when running images fits tighter?). If tightening the boxes doesn't help, you could also try switching models, since the different model sizes can behave differently. If none of that helps, then using point prompts might be the best option, if possible, since the v2 models seem to handle points much better than boxes. |
Could this be the cause of the problem? -The object is stationary, and the camera moves around the object while capturing the dataset.(Is SAM2 only trained for fixed camera?) |
From what I've seen, there aren't any issues with moving cameras, though maybe if the camera is moving very fast it could become a problem due to blurring? I don't have any rotating camera examples of my own, but trying it on a video from pexels, it seems to work (this is the tiny v2.1 model), though the rotation isn't fast enough to blur:
Strong blurring could probably break the tracking over time. For your example, do the problems only appear after some number of frames? I was assuming the issue happens on the first frame, but if it's something that happens over time, then ya it could be a blurring/movement issue. Again, I don't have any extreme blurring examples, but using a section of the crab rave video (around 1:15 into the video), the tracking seems to hold on even though almost every frame has severe blurring (of the object, not due to camera movement to be fair): If your video has a section that is slower/less blurry, maybe it's worth trying the segmentation there first to see if the problem goes away? |
`_, _, masks = predictor.add_new_points_or_box(inference_state=state, frame_idx=0, obj_id=0, box=input_box)
I'm currently using SAM2 to segment my custom dataset with both videos and single image. When using the same prompt, the segmentation works fine on a single image, just like in the demo, but when applied to a video, the result doesn't come out as expected, as shown in the image below. When I checked the internal mask data, it seems to have values between -0.05 and 0.05, so I thought it might be a confidence value issue. I tried adjusting the threshold, but the result still doesn't come out as high quality as with a single image and instead shows strange results with patterns. I don't know what the cause of the issue is and need help troubleshooting.
The text was updated successfully, but these errors were encountered: