You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey! I guess this could be a dumb question, but I honestly couldn't answer this by analyzing your paper. Do you have a clear idea about what happens to slots that are not required to explain the input image? For instance, in the original SA algorithm, it is clear that unnecessary slots tend to encode background information, but this does not happen with SA-MESH. It seems like some slots end up "binding" to regions that belong to already encoded objects (e.g. in Figure 7, in images with objects that are not strongly occluded). Since attention masks are sparse, it seems like this over-segmentation issue is easier to occur. Am I wrong? Thanks in advance!
The text was updated successfully, but these errors were encountered:
Hey! I guess this could be a dumb question, but I honestly couldn't answer this by analyzing your paper. Do you have a clear idea about what happens to slots that are not required to explain the input image? For instance, in the original SA algorithm, it is clear that unnecessary slots tend to encode background information, but this does not happen with SA-MESH. It seems like some slots end up "binding" to regions that belong to already encoded objects (e.g. in Figure 7, in images with objects that are not strongly occluded). Since attention masks are sparse, it seems like this over-segmentation issue is easier to occur. Am I wrong? Thanks in advance!
The text was updated successfully, but these errors were encountered: