We introduce Contrastive Gaussian Clustering, a novel approach capable of provide segmentation masks from any viewpoint and of enabling 3D segmentation of the scene. Recent works in novel-view synthesis have shown how to model the appearance of a scene via a cloud of 3D Gaussians, and how to generate accurate images from a given viewpoint by projecting on it the Gaussians before α blending their color. Following this example, we train a model to include also a segmentation feature vector for each Gaussian. These can then be used for 3D scene segmentation, by clustering Gaussians according to their feature vectors; and to generate 2D segmentation masks, by projecting the Gaussians on a plane and α blending over their segmentation features. Using a combination of contrastive learning and spatial regularization, our method can be trained on inconsistent 2D segmentation masks, and still learn to generate segmentation masks consistent across all views. Moreover, the resulting model is extremely accurate, improving the IoU accuracy of the predicted masks by +8% over the state of the art. Code and trained models will be released soon.
我们介绍了一种名为对比高斯聚类的新方法,这种方法能够从任何视角提供分割掩码,并实现场景的3D分割。最近在新视角合成的研究中展示了如何通过一云3D高斯模型来表现场景的外观,以及如何通过在给定视点上投影这些高斯并在它们的颜色上进行α混合来生成精确的图像。继承这一方法,我们训练了一个模型,为每个高斯也包括一个分割特征向量。这些可以用于通过根据其特征向量聚类高斯来进行3D场景分割;以及通过将高斯投影到平面上并在其分割特征上进行α混合来生成2D分割掩码。通过结合对比学习和空间规则化,我们的方法可以在不一致的2D分割掩码上进行训练,仍然学会生成在所有视角下一致的分割掩码。此外,所得模型的准确性极高,预测掩码的IoU准确性比现有技术水平提高了+8%。代码和训练模型将很快发布。