RT-GS2: Real-Time Generalizable Semantic Segmentation for 3D Gaussian Representations of Radiance Fields
Gaussian Splatting has revolutionized the world of novel view synthesis by achieving high rendering performance in real-time. Recently, studies have focused on enriching these 3D representations with semantic information for downstream tasks. In this paper, we introduce RT-GS2, the first generalizable semantic segmentation method employing Gaussian Splatting. While existing Gaussian Splatting-based approaches rely on scene-specific training, RT-GS2 demonstrates the ability to generalize to unseen scenes. Our method adopts a new approach by first extracting view-independent 3D Gaussian features in a self-supervised manner, followed by a novel View-Dependent / View-Independent (VDVI) feature fusion to enhance semantic consistency over different views. Extensive experimentation on three different datasets showcases RT-GS2's superiority over the state-of-the-art methods in semantic segmentation quality, exemplified by a 8.01% increase in mIoU on the Replica dataset. Moreover, our method achieves real-time performance of 27.03 FPS, marking an astonishing 901 times speedup compared to existing approaches. This work represents a significant advancement in the field by introducing, to the best of our knowledge, the first real-time generalizable semantic segmentation method for 3D Gaussian representations of radiance fields.
高斯喷溅技术通过实现实时的高渲染性能,已经革新了新视角合成的领域。最近的研究集中在为下游任务丰富这些三维表征的语义信息。在本文中,我们介绍了 RT-GS2,这是首个使用高斯喷溅的泛化语义分割方法。尽管现有基于高斯喷溅的方法依赖于特定场景的训练,RT-GS2 展示了对未见场景的泛化能力。我们的方法采用了一种新的方法,首先以自监督的方式提取视图独立的三维高斯特征,然后通过一种新颖的视图依赖/视图独立(VDVI)特征融合来增强不同视图下的语义一致性。在三个不同数据集上的广泛实验展示了 RT-GS2 在语义分割质量上超越了最先进方法,以 Replica 数据集上 8.01% 的 mIoU 增量为例。此外,我们的方法实现了 27.03 FPS 的实时性能,与现有方法相比速度提升了惊人的 901 倍。这项工作代表了该领域的一个重大进步,据我们所知,它是首个为辐射场的三维高斯表征引入实时泛化语义分割方法。