MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements
Simultaneous localization and mapping is essential for position tracking and scene understanding. 3D Gaussian-based map representations enable photorealistic reconstruction and real-time rendering of scenes using multiple posed cameras. We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM. Our method, MM3DGS, addresses the limitations of prior neural radiance field-based representations by enabling faster rendering, scale awareness, and improved trajectory tracking. Our framework enables keyframe-based mapping and tracking utilizing loss functions that incorporate relative pose transformations from pre-integrated inertial measurements, depth estimates, and measures of photometric rendering quality. We also release a multi-modal dataset, UT-MM, collected from a mobile robot equipped with a camera and an inertial measurement unit. Experimental evaluation on several scenes from the dataset shows that MM3DGS achieves 3x improvement in tracking and 5% improvement in photometric rendering quality compared to the current 3DGS SLAM state-of-the-art, while allowing real-time rendering of a high-resolution dense 3D map.
同时定位与地图构建对于位置跟踪和场景理解至关重要。基于三维高斯的地图表示方法能够使用多个摆放的摄像头实现场景的真实感重建和实时渲染。我们首次展示了使用三维高斯进行地图表示,结合未摆放摄像头图像和惯性测量,可以实现准确的SLAM。我们的方法,MM3DGS,通过使渲染更快、具有尺度意识和改进的轨迹跟踪,解决了之前基于神经辐射场表示的限制。我们的框架利用关键帧进行映射和跟踪,使用损失函数整合了从预集成的惯性测量、深度估计和光度渲染质量度量中得到的相对姿态变换。我们还发布了一个多模态数据集,UT-MM,该数据集由装备有摄像头和惯性测量单元的移动机器人收集。在数据集的几个场景上进行的实验评估显示,与当前3DGS SLAM的最新技术相比,MM3DGS在跟踪上实现了3倍的改进,在光度渲染质量上改进了5%,同时允许实时渲染高分辨率的密集三维地图。