We present HI-SLAM2, a geometry-aware Gaussian SLAM system that achieves fast and accurate monocular scene reconstruction using only RGB input. Existing Neural SLAM or 3DGS-based SLAM methods often trade off between rendering quality and geometry accuracy, our research demonstrates that both can be achieved simultaneously with RGB input alone. The key idea of our approach is to enhance the ability for geometry estimation by combining easy-to-obtain monocular priors with learning-based dense SLAM, and then using 3D Gaussian splatting as our core map representation to efficiently model the scene. Upon loop closure, our method ensures on-the-fly global consistency through efficient pose graph bundle adjustment and instant map updates by explicitly deforming the 3D Gaussian units based on anchored keyframe updates. Furthermore, we introduce a grid-based scale alignment strategy to maintain improved scale consistency in prior depths for finer depth details. Through extensive experiments on Replica, ScanNet, and ScanNet++, we demonstrate significant improvements over existing Neural SLAM methods and even surpass RGB-D-based methods in both reconstruction and rendering quality.
我们提出了 HI-SLAM2,一种几何感知的高斯 SLAM 系统,仅使用 RGB 输入即可实现快速且精确的单目场景重建。现有的神经 SLAM 或基于 3D 高斯投影(3DGS)的 SLAM 方法通常在渲染质量和几何精度之间存在权衡。我们的研究表明,仅依靠 RGB 输入即可同时实现高质量的渲染和精确的几何重建。 我们方法的核心思想是通过结合易获取的单目先验和基于学习的稠密 SLAM,提高几何估计能力,并使用 3D 高斯投影作为核心地图表示,以高效建模场景。在闭环(loop closure)阶段,我们通过高效的位姿图捆绑调整(pose graph bundle adjustment)和基于锚定关键帧更新的 3D 高斯单元变形,确保实时全局一致性和即时地图更新。此外,我们引入了一种基于网格的尺度对齐策略,以改进先验深度的尺度一致性,从而获取更精细的深度细节。 通过在 Replica、ScanNet 和 ScanNet++ 数据集上的广泛实验,我们的方法在重建和渲染质量上显著优于现有的神经 SLAM 方法,甚至在某些指标上超越了基于 RGB-D 的方法。1