Constructing a high-fidelity representation of the 3D scene using a monocular camera can enable a wide range of applications on mobile devices, such as micro-robots, smartphones, and AR/VR headsets. On these devices, memory is often limited in capacity and its access often dominates the consumption of compute energy. Although Gaussian Splatting (GS) allows for high-fidelity reconstruction of 3D scenes, current GS-based SLAM is not memory efficient as a large number of past images is stored to retrain Gaussians for reducing catastrophic forgetting. These images often require two-orders-of-magnitude higher memory than the map itself and thus dominate the total memory usage. In this work, we present GEVO, a GS-based monocular SLAM framework that achieves comparable fidelity as prior methods by rendering (instead of storing) them from the existing map. Novel Gaussian initialization and optimization techniques are proposed to remove artifacts from the map and delay the degradation of the rendered images over time. Across a variety of environments, GEVO achieves comparable map fidelity while reducing the memory overhead to around 58 MBs, which is up to 94x lower than prior works.
使用单目相机构建高保真3D场景表示,可以在移动设备上实现广泛的应用,如微型机器人、智能手机和AR/VR头显。然而,这些设备的内存容量通常有限,且内存访问往往会消耗大量计算能量。虽然高斯投影(Gaussian Splatting, GS)允许高保真重建3D场景,但现有的基于GS的SLAM在内存使用上效率不高,因为需要存储大量过去的图像来重新训练高斯,以减少灾难性遗忘。这些图像通常需要的内存比地图本身高出两个数量级,因此占据了大部分内存使用。在本研究中,我们提出了GEVO,这是一种基于GS的单目SLAM框架,通过从现有地图渲染(而非存储)图像,达到了与先前方法相当的重建保真度。我们还提出了新的高斯初始化和优化技术,以去除地图中的伪影,并延缓渲染图像随时间退化的问题。在各种环境下,GEVO 实现了与之前方法相当的地图保真度,同时将内存开销减少至约58 MB,最高比现有方法低94倍。