Reconstructing scenes and tracking motion are two sides of the same coin. Tracking points allow for geometric reconstruction [14], while geometric reconstruction of (dynamic) scenes allows for 3D tracking of points over time [24, 39]. The latter was recently also exploited for 2D point tracking to overcome occlusion ambiguities by lifting tracking directly into 3D [38]. However, above approaches either require offline processing or multi-view camera setups both unrealistic for real-world applications like robot navigation or mixed reality. We target the challenge of online 2D and 3D point tracking from unposed monocular camera input introducing Dynamic Online Monocular Reconstruction (DynOMo). We leverage 3D Gaussian splatting to reconstruct dynamic scenes in an online fashion. Our approach extends 3D Gaussians to capture new content and object motions while estimating camera movements from a single RGB frame. DynOMo stands out by enabling emergence of point trajectories through robust image feature reconstruction and a novel similarity-enhanced regularization term, without requiring any correspondence-level supervision. It sets the first baseline for online point tracking with monocular unposed cameras, achieving performance on par with existing methods. We aim to inspire the community to advance online point tracking and reconstruction, expanding the applicability to diverse real-world scenarios.
场景重建和运动跟踪是密切相关的两个方面。跟踪点可以实现几何重建 [14],而对(动态)场景的几何重建则允许在时间上进行 3D 点跟踪 [24, 39]。最近,这种方法也被用于 2D 点跟踪,通过将跟踪直接提升到 3D 来克服遮挡歧义 [38]。然而,这些方法要么需要离线处理,要么依赖于多视角相机设置,这些在机器人导航或混合现实等实际应用中并不切实际。我们针对从无姿态单目相机输入中进行在线 2D 和 3D 点跟踪的挑战,提出了动态在线单目重建(DynOMo)。我们利用 3D 高斯点喷射技术以在线方式重建动态场景。我们的方法扩展了 3D 高斯点,以捕捉新内容和物体运动,同时从单帧 RGB 图像中估计相机运动。DynOMo 的独特之处在于通过稳健的图像特征重建和新颖的相似性增强正则化项实现点轨迹的生成,而无需任何对应级别的监督。它为单目无姿态相机的在线点跟踪设立了首个基准,并在性能上与现有方法相当。我们的目标是激励社区推进在线点跟踪和重建技术,将其应用扩展到各种实际场景中。