Reconstructing dynamic objects from monocular videos is a severely underconstrained and challenging problem, and recent work has approached it in various directions. However, owing to the ill-posed nature of this problem, there has been no solution that can provide consistent, high-quality novel views from camera positions that are significantly different from the training views. In this work, we introduce Neural Parametric Gaussians (NPGs) to take on this challenge by imposing a two-stage approach: first, we fit a low-rank neural deformation model, which then is used as regularization for non-rigid reconstruction in the second stage. The first stage learns the object's deformations such that it preserves consistency in novel views. The second stage obtains high reconstruction quality by optimizing 3D Gaussians that are driven by the coarse model. To this end, we introduce a local 3D Gaussian representation, where temporally shared Gaussians are anchored in and deformed by local oriented volumes. The resulting combined model can be rendered as radiance fields, resulting in high-quality photo-realistic reconstructions of the non-rigidly deforming objects, maintaining 3D consistency across novel views. We demonstrate that NPGs achieve superior results compared to previous works, especially in challenging scenarios with few multi-view cues.
从单目视频中重建动态对象是一个受限严重且具有挑战性的问题,近期的工作已经从各个方向对其进行了探索。然而,由于这个问题本质上是不适定的,目前还没有解决方案能够从与训练视图显著不同的相机位置提供一致、高质量的新视图。在这项工作中,我们引入了神经参数高斯(NPGs)来应对这一挑战,采用了两阶段方法:首先,我们拟合一个低秩神经变形模型,然后在第二阶段用作非刚性重建的正则化。第一阶段学习对象的变形,以保持新视图中的一致性。第二阶段通过优化由粗略模型驱动的3D高斯来获得高重建质量。为此,我们引入了一种局部3D高斯表示,其中时间共享的高斯被固定在局部定向体积中并由其变形。最终组合模型可以被渲染为辐射场,从而实现对非刚性变形对象的高质量、逼真的重建,同时在新视图中保持3D一致性。我们展示了NPGs与先前工作相比,在具有少量多视图提示的挑战性场景中取得了更优异的结果。