PreF3R: Pose-Free Feed-Forward 3D Gaussian Splatting from Variable-length Image Sequence

We present PreF3R, Pose-Free Feed-forward 3D Reconstruction from an image sequence of variable length. Unlike previous approaches, PreF3R removes the need for camera calibration and reconstructs the 3D Gaussian field within a canonical coordinate frame directly from a sequence of unposed images, enabling efficient novel-view rendering. We leverage DUSt3R's ability for pair-wise 3D structure reconstruction, and extend it to sequential multi-view input via a spatial memory network, eliminating the need for optimization-based global alignment. Additionally, PreF3R incorporates a dense Gaussian parameter prediction head, which enables subsequent novel-view synthesis with differentiable rasterization. This allows supervising our model with the combination of photometric loss and pointmap regression loss, enhancing both photorealism and structural accuracy. Given a sequence of ordered images, PreF3R incrementally reconstructs the 3D Gaussian field at 20 FPS, therefore enabling real-time novel-view rendering. Empirical experiments demonstrate that PreF3R is an effective solution for the challenging task of pose-free feed-forward novel-view synthesis, while also exhibiting robust generalization to unseen scenes.

我们提出了 PreF3R，一种从可变长度图像序列中进行无位姿前馈式 3D 重建的方法。与以往的方法不同，PreF3R 不需要相机校准，能够直接从未配准的图像序列中在规范坐标系内重建 3D 高斯场，从而实现高效的新视角渲染。我们利用 DUSt3R 在成对 3D 结构重建中的能力，并通过空间记忆网络扩展到序列多视角输入，从而消除了基于优化的全局对齐需求。此外，PreF3R 集成了一个密集高斯参数预测模块，支持后续的新视角合成，结合可微光栅化进行渲染。该机制使得我们能够通过光度损失和点图回归损失的组合来监督模型，从而增强图像真实感和结构准确性。对于有序的图像序列，PreF3R 能以每秒 20 帧的速度增量式地重建 3D 高斯场，从而实现实时的新视角渲染。实验表明，PreF3R 是解决无位姿前馈式新视角合成这一挑战性任务的有效方案，同时在未知场景中表现出稳健的泛化能力。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2411.16877.md

2411.16877.md

PreF3R: Pose-Free Feed-Forward 3D Gaussian Splatting from Variable-length Image Sequence

Files

2411.16877.md

Latest commit

History

2411.16877.md

File metadata and controls

PreF3R: Pose-Free Feed-Forward 3D Gaussian Splatting from Variable-length Image Sequence