Wide-baseline panoramic images are frequently used in applications like VR and simulations to minimize capturing labor costs and storage needs. However, synthesizing novel views from these panoramic images in real time remains a significant challenge, especially due to panoramic imagery's high resolution and inherent distortions. Although existing 3D Gaussian splatting (3DGS) methods can produce photo-realistic views under narrow baselines, they often overfit the training views when dealing with wide-baseline panoramic images due to the difficulty in learning precise geometry from sparse 360∘ views. This paper presents Splatter-360, a novel end-to-end generalizable 3DGS framework designed to handle wide-baseline panoramic images. Unlike previous approaches, Splatter-360 performs multi-view matching directly in the spherical domain by constructing a spherical cost volume through a spherical sweep algorithm, enhancing the network's depth perception and geometry estimation. Additionally, we introduce a 3D-aware bi-projection encoder to mitigate the distortions inherent in panoramic images and integrate cross-view attention to improve feature interactions across multiple viewpoints. This enables robust 3D-aware feature representations and real-time rendering capabilities. Experimental results on the HM3Dhm3d and Replicareplica demonstrate that Splatter-360 significantly outperforms state-of-the-art NeRF and 3DGS methods (e.g., PanoGRF, MVSplat, DepthSplat, and HiSplat) in both synthesis quality and generalization performance for wide-baseline panoramic images.
宽基线全景图像常用于虚拟现实(VR)和模拟等应用场景,以减少采集劳动成本和存储需求。然而,从这些全景图像中实时生成新视角仍然是一项重大挑战,尤其是由于全景图像的高分辨率和固有畸变问题。尽管现有的3D高斯点云(3D Gaussian Splatting, 3DGS)方法能够在窄基线条件下生成逼真的视图,但在处理稀疏360°宽基线全景图像时,由于难以从稀疏视角中学习精确的几何结构,这些方法通常会过拟合训练视图。 为解决这一问题,本文提出了 Splatter-360,一种面向宽基线全景图像的端到端可泛化3DGS框架。与以往方法不同,Splatter-360 直接在球面域中进行多视图匹配,通过球面扫描算法构建球面代价体,从而增强网络的深度感知和几何估计能力。此外,我们引入了一个3D感知双投影编码器来缓解全景图像的畸变问题,并集成了跨视角注意力机制以改善多视点之间的特征交互。这种设计能够生成稳健的3D感知特征表示,并支持实时渲染。 在 HM3D 和 Replica 数据集上的实验结果表明,Splatter-360 在宽基线全景图像的新视角合成质量和泛化性能方面,显著优于现有的最新方法(如 PanoGRF、MVSplat、DepthSplat 和 HiSplat)。这一框架不仅提升了合成精度,还为宽基线全景图像的实时处理提供了高效解决方案。