We propose PixelGaussian, an efficient feed-forward framework for learning generalizable 3D Gaussian reconstruction from arbitrary views. Most existing methods rely on uniform pixel-wise Gaussian representations, which learn a fixed number of 3D Gaussians for each view and cannot generalize well to more input views. Differently, our PixelGaussian dynamically adapts both the Gaussian distribution and quantity based on geometric complexity, leading to more efficient representations and significant improvements in reconstruction quality. Specifically, we introduce a Cascade Gaussian Adapter to adjust Gaussian distribution according to local geometry complexity identified by a keypoint scorer. CGA leverages deformable attention in context-aware hypernetworks to guide Gaussian pruning and splitting, ensuring accurate representation in complex regions while reducing redundancy. Furthermore, we design a transformer-based Iterative Gaussian Refiner module that refines Gaussian representations through direct image-Gaussian interactions. Our PixelGaussian can effectively reduce Gaussian redundancy as input views increase. We conduct extensive experiments on the large-scale ACID and RealEstate10K datasets, where our method achieves state-of-the-art performance with good generalization to various numbers of views.
我们提出了 PixelGaussian,一种高效的前馈框架,用于从任意视角学习具有良好泛化性的三维高斯重建。大多数现有方法依赖于均匀的像素级高斯表示,每个视角学习固定数量的三维高斯,因此无法在输入视角增多时很好地泛化。与之不同的是,PixelGaussian 根据几何复杂度动态调整高斯分布和数量,从而实现更高效的表示,并显著提升重建质量。 具体来说,我们引入了 级联高斯适配器 (Cascade Gaussian Adapter, CGA),通过关键点评分器识别的局部几何复杂度来调整高斯分布。CGA 在具有上下文感知的超网络中利用可变形注意力来指导高斯剪枝和分裂,从而在复杂区域中保证精确表示,同时减少冗余。此外,我们设计了基于 Transformer 的迭代高斯细化器 (Iterative Gaussian Refiner) 模块,通过图像与高斯的直接交互来细化高斯表示。 PixelGaussian 可以随着输入视角的增加有效减少高斯冗余。我们在大规模的 ACID 和 RealEstate10K 数据集上进行了大量实验,结果表明我们的方法在各种视角数量下表现出色,达到了最新的技术水平,并具有良好的泛化能力。