Reconstructing 3D scenes from multiple viewpoints is a fundamental task in stereo vision. Recently, advances in generalizable 3D Gaussian Splatting have enabled high-quality novel view synthesis for unseen scenes from sparse input views by feed-forward predicting per-pixel Gaussian parameters without extra optimization. However, existing methods typically generate single-scale 3D Gaussians, which lack representation of both large-scale structure and texture details, resulting in mislocation and artefacts. In this paper, we propose a novel framework, HiSplat, which introduces a hierarchical manner in generalizable 3D Gaussian Splatting to construct hierarchical 3D Gaussians via a coarse-to-fine strategy. Specifically, HiSplat generates large coarse-grained Gaussians to capture large-scale structures, followed by fine-grained Gaussians to enhance delicate texture details. To promote inter-scale interactions, we propose an Error Aware Module for Gaussian compensation and a Modulating Fusion Module for Gaussian repair. Our method achieves joint optimization of hierarchical representations, allowing for novel view synthesis using only two-view reference images. Comprehensive experiments on various datasets demonstrate that HiSplat significantly enhances reconstruction quality and cross-dataset generalization compared to prior single-scale methods. The corresponding ablation study and analysis of different-scale 3D Gaussians reveal the mechanism behind the effectiveness.
从多个视角重建3D场景是立体视觉中的一项基础任务。近年来,可泛化的3D高斯点技术的进展使得从稀疏的输入视角生成高质量的新视图合成成为可能,通过前馈预测每个像素的高斯参数而无需额外的优化。然而,现有方法通常生成单一尺度的3D高斯,这无法同时表征大尺度结构和纹理细节,导致位置错误和伪影。在本文中,我们提出了一个新的框架,HiSplat,它在可泛化的3D高斯点中引入了分层策略,通过粗到细的方式构建分层的3D高斯。具体来说,HiSplat首先生成较大、粗粒度的高斯以捕捉大尺度结构,随后生成细粒度的高斯以增强精细的纹理细节。为了促进不同尺度间的交互,我们提出了误差感知模块用于高斯补偿,并引入了调制融合模块用于高斯修复。我们的方法实现了分层表示的联合优化,仅使用两视图参考图像即可进行新视图合成。基于多个数据集的综合实验表明,HiSplat在重建质量和跨数据集的泛化能力方面显著优于之前的单一尺度方法。对应的消融研究和对不同尺度3D高斯的分析揭示了其有效性背后的机制。