Reconstructing photo-realistic drivable human avatars from multi-view image sequences has been a popular and challenging topic in the field of computer vision and graphics. While existing NeRF-based methods can achieve high-quality novel view rendering of human models, both training and inference processes are time-consuming. Recent approaches have utilized 3D Gaussians to represent the human body, enabling faster training and rendering. However, they undermine the importance of the mesh guidance and directly predict Gaussians in 3D space with coarse mesh guidance. This hinders the learning procedure of the Gaussians and tends to produce blurry textures. Therefore, we propose UV Gaussians, which models the 3D human body by jointly learning mesh deformations and 2D UV-space Gaussian textures. We utilize the embedding of UV map to learn Gaussian textures in 2D space, leveraging the capabilities of powerful 2D networks to extract features. Additionally, through an independent Mesh network, we optimize pose-dependent geometric deformations, thereby guiding Gaussian rendering and significantly enhancing rendering quality. We collect and process a new dataset of human motion, which includes multi-view images, scanned models, parametric model registration, and corresponding texture maps. Experimental results demonstrate that our method achieves state-of-the-art synthesis of novel view and novel pose.
重建逼真可驾驶的人类化身从多视图图像序列一直是计算机视觉和图形领域一个受欢迎且具有挑战性的话题。尽管现有基于NeRF的方法可以实现人类模型的高质量新视角渲染,但训练和推理过程都非常耗时。最近的方法利用3D高斯分布来表示人体,使训练和渲染速度更快。然而,这些方法忽视了网格引导的重要性,并直接在3D空间中预测高斯分布,仅使用粗略的网格引导。这阻碍了高斯分布的学习过程,并倾向于产生模糊的纹理。因此,我们提出UV高斯分布,该方法通过共同学习网格变形和2D UV空间高斯纹理来模拟3D人体。我们利用UV映射的嵌入来在2D空间学习高斯纹理,利用强大的2D网络的能力来提取特征。另外,通过一个独立的网格网络,我们优化了依赖于姿态的几何变形,从而指导高斯渲染并显著提高渲染质量。我们收集并处理了一个新的人体运动数据集,包括多视图图像、扫描模型、参数模型注册和相应的纹理映射。实验结果表明,我们的方法在新视角和新姿态的合成上达到了最先进的水平。