Skip to content

Latest commit

 

History

History
5 lines (3 loc) · 2.84 KB

2312.11461.md

File metadata and controls

5 lines (3 loc) · 2.84 KB

GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning

Gaussian splatting has emerged as a powerful 3D representation that harnesses the advantages of both explicit (mesh) and implicit (NeRF) 3D representations. In this paper, we seek to leverage Gaussian splatting to generate realistic animatable avatars from textual descriptions, addressing the limitations (e.g., flexibility and efficiency) imposed by mesh or NeRF-based representations. However, a naive application of Gaussian splatting cannot generate high-quality animatable avatars and suffers from learning instability; it also cannot capture fine avatar geometries and often leads to degenerate body parts. To tackle these problems, we first propose a primitive-based 3D Gaussian representation where Gaussians are defined inside pose-driven primitives to facilitate animation. Second, to stabilize and amortize the learning of millions of Gaussians, we propose to use neural implicit fields to predict the Gaussian attributes (e.g., colors). Finally, to capture fine avatar geometries and extract detailed meshes, we propose a novel SDF-based implicit mesh learning approach for 3D Gaussians that regularizes the underlying geometries and extracts highly detailed textured meshes. Our proposed method, GAvatar, enables the large-scale generation of diverse animatable avatars using only text prompts. GAvatar significantly surpasses existing methods in terms of both appearance and geometry quality, and achieves extremely fast rendering (100 fps) at 1K resolution.

高斯溅射作为一种强大的三维表征手段,融合了显式(网格)和隐式(NeRF)三维表征的优势。在这篇论文中,我们试图利用高斯溅射技术,仅通过文本描述生成真实可动的虚拟形象,解决了基于网格或NeRF表征的限制(例如,灵活性和效率)。然而,简单应用高斯溅射技术无法生成高质量的可动虚拟形象,并且面临学习不稳定性;它也无法捕捉精细的虚拟形象几何结构,常导致身体部位退化。为解决这些问题,我们首先提出了一种基于原始体的三维高斯表征方法,其中高斯函数定义在受姿势驱动的原始体内以便于动画制作。其次,为了稳定学习数百万个高斯函数并减轻学习负担,我们提出使用神经隐式场预测高斯属性(例如,颜色)。最后,为了捕捉精细的虚拟形象几何结构并提取详细的网格,我们提出了一种新颖的基于SDF的隐式网格学习方法,用于三维高斯处理,这种方法规范了底层几何结构,并提取了高度详细的带纹理网格。我们提出的方法,GAvatar,使得使用文本提示大规模生成多样的可动虚拟形象成为可能。GAvatar在外观和几何质量方面显著超越现有方法,并在1K分辨率下实现了极快的渲染速度(100帧/秒)。