Generating lifelike 3D humans from a single RGB image remains a challenging task in computer vision, as it requires accurate modeling of geometry, high-quality texture, and plausible unseen parts. Existing methods typically use multi-view diffusion models for 3D generation, but they often face inconsistent view issues, which hinder high-quality 3D human generation. To address this, we propose Human-VDM, a novel method for generating 3D human from a single RGB image using Video Diffusion Models. Human-VDM provides temporally consistent views for 3D human generation using Gaussian Splatting. It consists of three modules: a view-consistent human video diffusion module, a video augmentation module, and a Gaussian Splatting module. First, a single image is fed into a human video diffusion module to generate a coherent human video. Next, the video augmentation module applies super-resolution and video interpolation to enhance the textures and geometric smoothness of the generated video. Finally, the 3D Human Gaussian Splatting module learns lifelike humans under the guidance of these high-resolution and view-consistent images. Experiments demonstrate that Human-VDM achieves high-quality 3D human from a single image, outperforming state-of-the-art methods in both generation quality and quantity.
从单张 RGB 图像生成逼真的 3D 人物仍然是计算机视觉中的一项挑战任务,因为这需要准确建模几何形状、高质量的纹理和可信的未见部分。现有方法通常使用多视角扩散模型进行 3D 生成,但这些方法常常面临视角不一致的问题,这妨碍了高质量 3D 人物的生成。为了解决这一问题,我们提出了 Human-VDM,一种利用视频扩散模型从单张 RGB 图像生成 3D 人物的新方法。Human-VDM 通过高斯点云渲染提供时间一致的视角用于 3D 人物生成。它包括三个模块:视角一致的人物视频扩散模块、视频增强模块和高斯点云渲染模块。首先,将单张图像输入到人物视频扩散模块中,以生成连贯的人物视频。接着,视频增强模块应用超分辨率和视频插值技术,以增强生成视频的纹理和几何平滑度。最后,3D 人物高斯点云渲染模块在这些高分辨率且视角一致的图像指导下学习生成逼真的 3D 人物。实验结果表明,Human-VDM 能够从单张图像生成高质量的 3D 人物,在生成质量和数量上均优于现有最先进的方法。