Rendering dynamic 3D human from monocular videos is crucial for various applications such as virtual reality and digital entertainment. Most methods assume the people is in an unobstructed scene, while various objects may cause the occlusion of body parts in real-life scenarios. Previous method utilizing NeRF for surface rendering to recover the occluded areas, but it requiring more than one day to train and several seconds to render, failing to meet the requirements of real-time interactive applications. To address these issues, we propose OccGaussian based on 3D Gaussian Splatting, which can be trained within 6 minutes and produces high-quality human renderings up to 160 FPS with occluded input. OccGaussian initializes 3D Gaussian distributions in the canonical space, and we perform occlusion feature query at occluded regions, the aggregated pixel-align feature is extracted to compensate for the missing information. Then we use Gaussian Feature MLP to further process the feature along with the occlusion-aware loss functions to better perceive the occluded area. Extensive experiments both in simulated and real-world occlusions, demonstrate that our method achieves comparable or even superior performance compared to the state-of-the-art method. And we improving training and inference speeds by 250x and 800x, respectively.
在单眼视频中渲染动态3D人体对于虚拟现实和数字娱乐等多种应用至关重要。大多数方法假设人物处于无遮挡的场景中,而在现实生活场景中,各种物体可能会导致身体部分被遮挡。之前的方法使用NeRF进行表面渲染以恢复被遮挡区域,但它需要超过一天的时间来训练并需要几秒钟来渲染,无法满足实时交互应用的要求。为解决这些问题,我们提出了基于3D高斯平滑的OccGaussian,该方法可以在6分钟内训练完成,并能以每秒最高160帧的速度产生高质量的人体渲染,即使输入被遮挡。OccGaussian在典型空间初始化3D高斯分布,并在被遮挡区域进行遮挡特征查询,提取聚合的像素对齐特征以补偿缺失信息。然后,我们使用高斯特征MLP进一步处理这些特征,并结合感知遮挡区域的遮挡感知损失函数。广泛的实验,包括在模拟和现实世界的遮挡中,证明我们的方法与最先进的方法相比,达到了可比甚至更优的性能。并且我们分别将训练和推理速度提高了250倍和800倍。