We present PhyCAGE, the first approach for physically plausible compositional 3D asset generation from a single image. Given an input image, we first generate consistent multi-view images for components of the assets. These images are then fitted with 3D Gaussian Splatting representations. To ensure that the Gaussians representing objects are physically compatible with each other, we introduce a Physical Simulation-Enhanced Score Distillation Sampling (PSE-SDS) technique to further optimize the positions of the Gaussians. It is achieved by setting the gradient of the SDS loss as the initial velocity of the physical simulation, allowing the simulator to act as a physics-guided optimizer that progressively corrects the Gaussians' positions to a physically compatible state. Experimental results demonstrate that the proposed method can generate physically plausible compositional 3D assets given a single image.
我们提出了 PhyCAGE,这是第一个从单张图像生成物理合理的可组合 3D 资产的方法。对于输入图像,我们首先为资产的各个组件生成一致的多视角图像。这些图像随后通过 3D 高斯投影(Gaussian Splatting)表示进行拟合。 为了确保表示对象的高斯与彼此之间在物理上相兼容,我们引入了一种 物理模拟增强的得分蒸馏采样(Physical Simulation-Enhanced Score Distillation Sampling, PSE-SDS) 技术,用于进一步优化高斯的位置。具体而言,我们将 SDS 损失的梯度设置为物理模拟的初始速度,使模拟器能够作为一个物理引导的优化器,逐步校正高斯的位置以达到物理兼容状态。 实验结果表明,所提出的方法能够基于单张图像生成物理合理的可组合 3D 资产,为单视角下的 3D 场景生成提供了一种新颖且有效的解决方案。