We present GALA3D, generative 3D GAussians with LAyout-guided control, for effective compositional text-to-3D generation. We first utilize large language models (LLMs) to generate the initial layout and introduce a layout-guided 3D Gaussian representation for 3D content generation with adaptive geometric constraints. We then propose an object-scene compositional optimization mechanism with conditioned diffusion to collaboratively generate realistic 3D scenes with consistent geometry, texture, scale, and accurate interactions among multiple objects while simultaneously adjusting the coarse layout priors extracted from the LLMs to align with the generated scene. Experiments show that GALA3D is a user-friendly, end-to-end framework for state-of-the-art scene-level 3D content generation and controllable editing while ensuring the high fidelity of object-level entities within the scene.
我们介绍了GALA3D,一种带有布局引导控制的生成式3D高斯方法,用于有效的组合式文本到3D生成。我们首先利用大型语言模型(LLMs)生成初始布局,并引入了一个布局引导的3D高斯表示,以适应几何约束条件下的3D内容生成。然后,我们提出了一个以条件扩散为基础的对象-场景组合优化机制,以协作生成具有一致几何、纹理、规模和多个对象之间准确互动的逼真3D场景,同时调整从LLMs提取的粗略布局先验,以使其与生成的场景对齐。实验表明,GALA3D是一个用户友好的、端到端的框架,用于最先进的场景级3D内容生成和可控编辑,同时确保场景内对象级实体的高保真度。