[ICLR 2025] Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
- [March. 2025] Inference code released! 🚀 🚀 🚀
- [Feb. 2025] Phidias has been accepted to ICLR 2025! 🔥 🔥 🔥
phidias_video.mp4
Project page | Paper | Video
Zhenwei Wang, Tengfei Wang, Zexin He, Gerhard Hancke, Ziwei Liu and Rynson W.H. Lau.
In 3D modeling, designers often use an existing 3D model as a reference to create new ones. This practice has inspired the development of Phidias, a novel generative model that uses diffusion for reference-augmented 3D generation. Given an image, our method leverages a retrieved or user-provided 3D reference model to guide the generation process, thereby enhancing the generation quality, generalization ability, and controllability. Our model integrates three key components: 1) meta-ControlNet that dynamically modulates the conditioning strength, 2) dynamic reference routing that mitigates misalignment between the input image and 3D reference, and 3) self-reference augmentations that enable self-supervised training with a progressive curriculum. Collectively, these designs result in a clear improvement over existing methods. Phidias establishes a unified framework for 3D generation using text, image, and 3D conditions with versatile applications.
- Release model weights, inference code, rendering code and usage instructions. The inference code has been tested on both RTX4090 (24G) and A100.
- Release gradio and huggingface demo
- Release training code and instructions
- Environment Setup:
conda create -n phidias python==3.10
conda activate phidias
# install PyTorch and xFormers
# xformers is required! please refer to https://github.com/facebookresearch/xformers for details.
# for example, we use torch 2.1.0 + cuda 11.8
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 xformers --index-url https://download.pytorch.org/whl/cu118
# a modified gaussian splatting (+ depth, alpha rendering)
git clone --recursive https://github.com/ashawkey/diff-gaussian-rasterization
pip install ./diff-gaussian-rasterization
# for mesh extraction
pip install git+https://github.com/NVlabs/nvdiffrast
# other dependencies
pip install -r requirements.txt
- If you want to render reference maps by yourself, please install Blender3.2.2
wget https://download.blender.org/release/Blender3.2/blender-3.2.2-linux-x64.tar.xz && \
tar -xf blender-3.2.2-linux-x64.tar.xz && \
rm blender-3.2.2-linux-x64.tar.xz
- Download our pretrained models and precomputed point-cloud features from huggingface, and place them under
model/
mkdir model
huggingface-cli download ZhenweiWang/Phidias-Diffusion --local-dir model/
# Exaples 1:
# without 3D reference (vanilla zero123++ with white background)
python infer.py big --workspace results/no_ref --mv_controlnet_path None --rembg --test_path data_test/image_to_3d/chair_watermelon.png
# Exaples 2:
# user-specified 3D reference (3D-to-3D), using pre-rendered reference maps
python infer.py big --workspace results/3d_to_3d --rembg --test_path data_test/3d_to_3d --no-use_retrieval
# Exaples 3:
# retrieve 3D reference from objaverse subset and perform online rendering
python infer.py big --workspace results/image_to_3d --rembg --test_path data_test/image_to_3d --use_retrieval --top_k_retrieval 1 2 3 --online_rendering --blender_path blender-3.2.2-linux-x64/blender --render_azimuth 0 --render_elevation 0
- Use
--online_rendering
to render retrieved 3D object during inference, which is more convenient but would make inference slower. Note that the results would be insatisfactory due to mis-aligned view angles between concept image and rendered CCMs. Please check the visualization (xxx_mv_refs_64.png) underresults/
and adjust--render_azimuth
and--render_elevation
to align the angles manually for better results. - Online rendered reference maps are saved to
data_test/ref_maps_retrieved
. - To reduce rendering cost during inference, you could also set
--no-online_rendering
and prepare all reference maps by pre-rendering all objaverse objects listed indata_train/objaverse/meta.json
under--root_dirs
. - Use
--rembg
to remove the background of your image automatically. Remember to adjust--resize_fg_ratio
so that the object has the same size with reference maps. - We support retrieve reference with top-1 to top-5 similarities. You could set
--top_k_retrieval 1 2 3
to get results using top-1, top-2 and top-3 3D references. - Adjust
--seed
to find best results. Default: 42. - Use
--controlnet_conditioning_scale
to adjust the control strength manually. Default: 1.0. - The default results are in the format of 3D gaussians. You could convert them into meshes as in LGM by running
python convert.py big --test_path results/saved.ply
, but the quality is limited.
- We provide the full Objaverse uids that we used in our paper under
data_train/objaverse
. - We also provide a code example for rendering images for training and testing, including rgba, normal and ccm. See
scripts/blender_script.py
.
If you find this work helpful for your research, please cite:
@article{wang2024phidias,
title={Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion},
author={Zhenwei Wang and Tengfei Wang and Zexin He and Gerhard Hancke and Ziwei Liu and Rynson W.H. Lau},
eprint={2409.11406},
archivePrefix={arXiv},
primaryClass={cs.CV},
year={2024},
url={https://arxiv.org/abs/2409.11406},
}