Skip to content

Latest commit

 

History

History
105 lines (84 loc) · 6.38 KB

readme.md

File metadata and controls

105 lines (84 loc) · 6.38 KB

[ICLR 2025] Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion

NEWS:

  • [March. 2025] Inference code released! 🚀 🚀 🚀
  • [Feb. 2025] Phidias has been accepted to ICLR 2025! 🔥 🔥 🔥
phidias_video.mp4

Zhenwei Wang, Tengfei Wang, Zexin He, Gerhard Hancke, Ziwei Liu and Rynson W.H. Lau.

Abstract

In 3D modeling, designers often use an existing 3D model as a reference to create new ones. This practice has inspired the development of Phidias, a novel generative model that uses diffusion for reference-augmented 3D generation. Given an image, our method leverages a retrieved or user-provided 3D reference model to guide the generation process, thereby enhancing the generation quality, generalization ability, and controllability. Our model integrates three key components: 1) meta-ControlNet that dynamically modulates the conditioning strength, 2) dynamic reference routing that mitigates misalignment between the input image and 3D reference, and 3) self-reference augmentations that enable self-supervised training with a progressive curriculum. Collectively, these designs result in a clear improvement over existing methods. Phidias establishes a unified framework for 3D generation using text, image, and 3D conditions with versatile applications.

Overview

Todo (Latest update: 2025/03/03)

  • Release model weights, inference code, rendering code and usage instructions. The inference code has been tested on both RTX4090 (24G) and A100.
  • Release gradio and huggingface demo
  • Release training code and instructions

Installation

  • Environment Setup:
    conda create -n phidias python==3.10
    conda activate phidias
    # install PyTorch and xFormers
    # xformers is required! please refer to https://github.com/facebookresearch/xformers for details.
    # for example, we use torch 2.1.0 + cuda 11.8
    pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 xformers --index-url https://download.pytorch.org/whl/cu118
  

    # a modified gaussian splatting (+ depth, alpha rendering)
    git clone --recursive https://github.com/ashawkey/diff-gaussian-rasterization
    pip install ./diff-gaussian-rasterization

    # for mesh extraction
    pip install git+https://github.com/NVlabs/nvdiffrast

    # other dependencies
    pip install -r requirements.txt
  • If you want to render reference maps by yourself, please install Blender3.2.2
wget https://download.blender.org/release/Blender3.2/blender-3.2.2-linux-x64.tar.xz && \
  tar -xf blender-3.2.2-linux-x64.tar.xz && \
  rm blender-3.2.2-linux-x64.tar.xz
  • Download our pretrained models and precomputed point-cloud features from huggingface, and place them under model/
  mkdir model
  huggingface-cli download ZhenweiWang/Phidias-Diffusion --local-dir model/

Inference

  # Exaples 1:
  # without 3D reference (vanilla zero123++ with white background)
  python infer.py big --workspace results/no_ref --mv_controlnet_path None --rembg --test_path data_test/image_to_3d/chair_watermelon.png

  # Exaples 2:
  # user-specified 3D reference (3D-to-3D), using pre-rendered reference maps
  python infer.py big --workspace results/3d_to_3d --rembg --test_path data_test/3d_to_3d --no-use_retrieval

  # Exaples 3:
  # retrieve 3D reference from objaverse subset and perform online rendering
  python infer.py big --workspace results/image_to_3d --rembg --test_path data_test/image_to_3d --use_retrieval --top_k_retrieval 1 2 3 --online_rendering --blender_path blender-3.2.2-linux-x64/blender  --render_azimuth 0 --render_elevation 0

Tips:

  • Use --online_rendering to render retrieved 3D object during inference, which is more convenient but would make inference slower. Note that the results would be insatisfactory due to mis-aligned view angles between concept image and rendered CCMs. Please check the visualization (xxx_mv_refs_64.png) under results/ and adjust --render_azimuth and --render_elevation to align the angles manually for better results.
  • Online rendered reference maps are saved to data_test/ref_maps_retrieved.
  • To reduce rendering cost during inference, you could also set --no-online_rendering and prepare all reference maps by pre-rendering all objaverse objects listed in data_train/objaverse/meta.json under --root_dirs.
  • Use --rembg to remove the background of your image automatically. Remember to adjust --resize_fg_ratio so that the object has the same size with reference maps.
  • We support retrieve reference with top-1 to top-5 similarities. You could set --top_k_retrieval 1 2 3 to get results using top-1, top-2 and top-3 3D references.
  • Adjust --seed to find best results. Default: 42.
  • Use --controlnet_conditioning_scale to adjust the control strength manually. Default: 1.0.
  • The default results are in the format of 3D gaussians. You could convert them into meshes as in LGM by running python convert.py big --test_path results/saved.ply, but the quality is limited.

Data rendering

  • We provide the full Objaverse uids that we used in our paper under data_train/objaverse.
  • We also provide a code example for rendering images for training and testing, including rgba, normal and ccm. See scripts/blender_script.py.

Citation

If you find this work helpful for your research, please cite:

@article{wang2024phidias,
        title={Phidias: A Generative Model for Creating 3D  Content from Text, Image, and 3D Conditions with Reference-Augmented  Diffusion}, 
        author={Zhenwei Wang and Tengfei Wang and Zexin He and Gerhard Hancke and Ziwei Liu and Rynson W.H. Lau},
        eprint={2409.11406},
        archivePrefix={arXiv},
        primaryClass={cs.CV},
        year={2024},
        url={https://arxiv.org/abs/2409.11406},
  }