Skip to content

Latest commit

 

History

History
119 lines (91 loc) · 4.53 KB

README.md

File metadata and controls

119 lines (91 loc) · 4.53 KB

360-1M is a large-scale 360° video dataset consisting of over 1 million videos for training video and 3D foundation models. This repository contains the following:

  1. Links to the videos URLs for download from YouTube.
  2. Metadata for each video including category, resolution, and views.
  3. Code for downloading the videos locally and to Google Cloud Platform (recommended).
  4. Code for filtering, processing, and obtaining camera pose for the videos.
  5. Code for training the novel view synthesis model, ODIN.
Reference Image
NYC Reference
Generated Scene Trajectory
NYC Demo
Reference Image
Living Room Reference
Generated Scene Trajectory
Living Room Demo
Reference Image
Picnic Reference
Generated Scene Trajectory
Picnic Demo

Downloading Videos

Metadata and video URLs can be downloaded from here: Metadata with Video URLs

To download the videos we recommend using the yt-dlp package. To run our download scripts you'll also need pandas and pyarrow to parse the metadata parquet:

#Install packages for downloading videos
pip install yt-dlp
pip install pandas
pip install pyarrow

The videos can be downloaded using the provided script:

python DownloadVideos/download_local.py --in_path 360-1M.parquet --out_dir /path/to/videos

The total size of all videos at max resolution is about 200 TB. We recommend downloading to a cloud platform due to bandwidth limitations and provide a script for use with GCP.

python DownloadVideos/Download_GCP.py --path 360-1M.parquet

We will soon release a filtered, high-quality subset to facilitate those who want to work with a smaller version of 360-1M locally.

Sample 1 Sample 2


Installation Guide for Video Processing And Training

Environment Setup

  1. Create a new Conda environment:
    conda create -n ODIN python=3.9
    conda activate ODIN
    
2. Clone the repository:

```bash
cd ODIN
pip install -r requirements.txt
  1. Install additional dependencies:
git clone https://github.com/CompVis/taming-transformers.git
pip install -e taming-transformers/
git clone https://github.com/openai/CLIP.git
pip install -e CLIP/
  1. Clone the MAST3R repository:
git clone --recursive https://github.com/naver/mast3r
cd mast3r
  1. Install MAST3R dependencies:
pip install -r requirements.txt
pip install -r dust3r/requirements.txt
For detailed installation instructions, visit the MAST3R repository.

Extracting Frames

To extract frames from videos, use the video_to_frames.py script:

python video_to_frames.py --path /path/to/videos --out /path/to/frames

Extracting Pairwise Poses Once frames are extracted, pairwise poses can be calculated using:

python extract_poses.py --path /path/to/frames

Training

Download the image-conditioned Stable Diffusion checkpoint released by Lambda Labs:

wget https://cv.cs.columbia.edu/zero123/assets/sd-image-conditioned-v2.ckpt

Run the training script:

python main.py \
    -t \
    --base configs/sd-ODIN-finetune-c_concat-256.yaml \
    --gpus 0,1,2,3,4,5,6,7 \
    --scale_lr False \
    --num_nodes 1 \
    --check_val_every_n_epoch 1 \
    --finetune_from sd-image-conditioned-v2.ckpt

Coming Soon

  • High quality subset for easier experimentation.
  • Model weights with inference and fine-tuning code.