Improving log

Custom CUDA implementation for efficiency

Some intermediate steps are reimplemented in cuda (lib/cuda/), which improves training speed by 1.8~3.5. Below show the results dense grid under 256^3 voxels and 160^3 voxels. Telsa V100, RTX 2080 Ti, and RTX 1080 Ti are tested. The PSNRs of different versions on different machines have about 0.2 PSNR drift. The training speeds of the final version are improved 2--3 times from the original native pytorch implementation.

num_voxels=256^3	lego		mic		ship
GPU=V100	psnr	`mm:ss`	psnr	`mm:ss`	psnr	`mm:ss`
native pytorch b076912	35.51	`15:10`	34.39	`14:11`	30.05	`17:04`
cuda re-impl. Adam optimizer d3783f4	35.47	`08:54` (1.7x)	34.34	`06:41` (2.1x)	30.05	`10:23` (1.6x)
cuda re-impl. rendering 3de7a6d	35.63	`06:31` (2.3x)	34.48	`04:31` (3.1x)	30.30	`08:20` (2.0x)
prevent atomic add in alpha2weight 4f4ac99	35.61	`05:35` (2.7x)	34.51	`04:00` (3.5x)	30.29	`07:20` (2.3x)

GPU=2080Ti
native pytorch b076912	-	OOM	34.44	`18:01`	-	OOM
cuda re-impl. 4f4ac99	35.61	`07:19`	34.49	`04:30` (4.0x)	30.29	`09:53`

GPU=1080Ti
native pytorch b076912	35.76	`37:22`	34.47	`31:18`	30.09	`45:28`
cuda re-impl. 4f4ac99	35.62	`14:32` (2.6x)	34.50	`08:55` (3.5x)	30.29	`21:00` (2.2x)

# The model&training config for the results above
coarse_train = dict(N_iters=5000)
fine_train = dict(pg_scale=[1000,2000,3000,4000,5000,6000])
fine_model_and_render = dict(num_voxels=256**3)

num_voxels=160^3	lego		mic		ship
GPU=V100	psnr	`mm:ss`	psnr	`mm:ss`	psnr	`mm:ss`
native pytorch b076912	34.65	`08:29`	33.19	`07:04`	29.08	`10:38`
cuda re-impl. Adam optimizer d3783f4	34.66	`06:01` (1.4x)	33.14	`04:38` (1.5x)	29.04	`08:06` (1.3x)
cuda re-impl. rendering 3de7a6d	34.56	`04:50` (1.8x)	33.10	`03:22` (2.1x)	29.19	`06:31` (1.6x)
prevent atomic add in alpha2weight 4f4ac99	34.58	`03:58` (2.1x)	33.12	`03:00` (2.4x)	29.17	`05:46` (1.8x)

GPU=2080Ti
native pytorch b076912	34.68	`11:27`	33.18	`09:19`	29.13	`14:35`
cuda re-impl. 4f4ac99	34.59	`04:59` (2.3x)	33.15	`03:04` (3.0x)	29.19	`07:32` (1.9x)

GPU=1080Ti
native pytorch b076912	34.66	`22:01`	33.19	`17:14`	29.10	`29:57`
cuda re-impl. 4f4ac99	34.56	`10:29` (2.1x)	33.11	`06:21` (2.7x)	29.18	`16:48` (x1.8)

# The model&training config for the results above
coarse_train = dict(N_iters=5000)
fine_train = dict(pg_scale=[1000,2000,3000,4000])
fine_model_and_render = dict(num_voxels=160**3)

Extend for forward-facing scene

The model for forward-facing scene is implemented in lib/dmpigo.py. Some main modifications include:

Use NeRF's NDC warping
Use Multiplane Image
The initial probability stopping at each plane is 1/(# of planes)
Skip coarse stage training as it don't help in forward-facing scene
Adopt total variation loss or the quality would degrade

All config files are in configs/llff/. The based config for small model is:

# See configs/llff/llff_default.py
data = dict(
    dataset_type='llff',    # use llff dataloader
    ndc=True,               # use ndc coordinate (only for forward-facing; not support yet)
    width=1008,             # enforce image width
    height=756,             # enforce image height
)

coarse_train = dict(
    N_iters=0,              # we don't need the coarse stage training
)

fine_train = dict(
    N_iters=30000,
    N_rand=4096,            # it seem that larger batch don't help
    pg_scale=[2000,4000,6000,8000],
    ray_sampler='flatten',
    tv_before=1e9,          # enable total variation loss
    tv_dense_before=10000,  # dense version of total variation loss for the first 10k iterations
    weight_tv_density=1e-5,
    weight_tv_k0=1e-6,
)

fine_model_and_render = dict(
    num_voxels=256**3,
    mpi_depth=128,          # the number of planes in Multiplane Image (work when ndc=True)
    rgbnet_dim=9,           # it seem that more rgbnet_dim don't help
    rgbnet_width=64,        # it seem that larger rgbnet_width don't help
    world_bound_scale=1,    # we don't have to slightly enlarge the ndc
    fast_color_thres=1e-3,  # the initial probability stopping at each plane is 1/mpi_depth
                            # so the original 1e-4 would be too passive here
)

See configs/llff/llff_default_lg.py for the modification for large model. Basically, we double the number of mpi_depth and use a larger MLP.

Results:

Our training times are measured on single Telsa V100 GPU.

Training time (mm:ss)

Method	Avg.	Room	Fern	Leaves	Fortress	Orchids	Flower	T-Rex	Horns
NeRF	30+ hr
Ours small	05:30	05:55	06:12	04:36	05:38	05:26	05:28	05:07	05:23
Ours large	16:27	17:38	18:21	14:11	16:03	17:14	16:27	15:46	16:00

PSNR

Method	Avg.	Room	Fern	Leaves	Fortress	Orchids	Flower	T-Rex	Horns
NeRF	26.50	32.70	25.17	20.92	31.16	20.36	27.40	26.80	27.45
Ours small	25.83	30.88	24.69	20.81	30.09	19.82	27.34	26.04	26.98
Ours large	26.37	32.16	24.99	21.01	30.79	20.07	27.62	26.63	27.69

SSIM

Method	Avg.	Room	Fern	Leaves	Fortress	Orchids	Flower	T-Rex	Horns
NeRF	0.811	0.948	0.792	0.690	0.881	0.641	0.827	0.880	0.828
Ours small	0.826	0.940	0.810	0.735	0.871	0.663	0.849	0.891	0.850
Ours large	0.840	0.951	0.821	0.745	0.890	0.673	0.856	0.909	0.877

LPIPS (VGG)

Method	Avg.	Room	Fern	Leaves	Fortress	Orchids	Flower	T-Rex	Horns
NeRF	0.250	0.178	0.280	0.316	0.171	0.321	0.219	0.249	0.268
Ours small	0.215	0.191	0.231	0.215	0.185	0.252	0.187	0.229	0.233
Ours large	0.200	0.172	0.222	0.205	0.161	0.247	0.181	0.215	0.203

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IMPROVING_LOG.md

IMPROVING_LOG.md

Improving log

Custom CUDA implementation for efficiency

Extend for forward-facing scene

Files

IMPROVING_LOG.md

Latest commit

History

IMPROVING_LOG.md

File metadata and controls

Improving log

Custom CUDA implementation for efficiency

Extend for forward-facing scene