Skip to content

We introduce a novel approach for parameter generation, named neural network parameter diffusion (p-diff), which employs a standard latent diffusion model to synthesize a new set of parameters

Notifications You must be signed in to change notification settings

NUS-HPC-AI-Lab/Neural-Network-Diffusion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Neural Network Diffusion

Motivation of p-diff
This repository contains the code and implementation details for the research paper titled Neural Network Diffusion. The paper explores novel paradigms in deep learning, specifically focusing on diffusion models for generating high-performing neural network parameters.

Authors

Overview

Figure: Our approach consists of two processes: parameter autoencoder and parameter generation. Parameter autoencoder aims to extract the latent representations and reconstruct model parameters via the decoder. The extracted representations are used to train a diffusion model (DM). During inference, a random noise vector is fed into the DM and the trained decoder to generate new parameters.

Abstract: Diffusion models have achieved remarkable success in image and video generation. In this work, we demonstrate that diffusion models can also generate high-performing neural network parameters. Our approach is simple, utilizing an autoencoder and a diffusion model. The autoencoder extracts latent representations of a subset of the trained neural network parameters. Next, a diffusion model is trained to synthesize these latent representations from random noise. This model then generates new representations, which are passed through the autoencoder's decoder to produce new subsets of high-performing network parameters. Across various architectures and datasets, our approach consistently generates models with comparable or improved performance over trained networks, with minimal additional cost. Notably, we empirically find that the generated models are not memorizing the trained ones. Our results encourage more exploration into the versatile use of diffusion models.

Environment

We support all versions of pytorch>=2.0.0. But we recommend to use python==3.11 and pytorch==2.5.1, which we have fully tested.

conda create -n pdiff python=3.11
conda activate pdiff
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
git clone https://github.com/NUS-HPC-AI-Lab/Neural-Network-Parameter-Diffusion.git --depth=1
cd Neural-Network-Parameter-Diffusion
pip install -r requirements.txt

Quick Start

This will run three steps sequentially: preparing the dataset, training p-diff, and evaluating. Then the results will be saved in the root directory and save checkpoint in ./checkpoint

cd workspace
bash run_all.sh main cifar100_resnet18 0
# bash run_all <category> <tag> <device>

Detailed Usage

Prepare checkpoints dataset.

cd ./dataset/main/cifar100_resnet18
rm performance.cache  # optional
CUDA_VISIBLE_DEVICES=0 python train.py
CUDA_VISIBLE_DEVICES=0 python finetune.py

Train pdiff and generate models.

cd ../../../workspace
bash launch.sh main cifar100_resnet18 0
# bash launch <category> <tag> <device>
CUDA_VISIBLE_DEVICES=0 python generate.py main cifar100_resnet18
# CUDA_VISIBLE_DEVICES=<device> python generate.py <category> <tag>

Test original checkpoints and generated checkpoints and their similarity.

CUDA_VISIBLE_DEVICES=0 python evaluate.py main cifar100_resnet18
# CUDA_VISIBLE_DEVICES=<device> python evaluate.py <category> <tag>

All our <category> and <tag> can be found in ./dataset/<category>/<tag>.

Register Your Own Dataset

  1. Create a directory that mimics the dataset folder and contains three contents:
mkdir ./dataset/main/<tag>
cd ./dataset/main/<tag>

checkpoint: A directory contains many .pth files, which contain dictionaries of parameters.
generated: An empty directory, where the generated model will be stored.
test.py: A test script to test the checkpoints. It should be callable as follows:

CUDA_VISIBLE_DEVICES=0 python test.py ./checkpoint/checkpoint001.pth
# CUDA_VISIBLE_DEVICES=<device> python test.py <checkpoint_file>
  1. Register a dataset.
    Add a class to the last line of the dataset file.
cd ../../../dataset
vim __init__.py  
# This __init__.py is the dataset file.
# on line 392
+ class <Tag>(MainDataset): pass
  1. Create your launch script.
    You can change other hyperparameters here.
cd ../workspace/main
cp cifar10_resnet18.py main_<tag>.py
vim main_<tag>.py
# on line 33
- from dataset import Cifar100_ResNet18 as Dataset
+ from dataset import <Tag> as Dataset
  1. Train pdiff and generate models.
    Following Section "Detail Usage".

  2. Test original ckpt and generated ckpt and their similarity.
    Following Section "Detail Usage".

Acknowledgments

We thank Kaiming He, Dianbo Liu, Mingjia Shi, Zheng Zhu, Bo Zhao, Jiawei Liu, Yong Liu, Ziheng Qin, Zangwei Zheng, Yifan Zhang, Xiangyu Peng, Hongyan Chang, Zirui Zhu, Dave Zhenyu Chen, Ahmad Sajedi and George Cazenavette for valuable discussions and feedbacks.

Citation

If you found our work useful, please consider citing us.

@misc{wang2024neural,
      title={Neural Network Diffusion}, 
      author={Kai Wang and Dongwen Tang and Boya Zeng and Yida Yin and Zhaopan Xu and Yukun Zhou and Zelin Zang and Trevor Darrell and Zhuang Liu and Yang You},
      year={2024},
      eprint={2402.13144},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

About

We introduce a novel approach for parameter generation, named neural network parameter diffusion (p-diff), which employs a standard latent diffusion model to synthesize a new set of parameters

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published