Skip to content

Latest commit

 

History

History
120 lines (86 loc) · 6.61 KB

README.md

File metadata and controls

120 lines (86 loc) · 6.61 KB

RAVE: Residual Vector Embedding for CLIP-Guided Backlit Image Enhancement

Accepted to ECCV 2024Project PagearXiv

Results

In this paper we propose a novel modification of CLIP guidance for the task of unsupervised backlit image enhancement. Our work builds on the state-of-the-art CLIP-LIT approach, which learns a prompt pair by constraining the text-image similarity between a prompt (negative/positive sample) and a corresponding image (backlit image/well-lit image) in the CLIP embedding space. Learned prompts then guide an image enhancement network. Based on the CLIP-LIT framework, we propose two novel methods for CLIP guidance. First, we show that instead of tuning prompts in the space of text embeddings, it is possible to directly tune their embeddings in the latent space without any loss in quality. This accelerates training and potentially enables the use of additional encoders that do not have a text encoder. Second, we propose a novel approach that does not require any prompt tuning. Instead, based on CLIP embeddings of backlit and well-lit images from training data, we compute the residual vector in the embedding space as a simple difference between the mean embeddings of the well-lit and backlit images. This vector then guides the enhancement network during training, pushing a backlit image towards the space of well-lit images. This approach further dramatically reduces training time, stabilizes training and produces high quality enhanced images without artifacts, both in supervised and unsupervised training regimes. Additionally, we show that residual vectors can be interpreted, revealing biases in training data, and thereby enabling potential bias correction.

🔖 Approach

drawing
In RAVE we exploit arithmetic defined in the CLIP latent space. Using well-lit and backlit training data, we construct a residual vector, which will then be used for enhancement model guidance. This is a vector that points in a direction moving from backlit images to well-lit images in the CLIP embedding space. We then use this vector as guidance for the image enhancement model during training. This will train the image enhancement model to produce images with CLIP latent vectors that are close to the CLIP latent vectors of well-lit training images.

🔖 Updates

  • 2024.08.26: Code for training and testing as well as model checkpoints are publicly available now.

🔖 Usage

✔️ Training and Testing Data:

Training and testing data can be downloaded from:

  • BAID dataset (train and test parts);
  • DIV2K images (well-lit images used instead of well-lit images from BAID for training models in unpaired setting);
  • LOL-v1 dataset for low-light image enhancement task (see supplementary material of RAVE paper for results on this data).

✔️ Run Training:

➖ CLIP-LIT and CLIP-LIT-Latent

Train CLIP-LIT:

python train.py --cfg ./configs/train/clip_lit.yaml

Train CLIP-LIT-Latent:

python train.py --cfg ./configs/train/clip_lit_latent.yaml

Before running, make sure paths to training data in the config are correct (backlit_images_path and welllit_images_path in config)

If you have pre-trained Unet and/or guidance model checkpoints, you can resume training by changing arguments load_pretrain corresponding to Unet/guidance model in the config. For more information on config arguments see Readme.md in config directory.

➖ RAVE

Train RAVE:

python train_rave.py --cfg ./configs/train/rave.yaml

Before running, make sure paths to training data in the config are correct (backlit_images_path and welllit_images_path in config)

To train RAVE with shifted residual by n tokens, change the remove_first_n_tokens argument in the config.

✔️ Inferencing and Testing:

➖ Pretrained checkpoints

Pretrained checkpoints for all the models are stored in pretrained_models dir.

Models trained on paired data:

  • CLIP-LIT: clip_lit_paired.pth;
  • CLIP-LIT-Latent: clip_lit_latent_paired.pth;
  • RAVE: rave_paired.pth.

Models trained on unpaired data:

  • CLIP-LIT: clip_lit_unpaired.pth;
  • CLIP-LIT-Latent: clip_lit_latent_unpaired.pth;
  • RAVE without shifting the residual: rave_unpaired.pth;
  • RAVE with shifting the residual by 15 tokens: rave_unpaired_shifted.pth.

➖ Inferencing

To run trained model on backlit images use the following command:

python inference.py --cfg ./configs/inference/inference.yaml

Before running, make sure that the path to testing data in the config is correct (input in config)

➖ Testing (computing metrics)

To compute metrics (SSIM, PSNR, LPIPS) on bunch of backlit and corresponding enhanced images, use the following command:

python compute_metrics.py --cfg ./configs/inference/metrics.yaml 

Before running, make sure that the paths to ground-truth well-lit data and enhanced images in the config are correct (gt_images_path and enhanced_images_path in config)

🔖 Citation

If you find our work useful, please consider citing the paper:

@misc{gaintseva2024raveresidualvectorembedding,
      title={RAVE: Residual Vector Embedding for CLIP-Guided Backlit Image Enhancement}, 
      author={Tatiana Gaintseva and Martin Benning and Gregory Slabaugh},
      year={2024},
      eprint={2404.01889},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2404.01889}, 
}

🔖 Contacts

Please feel free to reach out at [email protected].