Training-efficient (Treff) adapter for contrastive language-audio models

This is the official repo for the INTERSPEECH2023 paper "Adapting Language-Audio Models as Few-Shot Audio Learners". We proposed Treff adapter to boostrap language-audio pretraining model (such as CLAPs) with a few of labelled examples. It beats the zero-shot classifier and existing few-shot learning methods even WITHOUT training.

Setup

You are required to just install the dependencies: pip install -r requirements.txt using Python 3 to get started.

CLAP weights

Access CLAP weights: Pretrained Model [Zenodo] Alternatively, you may also want to try out other version of CLAP model in Pretrained Model.

Usage

Please take a look at scripts for usage examples. Ones may need to change the WORK_DIR and STORAGE_DIR to their custom direction. Also, to try out Treff adapter in more scenarios, one can directly adjust hyper-parameters, such as MODEL_NAME and BATCH_SIZE.

Citation

If you are using this repo, please cite this work.

@misc{liang2023adapting,
      title={Adapting Language-Audio Models as Few-Shot Audio Learners}, 
      author={Jinhua Liang and Xubo Liu and Haohe Liu and Huy Phan and Emmanouil Benetos and Mark D. Plumbley and Wenwu Wang},
      year={2023},
      eprint={2305.17719},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}

Acknowledge

This repo is build upon the great jobs of Microsoft CLAP and LAION CLAP.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
experiments		experiments
scripts		scripts
src		src
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Training-efficient (Treff) adapter for contrastive language-audio models

Setup

CLAP weights

Usage

Citation

Acknowledge

About

Releases

Packages

Languages

License

JinhuaLiang/lam4fsl

Folders and files

Latest commit

History

Repository files navigation

Training-efficient (Treff) adapter for contrastive language-audio models

Setup

CLAP weights

Usage

Citation

Acknowledge

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages