Skip to content

Latest commit

 

History

History
23 lines (13 loc) · 1.09 KB

README.md

File metadata and controls

23 lines (13 loc) · 1.09 KB

Infant Cry

Implementation for the detection of infant cries using the TRILLsson speech representations.

Installation

apt-get update && apt-get install -y libsndfile1 ffmpeg
conda create -p ./venv python=3.9
conda activate ./venv
pip install -r requirements.txt

Running

To run a training, call:

python run.py --config 'path/to/config.yaml'

with the config linking to compatible data.

Results

Training using the configuration found in 'config/config_cluster.yaml' the model achieves an F1 score of 0.899 on previously unseen data (unseen babies). This model was trained on non-augmented data, with a batch size of 8 and a learning rate of 0.001. The model was trained on a single GPU (RTXA6000 48GB). Other experiments were run using augmented data which previously had led to improved performance. Augmentation methods can be found the repository https://github.com/timherzig/speech_augment, and using generated room impulse responses from a modified version of anton-jeran/FAST-RIR found here: https://github.com/timherzig/FAST-RIR.