how add a dataset in AudioEvals?

In practice, you may need eval your custom audio dataset.

before this, you need now how launch a custom eval task: how launch a custom eval task.md

here are steps:

JSON file:

register the dataset

make sure your dataset file is jsonl format and with WavPath column which specific the audio file path.

new a file **.yaml in registry/dataset/ content like :

$name:  # name after cli: --dataset $name
class: audio_evals.dataset.dataset.JsonlFile
args:
 default_task: alei_asr  # you should specify an eval task as default, you can find valid task in  `registry/eval_task`
 f_name:  # the file name
 ref_col:  # the reference answer column name in file

after registry dataset, you can eval your dataset with --dataset $name, enjoy 😘

Example:

create a file my_dataset.jsonl with WavPath and Transcript columns, the content like this:

{"WavPath": "path/to/audio1.wav", "Transcript": "this is the first audio"}
{"WavPath": "path/to/audio2.wav", "Transcript": "this is the second audio"}

create a file my_dataset.yaml in registry/dataset/ with content:

my_dataset:
  class: audio_evals.dataset.dataset.JsonlFile
  args:
    default_task: asr
    f_name: my_dataset.jsonl     # the file name
    ref_col: Transcript           # the reference answer column name in file

eval your dataset with --dataset my_dataset

export PYTHONPATH=$PWD:$PYTHONPATH
export OPENAI_API_KEY=$your-key
python audio_evals/main.py --dataset my_dataset --model gpt4o_audio

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how add a dataset.md

how add a dataset.md

how add a dataset in AudioEvals?

JSON file:

register the dataset

Files

how add a dataset.md

Latest commit

History

how add a dataset.md

File metadata and controls

how add a dataset in AudioEvals?

JSON file:

register the dataset