-
Notifications
You must be signed in to change notification settings - Fork 36
/
Copy pathREADME-file-lists.txt
78 lines (61 loc) · 3.12 KB
/
README-file-lists.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
======================
CLEAN SPEECH FILES
======================
For reproducible evaluation of speaker recognition systems, we provide a list
of clean files chosen from the NIST SRE 2010 data set. This data set:
- 16ksps PCM format
- 3-to-15 minute long interviews
- Clean recordings using lavalier microphones
- 361 speakers for each of the train and test sets
- Each train file has at least one test case
- train.list is a list of enrollment files, speaker ids and genders (m: male, f:
female)
- test.list is a list of test cases (2nd column), their speaker ids (1st column)
and gender (3rd column)
Note that filenames have a letter 'A' appended to them, indicating only the
first audio channel needs to be extracted from the files.
From these lists, you should prepare the corresponding lists of train and test files with
absolute pathnames, i.e. pointing to the NIST SRE2010 audio files in your
system. These file lists will be degraded under diverse acoustic conditions using
the provided simulator.
=========================================
DEVELOPMENT, TRAIN AND TEST DATA SETS
=========================================
This package targets both the evaluation and development of robust speaker
recognition systems. System development can be sensitive to the acoustic
conditions included in the development set, that are also present in the
train and test sets. We address this issue in two ways:
- A large noise (~2000 files, 80h), impulse response (~120 device and space
impulse responses) and codec (~12 speech and audio codecs) database. The
rationale here is to be able to reproduce a very large number of
combined degradation processes to promote good generalization of machine
learning algorithms.
- Split noise data into dev, train and test sets, to prevent overfitting
of noise data. Still, we consider codecs and impulse responses as
processes rather than data and, given the small number of them, we decided
them to be shared across dev, train and test data sets.
To build the degradation dev, train and test sets we first assign train noise
files (361), then test files (644), keeping the rest for the development set
(1016). To do so just run:
./split-dev-train-test.py noise-file-list.txt train.list test.list
This will generate three files:
noise-file-list-trn.txt
noise-file-list-tst.txt
noise-file-list-dev.txt
These noise file lists will be used when degrading a list of clean speech
files using the command
./degrade-audio-list-safe-random.py \
-N noise-file-list-XXX.txt \
-D ir-device-file-list.txt \
-P ir-space-file-list.txt
condition \
file-list.txt \
output-dir-XXX
XXX: either 'trn','tst' or 'dev'
condition:
[landline|cellular|satellite|voip|playback|interview].[|noisy08|noisy15]
file-list.txt: list of clean files to degrade (with absolute path)
output-dir-XXX: output directory for the degraded audio files
Note that 'safe-random' in the script name refers to reproducible random number generation across
machines. This is simply implemented as a list of pregenerated integer random
numbers in the file random. Please do not change the file 'random'.