Infocom2021

Flowchartof the AI network.

Deployment Guidance (python 3 preferred)

Google API + WER/MER/WIL Metric

pip install SpeechRecognition google-cloud-speech google-api-python-client oauth2client jiwer

Test: cd utils && python speech2text.py

SDR/SIR/SAR Metric

sdr,sir,sar = mir_eval.separation.bss_eval_sources(reference_sources, estimated_sources, compute_permutation=True)

Benchmark

focus my voice:

python metric_eva_focus.py -c config/focusTest.yaml -e model/embedder.pt --checkpoint_path ../trained_model/enhance_my_voice/chkpt_201000.pt -o eva-focus -m focus -g 0 -x [noise]-[XdB].xlsx

hide my voice:

python metric_eva_hide.py -c config/hideTest.yaml -e model/embedder.pt --checkpoint_path ../trained_model/hide_my_voice/chkpt_304000.pt -o eva-hide -m hide -g 0 -x [noise]-[XdB].xlsx

Evaluate

python inference.py -c [config yaml] -e [path of embedder pt file] --checkpoint_path [path of chkpt pt file] -m [path of mixed wav file] -r [path of reference wav file] -g 1 -o [output directory]

Note that the checkpoint_path model will affect the model perform either hide or focus voice.

Train

python trainer.py -c [config yaml] -e [path of embedder pt file] -g 1 -l power/mse -m [name] -h 1/0

-h is selected to either train a hide voice model or a focus voice model

Version Control

Version	Description
V0	Original Version of voicefilter
V1.0	after * mask, the purified_mag -> istft -> purified_wav -> + mixed_wav -> denoised_wav -> stft -> denoised_mag -> loss
V2.0	put istft into model. model_output -> istft -> audio_mask -> + mixed_wav -> denoised_wav -> stft -> denoised_mag -> loss
V2.1	~~change normalize function in stft~~
V3.0	~~Use linearity of Fourier Transform, only change * to - operation from V0~~
V3.1	Apply normalization after mixed_mag - noise_mag
V3.1.1	~~Add 3 different evaluations for wavs based on v3.1~~
V3.1.2	Change dataloader, get new_target_wav = mixed_wav - target_wav
V3.2	~~Use + operation instead of - compare with v3.1~~
V3.2.1	Add 3 different evaluations for wavs based on v3.2
V3.2.2	~~Add generator2, new dataloader based on v3.2.1~~
V3.2.3	Use plus to train hide my voice, add dataloader option for old and new dataset

Dataset path in server

Dataset	PATH
train-100
audios after normalize.sh	/srv/node/sdc1/LibriSpeech
spectrograms after generator.py	~~/srv/node/sdd1/small-processed-audio~~
train-360
audios after normalize.sh	/srv/node/sdc1/medium-LibriSpeech
spectrograms and phases after v2 generator.py	/srv/node/sdc1/medium-processed-audio
New dataset
New dataset based on train-360	/srv/node/sdd1/new-processed-audio

Schedule

Period	chenning	hanqing
0701-0703	Power loss[x]	Reproduction [x]
0704-0704		Code Review [x], dataset Production [x]
0705-0708	Paper introduction draft & pipeline optimization [x]	code v3 [x]
0709-0711	System design	Preliminary on public dataset
0713-0718	Finish experimental evaluation	Finish experimental evaluation
0720-0725	Finish user case 1	Finish user case 1
0727-0801	Finish user case 2	Finish user case 2
0803-0808	Paper v1	Paper v1
0809-0814	Paper submission	Paper submission

VoiceFilter

Unofficial PyTorch implementation of Google AI's: VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking.

Dependencies

Python and packages
```
pip install -r requirements.txt
```

Prepare Dataset

Download LibriSpeech dataset

To replicate VoiceFilter paper, get LibriSpeech dataset at http://www.openslr.org/12/. train-clear-100.tar.gz(6.3G) contains speech of 252 speakers, and train-clear-360.tar.gz(23G) contains 922 speakers. You may use either, but the more speakers you have in dataset, the more better VoiceFilter will be.

Resample & Normalize wav files

First, unzip tar.gz file to desired folder:

tar -xvzf train-clear-360.tar.gz

Next, copy utils/normalize-resample.sh to root directory of unzipped data folder. Then:

vim normalize-resample.sh # set "N" as your CPU core number.
chmod a+x normalize-resample.sh
./normalize-resample.sh # this may take long

Edit config.yaml

cd config
cp default.yaml config.yaml
vim config.yaml

Tips:

change train_dir and test_dir. Maintain different config.yaml at desktop and server.

Preprocess wav files

In order to boost training speed, perform STFT for each files before training by:
```
python generator.py -c [config yaml] -d [data directory] -o [output directory] -p [processes to run]
```
This will create 100,000(train) + 1000(test) data. (About 160G)

Tips:

Run v0 => generator.py can get mixed_mag, mixed_wav, target_mag, target_wav, d_vector.txt. Note this d_vector.txt is the path of reference audio.
Run v1.0 or v2.0 generator.py can also get mixed_phase and target_phase.
On server side, DO NOT use -p as multi-processor.

Train VoiceFilter

Run

After specifying train_dir, test_dir at config.yaml, run:
```
python trainer.py -c [config yaml] -e [path of embedder pt file] -g 1 -l power/mse -m [name]
```
This will create chkpt/name and logs/name at base directory(-b option, . in default)

Tips:

add -g to choose cuda device, default is device 1. This arg is required.
add -l to select loss type, default is power loss. Can switch to mse loss by specifying this arg to mse.
View tensorboardX
```
tensorboard --logdir ./logs
```

Resuming from checkpoint

python trainer.py -c [config yaml] --checkpoint_path [chkpt/name/chkpt_{step}.pt] -e [path of embedder pt file] -g 1 -l power/mse -m name

Evaluate

python inference.py -c [config yaml] -e [path of embedder pt file] --checkpoint_path [path of chkpt pt file] -m [path of mixed wav file] -r [path of reference wav file] -g 1 -o [output directory]

Possible improvments

Try power-law compressed reconstruction error as loss function, instead of MSE. (See #14)

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
assets		assets
conf		conf
config		config
datasets		datasets
demo_wav		demo_wav
eva-focus-train		eva-focus-train
eva-focus		eva-focus
eva-hide-test		eva-hide-test
eva-hide-train		eva-hide-train
eva-hide		eva-hide
fig		fig
matlab		matlab
model		model
noise_source		noise_source
pic_making		pic_making
selected_noise		selected_noise
test		test
utils		utils
.gitignore		.gitignore
README.md		README.md
confidence_compare.py		confidence_compare.py
demo_spec.py		demo_spec.py
generate_user_case.py		generate_user_case.py
generate_user_case_hide.py		generate_user_case_hide.py
generator-ourdataset.py		generator-ourdataset.py
generator.py		generator.py
generator2.py		generator2.py
inference.py		inference.py
inference_matlab.py		inference_matlab.py
infocom21.pdf		infocom21.pdf
infocom21.pptx		infocom21.pptx
main.m		main.m
main_audios.m		main_audios.m
main_user_case.m		main_user_case.m
main_user_case2.m		main_user_case2.m
metric_eva_focus.py		metric_eva_focus.py
metric_eva_hide.py		metric_eva_hide.py
normalize-resample.sh		normalize-resample.sh
parameter_analysis.m		parameter_analysis.m
prepare_for_mixed_audio.m		prepare_for_mixed_audio.m
requirements.txt		requirements.txt
speaker_matrix.py		speaker_matrix.py
test_audio.m		test_audio.m
trainer.py		trainer.py
validation.py		validation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Infocom2021

Deployment Guidance (python 3 preferred)

Google API + WER/MER/WIL Metric

SDR/SIR/SAR Metric

Benchmark

focus my voice:

hide my voice:

Evaluate

Train

Version Control

Dataset path in server

Schedule

VoiceFilter

Dependencies

Prepare Dataset

Tips:

Tips:

Train VoiceFilter

Tips:

Evaluate

Possible improvments

About

Releases

Packages

Languages

lange-bleu/infocom2021

Folders and files

Latest commit

History

Repository files navigation

Infocom2021

Deployment Guidance (python 3 preferred)

Google API + WER/MER/WIL Metric

SDR/SIR/SAR Metric

Benchmark

focus my voice:

hide my voice:

Evaluate

Train

Version Control

Dataset path in server

Schedule

VoiceFilter

Dependencies

Prepare Dataset

Tips:

Tips:

Train VoiceFilter

Tips:

Evaluate

Possible improvments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages