-
Notifications
You must be signed in to change notification settings - Fork 55
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
idan-arm
committed
Oct 31, 2021
1 parent
0e0fa4b
commit 22211f6
Showing
23 changed files
with
1,946 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
# RNNoise INT8 | ||
|
||
## Description | ||
RNNoise is a noise reduction network, that helps to remove noise from audio signals while maintaining any speech. This is a TFLite quantized version that takes traditional signal processing features and outputs gain values that can be used to remove noise from audio. It also detects if voice activity is present. | ||
This is a 1 step model trained on Noisy speech database for training speech enhancement algorithms and TTS models that requires hidden states to be fed in at each time step. | ||
Dataset license link: https://datashare.ed.ac.uk/handle/10283/2791 | ||
This model is converted from FP32 to INT8 using post-training quantization. | ||
|
||
|
||
## License | ||
[Apache-2.0](https://spdx.org/licenses/Apache-2.0.html) | ||
|
||
## Network Information | ||
| Network Information | Value | | ||
|---------------------|----------------| | ||
| Framework | TensorFlow Lite | | ||
| SHA-1 Hash | 2d973fe7116e0bc3674f0f3f0f7185ffe105bba5 | | ||
| Size (Bytes) | 113472 | | ||
| Provenance | https://arxiv.org/pdf/1709.08243.pdf | | ||
| Paper | https://arxiv.org/pdf/1709.08243.pdf | | ||
|
||
## Performance | ||
|
||
| Platform | Optimized | | ||
|----------|:---------:| | ||
| Cortex-A |:heavy_check_mark: | | ||
| Cortex-M |:heavy_check_mark: | | ||
| Mali GPU |:heavy_check_mark: | | ||
| Ethos U |:heavy_check_mark: | | ||
|
||
### Key | ||
* :heavy_check_mark: - Will run on this platform. | ||
* :heavy_multiplication_x: - Will not run on this platform. | ||
|
||
## Accuracy | ||
Dataset: Noisy Speech Database For Training Speech Enhancement Algorithms And Tts Models | ||
|
||
| Metric | Value | | ||
|--------|-------| | ||
| Average Pesq | 2.945 | | ||
|
||
## Optimizations | ||
| Optimization | Value | | ||
|--------------|---------| | ||
| Quantization | INT8 | | ||
|
||
## Network Inputs | ||
<table> | ||
<tr> | ||
<th width="200">Input Node Name</th> | ||
<th width="100">Shape</th> | ||
<th width="300">Description</th> | ||
</tr> | ||
<tr> | ||
<td>main_input_int8</td> | ||
<td>(1, 1, 42)</td> | ||
<td>Pre-processed signal features extracted from 480 values of a 48KHz wav file</td> | ||
</tr> | ||
<tr> | ||
<td>vad_gru_prev_state_int8</td> | ||
<td>(1, 24)</td> | ||
<td>Previous GRU state for the voice activity detection GRU</td> | ||
</tr> | ||
<tr> | ||
<td>noise_gru_prev_state_int8</td> | ||
<td>(1, 48)</td> | ||
<td>Previous GRU state for the noise GRU</td> | ||
</tr> | ||
<tr> | ||
<td>denoise_gru_prev_state_int8</td> | ||
<td>(1, 96)</td> | ||
<td>Previous GRU state for the denoise GRU</td> | ||
</tr> | ||
</table> | ||
|
||
## Network Outputs | ||
<table> | ||
<tr> | ||
<th width="200">Output Node Name</th> | ||
<th width="100">Shape</th> | ||
<th width="300">Description</th> | ||
</tr> | ||
<tr> | ||
<td>Identity_int8</td> | ||
<td>(1, 1, 96)</td> | ||
<td>Next GRU state for the denoise GRU</td> | ||
</tr> | ||
<tr> | ||
<td>Identity_1_int8</td> | ||
<td>(1, 1, 22)</td> | ||
<td>Gain values that can be used to remove noise from this audio sample</td> | ||
</tr> | ||
<tr> | ||
<td>Identity_2_int8</td> | ||
<td>(1, 1, 48)</td> | ||
<td>Next GRU state for the noise GRU</td> | ||
</tr> | ||
<tr> | ||
<td>Identity_3_int8</td> | ||
<td>(1, 1, 24)</td> | ||
<td>Next GRU state for the voice activity detection GRU</td> | ||
</tr> | ||
<tr> | ||
<td>Identity_4_int8</td> | ||
<td>(1, 1, 1)</td> | ||
<td>Probability that this audio sample contains voice activity</td> | ||
</tr> | ||
</table> |
108 changes: 108 additions & 0 deletions
108
models/noise_suppression/RNNoise/tflite_int8/definition.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
benchmark: | ||
Noisy speech database for training speech enhancement algorithms and TTS models: | ||
Average pesq: '2.945' | ||
description: "RNNoise is a noise reduction network, that helps to remove noise from\ | ||
\ audio signals while maintaining any speech. This is a TFLite quantized version\ | ||
\ that takes traditional signal processing features and outputs gain values that\ | ||
\ can be used to remove noise from audio. It also detects if voice activity is present.\r\ | ||
\nThis is a 1 step model trained on Noisy speech database for training speech enhancement\ | ||
\ algorithms and TTS models that requires hidden states to be fed in at each time\ | ||
\ step.\r\nDataset license link:https://datashare.ed.ac.uk/handle/10283/2791\r\n\ | ||
This model is converted from FP32 to INT8 using post-training quantization. | ||
license: | ||
- Apache-2.0 | ||
network: | ||
file_size_bytes: 113472 | ||
filename: rnnoise_INT8.tflite | ||
framework: TensorFlow Lite | ||
hash: | ||
algorithm: sha1 | ||
value: 2d973fe7116e0bc3674f0f3f0f7185ffe105bba5 | ||
provenance: https://arxiv.org/pdf/1709.08243.pdf | ||
quality_level: null | ||
network_parameters: | ||
input_nodes: | ||
- description: Pre-processed signal features extracted from 480 values of a 48KHz | ||
wav file | ||
example_input: | ||
path: models/noise_suppression/RNNoise/tflite_int8/testing_input/main_input_int8 | ||
name: main_input_int8 | ||
shape: | ||
- 1 | ||
- 1 | ||
- 42 | ||
- description: Previous GRU state for the voice activity detection GRU | ||
example_input: | ||
path: models/noise_suppression/RNNoise/tflite_int8/testing_input/vad_gru_prev_state_int8 | ||
name: vad_gru_prev_state_int8 | ||
shape: | ||
- 1 | ||
- 24 | ||
- description: Previous GRU state for the noise GRU | ||
example_input: | ||
path: models/noise_suppression/RNNoise/tflite_int8/testing_input/noise_gru_prev_state_int8 | ||
name: noise_gru_prev_state_int8 | ||
shape: | ||
- 1 | ||
- 48 | ||
- description: Previous GRU state for the denoise GRU | ||
example_input: | ||
path: models/noise_suppression/RNNoise/tflite_int8/testing_input/denoise_gru_prev_state_int8 | ||
name: denoise_gru_prev_state_int8 | ||
shape: | ||
- 1 | ||
- 96 | ||
output_nodes: | ||
- description: Next GRU state for the denoise GRU | ||
name: Identity_int8 | ||
shape: | ||
- 1 | ||
- 1 | ||
- 96 | ||
test_output_path: models/noise_suppression/RNNoise/tflite_int8/testing_output/Identity_int8 | ||
- description: Gain values that can be used to remove noise from this audio sample | ||
name: Identity_1_int8 | ||
shape: | ||
- 1 | ||
- 1 | ||
- 22 | ||
test_output_path: models/noise_suppression/RNNoise/tflite_int8/testing_output/Identity_1_int8 | ||
- description: Next GRU state for the noise GRU | ||
name: Identity_2_int8 | ||
shape: | ||
- 1 | ||
- 1 | ||
- 48 | ||
test_output_path: models/noise_suppression/RNNoise/tflite_int8/testing_output/Identity_2_int8 | ||
- description: Next GRU state for the voice activity detection GRU | ||
name: Identity_3_int8 | ||
shape: | ||
- 1 | ||
- 1 | ||
- 24 | ||
test_output_path: models/noise_suppression/RNNoise/tflite_int8/testing_output/Identity_3_int8 | ||
- description: Probability that this audio sample contains voice activity | ||
name: Identity_4_int8 | ||
shape: | ||
- 1 | ||
- 1 | ||
- 1 | ||
test_output_path: models/noise_suppression/RNNoise/tflite_int8/testing_output/Identity_4_int8 | ||
operators: | ||
TensorFlow Lite: | ||
- ADD | ||
- CONCATENATION | ||
- DEQUANTIZE | ||
- FULLY_CONNECTED | ||
- LOGISTIC | ||
- MUL | ||
- PACK | ||
- QUANTIZE | ||
- RELU | ||
- RESHAPE | ||
- SPLIT | ||
- SPLIT_V | ||
- SUB | ||
- TANH | ||
- UNPACK | ||
paper: https://arxiv.org/pdf/1709.08243.pdf |
51 changes: 51 additions & 0 deletions
51
models/noise_suppression/RNNoise/tflite_int8/recreate_model/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
# Recreate RNNoise model | ||
|
||
This folder contains code that will allow you to recreate the model and benchmark the performance. | ||
|
||
## How to train | ||
|
||
In order to recreate the RNNoise model exactly this will require you download the same dataset that we used to train | ||
the model. Downloading and unzipping the data can be done by running the following command: | ||
|
||
```bash | ||
./get_data.sh | ||
``` | ||
|
||
This will download the [Noisy speech database for training speech enhancement algorithms and TTS models]( | ||
https://datashare.ed.ac.uk/handle/10283/2791) and unzip it into training and testing folders for both clean and | ||
noisy data. | ||
|
||
Next you will need to create test and training .h5 files. These will contain the input features and labels for | ||
training the model. | ||
|
||
You have two methods of doing this: | ||
|
||
The first, and recommended way, is to follow the first 3 instructions found in the original RNNoise repository [here]( | ||
https://github.com/xiph/rnnoise/blob/master/TRAINING-README) to generate h5 files. You will need to combine all | ||
the clean audio in to one wav file and all the isolated noise audio into another. We provide an example function that | ||
can do this, see: ```create_combined_wavs``` in ```data.py```: | ||
|
||
Alternatively, you can use our Python implementation like so: | ||
```bash | ||
python data.py --clean_train_wav_folder=./clean_trainset_56spk_wav --noisy_train_wav_folder=./noisy_trainset_56spk_wav | ||
--clean_test_wav_folder=./clean_testset_wav --noisy_test_wav_folder=./noisy_testset_wav | ||
``` | ||
|
||
However, this is much much slower than the original implementation. | ||
|
||
After you have train and test h5 files created you can run the following shell script to perform training and generate | ||
the final TFLite files. | ||
|
||
```bash | ||
./train_and_quantise_model.sh | ||
``` | ||
|
||
This shell script expects that your training h5 file is called ```train.h5``` and your testing h5 file is | ||
called ```test.h5``` | ||
|
||
Finally, to evaluate performance of the models you can run the following Python script: | ||
```bash | ||
python test.py --clean_wav_folder=./clean_testset_wav --noisy_wav_folder=./noisy_testset_wav --tflite_path=<path_to_tflite> | ||
``` | ||
|
||
This evaluation may take some time to complete. |
Oops, something went wrong.