Skip to content

Commit

Permalink
Rnnoise-PMZ
Browse files Browse the repository at this point in the history
  • Loading branch information
idan-arm committed Oct 31, 2021
1 parent 0e0fa4b commit 22211f6
Show file tree
Hide file tree
Showing 23 changed files with 1,946 additions and 0 deletions.
27 changes: 27 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,33 @@

**Dataset**: Google Speech Commands Test Set

## Noise Suppression

<table>
<tr>
<th width="250">Network</th>
<th width="100">Type</th>
<th width="160">Framework</th>
<th width="120">Cortex-A</th>
<th width="120">Cortex-M</th>
<th width="120">Mali GPU</th>
<th width="120">Ethos U</th>
<th width="90">Score (Average Pesq)</th>
</tr>
<tr>
<td><a href="models/noise_suppression/RNNoise/tflite_int8">RNNoise INT8 *</a></td>
<td align="center">INT8</td>
<td align="center">TensorFlow Lite</td>
<td align="center">:heavy_check_mark: </td>
<td align="center">:heavy_check_mark: </td>
<td align="center">:heavy_check_mark: </td>
<td align="center">:heavy_check_mark: </td>
<td align="center">2.945</td>
</tr>
</table>

**Dataset**: Noisy Speech Database For Training Speech Enhancement Algorithms And Tts Models

## Object Detection

<table>
Expand Down
108 changes: 108 additions & 0 deletions models/noise_suppression/RNNoise/tflite_int8/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# RNNoise INT8

## Description
RNNoise is a noise reduction network, that helps to remove noise from audio signals while maintaining any speech. This is a TFLite quantized version that takes traditional signal processing features and outputs gain values that can be used to remove noise from audio. It also detects if voice activity is present.
This is a 1 step model trained on Noisy speech database for training speech enhancement algorithms and TTS models that requires hidden states to be fed in at each time step.
Dataset license link: https://datashare.ed.ac.uk/handle/10283/2791
This model is converted from FP32 to INT8 using post-training quantization.


## License
[Apache-2.0](https://spdx.org/licenses/Apache-2.0.html)

## Network Information
| Network Information | Value |
|---------------------|----------------|
| Framework | TensorFlow Lite |
| SHA-1 Hash | 2d973fe7116e0bc3674f0f3f0f7185ffe105bba5 |
| Size (Bytes) | 113472 |
| Provenance | https://arxiv.org/pdf/1709.08243.pdf |
| Paper | https://arxiv.org/pdf/1709.08243.pdf |

## Performance

| Platform | Optimized |
|----------|:---------:|
| Cortex-A |:heavy_check_mark: |
| Cortex-M |:heavy_check_mark: |
| Mali GPU |:heavy_check_mark: |
| Ethos U |:heavy_check_mark: |

### Key
* :heavy_check_mark: - Will run on this platform.
* :heavy_multiplication_x: - Will not run on this platform.

## Accuracy
Dataset: Noisy Speech Database For Training Speech Enhancement Algorithms And Tts Models

| Metric | Value |
|--------|-------|
| Average Pesq | 2.945 |

## Optimizations
| Optimization | Value |
|--------------|---------|
| Quantization | INT8 |

## Network Inputs
<table>
<tr>
<th width="200">Input Node Name</th>
<th width="100">Shape</th>
<th width="300">Description</th>
</tr>
<tr>
<td>main_input_int8</td>
<td>(1, 1, 42)</td>
<td>Pre-processed signal features extracted from 480 values of a 48KHz wav file</td>
</tr>
<tr>
<td>vad_gru_prev_state_int8</td>
<td>(1, 24)</td>
<td>Previous GRU state for the voice activity detection GRU</td>
</tr>
<tr>
<td>noise_gru_prev_state_int8</td>
<td>(1, 48)</td>
<td>Previous GRU state for the noise GRU</td>
</tr>
<tr>
<td>denoise_gru_prev_state_int8</td>
<td>(1, 96)</td>
<td>Previous GRU state for the denoise GRU</td>
</tr>
</table>

## Network Outputs
<table>
<tr>
<th width="200">Output Node Name</th>
<th width="100">Shape</th>
<th width="300">Description</th>
</tr>
<tr>
<td>Identity_int8</td>
<td>(1, 1, 96)</td>
<td>Next GRU state for the denoise GRU</td>
</tr>
<tr>
<td>Identity_1_int8</td>
<td>(1, 1, 22)</td>
<td>Gain values that can be used to remove noise from this audio sample</td>
</tr>
<tr>
<td>Identity_2_int8</td>
<td>(1, 1, 48)</td>
<td>Next GRU state for the noise GRU</td>
</tr>
<tr>
<td>Identity_3_int8</td>
<td>(1, 1, 24)</td>
<td>Next GRU state for the voice activity detection GRU</td>
</tr>
<tr>
<td>Identity_4_int8</td>
<td>(1, 1, 1)</td>
<td>Probability that this audio sample contains voice activity</td>
</tr>
</table>
108 changes: 108 additions & 0 deletions models/noise_suppression/RNNoise/tflite_int8/definition.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
benchmark:
Noisy speech database for training speech enhancement algorithms and TTS models:
Average pesq: '2.945'
description: "RNNoise is a noise reduction network, that helps to remove noise from\
\ audio signals while maintaining any speech. This is a TFLite quantized version\
\ that takes traditional signal processing features and outputs gain values that\
\ can be used to remove noise from audio. It also detects if voice activity is present.\r\
\nThis is a 1 step model trained on Noisy speech database for training speech enhancement\
\ algorithms and TTS models that requires hidden states to be fed in at each time\
\ step.\r\nDataset license link:https://datashare.ed.ac.uk/handle/10283/2791\r\n\
This model is converted from FP32 to INT8 using post-training quantization.
license:
- Apache-2.0
network:
file_size_bytes: 113472
filename: rnnoise_INT8.tflite
framework: TensorFlow Lite
hash:
algorithm: sha1
value: 2d973fe7116e0bc3674f0f3f0f7185ffe105bba5
provenance: https://arxiv.org/pdf/1709.08243.pdf
quality_level: null
network_parameters:
input_nodes:
- description: Pre-processed signal features extracted from 480 values of a 48KHz
wav file
example_input:
path: models/noise_suppression/RNNoise/tflite_int8/testing_input/main_input_int8
name: main_input_int8
shape:
- 1
- 1
- 42
- description: Previous GRU state for the voice activity detection GRU
example_input:
path: models/noise_suppression/RNNoise/tflite_int8/testing_input/vad_gru_prev_state_int8
name: vad_gru_prev_state_int8
shape:
- 1
- 24
- description: Previous GRU state for the noise GRU
example_input:
path: models/noise_suppression/RNNoise/tflite_int8/testing_input/noise_gru_prev_state_int8
name: noise_gru_prev_state_int8
shape:
- 1
- 48
- description: Previous GRU state for the denoise GRU
example_input:
path: models/noise_suppression/RNNoise/tflite_int8/testing_input/denoise_gru_prev_state_int8
name: denoise_gru_prev_state_int8
shape:
- 1
- 96
output_nodes:
- description: Next GRU state for the denoise GRU
name: Identity_int8
shape:
- 1
- 1
- 96
test_output_path: models/noise_suppression/RNNoise/tflite_int8/testing_output/Identity_int8
- description: Gain values that can be used to remove noise from this audio sample
name: Identity_1_int8
shape:
- 1
- 1
- 22
test_output_path: models/noise_suppression/RNNoise/tflite_int8/testing_output/Identity_1_int8
- description: Next GRU state for the noise GRU
name: Identity_2_int8
shape:
- 1
- 1
- 48
test_output_path: models/noise_suppression/RNNoise/tflite_int8/testing_output/Identity_2_int8
- description: Next GRU state for the voice activity detection GRU
name: Identity_3_int8
shape:
- 1
- 1
- 24
test_output_path: models/noise_suppression/RNNoise/tflite_int8/testing_output/Identity_3_int8
- description: Probability that this audio sample contains voice activity
name: Identity_4_int8
shape:
- 1
- 1
- 1
test_output_path: models/noise_suppression/RNNoise/tflite_int8/testing_output/Identity_4_int8
operators:
TensorFlow Lite:
- ADD
- CONCATENATION
- DEQUANTIZE
- FULLY_CONNECTED
- LOGISTIC
- MUL
- PACK
- QUANTIZE
- RELU
- RESHAPE
- SPLIT
- SPLIT_V
- SUB
- TANH
- UNPACK
paper: https://arxiv.org/pdf/1709.08243.pdf
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Recreate RNNoise model

This folder contains code that will allow you to recreate the model and benchmark the performance.

## How to train

In order to recreate the RNNoise model exactly this will require you download the same dataset that we used to train
the model. Downloading and unzipping the data can be done by running the following command:

```bash
./get_data.sh
```

This will download the [Noisy speech database for training speech enhancement algorithms and TTS models](
https://datashare.ed.ac.uk/handle/10283/2791) and unzip it into training and testing folders for both clean and
noisy data.

Next you will need to create test and training .h5 files. These will contain the input features and labels for
training the model.

You have two methods of doing this:

The first, and recommended way, is to follow the first 3 instructions found in the original RNNoise repository [here](
https://github.com/xiph/rnnoise/blob/master/TRAINING-README) to generate h5 files. You will need to combine all
the clean audio in to one wav file and all the isolated noise audio into another. We provide an example function that
can do this, see: ```create_combined_wavs``` in ```data.py```:

Alternatively, you can use our Python implementation like so:
```bash
python data.py --clean_train_wav_folder=./clean_trainset_56spk_wav --noisy_train_wav_folder=./noisy_trainset_56spk_wav
--clean_test_wav_folder=./clean_testset_wav --noisy_test_wav_folder=./noisy_testset_wav
```

However, this is much much slower than the original implementation.

After you have train and test h5 files created you can run the following shell script to perform training and generate
the final TFLite files.

```bash
./train_and_quantise_model.sh
```

This shell script expects that your training h5 file is called ```train.h5``` and your testing h5 file is
called ```test.h5```

Finally, to evaluate performance of the models you can run the following Python script:
```bash
python test.py --clean_wav_folder=./clean_testset_wav --noisy_wav_folder=./noisy_testset_wav --tflite_path=<path_to_tflite>
```

This evaluation may take some time to complete.
Loading

0 comments on commit 22211f6

Please sign in to comment.