Rnnoise-PMZ

ARM-software · Oct 31, 2021 · 22211f6 · 22211f6
1 parent 0e0fa4b
commit 22211f6
Show file tree

Hide file tree

Showing 23 changed files with 1,946 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -245,6 +245,33 @@
 
 **Dataset**: Google Speech Commands Test Set
 
+## Noise Suppression
+
+<table>
+    <tr>
+        <th width="250">Network</th>
+        <th width="100">Type</th>
+        <th width="160">Framework</th>
+        <th width="120">Cortex-A</th>
+        <th width="120">Cortex-M</th>
+        <th width="120">Mali GPU</th>
+        <th width="120">Ethos U</th>
+        <th width="90">Score (Average Pesq)</th>
+    </tr>
+    <tr>
+        <td><a href="models/noise_suppression/RNNoise/tflite_int8">RNNoise INT8 *</a></td>
+        <td align="center">INT8</td>
+        <td align="center">TensorFlow Lite</td>
+        <td align="center">:heavy_check_mark: </td>
+        <td align="center">:heavy_check_mark: </td>
+        <td align="center">:heavy_check_mark: </td>
+        <td align="center">:heavy_check_mark: </td>
+        <td align="center">2.945</td>
+    </tr>
+</table>
+
+**Dataset**: Noisy Speech Database For Training Speech Enhancement Algorithms And Tts Models
+
 ## Object Detection
 
 <table>

diff --git a/models/noise_suppression/RNNoise/tflite_int8/README.md b/models/noise_suppression/RNNoise/tflite_int8/README.md
@@ -0,0 +1,108 @@
+# RNNoise INT8
+
+## Description
+RNNoise is a noise reduction network, that helps to remove noise from audio signals while maintaining any speech. This is a TFLite quantized version that takes traditional signal processing features and outputs gain values that can be used to remove noise from audio. It also detects if voice activity is present.
+This is a 1 step model trained on Noisy speech database for training speech enhancement algorithms and TTS models that requires hidden states to be fed in at each time step.
+Dataset license link: https://datashare.ed.ac.uk/handle/10283/2791
+This model is converted from FP32 to INT8 using post-training quantization.
+
+
+## License
+[Apache-2.0](https://spdx.org/licenses/Apache-2.0.html)
+
+## Network Information
+| Network Information |  Value         |
+|---------------------|----------------|
+|  Framework          | TensorFlow Lite |
+|  SHA-1 Hash         | 2d973fe7116e0bc3674f0f3f0f7185ffe105bba5 |
+|  Size (Bytes)       | 113472 |
+|  Provenance         | https://arxiv.org/pdf/1709.08243.pdf |
+|  Paper              | https://arxiv.org/pdf/1709.08243.pdf |
+
+## Performance
+
+| Platform | Optimized |
+|----------|:---------:|
+| Cortex-A |:heavy_check_mark:          |
+| Cortex-M |:heavy_check_mark:          |
+| Mali GPU |:heavy_check_mark:          |
+| Ethos U  |:heavy_check_mark:          |
+
+### Key
+* :heavy_check_mark: - Will run on this platform.
+* :heavy_multiplication_x: - Will not run on this platform.
+
+## Accuracy
+Dataset: Noisy Speech Database For Training Speech Enhancement Algorithms And Tts Models
+
+| Metric | Value |
+|--------|-------|
+| Average Pesq | 2.945 |
+
+## Optimizations
+| Optimization |  Value  |
+|--------------|---------|
+| Quantization | INT8 |
+
+## Network Inputs
+<table>
+    <tr>
+        <th width="200">Input Node Name</th>
+        <th width="100">Shape</th>
+        <th width="300">Description</th>
+    </tr>
+    <tr>
+        <td>main_input_int8</td>
+        <td>(1, 1, 42)</td>
+        <td>Pre-processed signal features extracted from 480 values of a 48KHz wav file</td> 
+    </tr>
+    <tr>
+        <td>vad_gru_prev_state_int8</td>
+        <td>(1, 24)</td>
+        <td>Previous GRU state for the voice activity detection GRU</td> 
+    </tr>
+    <tr>
+        <td>noise_gru_prev_state_int8</td>
+        <td>(1, 48)</td>
+        <td>Previous GRU state for the noise GRU</td> 
+    </tr>
+    <tr>
+        <td>denoise_gru_prev_state_int8</td>
+        <td>(1, 96)</td>
+        <td>Previous GRU state for the denoise GRU</td> 
+    </tr>
+</table>
+
+## Network Outputs
+<table>
+    <tr>
+        <th width="200">Output Node Name</th>
+        <th width="100">Shape</th>
+        <th width="300">Description</th>
+    </tr>
+    <tr>
+        <td>Identity_int8</td>
+        <td>(1, 1, 96)</td>
+        <td>Next GRU state for the denoise GRU</td> 
+    </tr>
+    <tr>
+        <td>Identity_1_int8</td>
+        <td>(1, 1, 22)</td>
+        <td>Gain values that can be used to remove noise from this audio sample</td> 
+    </tr>
+    <tr>
+        <td>Identity_2_int8</td>
+        <td>(1, 1, 48)</td>
+        <td>Next GRU state for the noise GRU</td> 
+    </tr>
+    <tr>
+        <td>Identity_3_int8</td>
+        <td>(1, 1, 24)</td>
+        <td>Next GRU state for the voice activity detection GRU</td> 
+    </tr>
+    <tr>
+        <td>Identity_4_int8</td>
+        <td>(1, 1, 1)</td>
+        <td>Probability that this audio sample contains voice activity</td> 
+    </tr>
+</table>
diff --git a/models/noise_suppression/RNNoise/tflite_int8/definition.yaml b/models/noise_suppression/RNNoise/tflite_int8/definition.yaml
@@ -0,0 +1,108 @@
+benchmark:
+  Noisy speech database for training speech enhancement algorithms and TTS models:
+    Average pesq: '2.945'
+description: "RNNoise is a noise reduction network, that helps to remove noise from\
+  \ audio signals while maintaining any speech. This is a TFLite quantized version\
+  \ that takes traditional signal processing features and outputs gain values that\
+  \ can be used to remove noise from audio. It also detects if voice activity is present.\r\
+  \nThis is a 1 step model trained on Noisy speech database for training speech enhancement\
+  \ algorithms and TTS models that requires hidden states to be fed in at each time\
+  \ step.\r\nDataset license link:https://datashare.ed.ac.uk/handle/10283/2791\r\n\
+  This model is converted from FP32 to INT8 using post-training quantization.
+license:
+- Apache-2.0
+network:
+  file_size_bytes: 113472
+  filename: rnnoise_INT8.tflite
+  framework: TensorFlow Lite
+  hash:
+    algorithm: sha1
+    value: 2d973fe7116e0bc3674f0f3f0f7185ffe105bba5
+  provenance: https://arxiv.org/pdf/1709.08243.pdf
+  quality_level: null
+network_parameters:
+  input_nodes:
+  - description: Pre-processed signal features extracted from 480 values of a 48KHz
+      wav file
+    example_input:
+      path: models/noise_suppression/RNNoise/tflite_int8/testing_input/main_input_int8
+    name: main_input_int8
+    shape:
+    - 1
+    - 1
+    - 42
+  - description: Previous GRU state for the voice activity detection GRU
+    example_input:
+      path: models/noise_suppression/RNNoise/tflite_int8/testing_input/vad_gru_prev_state_int8
+    name: vad_gru_prev_state_int8
+    shape:
+    - 1
+    - 24
+  - description: Previous GRU state for the noise GRU
+    example_input:
+      path: models/noise_suppression/RNNoise/tflite_int8/testing_input/noise_gru_prev_state_int8
+    name: noise_gru_prev_state_int8
+    shape:
+    - 1
+    - 48
+  - description: Previous GRU state for the denoise GRU
+    example_input:
+      path: models/noise_suppression/RNNoise/tflite_int8/testing_input/denoise_gru_prev_state_int8
+    name: denoise_gru_prev_state_int8
+    shape:
+    - 1
+    - 96
+  output_nodes:
+  - description: Next GRU state for the denoise GRU
+    name: Identity_int8
+    shape:
+    - 1
+    - 1
+    - 96
+    test_output_path: models/noise_suppression/RNNoise/tflite_int8/testing_output/Identity_int8
+  - description: Gain values that can be used to remove noise from this audio sample
+    name: Identity_1_int8
+    shape:
+    - 1
+    - 1
+    - 22
+    test_output_path: models/noise_suppression/RNNoise/tflite_int8/testing_output/Identity_1_int8
+  - description: Next GRU state for the noise GRU
+    name: Identity_2_int8
+    shape:
+    - 1
+    - 1
+    - 48
+    test_output_path: models/noise_suppression/RNNoise/tflite_int8/testing_output/Identity_2_int8
+  - description: Next GRU state for the voice activity detection GRU
+    name: Identity_3_int8
+    shape:
+    - 1
+    - 1
+    - 24
+    test_output_path: models/noise_suppression/RNNoise/tflite_int8/testing_output/Identity_3_int8
+  - description: Probability that this audio sample contains voice activity
+    name: Identity_4_int8
+    shape:
+    - 1
+    - 1
+    - 1
+    test_output_path: models/noise_suppression/RNNoise/tflite_int8/testing_output/Identity_4_int8
+operators:
+  TensorFlow Lite:
+  - ADD
+  - CONCATENATION
+  - DEQUANTIZE
+  - FULLY_CONNECTED
+  - LOGISTIC
+  - MUL
+  - PACK
+  - QUANTIZE
+  - RELU
+  - RESHAPE
+  - SPLIT
+  - SPLIT_V
+  - SUB
+  - TANH
+  - UNPACK
+paper: https://arxiv.org/pdf/1709.08243.pdf
diff --git a/models/noise_suppression/RNNoise/tflite_int8/recreate_model/README.md b/models/noise_suppression/RNNoise/tflite_int8/recreate_model/README.md
@@ -0,0 +1,51 @@
+# Recreate RNNoise model
+
+This folder contains code that will allow you to recreate the model and benchmark the performance.
+
+## How to train
+
+In order to recreate the RNNoise model exactly this will require you download the same dataset that we used to train
+the model. Downloading and unzipping the data can be done by running the following command: 
+
+```bash
+./get_data.sh
+```
+
+This will download the [Noisy speech database for training speech enhancement algorithms and TTS models](
+https://datashare.ed.ac.uk/handle/10283/2791) and unzip it into training and testing folders for both clean and
+noisy data.
+
+Next you will need to create test and training .h5 files. These will contain the input features and labels for
+training the model.
+
+You have two methods of doing this:
+
+The first, and recommended way, is to follow the first 3 instructions found in the original RNNoise repository [here](
+https://github.com/xiph/rnnoise/blob/master/TRAINING-README) to generate h5 files. You will need to combine all
+the clean audio in to one wav file and all the isolated noise audio into another. We provide an example function that
+can do this, see: ```create_combined_wavs``` in ```data.py```: 
+
+Alternatively, you can use our Python implementation like so:
+```bash
+python data.py --clean_train_wav_folder=./clean_trainset_56spk_wav --noisy_train_wav_folder=./noisy_trainset_56spk_wav 
+--clean_test_wav_folder=./clean_testset_wav --noisy_test_wav_folder=./noisy_testset_wav
+```
+
+However, this is much much slower than the original implementation.
+
+After you have train and test h5 files created you can run the following shell script to perform training and generate
+the final TFLite files.
+
+```bash
+./train_and_quantise_model.sh
+```
+
+This shell script expects that your training h5 file is called ```train.h5``` and your testing h5 file is
+called ```test.h5```
+
+Finally, to evaluate performance of the models you can run the following Python script:
+```bash
+python test.py --clean_wav_folder=./clean_testset_wav --noisy_wav_folder=./noisy_testset_wav --tflite_path=<path_to_tflite>
+```
+
+This evaluation may take some time to complete.