Merge pull request #22 from ARM-software/wav2letter_pruned

Added Wav2letter Pruned INT8
ARM-software · May 18, 2021 · ed37a3b · ed37a3b
2 parents 35040a9 + 1a92aa0
commit ed37a3b
Show file tree

Hide file tree

Showing 8 changed files with 181 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -259,6 +259,15 @@
         <th width="100">Mali GPU</th>
         <th width="100">Ethos U</th>
     </tr>
+    <tr>
+        <td><a href="models/speech_recognition/wav2letter/tflite_pruned_int8">Wav2letter Pruned INT8</a></td>
+        <td align="center">INT8</td>
+        <td align="center">TensorFlow Lite</td>
+        <td align="center">:heavy_check_mark:</td>
+        <td align="center">:heavy_check_mark:</td>
+        <td align="center">:heavy_check_mark:</td>
+        <td align="center">:heavy_check_mark:</td>
+    </tr>
     <tr>
         <td><a href="models/speech_recognition/wav2letter/tflite_int8">Wav2letter INT8</a></td>
         <td align="center">INT8</td>

diff --git a/models/speech_recognition/wav2letter/tflite_pruned_int8/README.md b/models/speech_recognition/wav2letter/tflite_pruned_int8/README.md
@@ -0,0 +1,73 @@
+# Wav2letter Pruned INT8
+
+## Description
+Wav2letter is a convolutional speech recognition neural network. This implementation was created by Arm, pruned to 50% sparisty, fine-tuned and quantized using the TensorFlow Model Optimization Toolkit.
+
+## License
+[Apache-2.0](https://spdx.org/licenses/Apache-2.0.html)
+
+## Related Materials
+### Class Labels
+The class labels associated with this model can be downloaded by running the script `get_class_labels.sh`.
+
+## Network Information
+| Network Information |  Value         |
+|---------------------|----------------|
+|  Framework          | TensorFlow Lite |
+|  SHA-1 Hash         | e389797705f5f8a7973c3280954dd5cdf54284a1 |
+|  Size (Bytes)       | 23815520 |
+|  Provenance         | https://github.com/ARM-software/ML-zoo/tree/master/models/speech_recognition/wav2letter/tflite_pruned_int8 |
+|  Paper              | https://arxiv.org/abs/1609.03193 |
+
+## Performance
+| Platform | Optimized |
+|----------|:---------:|
+| Cortex-A |:heavy_check_mark:         |
+| Cortex-M |:heavy_check_mark:         |
+| Mali GPU |:heavy_check_mark:         |
+| Ethos U  |:heavy_check_mark:         |
+
+### Key
+* :heavy_check_mark: - Will run on this platform.
+* :heavy_multiplication_x: - Will not run on this platform.
+
+## Accuracy
+Dataset: LibriSpeech
+
+| Metric | Value |
+|--------|-------|
+| LER | 0.07981431 |
+
+## Optimizations
+| Optimization |  Value  |
+|--------------|---------|
+| Quantization | INT8 |
+| Sparsity | 50% |
+
+## Network Inputs
+<table>
+    <tr>
+        <th width="200">Input Node Name</th>
+        <th width="100">Shape</th>
+        <th width="300">Description</th>
+    </tr>
+    <tr>
+        <td>input_2_int8</td>
+        <td>(1, 296, 39)</td>
+        <td>Speech converted to MFCCs and quantized to INT8</td> 
+    </tr>
+</table>
+
+## Network Outputs
+<table>
+    <tr>
+        <th width="200">Output Node Name</th>
+        <th width="100">Shape</th>
+        <th width="300">Description</th>
+    </tr>
+    <tr>
+        <td>Identity_int8</td>
+        <td>(1, 1, 148, 29)</td>
+        <td>A tensor of (batch, time, class probabilities) that represents the probability of each class at each timestep. Should be passed to a decoder e.g. ctc_beam_search_decoder.</td> 
+    </tr>
+</table>
diff --git a/models/speech_recognition/wav2letter/tflite_pruned_int8/definition.yaml b/models/speech_recognition/wav2letter/tflite_pruned_int8/definition.yaml
@@ -0,0 +1,45 @@
+benchmark:
+  LibriSpeech:
+    LER: 0.07981431
+description: Wav2letter is a convolutional speech recognition neural network. This
+  implementation was created by Arm, pruned to 50% sparisty, fine-tuned and quantized
+  using the TensorFlow Model Optimization Toolkit.
+license:
+- Apache-2.0
+network:
+  file_size_bytes: 23815520
+  filename: wav2letter_pruned_int8.tflite
+  framework: TensorFlow Lite
+  hash:
+    algorithm: sha1
+    value: e389797705f5f8a7973c3280954dd5cdf54284a1
+  provenance: https://github.com/ARM-software/ML-zoo/tree/master/models/speech_recognition/wav2letter/tflite_pruned_int8
+network_parameters:
+  input_nodes:
+  - description: Speech converted to MFCCs and quantized to INT8
+    example_input:
+      path: models/speech_recognition/wav2letter/tflite_pruned_int8/testing_input/input_2_int8
+    name: input_2_int8
+    shape:
+    - 1
+    - 296
+    - 39
+    type: int8
+  output_nodes:
+  - description: A tensor of (batch, time, class probabilities) that represents the
+      probability of each class at each timestep. Should be passed to a decoder e.g.
+      ctc_beam_search_decoder.
+    name: Identity_int8
+    shape:
+    - 1
+    - 1
+    - 148
+    - 29
+    test_output_path: models/speech_recognition/wav2letter/tflite_pruned_int8/testing_output/Identity_int8
+operators:
+  TensorFlow Lite:
+  - CONV_2D
+  - RESHAPE
+  - LEAKY_RELU
+  - SOFTMAX
+paper: https://arxiv.org/abs/1609.03193
diff --git a/models/speech_recognition/wav2letter/tflite_pruned_int8/get_class_labels.sh b/models/speech_recognition/wav2letter/tflite_pruned_int8/get_class_labels.sh
@@ -0,0 +1,19 @@
+# Copyright (C) 2021 Arm Limited or its affiliates. All rights reserved.
+#
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the License); you may
+# not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS, WITHOUT
+# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+#!/usr/bin/env bash
+
+python scripts/create_labels.py
diff --git a/models/speech_recognition/wav2letter/tflite_pruned_int8/scripts/create_labels.py b/models/speech_recognition/wav2letter/tflite_pruned_int8/scripts/create_labels.py
@@ -0,0 +1,26 @@
+# Copyright (C) 2021 Arm Limited or its affiliates. All rights reserved.
+#
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the License); you may
+# not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS, WITHOUT
+# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+def get_label_dict():
+    alphabet = "abcdefghijklmnopqrstuvwxyz' @"
+    return [c for c in alphabet]
+
+if __name__ == "__main__":
+    labels = get_label_dict()
+
+    with open("labelmappings.txt", "w") as f:
+        for l in labels:
+            f.write('{}\n'.format(l))
diff --git a/models/speech_recognition/wav2letter/tflite_pruned_int8/testing_input/input_2_int8/0.npy b/models/speech_recognition/wav2letter/tflite_pruned_int8/testing_input/input_2_int8/0.npy
diff --git a/models/speech_recognition/wav2letter/tflite_pruned_int8/testing_output/Identity_int8/0.npy b/models/speech_recognition/wav2letter/tflite_pruned_int8/testing_output/Identity_int8/0.npy
diff --git a/models/speech_recognition/wav2letter/tflite_pruned_int8/wav2letter_pruned_int8.tflite b/models/speech_recognition/wav2letter/tflite_pruned_int8/wav2letter_pruned_int8.tflite