-
Notifications
You must be signed in to change notification settings - Fork 94
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add translation example from chinese, french to english (#681)
This adds: - a VAD node to detect voice within an audio to avoid having too much noise - Bump distil-whisper to whisper-turbo - Use Argotranslate and OPUS-MT for translation - Add example dataflow within the example folder ## Get started ```bash cd examples/translation dora up dora build dataflow_zh_en_terminal.yml dora start dataflow_zh_en_terminal.yml --detach python pretty_print.py ```
- Loading branch information
Showing
30 changed files
with
768 additions
and
56 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
*.pt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# Dora argo example | ||
|
||
Make sure to have, dora, pip and cargo installed. | ||
|
||
```bash | ||
dora up | ||
|
||
## For chinese | ||
dora build dataflow_zh_en_terminal.yml | ||
dora start dataflow_zh_en_terminal.yml --detach | ||
|
||
python pretty_print.py | ||
|
||
dora stop | ||
|
||
|
||
## For chinese | ||
dora build dataflow_en_zh_terminal.yml | ||
dora start dataflow_en_zh_terminal.yml --detach | ||
|
||
python pretty_print.py | ||
|
||
dora stop | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
nodes: | ||
- id: dora-microphone | ||
build: pip install -e ../../node-hub/dora-microphone | ||
path: dora-microphone | ||
outputs: | ||
- audio | ||
|
||
- id: dora-vad | ||
build: pip install -e ../../node-hub/dora-vad | ||
path: dora-vad | ||
inputs: | ||
audio: dora-microphone/audio | ||
outputs: | ||
- audio | ||
|
||
- id: dora-distil-whisper | ||
build: pip install -e ../../node-hub/dora-distil-whisper | ||
path: dora-distil-whisper | ||
inputs: | ||
input: dora-vad/audio | ||
outputs: | ||
- text | ||
env: | ||
TARGET_LANGUAGE: english | ||
TRANSLATE: false | ||
|
||
- id: dora-argotranslate | ||
build: pip install -e ../../node-hub/dora-argotranslate | ||
path: dora-argotranslate | ||
inputs: | ||
text: dora-distil-whisper/text | ||
outputs: | ||
- text | ||
env: | ||
SOURCE_LANGUAGE: en | ||
TARGET_LANGUAGE: zh | ||
|
||
- id: dora-rerun | ||
build: cargo build -p dora-rerun --release | ||
path: dora-rerun | ||
inputs: | ||
text: dora-argotranslate/text | ||
env: | ||
IMAGE_WIDTH: 640 | ||
IMAGE_HEIGHT: 480 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
nodes: | ||
- id: dora-microphone | ||
build: pip install -e ../../node-hub/dora-microphone | ||
path: dora-microphone | ||
outputs: | ||
- audio | ||
|
||
- id: dora-vad | ||
build: pip install -e ../../node-hub/dora-vad | ||
path: dora-vad | ||
inputs: | ||
audio: dora-microphone/audio | ||
outputs: | ||
- audio | ||
|
||
- id: dora-distil-whisper | ||
build: pip install -e ../../node-hub/dora-distil-whisper | ||
path: dora-distil-whisper | ||
inputs: | ||
input: dora-vad/audio | ||
outputs: | ||
- text | ||
env: | ||
TARGET_LANGUAGE: english | ||
TRANSLATE: false | ||
|
||
- id: dora-opus | ||
build: pip install -e ../../node-hub/dora-opus | ||
path: dora-opus | ||
inputs: | ||
text: dora-distil-whisper/text | ||
outputs: | ||
- text | ||
env: | ||
SOURCE_LANGUAGE: en | ||
TARGET_LANGUAGE: zh | ||
|
||
- id: pretty-print | ||
path: dynamic | ||
inputs: | ||
translated_text: dora-opus/text | ||
original_text: dora-distil-whisper/text | ||
env: | ||
IMAGE_WIDTH: 640 | ||
IMAGE_HEIGHT: 480 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
nodes: | ||
- id: dora-microphone | ||
build: pip install -e ../../node-hub/dora-microphone | ||
path: dora-microphone | ||
outputs: | ||
- audio | ||
|
||
- id: dora-vad | ||
build: pip install -e ../../node-hub/dora-vad | ||
path: dora-vad | ||
inputs: | ||
audio: dora-microphone/audio | ||
outputs: | ||
- audio | ||
|
||
- id: dora-distil-whisper | ||
build: pip install -e ../../node-hub/dora-distil-whisper | ||
path: dora-distil-whisper | ||
inputs: | ||
input: dora-vad/audio | ||
outputs: | ||
- text | ||
env: | ||
TARGET_LANGUAGE: english | ||
TRANSLATE: false | ||
|
||
- id: dora-argotranslate | ||
build: pip install -e ../../node-hub/dora-argotranslate | ||
path: dora-argotranslate | ||
inputs: | ||
text: dora-distil-whisper/text | ||
outputs: | ||
- text | ||
env: | ||
SOURCE_LANGUAGE: en | ||
TARGET_LANGUAGE: zh | ||
|
||
- id: pretty-print | ||
build: cargo build -p dora-rerun --release | ||
path: dynamic | ||
inputs: | ||
translated_text: dora-argotranslate/text | ||
original_text: dora-distil-whisper/text | ||
env: | ||
IMAGE_WIDTH: 640 | ||
IMAGE_HEIGHT: 480 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
nodes: | ||
- id: dora-microphone | ||
build: pip install -e ../../node-hub/dora-microphone | ||
path: dora-microphone | ||
outputs: | ||
- audio | ||
|
||
- id: dora-vad | ||
build: pip install -e ../../node-hub/dora-vad | ||
path: dora-vad | ||
inputs: | ||
audio: dora-microphone/audio | ||
outputs: | ||
- audio | ||
|
||
- id: dora-distil-whisper | ||
build: pip install -e ../../node-hub/dora-distil-whisper | ||
path: dora-distil-whisper | ||
inputs: | ||
input: dora-vad/audio | ||
outputs: | ||
- text | ||
env: | ||
TARGET_LANGUAGE: french | ||
TRANSLATE: false | ||
|
||
- id: dora-argotranslate | ||
build: pip install -e ../../node-hub/dora-argotranslate | ||
path: dora-argotranslate | ||
inputs: | ||
text: dora-distil-whisper/text | ||
outputs: | ||
- text | ||
env: | ||
SOURCE_LANGUAGE: fr | ||
TARGET_LANGUAGE: en | ||
|
||
- id: dora-rerun | ||
build: cargo build -p dora-rerun --release | ||
path: dora-rerun | ||
inputs: | ||
translated_text: dora-argotranslate/text | ||
original_text: dora-distil-whisper/text | ||
env: | ||
IMAGE_WIDTH: 640 | ||
IMAGE_HEIGHT: 480 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
nodes: | ||
- id: dora-microphone | ||
build: pip install -e ../../node-hub/dora-microphone | ||
path: dora-microphone | ||
outputs: | ||
- audio | ||
|
||
- id: dora-vad | ||
build: pip install -e ../../node-hub/dora-vad | ||
path: dora-vad | ||
inputs: | ||
audio: dora-microphone/audio | ||
outputs: | ||
- audio | ||
|
||
- id: dora-distil-whisper | ||
build: pip install -e ../../node-hub/dora-distil-whisper | ||
path: dora-distil-whisper | ||
inputs: | ||
input: dora-vad/audio | ||
outputs: | ||
- text | ||
env: | ||
TARGET_LANGUAGE: chinese | ||
TRANSLATE: false | ||
|
||
- id: dora-opus | ||
build: pip install -e ../../node-hub/dora-opus | ||
path: dora-opus | ||
inputs: | ||
text: dora-distil-whisper/text | ||
outputs: | ||
- text | ||
env: | ||
SOURCE_LANGUAGE: zh | ||
TARGET_LANGUAGE: en | ||
|
||
- id: plot | ||
build: cargo build -p dora-rerun --release | ||
path: dora-rerun | ||
inputs: | ||
translated_text: dora-opus/text | ||
original_text: dora-distil-whisper/text | ||
env: | ||
IMAGE_WIDTH: 640 | ||
IMAGE_HEIGHT: 480 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
nodes: | ||
- id: dora-microphone | ||
build: pip install -e ../../node-hub/dora-microphone | ||
path: dora-microphone | ||
outputs: | ||
- audio | ||
|
||
- id: dora-vad | ||
build: pip install -e ../../node-hub/dora-vad | ||
path: dora-vad | ||
inputs: | ||
audio: dora-microphone/audio | ||
outputs: | ||
- audio | ||
|
||
- id: dora-distil-whisper | ||
build: pip install -e ../../node-hub/dora-distil-whisper | ||
path: dora-distil-whisper | ||
inputs: | ||
input: dora-vad/audio | ||
outputs: | ||
- text | ||
env: | ||
TARGET_LANGUAGE: chinese | ||
TRANSLATE: false | ||
|
||
- id: dora-opus | ||
build: pip install -e ../../node-hub/dora-opus | ||
path: dora-opus | ||
inputs: | ||
text: dora-distil-whisper/text | ||
outputs: | ||
- text | ||
env: | ||
SOURCE_LANGUAGE: zh | ||
TARGET_LANGUAGE: en | ||
|
||
- id: pretty-print | ||
path: dynamic | ||
inputs: | ||
translated_text: dora-opus/text | ||
original_text: dora-distil-whisper/text |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
import os | ||
import shutil | ||
|
||
|
||
def clear_screen(): | ||
# Clear the screen based on the operating system | ||
os.system("cls" if os.name == "nt" else "clear") | ||
|
||
|
||
def print_centered(texts): | ||
# Get terminal size | ||
terminal_size = shutil.get_terminal_size() | ||
|
||
# Print newlines to move cursor to the middle vertically | ||
for k, v in texts.items(): | ||
print(k) | ||
print("\n" * 1) | ||
# Calculate horizontal padding and print the centered text | ||
for l in v: | ||
print(l.center(terminal_size.columns)) | ||
print("\n" * 1) | ||
|
||
|
||
from dora import Node | ||
|
||
node = Node("pretty-print") | ||
|
||
previous_texts = {} | ||
|
||
clear_screen() | ||
print("Waiting for speech...") | ||
for event in node: | ||
if event["type"] == "INPUT": | ||
# The sentence to be printed | ||
sentence = event["value"][0].as_py() | ||
if event["id"] not in previous_texts.keys(): | ||
|
||
previous_texts[event["id"]] = ["", "", sentence] | ||
else: | ||
previous_texts[event["id"]] += [sentence] | ||
previous_texts[event["id"]] = previous_texts[event["id"]][-3:] | ||
# Clear the screen | ||
clear_screen() | ||
|
||
# Print the sentence in the middle of the terminal | ||
print_centered(previous_texts) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
# Dora text translation Node using Argo translate | ||
|
11 changes: 11 additions & 0 deletions
11
node-hub/dora-argotranslate/dora_argotranslate/__init__.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
import os | ||
|
||
# Define the path to the README file relative to the package directory | ||
readme_path = os.path.join(os.path.dirname(os.path.dirname(__file__)), "README.md") | ||
|
||
# Read the content of the README file | ||
try: | ||
with open(readme_path, "r", encoding="utf-8") as f: | ||
__doc__ = f.read() | ||
except FileNotFoundError: | ||
__doc__ = "README file not found." |
Oops, something went wrong.