Skip to content

Latest commit

 

History

History
50 lines (33 loc) · 2.15 KB

File metadata and controls

50 lines (33 loc) · 2.15 KB

machine-translation-nar-en-ru-0002

Use Case and High-Level Description

This is an English-Russian machine translation model based on non-autoregressive Transformer topology.

Tokenization occurs using the SentencePieceBPETokenizer (see the demo code for implementation details) and is enclosed in tokenizer_src and tokenizer_tgt folders.

Specification

Metric Value
GOps 23.17
MParams 69.29
Source framework PyTorch*

Accuracy

The quality metrics were calculated on the wmt19-ru-en dataset ("test" split in lower case).

Metric Value
BLEU 22.7 %

Use accuracy_check [...] --model_attributes <path_to_folder_with_downloaded_model> to specify the path to additional model attributes. path_to_folder_with_downloaded_model is a path to the folder, where the current model is downloaded by Model Downloader tool.

Input

name: tokens shape: 1, 192 description: sequence of tokens (integer values) representing the tokenized sentence. The sequence structure is as follows (<s>, </s> and <pad> should be replaced by corresponding token IDs as specified by the dictionary): <s> + tokenized sentence + </s> + (<pad> tokens to pad to the maximum sequence length of 192)

Output

name: pred shape: 1, 192 description: sequence of tokens (integer values) representing the tokenized translation. The sequence structure is as follows (<s>, </s> and <pad> should be replaced by corresponding token IDs as specified by the dictionary): <s> + tokenized sentence + </s> + (<pad> tokens to pad to the maximum sequence length of 192)

Demo usage

The model can be used in the following demos provided by the Open Model Zoo to show its capabilities:

Legal Information

[*] Other names and brands may be claimed as the property of others.