BLEU

Idea

BLEU (bilingual evaluation understudy) is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. (Wikipedia) Since there are possibly multiple good translations to one input, the question remains which is the best one. BLEU provides a metric to compute a score wheter the generated words are in the references.

Improvement

Concept

In general the BLEU metric counts the generated words in the translation and compares them to the occurrences in the references.

Generated words = machine output
References = human labels

Precision:

Modified precision:

N-gram precision:

if machine translation equals one ref -> P = 1.0

BLEU Score:

= n-gram
- compute for

where

BP: penalizes score for short sentences
- short sentences tend to have good scores, because of the number of word occurrences

The BLEU metric is useful for NMT and Image Captioning, but bad for speech recognition (mostly one ground truth).

Evaluation

Production

References

BLEU Wikipedia

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bleu.md

bleu.md

BLEU

Idea

Improvement

Concept

Evaluation

Production

References

Files

bleu.md

Latest commit

History

bleu.md

File metadata and controls

BLEU

Idea

Improvement

Concept

Evaluation

Production

References