Skip to content

User score calculation

Taras Semenenko edited this page May 26, 2016 · 3 revisions

We calculate a user score to evaluate real amount of work that was done for translating or reviewing each unit. The “similarity ratio” concept is leveraged for that.

Similarity ratio

Similarity ratio is the real number in [0..1] range that shows how different is the submitted translation from the next most similar one. 0 means the string is totally different from anything else, 1 means that the translation is identical to the one we already know about.

S(Strnew, Strold) = (1 - levenshtein_words(Strnew, Strold) / max(length_words(Strnew), length_words(Strold)))

where

levenshtein_words(Strnew, Strold) is the number of edits (in words) calculated using Levenshtein algorithm which is needed to transform Strold into Strnew

In terms of similarity calculation, “words” are just chunks of text split by one or more whitespace symbols.

We will be storing two different similarity value calculations with each submission and suggestion:

  1. similarity — similarity ratio based on translations gathered from suggestions and from ‘similar translations’ results — this is calculated based on all suggestions and similar translations visible to the translator in the editor at the moment he submits a translation.
  2. mt_similarity — similarity ratio based on comparison with the machine-provided translation (e.g. Google Translate) — this is calculated only if the user has used the pre-translate function on a unit prior to submitting it.

When the user translates a new unit, we store both types of similarity, but use S = max(similarity, mt_similarity) for any further calculations.

There are two major translators’ activities: 1) translation and 2) reviewing. We agree on that raw translation is 5/7 of the price, whereas reviewing is 2/7 (in terms of labor and in terms of money).

Fuzzy (‘Needs Work’) feature changes

We don’t want regular approved translators (the ones that have Submit rights) to submit any fuzzy strings. Only administrators will see the [_] Needs work flag and be able to set it (or this flag can be set by the system itself, e.g. when units are synced from .po). Reasoning: we don’t want translators to put unfinished translations, since these might go into production. Instead, they should be able to suggest translations (and later accept/reject those or use as a source for further edits).

Use cases

Common variables / formula parts:

  • NS → number of source words
  • NT → number of the words in the new (replacement) translation
  • S → calculated similarity between [previous translations for the current user, previously existed translation, suggestions, similar translations] and the translation being submitted (current text in the editor)
  • editCost → NS * 5/7
  • reviewCost` → NS * 2/7
  • analyzeCost → NS * 0.1
  • rawTranslationCost → editCost * (1-S)
  • suggestionAdvanceCoefficient = 0.2

1. Approved translator submits a new translation

Score += rawTranslationCost + reviewCost

Registered score log events (see below for explanation):

  • TA

2. Approved translator reviews the translation (removes the fuzzy flag) and optionally edits it

Score += rawTranslationCost + reviewCost
Original reviewer’s Score -= reviewCost // penalty

The “Original reviewer” is the person who reviewed the previous translation, which is either the reviewer or the translator (if the reviewer is not defined for the unit).

Note 1: if the translator does no edits, S will be 1, and rawTranslationCost will be 0, so in this case (when the translator just removes the fuzzy flag, the formula will look like this:

Score += reviewCost

Note 2: if the author edits his own translation (the translation he edited/reviewed previously), then reviewCost will actually would be subtracted from him (as a penalty) and then added again, which essentially means:

Score += rawTranslationCost

Registered score log events (see below for explanation):

  • XR (for original reviewer)
  • TE (if edited after someone else) or TX (if edited after themselves)
  • R (for reviewer)

3. Admin removes the translation

Original translator’s Score -= rawTranslationCost // penalty
Original reviewer’s Score -= reviewCost // penalty

Registered score log events (see below for explanation):

  • XT (for original translator)
  • XR (for original reviewer)

4. Admin raises the fuzzy flag

Original reviwer’s Score -= reviewCost // penalty

5. Volunteer translator adds a suggestion

Score += rawTranslationCost * suggestionAdvanceCoefficient

Registered score log events (see below for explanation):

  • S

6. Reviewer accepts suggestion

Reviewer’s Score += reviewCost
Suggester’s Score += rawTranslationCost * (1 - suggestionAdvanceCoefficient)
Original reviewer’s Score -= reviewCost // penalty

Note 1: if the translator accepts his own suggestion, he will essentially get the same score:

reviewCost +   rawTranslationCost * suggestionAdvanceCoefficient + rawTranslationCost * (1 - suggestionAdvanceCoefficient) = reviewCost + rawTranslationCost

Note 2: Original reviewer gets penalty if: there was a previous translation on this unit, and the unit was not fuzzy.

Registered score log events (see below for explanation):

  • SA (for original suggester)
  • RA (for reviewer)

7. Reviewer rejects suggestion

Reviewer’s Score += analyzeCost
Suggester’s Score -= (rawTranslationCost * suggestionAdvanceCoefficient + analyzeCost) // penalty
Note: if the translator rejects his own suggestion, he will essentially get the zero score:
rawTranslationCost * suggestionAdvanceCoefficient + analyzeCost - (rawTranslationCost * suggestionAdvanceCoefficient + analyzeCost) = 0

Registered score log events (see below for explanation):

  • SR (for original suggester)
  • RR (for reviewer)

Score

User’s score is a real number. When displaying a meter, which is an integer, we will multiply the score by 1000 and round it:

publicScore = round(Score * 1000)

Logging

We must store the log of any changes done to a score:

table ‘pootle_score_log’: (

  1. id → primary key, autoincrement
  2. datetime → date/time of the event (required, not null)
  3. user_id → ID of the user whose score is affected (required, not null)
  4. rate → current user’s rate (not null, defaults to 0) [copied from PootleProfile at the action moment]
  5. words → number of words in the original source string (required, not null)
  6. similarity → the reported similarity ratio (required, real number)
  7. score_delta → the final calculated score delta for the action (required, real number)
  8. action_code → see below (required, char[2]) — see below
  9. object_id → submission or suggestion id, depending on a code (required, integer)

)

Action codes:

  1. TA → unit translated+reviewed (initial translation)
  2. TE → unit edited after someone else
  3. TX → unit edited after themselves
  4. TD → translation deleted by admin
  5. R → translation reviewed
  6. XT → translation edit penalty
  7. XR → translation review penalty
  8. TF → translation’s fuzzy flag is set by admin
  9. S → suggestion added
  10. SA → suggestion accepted (counted towards the suggestion author)
  11. SR → suggestion rejected (counted towards the suggestion author)
  12. RA → suggestion accepted (counted towards the reviewer)
  13. RR → suggestion rejected (counted towards the reviewer)

This table will be used to calculate score changes between dates and to provide invoice stats or details.

Also, we must log all score changes in pootle-activity.log, like this:

[05/03/2014 03:56:42]  someuser  SC  +12.5273 T #1899825, NS=21, S=0.23 (total: 8962.9297)
[05/03/2014 03:57:12]  admin  SC  0 TF #1899825, NS=21, S=0.23 (total: 2878639.0107)
[05/03/2014 03:57:12]  someuser  SC  -4.124 XR #1899825, NS=21, S=0.23 (total: 8958.8057)
[05/03/2014 04:22:39]  someuser  SC  +4.124 R #1899825, NS=21, S=0.23 (total: 8962.9297)

where SC = Score Change

For paid translators, we will also store separate logs (pootle-activity-someuser.log) with items related to this particular user only (including log entries from admins and other reviewers that affected users’ score). Payments

Paid contractors will have a certain full translation rate + review rate (and a currency) associated with their account. When making payments for any given period, we will calculate the due amount by calculating each score log event using a special formula (see below).

Action codes + Payment formulas:

rate = translation rate (raw translation + review). E.g. 7 cents per word
review_rate = review rate. E.g. 2 cents per word 
raw_rate = rate - review_rate

If similarity S < 50%, we treat is as 0 (and contractor gets paid for 100% of the words).

Common variables / formula parts:

  • NS → number of source words
  • S → calculated similarity between [previous translations for the current user, previously existed translation, suggestions, similar translations] and the translation being submitted (current text in the editor)
  • TA → unit translated+reviewed (initial translation) -- NS * (1 - S) * raw_rate + NS * review_rate
  • TE → unit edited after someone else -- NS * review_rate
  • TX → unit edited after themselves -- skip
  • TD → translation deleted by admin -- skip
  • R → translation reviewed -- NS * review_rate
  • XT → translation edit penalty -- skip
  • XR → translation review penalty -- skip
  • TF → translation’s fuzzy flag is set by admin -- skip
  • S → suggestion added -- skip
  • SA → suggestion accepted (counted towards the suggestion author) -- if a new translation was added, see TA
  • SR → suggestion rejected (counted towards the suggestion author) -- skip
  • RA → reviewer accepted suggestion (counted towards the reviewer)
    if a new translation was added, and the suggestion was made by the translator himself, see TA, else: if the suggestion was added by someone else, see R, else: skip (filter out TX-like events when the suggestion is added on top of a previous translation done by the reviewer)
  • RR → reviewer rejected suggestion (counted towards the reviewer) -- skip
Clone this wiki locally