You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During onboarding, when the user supplies a translation or back translation, it would be helpful to capture some statistics about the text of the project, such as:
words and word frequencies
characters and character frequencies
distribution plots of this word and character information
In addition, after a preprocess/train/test run, it would be helpful to capture some token and word statistics indicating which tokens and words were part of the train, validation, and/or test sets for both the source and target texts. In particular, flagging any inconsistencies -
tokens or words in the source / target validation or test set that were not part of the training set - would be helpful.
The text was updated successfully, but these errors were encountered:
During onboarding, when the user supplies a translation or back translation, it would be helpful to capture some statistics about the text of the project, such as:
In addition, after a preprocess/train/test run, it would be helpful to capture some token and word statistics indicating which tokens and words were part of the train, validation, and/or test sets for both the source and target texts. In particular, flagging any inconsistencies -
tokens or words in the source / target validation or test set that were not part of the training set - would be helpful.
The text was updated successfully, but these errors were encountered: