You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
The train_lstm step writes an “all chars” text file that assumes that it encounters all the chars in the corpus. But this is not necessarily true. The training is on limited data, and it may miss rare chars that will exist in the correction step.
Is it ok? Or this is something that needs to be addressed?
Thanks!
Omri
The text was updated successfully, but these errors were encountered:
Actually, the chars are extracted from all text (train set, test set, and val set).
Whether this is correct (fair) is open for discussion. It is probably more correct to use only the characters in the train set (and maybe validation set) and have an 'unknown' character. It is likely that the 'unknown' character only appears in the input text, and not in the output text. Otherwise incorrect text will be produced.
Hi,
The train_lstm step writes an “all chars” text file that assumes that it encounters all the chars in the corpus. But this is not necessarily true. The training is on limited data, and it may miss rare chars that will exist in the correction step.
Is it ok? Or this is something that needs to be addressed?
Thanks!
Omri
The text was updated successfully, but these errors were encountered: