You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried using the vocab builder on the German Wikipedia, but some words aren't accurately represented into its sub words, for example, "eintausendneunhundertneunzig" is considered as a one sub word, although I expected "ein", "tausend", "neun", "hundert", "neun", "zig", is there any tweaks to make the model more specific to German which is very compound?
Thank you
The text was updated successfully, but these errors were encountered:
First, it seems you need to check each subwords like 'ein', 'tausend' are in your vocab.
And if there are, the word 'eintausendneunhundertneunzig' may appeared many times, higher than threshold set by min_count flags.
I tried using the vocab builder on the German Wikipedia, but some words aren't accurately represented into its sub words, for example, "eintausendneunhundertneunzig" is considered as a one sub word, although I expected "ein", "tausend", "neun", "hundert", "neun", "zig", is there any tweaks to make the model more specific to German which is very compound?
Thank you
The text was updated successfully, but these errors were encountered: