You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This question is not 100 percent clear to answer.
Since Hebrew is based on the English corpus, you can fine tune our already trained models and do not have to train from scratch.
A rule of thumb is that it is highly recommended to validate on real data (even if it might not be that much).
For training you can also try to generate synthtic data.
For example with: https://github.com/clovaai/synthtiger
Or if possible label real data with AWS Textract or Azure Document AI.
~50K should be a good starting value for the beginning.
The current models are trained from scratch (mindee internal dataset / french vocab ~11M word crop images)
🚀 The feature
Support hebrew characters
Motivation, pitch
Increase user base
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: