-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training or finetuning segmentation/recognition model on new data #666
Comments
Just a fun fact:
|
For the segmenter I'd start with fine-tuning the base model that you can
find in the kraken repository. For the recognizer you might try to train
a generalized model based on BiblIA instead of training from scratch but
you'll still easily get a good quality recognition model when training
from scratch.
And, by the way, should I train with `topline`, `centerline` or
`baseline`? I think in this case `centerline` would fit, just want to
be sure.
The topline, centerline, baseline switches are hints for the polygonizer
during inference. The training procedure isn't affected by them.
Fundamentally, the switches tell the polygonizer how you annotated the
ground truth so it can internally translate the baseline slightly
upwards, topline slightly downwards, and centerline not at all to make
polygonization more robust. It looks like you annotated centerlines and
the polygons already look good so the `-cl` switch is most likely
appropriate.
|
Thank you so much @mittagessen |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I want to train a new model for segmentation then recognition on a pretty decent size of a ground-truth, about 730 pages of 24 different manuscripts. Is the same script by many hands (Hebrew/Aramaic Samaritan to be precise). Should I finetune some models or should I start from scratch? So far, with other data (same script) I have pretty good results finetuning but the data was way smaller 50/100/200 pages. Thanks!
The text was updated successfully, but these errors were encountered: