Work in progress. Far from production ready.
...
For testing training, we're using the UD dataset, which has been automatically converted to Cyrillic. This is temporary. We will eventually use our own training data.
- data originates from Morpho-SLaWS (Tasovac, Rudan and Rudan 2015) and Transpoetika (Tasovac 2012)
- currently includes both Ekavian and Jekavian forms, I may move Jekavians to the normalization function