Serbian Language Pipeline for Spacy

Work in progress. Far from production ready.

How to use with Spacy?

...

Data files

For testing training, we're using the UD dataset, which has been automatically converted to Cyrillic. This is temporary. We will eventually use our own training data.

Lemmatizer data

data originates from Morpho-SLaWS (Tasovac, Rudan and Rudan 2015) and Transpoetika (Tasovac 2012)
currently includes both Ekavian and Jekavian forms, I may move Jekavians to the normalization function

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Serbian Language Pipeline for Spacy

How to use with Spacy?

Data files

Lemmatizer data

Files

README.md

Latest commit

History

README.md

File metadata and controls

Serbian Language Pipeline for Spacy

How to use with Spacy?

Data files

Lemmatizer data