This repository has been archived by the owner on Aug 12, 2021. It is now read-only.
New Italian and German pipelines and models
maziyarpanahi
released this
10 Jun 15:38
·
388 commits
to master
since this release
We are happy to announce our new Italian and German pipelines and models. We are also going to release new entity_recognizer_lg
and entity_recognizer_md
pipelines for Italian and French.
Pipelines
Italian
Pipelines | Name | Language |
---|---|---|
Explain Document Large | explain_document_lg |
it |
Explain Document Medium | explain_document_md |
it |
Entity Recognizer Large | entity_recognizer_lg |
it |
Entity Recognizer Medium | entity_recognizer_md |
it |
French
Pipelines | Name | Language |
---|---|---|
Entity Recognizer Large | entity_recognizer_lg |
fr |
Entity Recognizer Medium | entity_recognizer_md |
fr |
Models
Italian
Model | Name | Language |
---|---|---|
PerceptronModel (POS UD) | pos_ud_isdt |
it |
NerDLModel (glove_6B_300 and glove_840B_300) | ner_dl |
it |
German
Model | Name | Language |
---|---|---|
LemmatizerModel (Lemmatizer) | lemma |
de |
PerceptronModel (POS UD) | pos_ud_hdt |
de |
NerDLModel (glove_6B_300 and glove_840B_300) | ner_dl |
de |
Dataset
Feature | Description |
---|---|
Lemma | Trained by Lemmatizer annotator on lemmatization-lists by Michal Měchura |
POS | Trained by PerceptronApproach annotator on the Universal Dependencies |
NER | Trained by NerDLApproach annotator with BiLSTM-CNN on the WikiNER corpus and supports the identification of PER , LOC , ORG and MISC entities |
Example
German POS model
val perceptronModel = PerceptronModel.pretrained("pos_ud_hdt", lang="de")
.setInputCols(Array("sentence", "token"))
.setOutputCol("pos")
German NerDL model
val ner = NerDLModel.pretrained("ner_dl", lang="de")
.setInputCols("sentence", "token", "embeddings")
.setOutputCol("ner")