Skip to content
This repository has been archived by the owner on Aug 12, 2021. It is now read-only.

New Italian and German pipelines and models

Compare
Choose a tag to compare
@maziyarpanahi maziyarpanahi released this 10 Jun 15:38
· 388 commits to master since this release
732cb25

We are happy to announce our new Italian and German pipelines and models. We are also going to release new entity_recognizer_lg and entity_recognizer_md pipelines for Italian and French.

Pipelines

Italian

Pipelines Name Language
Explain Document Large explain_document_lg it
Explain Document Medium explain_document_md it
Entity Recognizer Large entity_recognizer_lg it
Entity Recognizer Medium entity_recognizer_md it

French

Pipelines Name Language
Entity Recognizer Large entity_recognizer_lg fr
Entity Recognizer Medium entity_recognizer_md fr

Models

Italian

Model Name Language
PerceptronModel (POS UD) pos_ud_isdt it
NerDLModel (glove_6B_300 and glove_840B_300) ner_dl it

German

Model Name Language
LemmatizerModel (Lemmatizer) lemma de
PerceptronModel (POS UD) pos_ud_hdt de
NerDLModel (glove_6B_300 and glove_840B_300) ner_dl de

Dataset

Feature Description
Lemma Trained by Lemmatizer annotator on lemmatization-lists by Michal Měchura
POS Trained by PerceptronApproach annotator on the Universal Dependencies
NER Trained by NerDLApproach annotator with BiLSTM-CNN on the WikiNER corpus and supports the identification of PER, LOC, ORG and MISC entities

Example

German POS model

val perceptronModel = PerceptronModel.pretrained("pos_ud_hdt", lang="de")
    .setInputCols(Array("sentence", "token"))
    .setOutputCol("pos")

German NerDL model

val ner = NerDLModel.pretrained("ner_dl", lang="de")
    .setInputCols("sentence", "token", "embeddings")
    .setOutputCol("ner")