Skip to content

GreynirSeq is a natural language parsing toolkit for Icelandic focused on sequence modeling with neural networks.

License

MIT, AGPL-3.0 licenses found

Licenses found

MIT
LICENSE
AGPL-3.0
GNU_AFFERO_LICENSE
Notifications You must be signed in to change notification settings

mideind/GreynirSeq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

5646852 · Sep 21, 2022
Jan 11, 2022
Sep 24, 2020
Apr 29, 2022
Dec 1, 2021
Feb 25, 2021
Dec 15, 2021
Sep 20, 2022
Sep 20, 2022
Sep 20, 2022
Jan 31, 2022
Nov 10, 2021
May 3, 2022
Feb 24, 2021

Repository files navigation

superlinter License: AGPL v3


Greynir

GreynirSeq

GreynirSeq is a natural language parsing toolkit for Icelandic focused on sequence modeling with neural networks. It is under active development and is in its early stages.

The modeling part (nicenlp) of GreynirSeq is built on top of the excellent fairseq from Facebook (which is built on top of pytorch).

GreynirSeq is licensed under the MIT license unless otherwise stated at the top of a file. Model files hosted by Miðeind or on Hugging Face are under the GNU AFFERO GPLv3 license unless released elsewhere under CC-BY, e.g. on CLARIN.


Be aware that usage of the CLI or otherwise downloading model files will result in downloading of gigabytes of data. Simply installing greynirseq will not download any models, they are automatically downloaded on-demand.

Installation

In a suitable virtual environment

# From PyPI
$ pip install greynirseq
# or from git main branch
$ pip install git+https://github.com/mideind/greynirseq@main

Features

TL;DR give me the CLI

The greynirseq CLI interface can be used to run pretrained models for various tasks. Run pip install greynirseq && greynirseq -h to see what options are available.

POS

Input is accepted from file containing a single tokenized sentence per line, or from stdin.

$ echo "Systurnar Guðrún og Monique átu einar um jólin á McDonalds ." | greynirseq pos --input -

nvfng nven-s c n---s sfg3fþ lvfnsf af nhfog af n----s pl

NER

Input is accepted from file containing a single tokenized sentence per line, or from stdin.

$ echo "Systurnar Guðrún og Monique átu einar um jólin á McDonalds ." | greynirseq ner --input -

O B-Person O B-Person O O O O O B-Organization O

Translation

Input is accepted from file containing a single untokenized sentence per line, or from stdin.

# For en->is translation
$ echo "This is an awesome test that shows how to use a pretrained translation model." | greynirseq translate --source-lang en --target-lang is

Þetta er æðislegt próf sem sýnir hvernig nota má forprófað þýðingarlíkan.

# For is->en translation
$ echo "Þetta er æðislegt próf sem sýnir hvernig nota má forprófað þýðingarlíkan." | greynirseq translate --source-lang is --target-lang en

This is an awesome test that shows how a pre-tested translation model can be used.

Neural Icelandic Language Processing - NIceNLP

IceBERT is an Icelandic BERT-based (RoBERTa) language model that is suitable for fine tuning on downstream tasks.

The following fine tuning tasks are available both through the greynirseq CLI and for loading programmatically.

  1. POS tagging
  2. NER tagging

There are also a some translation models available. They are Transformer models trained from scratch or finetuned based on mBART25.

  1. Translation

Development

To install GreynirSeq in development mode we recommend using poetry as shown below

pip install poetry && poetry install

Linting

All code is checked with Super-Linter in a GitHub Action, we recommend running it locally before pushing

./run_linter.sh

Type annotation

Type annotation will soon be checked with mypy and should be included.

About

GreynirSeq is a natural language parsing toolkit for Icelandic focused on sequence modeling with neural networks.

Resources

License

MIT, AGPL-3.0 licenses found

Licenses found

MIT
LICENSE
AGPL-3.0
GNU_AFFERO_LICENSE

Stars

Watchers

Forks

Packages

No packages published