Name	Name	Last commit message	Last commit date
Latest commit ZJaume Bump version Sep 1, 2024 a9b2828 · Sep 1, 2024 History 75 Commits
.github/workflows	.github/workflows	Publish to PyPi	Aug 28, 2024
LanguageModels	LanguageModels	Separated models for each ngram type	Aug 6, 2024
src	src	Count word chars early	Sep 1, 2024
.gitignore	.gitignore	Ignore wheel dirs	Aug 6, 2024
Cargo.toml	Cargo.toml	Bump version	Sep 1, 2024
LICENSE	LICENSE	Initial commit	Nov 15, 2023
README.md	README.md	Count word chars early	Sep 1, 2024
pyproject.toml	pyproject.toml	Add more project metadata	Aug 26, 2024

Repository files navigation

heliport

A language identification tool that aims to be both fast and accurate. Originally started as a HeLI-OTS port to Rust.

Installation

From PyPi

Install it in your environment

pip install heliport

then download the model

heliport-download

From source

Install the requirements:

Python
PIP
Rust
OpenSSL

Clone the repo, build the package and compile the model

git clone https://github.com/ZJaume/heliport
cd heliport
pip install .
heliport-convert

Usage

CLI

Just run the heliport command that reads lines from stdin

cat sentences.txt | heliport

eng_latn
cat_latn
rus_cyrl
...

Python package

>>> from heliport import Identifier
>>> i = Identifier()
>>> i.identify("L'aigua clara")
'cat_latn'

Rust crate

use std::sync::Arc;
use heliport::identifier::Identifier;
use heliport::lang::Lang;
use heliport::load_models;

let (charmodel, wordmodel) = load_models("/dir/to/models")
let identifier = Identifier::new(
    Arc::new(charmodel),
    Arc::new(wordmodel),
    );
let lang, score = identifier.identify("L'aigua clara");
assert_eq!(lang, Lang::cat_Latn);

Benchmarks

Speed benchmarks with 100k random sentences from OpenLID, all the tools running single-threaded:

tool	time (s)
CLD2	1.12
HeLI-OTS	60.37
lingua all high preloaded	56.29
lingua all low preloaded	23.34
fasttext openlid193	8.44
heliport	2.33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

heliport

Installation

From PyPi

From source

Usage

CLI

Python package

Rust crate

Benchmarks

About

Releases 11

Packages

Languages

License

ZJaume/heliport

Folders and files

Latest commit

History

Repository files navigation

heliport

Installation

From PyPi

From source

Usage

CLI

Python package

Rust crate

Benchmarks

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 11

Packages 0

Languages

Packages