Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Language Validation Test #257

Merged
Merged
Changes from 8 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
8fc4b9a
test to validate languages
NIXBLACK11 Oct 31, 2023
9a3228b
test to validate languages
NIXBLACK11 Oct 31, 2023
ad9a588
Delete flores directory
NIXBLACK11 Oct 31, 2023
7f32d7a
Update validate_models.py
NIXBLACK11 Oct 31, 2023
ff3254b
Update validate_models.py
NIXBLACK11 Oct 31, 2023
cb2d91a
Update validate_models.py
NIXBLACK11 Oct 31, 2023
f4e84d2
Update validate_models.py
NIXBLACK11 Oct 31, 2023
109eac2
Update .gitignore
NIXBLACK11 Oct 31, 2023
2236fe0
added pytest to validate_models.py
NIXBLACK11 Nov 1, 2023
472657b
Update validate_models.py
NIXBLACK11 Nov 1, 2023
c744030
Update validate_models.py
NIXBLACK11 Nov 1, 2023
c71aec7
Update validate_models.py using mock downloader
NIXBLACK11 Nov 4, 2023
c816d79
Update validate_models.py
NIXBLACK11 Nov 6, 2023
31aa252
Update validate_models.py
NIXBLACK11 Nov 6, 2023
c34279d
Update validate_models.py
NIXBLACK11 Nov 6, 2023
8b25a3d
Update validate_models.py
NIXBLACK11 Nov 6, 2023
302d068
Update validate_models.py
NIXBLACK11 Nov 7, 2023
73f873f
Update download_models.py according to 1.
NIXBLACK11 Nov 7, 2023
5e04a2a
Update download_models.py
NIXBLACK11 Nov 7, 2023
e3552a7
Update download_models.py
NIXBLACK11 Nov 7, 2023
1d74246
Update download_models.py
NIXBLACK11 Nov 7, 2023
1bddd81
Update validate_models.py
NIXBLACK11 Nov 8, 2023
e4f3fd0
Update models.py
NIXBLACK11 Nov 8, 2023
03284a2
Update laser_tokenizer.py
NIXBLACK11 Nov 8, 2023
43f4d1a
Update download_models.py
NIXBLACK11 Nov 8, 2023
6ef54c2
Update validate_models.py
NIXBLACK11 Nov 8, 2023
89c9dde
Update validate_models.py
NIXBLACK11 Nov 8, 2023
d883ee0
Added slow and fast tests to validate_models.py
NIXBLACK11 Nov 9, 2023
e1e22a3
Update validate_models.py
NIXBLACK11 Nov 9, 2023
a8f4135
Update validate_models.py
NIXBLACK11 Nov 9, 2023
4cd83e8
Create test_validate_models.py
NIXBLACK11 Nov 9, 2023
e0be04f
Rename test_validate_models.py to test_models_initialization.py
NIXBLACK11 Nov 9, 2023
9ec012f
Update test_models_initialization.py
NIXBLACK11 Nov 9, 2023
fbbc6fc
Update test_models_initialization.py
NIXBLACK11 Nov 9, 2023
99ebbfd
Update download_models.py
NIXBLACK11 Nov 9, 2023
6356c4d
Update test_models_initialization.py
NIXBLACK11 Nov 9, 2023
eac3674
Update test_models_initialization.py
NIXBLACK11 Nov 9, 2023
d3935f9
Update download_models.py
NIXBLACK11 Nov 9, 2023
18c1657
Update validate_models.py
NIXBLACK11 Nov 14, 2023
c26e775
Update validate_models.py
NIXBLACK11 Nov 14, 2023
023eab2
Update validate_models.py
NIXBLACK11 Nov 14, 2023
3944556
Update validate_models.py
NIXBLACK11 Nov 14, 2023
0a4d983
Update validate_models.py
NIXBLACK11 Nov 14, 2023
e5823d6
Update validate_models.py
NIXBLACK11 Nov 14, 2023
92345be
Update validate_models.py
NIXBLACK11 Nov 14, 2023
87a08e9
Update validate_models.py
NIXBLACK11 Nov 14, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions laser_encoders/validate_models.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
import os
import tempfile

from laser_encoders.download_models import LaserModelDownloader
from laser_encoders.language_list import LASER2_LANGUAGE, LASER3_LANGUAGE
from laser_encoders.laser_tokenizer import initialize_tokenizer
from laser_encoders.models import initialize_encoder


def validate_language_models_and_tokenize():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maybe do the pytest.mark.parametrize thing with this test, instead of looping over the language inside it?
This way, it would be easier to rerun it for a particular language, if e.g. we decide to fix a single language code

with tempfile.TemporaryDirectory() as tmp_dir:
print("Created temporary directory", tmp_dir)

downloader = LaserModelDownloader(model_dir=tmp_dir)

for lang in LASER3_LANGUAGE:
# Use the downloader to download the model
downloader.download_laser3(lang)
encoder = initialize_encoder(lang, model_dir=tmp_dir)
tokenizer = initialize_tokenizer(lang, model_dir=tmp_dir)
# Test tokenization with a sample sentence
tokenized = tokenizer.tokenize("This is a sample sentence.")

for lang in LASER2_LANGUAGE:
# Use the downloader to download the model
downloader.download_laser2()
encoder = initialize_encoder(lang, model_dir=tmp_dir, laser="laser2")
tokenizer = initialize_tokenizer(lang, model_dir=tmp_dir)
# Test tokenization with a sample sentence
tokenized = tokenizer.tokenize("This is a sample sentence.")

print("All language models validated and deleted successfully.")
Loading