Skip to content

Releases: embeddings-benchmark/mteb

1.29.1

13 Jan 21:27
Compare
Choose a tag to compare

1.29.1 (2025-01-13)

Fix

  • fix: Added C-MTEB (#1786)

Added C-MTEB (3ba7e22)

1.29.0

13 Jan 17:51
Compare
Choose a tag to compare

1.29.0 (2025-01-13)

Ci

  • ci: fix model loading test (#1775)

  • pass base branch into the make command as an arg

  • test a file that has custom wrapper

  • what about overview

  • just dont check overview

  • revert instance check

  • explicitly omit overview and init

  • remove test change

  • try on a lot of models

  • revert test model file


Co-authored-by: Isaac Chung <[email protected]> (9b117a8)

Feature

  • feat: Update task filtering, fixing bug which included cross-lingual tasks in overly many benchmarks (#1787)

  • feat: Update task filtering, fixing bug on MTEB

  • Updated task filtering adding exclusive_language_filter and hf_subset
  • fix bug in MTEB where cross-lingual splits were included
  • added missing language filtering to MTEB(europe, beta) and MTEB(indic, beta)

The following code outlines the problems:

import mteb
from mteb.benchmarks import MTEB_ENG_CLASSIC

task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == &#34;STS22&#34;][0]
# was eq. to:
task = mteb.get_task(&#34;STS22&#34;, languages=[&#34;eng&#34;])
task.hf_subsets
# correct filtering to English datasets:
# [&#39;en&#39;, &#39;de-en&#39;, &#39;es-en&#39;, &#39;pl-en&#39;, &#39;zh-en&#39;]
# However it should be:
# [&#39;en&#39;]

# with the changes it is:
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == &#34;STS22&#34;][0]
task.hf_subsets
# [&#39;en&#39;]
# eq. to
task = mteb.get_task(&#34;STS22&#34;, hf_subsets=[&#34;en&#34;])
# which you can also obtain using the exclusive_language_filter (though not if there was multiple english splits):
task = mteb.get_task(&#34;STS22&#34;, languages=[&#34;eng&#34;], exclusive_language_filter=True)
  • format

  • remove "en-ext" from AmazonCounterfactualClassification

  • fixed mteb(deu)

  • fix: simplify in a few areas (4a70e5d)

1.28.7

13 Jan 11:01
Compare
Choose a tag to compare

1.28.7 (2025-01-13)

Ci

  • ci: skip AfriSentiLID for now (#1785)

  • skip AfriSentiLID for now

  • skip relevant test case instead


Co-authored-by: Isaac Chung <[email protected]> (71dbd61)

Fix

  • fix: update max tokens for OpenAI (#1772)

update max tokens (0c5c3a5)

1.28.6

11 Jan 17:05
Compare
Choose a tag to compare

1.28.6 (2025-01-11)

Fix

  • fix: added annotations for training data (#1742)

  • fix: Added annotations for arctic embed models

  • added google and bge

  • added cohere

  • Added e5

  • added bge based model2vec

  • annotated oAI

  • format and update annotations (3f093c8)

1.28.5

11 Jan 16:22
Compare
Choose a tag to compare

1.28.5 (2025-01-11)

Fix

  • fix: Leaderboard: K instead of M (#1761)

Fixes #1752 (972463e)

Unknown

  • other: add script for leaderboard compare (#1758)

  • add script

  • remove changes

  • remove changes

  • add comment

  • lint

  • order like in benchmark object

  • round results (8bc80aa)

1.28.4

10 Jan 15:32
Compare
Choose a tag to compare

1.28.4 (2025-01-10)

Fix

  • fix: fixes implementation of similarity() (#1748)

  • fix(#1594): fixes implementation of similarity()

  • fix: add similarity to SentenceTransformerWrapper


Co-authored-by: sam021313 <[email protected]> (3fe9264)

1.28.3

10 Jan 14:24
Compare
Choose a tag to compare

1.28.3 (2025-01-10)

Fix

  • fix: Fixed definition of zero-shot in ModelMeta (#1747)

  • Corrected zero_shot definition to be based on task names, not dataset path (407e205)

1.28.2

10 Jan 14:06
Compare
Choose a tag to compare

1.28.2 (2025-01-10)

Fix

  • fix: Fixed task_type aggregation on leaderboard (#1746)

  • Fixed task_type aggregation in leaderboard

  • Fixed an error due to unneccesary indentation in get_score (76bb070)

1.28.1

10 Jan 13:24
Compare
Choose a tag to compare

1.28.1 (2025-01-10)

Fix

  • fix: Leaderboard Speedup (#1745)

  • Added get_scores_fast

  • Made leaderboard faster with smarter dependency graph and event management and caching

  • Changed print to logger.info (9eff8ca)

Test

  • test: Add script to test model loading below n_parameters threshold (#1698)

  • add model loading test for models below 2B params

  • add failure message to include model namne

  • use the real get_model_meta

  • use cache folder

  • teardown per function

  • fix directory removal

  • write to file

  • wip loading from before

  • wip

  • Rename model_loading_testing.py to model_loading.py

  • Delete tests/test_models/test_model_loading.py

  • checks for models below 2B

  • try not using cache folder

  • update script with scan_cache_dir and add args

  • add github CI: detect changed model files and run model loading test

  • install all model dependencies

  • dependecy installations and move file location

  • should trigger a model load test in CI

  • find correct commit for diff

  • explicity fetch base branch

  • add make command

  • try to run in python instead and add pytest

  • fix attribute error and add read mode

  • separate script calling

  • let pip install be cached and specify repo path

  • check ancestry

  • add cache and rebase

  • try to merge instead of rebase

  • try without merge base

  • check if file exists first

  • Apply suggestions from code review
    Co-authored-by: Kenneth Enevoldsen <[email protected]>

  • Update .github/workflows/model_loading.yml
    Co-authored-by: Kenneth Enevoldsen <[email protected]>

  • address review comments to run test once from CI and not pytest


Co-authored-by: Kenneth Enevoldsen <[email protected]> (8d033f3)

Unknown

  • Fixed result loading on leaderboard (#1739)

  • Only main_score gets loaded for leaderboard thereby avoiding OOM errors

  • Fixed plot failing because of missing embedding dimensions

  • Ran linting (752d2b8)

1.28.0

09 Jan 12:11
Compare
Choose a tag to compare

1.28.0 (2025-01-09)

Feature

  • feat: Add nomic modern bert (#1684)

  • add nomic modern bert

  • use SentenceTransformerWrapper

  • use SentenceTransformerWrapper

  • try nomic wrapper

  • update

  • use all prompts

  • pass prompts

  • use fp16

  • lint

  • change to version

  • remove commented code (95f143a)

Fix

  • fix: allow kwargs in init for RerankingWrapper (#1676)

  • allow kwargs in init

  • fix retrieval

  • convert corpus_in_pair to list (f5962c6)