1.12.3
Main changes
-
New option to use multiple templates and/or num_demos in single dataset recipe. Unitxt will randomly sample from the provided templates and possible number of demos for each instance.
See example : https://github.com/IBM/unitxt/blob/main/examples/evaluate_different_templates_num_demos.py -
A warning is now generated when a metric generate a score with the same name as that of another metric and overwrites it
See more details on how to deal with conflicting metric names in https://www.unitxt.ai/en/latest/docs/adding_metric.html#metric-outputs-with-multiple-metrics
Non backward compatible changes in catalog
- change rag metrics name convention (e.g. "metrics.rag.mrr" -> "metrics.rag.context_correctness.mrr",) - catalog non backward compatible change by @assaftibm in #1104
- Update summarization task and templates to support multiple reference summaries - by @yoavkatz in #1126
- Fix belebele due to new convention by @elronbandel in #1145
Additions to catalog
- Add DeepSeek-Coder format and system prompt by @oktie in #1105
- Add a metric to calculate the ratio of references included in the prediction by @marukaz in #1091
- adding RAG bge metrics by @assaftibm
New Features
- Add option to run multiple templates and or num_demos in single dataset recipe. Now it is possible to give a list of templates or num_demos. Unitxt will randomly sample from the templates and for each instance assign a random template from the list. by @elronbandel in #1110
- A warning is now generated when a metric generate a score with the same name as that of another metric and overwrites it @dafnapension in #1124
- MetricPipeline fields postpreprocess_steps has been renamed to postprocess_steps. The old field (postpreprocess_steps) still exists for backward compatible but depricated. by @dafnapension in #1117
- Decrease runtime of demo examples
- Add tests for RAG metrics by @matanor
- Adding dedicated Unitxt warning and error classes to link online documentation by @yoavkatz in
- The code now uses a central controllable deepcopy function by @elronbandel in #1120
Bug Fixes
- Create a dedicated nltk a mixin, for downloading all versions of punkt which needed by metrics code. by @elronbandel in #1151
- For bulk instance metrics, Replace mean function with nanmean to support aggregation in case of nan scores. by @elronbandel in #1150
- Fix helm test by @elronbandel in #1109
- Fix bug with RAG metrics: Fix use of minilm model by @assaftibm in #1115
- Fix data classification of WML model to include 'public' classification by @yoavkatz in #1118
- Fix WMLInferenceEngine by @pawelknes in #1122
- Fix belebele HF path due to new convention by @elronbandel in #1145
Documentation changes
- Improve debugging.rst wording
- Improve examples.rst wording by @welisheva22 in #1138
- Improve data_classification_policy.rst wording by @welisheva22 in #1139
- Improve rag_support.rst wording by @welisheva22 in #1139
- Improve production.rst wording by @welisheva22 in #1148
- Improve the clarity of the code examples.
- Improve load_datasets.rst wording by @welisheva22
- Improve introduction.rst wording by @welisheva22
- Improve installation.rst wording by @welisheva22
- Improve adding_format.rst wording by @welisheva22
- Improve adding_task.rst wording by @welisheva22
- Improve adding_template.rst wording by @welisheva22
- mprove adding_dataset.rst wording by @hanansinger
- improve index.rst page by @yoavkatz
- Fix link to llama blog in adding_format.rst by @andersonm-ibm in #1113
- Added example of RAG response by @yoavkatz in #1121
New Contributors
- @andersonm-ibm made their first contribution in #1113 by @welisheva22 in #1152