Releases: instructlab/eval
Releases · instructlab/eval
v0.1.0
What's Changed
- Fixing up test case after api changes to add error_rate by @danmcp in #63
- Inherit logging from caller rather than from vLLM by @danmcp in #66
- Update batch size description and allow for str by @danmcp in #67
- Don't set basicConfig from libraries by @danmcp in #69
Full Changelog: v0.0.9...v0.1.0
v0.0.9
v0.0.8
v0.0.7
What's Changed
- Bump actions/download-artifact from 4.1.7 to 4.1.8 by @dependabot in #53
- Add missing license identifiers by @danmcp in #56
- Parameterize ILAB_EVAL_MERGE_SYS_USR by @danmcp in #57
- Adding basic logging facilities for eval with a first pass at some useful logging by @danmcp in #55
Full Changelog: v0.0.6...v0.0.7
v0.0.6
What's Changed
- Add MMLU tests by @nathan-weinberg in #27
- feat: Include error rate in judgment results by @danmcp in #49
Full Changelog: v0.0.5...v0.0.6
v0.0.5
What's Changed
- Remove task categories from mmlu tasklist by @alinaryan in #46
- Add entry points for evaluator classes by @tiran in #45
- e2e: Only run one job at a time for a given PR by @russellb in #47
New Contributors
Full Changelog: v0.0.4...v0.0.5
v0.0.4
What's Changed
- e2e: Fix permissions error by @russellb in #34
- Include qna_file in mt_bench_branch results by @danmcp in #33
- Include task scores with mmlu results + adjust default api retries by @danmcp in #37
- Bump lm-eval minimum version to 0.4.3 by @nathan-weinberg in #44
- Allow first_n option for gen answers, fix return values with max_workers=1 and only print api errors on final failure by @danmcp in #41
- Read question_id as a string to preserve precision by @danmcp in #42
New Contributors
Full Changelog: v0.0.3...v0.0.4
v0.0.3
What's Changed
- Add e2e job to CI by @nathan-weinberg in #14
- Add list of default MMLU tasks as a constant by @nathan-weinberg in #24
Full Changelog: v0.0.2...v0.0.3
v0.0.2
What's Changed
- Add MT-Bench and PR-Bench Support by @danmcp in #9
- Implement
MMLU_Evaluator.run()
by @alinaryan in #10 - Updating to > 1.0 openai by @danmcp in #17
- mmlu branch run() complete by @alimaredia in #19
- add cpu support for MMLU bench by @cdoern in #21
New Contributors
- @danmcp made their first contribution in #9
- @alinaryan made their first contribution in #10
- @alimaredia made their first contribution in #19
- @cdoern made their first contribution in #21
Full Changelog: v0.0.1...v0.0.2
v0.0.1
What's Changed
- Fixes pyproject.toml by @bjhargrave in #4
- Bump step-security/harden-runner from 2.8.0 to 2.8.1 by @dependabot in #5
- Bump rojopolis/spellcheck-github-actions from 0.37.0 to 0.38.0 by @dependabot in #7
- Bump pypa/gh-action-pypi-publish from 1.8.14 to 1.9.0 by @dependabot in #8
- Initial skeleton for Evaluator classes and exceptions by @nathan-weinberg in #6
New Contributors
- @bjhargrave made their first contribution in #4
Full Changelog: v0.0.1rc1...v0.0.1