Skip to content

Releases: instructlab/eval

v0.1.0

15 Jul 16:27
ae6097f
Compare
Choose a tag to compare

What's Changed

  • Fixing up test case after api changes to add error_rate by @danmcp in #63
  • Inherit logging from caller rather than from vLLM by @danmcp in #66
  • Update batch size description and allow for str by @danmcp in #67
  • Don't set basicConfig from libraries by @danmcp in #69

Full Changelog: v0.0.9...v0.1.0

v0.0.9

12 Jul 14:48
5257e23
Compare
Choose a tag to compare

What's Changed

  • [mmlu] Allow optionally setting a PyTorch device by @alinaryan in #62
  • Error handling with sdg_path not found and invalid by @danmcp in #61
  • Rename sdg_path to tasks_dir by @danmcp in #64

Full Changelog: v0.0.8...v0.0.9

v0.0.8

09 Jul 16:37
2a27715
Compare
Choose a tag to compare

What's Changed

  • fix: Add specific error handling around git repo input by @danmcp in #52
  • Removing a linting ignore by @danmcp in #58

Full Changelog: v0.0.7...v0.0.8

v0.0.7

08 Jul 23:16
450acaf
Compare
Choose a tag to compare

What's Changed

  • Bump actions/download-artifact from 4.1.7 to 4.1.8 by @dependabot in #53
  • Add missing license identifiers by @danmcp in #56
  • Parameterize ILAB_EVAL_MERGE_SYS_USR by @danmcp in #57
  • Adding basic logging facilities for eval with a first pass at some useful logging by @danmcp in #55

Full Changelog: v0.0.6...v0.0.7

v0.0.6

08 Jul 13:43
6c537c5
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.0.5...v0.0.6

v0.0.5

03 Jul 15:09
4c89cc3
Compare
Choose a tag to compare

What's Changed

  • Remove task categories from mmlu tasklist by @alinaryan in #46
  • Add entry points for evaluator classes by @tiran in #45
  • e2e: Only run one job at a time for a given PR by @russellb in #47

New Contributors

  • @tiran made their first contribution in #45

Full Changelog: v0.0.4...v0.0.5

v0.0.4

01 Jul 15:35
7642cab
Compare
Choose a tag to compare

What's Changed

  • e2e: Fix permissions error by @russellb in #34
  • Include qna_file in mt_bench_branch results by @danmcp in #33
  • Include task scores with mmlu results + adjust default api retries by @danmcp in #37
  • Bump lm-eval minimum version to 0.4.3 by @nathan-weinberg in #44
  • Allow first_n option for gen answers, fix return values with max_workers=1 and only print api errors on final failure by @danmcp in #41
  • Read question_id as a string to preserve precision by @danmcp in #42

New Contributors

Full Changelog: v0.0.3...v0.0.4

v0.0.3

27 Jun 18:42
f79ce58
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.0.2...v0.0.3

v0.0.2

26 Jun 18:55
caa1e4c
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.0.1...v0.0.2

v0.0.1

18 Jun 03:32
6632002
Compare
Choose a tag to compare

What's Changed

  • Fixes pyproject.toml by @bjhargrave in #4
  • Bump step-security/harden-runner from 2.8.0 to 2.8.1 by @dependabot in #5
  • Bump rojopolis/spellcheck-github-actions from 0.37.0 to 0.38.0 by @dependabot in #7
  • Bump pypa/gh-action-pypi-publish from 1.8.14 to 1.9.0 by @dependabot in #8
  • Initial skeleton for Evaluator classes and exceptions by @nathan-weinberg in #6

New Contributors

Full Changelog: v0.0.1rc1...v0.0.1