Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

instructlab / eval Public

Notifications You must be signed in to change notification settings
Fork 23
Star 11

Code
Issues 13
Pull requests 7
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Releases: instructlab/eval

Releases · instructlab/eval

v0.1.0

15 Jul 16:27

nathan-weinberg

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.1.0

What's Changed

Fixing up test case after api changes to add error_rate by @danmcp in #63
Inherit logging from caller rather than from vLLM by @danmcp in #66
Update batch size description and allow for str by @danmcp in #67
Don't set basicConfig from libraries by @danmcp in #69

Full Changelog: v0.0.9...v0.1.0

Contributors

danmcp

Assets 6

Loading

All reactions

v0.0.9

12 Jul 14:48

danmcp

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.0.9

What's Changed

[mmlu] Allow optionally setting a PyTorch device by @alinaryan in #62
Error handling with sdg_path not found and invalid by @danmcp in #61
Rename sdg_path to tasks_dir by @danmcp in #64

Full Changelog: v0.0.8...v0.0.9

Contributors

danmcp and alinaryan

Assets 6

Loading

All reactions

v0.0.8

09 Jul 16:37

alinaryan

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.0.8

What's Changed

fix: Add specific error handling around git repo input by @danmcp in #52
Removing a linting ignore by @danmcp in #58

Full Changelog: v0.0.7...v0.0.8

Contributors

danmcp

Assets 6

Loading

All reactions

v0.0.7

08 Jul 23:16

danmcp

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.0.7

What's Changed

Bump actions/download-artifact from 4.1.7 to 4.1.8 by @dependabot in #53
Add missing license identifiers by @danmcp in #56
Parameterize ILAB_EVAL_MERGE_SYS_USR by @danmcp in #57
Adding basic logging facilities for eval with a first pass at some useful logging by @danmcp in #55

Full Changelog: v0.0.6...v0.0.7

Contributors

danmcp and dependabot

Assets 6

Loading

All reactions

v0.0.6

08 Jul 13:43

danmcp

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.0.6

What's Changed

Add MMLU tests by @nathan-weinberg in #27
feat: Include error rate in judgment results by @danmcp in #49

Full Changelog: v0.0.5...v0.0.6

Contributors

danmcp and nathan-weinberg

Assets 6

Loading

All reactions

v0.0.5

03 Jul 15:09

nathan-weinberg

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.0.5

What's Changed

Remove task categories from mmlu tasklist by @alinaryan in #46
Add entry points for evaluator classes by @tiran in #45
e2e: Only run one job at a time for a given PR by @russellb in #47

New Contributors

@tiran made their first contribution in #45

Full Changelog: v0.0.4...v0.0.5

Contributors

russellb, tiran, and alinaryan

Assets 6

Loading

All reactions

v0.0.4

01 Jul 15:35

alinaryan

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.0.4

What's Changed

e2e: Fix permissions error by @russellb in #34
Include qna_file in mt_bench_branch results by @danmcp in #33
Include task scores with mmlu results + adjust default api retries by @danmcp in #37
Bump lm-eval minimum version to 0.4.3 by @nathan-weinberg in #44
Allow first_n option for gen answers, fix return values with max_workers=1 and only print api errors on final failure by @danmcp in #41
Read question_id as a string to preserve precision by @danmcp in #42

New Contributors

@russellb made their first contribution in #34

Full Changelog: v0.0.3...v0.0.4

Contributors

russellb, danmcp, and nathan-weinberg

Assets 6

Loading

All reactions

v0.0.3

27 Jun 18:42

nathan-weinberg

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.0.3

What's Changed

Add e2e job to CI by @nathan-weinberg in #14
Add list of default MMLU tasks as a constant by @nathan-weinberg in #24

Full Changelog: v0.0.2...v0.0.3

Contributors

nathan-weinberg

Assets 6

Loading

All reactions

v0.0.2

26 Jun 18:55

nathan-weinberg

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.0.2

What's Changed

Add MT-Bench and PR-Bench Support by @danmcp in #9
Implement MMLU_Evaluator.run() by @alinaryan in #10
Updating to > 1.0 openai by @danmcp in #17
mmlu branch run() complete by @alimaredia in #19
add cpu support for MMLU bench by @cdoern in #21

New Contributors

@danmcp made their first contribution in #9
@alinaryan made their first contribution in #10
@alimaredia made their first contribution in #19
@cdoern made their first contribution in #21

Full Changelog: v0.0.1...v0.0.2

Contributors

danmcp, alimaredia, and 2 other contributors

Assets 6

Loading

All reactions

v0.0.1

18 Jun 03:32

nathan-weinberg

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.0.1

What's Changed

Fixes pyproject.toml by @bjhargrave in #4
Bump step-security/harden-runner from 2.8.0 to 2.8.1 by @dependabot in #5
Bump rojopolis/spellcheck-github-actions from 0.37.0 to 0.38.0 by @dependabot in #7
Bump pypa/gh-action-pypi-publish from 1.8.14 to 1.9.0 by @dependabot in #8
Initial skeleton for Evaluator classes and exceptions by @nathan-weinberg in #6

New Contributors

@bjhargrave made their first contribution in #4

Full Changelog: v0.0.1rc1...v0.0.1

Contributors

bjhargrave, dependabot, and nathan-weinberg

Assets 6

Loading

All reactions

Previous 1 2 3 Next

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.