Skip to content
This repository has been archived by the owner on May 17, 2024. It is now read-only.

Human evaluation should be applied to a portion of the dataset #55

Open
yujonglee opened this issue Aug 29, 2023 · 0 comments
Open

Human evaluation should be applied to a portion of the dataset #55

yujonglee opened this issue Aug 29, 2023 · 0 comments

Comments

@yujonglee
Copy link
Owner

  • Conflict resolver - if consensus failed, we should collect them and hand them to human
  • Vibe check https://www.latent.space/p/mosaic-mpt-7b

    The vibe-based eval cannot be underrated. … One of our evals was just having a bunch of prompts and watching the answers as the models trained and see if they change. Honestly, I don’t really believe that any of these eval metrics capture what we care about. One of our prompts was “suggest games for a 3-year-old and a 7-year-old to play” and that was a lot more valuable to see how the answer changed during the course of training. — Jonathan Frankle

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant