Final evaluation notebook #97

menamerai · 2024-03-31T17:26:47Z

Specify the model names (e.g. llama2_100k, llama2_200k,... or list with different checkpoints)
Load next logprobs data (for each of the model names)
→ for now we just load a slice of the dataset (to develop faster)
Specify the tokens we are interested (a sort of dynamic category)
→ using Rai’s GUI (need to be tested when integrate, just try to use the selected_ids as a dynamic variable and see how it turns out for now)
Compute the statistics from the plot (copy form end2end.ipynb , currently commented out)
→ these calculate the percentiles for a group of tokens (which should be arbitrary)
Plot: using function in delphi.eval.vis_per_token_model
Specify 2 models for comparison.
Get prompt examples (Rai’s notebook model_diff.ipynb)

The text was updated successfully, but these errors were encountered:

menamerai · 2024-03-31T17:28:08Z

The notebook will be called eval_notebook.ipynb for now. Working on the first 2 bullet points today. I'll try to test this on Colab as well.

Post-note: did the first two bullets, also got started on third bullet. Unsure if it will work with the current ecosystem, to be tested when integrate with visualization functions.

jettjaniak · 2024-04-03T20:03:33Z

we have a PR and a trello card, we don't need this issue

menamerai added eval vis visualizations labels Mar 31, 2024

menamerai assigned siwei-li, menamerai and transcendingvictor Mar 31, 2024

menamerai linked a pull request Mar 31, 2024 that will close this issue

final evaluation notebook #98

Merged

jettjaniak closed this as not planned Won't fix, can't repro, duplicate, stale Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Final evaluation notebook #97

Final evaluation notebook #97

menamerai commented Mar 31, 2024 •

edited by transcendingvictor

Loading

menamerai commented Mar 31, 2024 •

edited

Loading

jettjaniak commented Apr 3, 2024

Final evaluation notebook #97

Final evaluation notebook #97

Comments

menamerai commented Mar 31, 2024 • edited by transcendingvictor Loading

menamerai commented Mar 31, 2024 • edited Loading

jettjaniak commented Apr 3, 2024

menamerai commented Mar 31, 2024 •

edited by transcendingvictor

Loading

menamerai commented Mar 31, 2024 •

edited

Loading