This repository includes the slides and some of the notebooks that are used in my Evaluation workshops.
Some of the notebooks do require an OpenAI API key.
These notebooks are intended for explaining key points of the talk, please don't try to bring them to production use. If you want to dig deeper or have issues, go to the source for each of these projects.
I will do a updated workshop in April 2025, so look for updates here
Prompting a Chatbot: Colab notebook
Testing Properties of a System: Guidance AI
Langtest tutorials from John Snow Labs: Colab Notebooks
LLM Evaluation Harness from EleutherAI: Github or Colab notebook
Ragas showing Model as an evaluator: Github or Colab notebook
Ragas using LangFuse: Colab notebook
Evaluate LLMs and RAG a practical example using Langchain and Hugging Face: Github
MLFlow Automated Evaluation: Blog
LLM Grader on AWS: Video and Notebook
Argilla for Annotation: Spaces login: admin password: 12345678
LLM AutoEval for RunPod by Maxime Labonne Colab
Generative AI Summit, Austin (Oct 2023) - Slides
ODSC West, San Francisco (Nov 2023) - Slides
Arize Holiday Conference (Dec 2023) - Slides
Data Innovation Conference (Apr 2024) - Slides
Evaluation for Large Language Models and Generative AI - A Deep Dive - YouTube
Constructing an Evaluation Approach for Generative AI Models - YouTube
Large Language Models (LLMs) Can Explain Their Predictions - YouTube & Slides
Josh Tobin's Evaluation talk YouTube
LLM Evaluation Tooling Review