-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Playground
: Side-by-Side Expert Evaluation
#345
Conversation
Added dependencies for free fontawesome.
Adds an attribute (false by default) that allows hiding the detail header (References / unreferenced, Grading Criterion ID etc.) from inline feedback.
Adds the expert view and auxiliary data and an auxiliary button to access the expert view temporarily in testing
Playground
: Side-by-Side Expert Evaluation (WIP)
- Replaced sanitize-html with Markdown rendering for metric descriptions. - Simplified the process of editing descriptions by allowing Markdown syntax. - Improved visualization of descriptions in popups and during the editing process, enhancing the user experience and readability.
- Refactored the Metric type, which was previously defined in two separate locations, to now consistently use the Metric type from the model. - Moved the ExpertEvaluationConfig definition from the evaluation_management component to the model, enabling reuse across multiple components for better maintainability and consistency.
- Fixes an issue where switching the ExpertEvaluationConfig via the dropdown did not correctly update the selected exercises. - Refactors the ExpertEvaluationExerciseSelect component to simplify its structure by moving data fetching and error handling to the child component. - Ensures proper synchronization of selected exercises between the parent and child components. - Implements better separation of concerns by letting the parent manage state while the child handles fetching and rendering exercises. - Adds multiple exercise selection capability with clear communication between the parent and child components. This commit improves code clarity, ensures reliable state management, and enhances user experience during config switching and exercise selection.
- Added creationDate Attribute to better distinguish configs with same name. - Added eager loading of submissions and categorizedFeedbacks in multiple_exercises_select for simpler access.
- Made the header in the expert view of the side-by-side tool sticky, allowing experts to navigate more efficiently without needing to scroll back to the top after each evaluation. - Reduced the header’s footprint, giving more screen space for critical evaluation features and improving overall usability. - Enhanced the visual appeal by adding a cleaner, more functional progress bar for tracking evaluation progress.
Once an evaluation is started, the user is not able to change it anymore.
Enables the researchers to create links for the expert evaluation. The experts can access the side by side tool with the links
…um/Athena into feature/side-by-side-tool
When an experiment is started, no new metrics can be added. Therefor, the new metric form is now hidden accordingly.
- If disabled, buttons are hidden form the evaluation management rather than disabling them. - Use section instead of labels if fitting
Playground
: Side-by-Side Expert Evaluation (WIP)Playground
: Side-by-Side Expert Evaluation
playground/package.json
Outdated
@@ -14,6 +14,9 @@ | |||
}, | |||
"dependencies": { | |||
"@blueprintjs/core": "5.5.1", | |||
"@fortawesome/fontawesome-svg-core": "^6.6.0", | |||
"@fortawesome/free-solid-svg-icons": "^6.6.0", | |||
"@fortawesome/react-fontawesome": "^0.2.2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please pin the dependencies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now pinned the dependencies 👍
playground/src/model/data_mode.ts
Outdated
@@ -1 +1 @@ | |||
export type DataMode = "example" | "evaluation" | string; | |||
export type DataMode = "example" | "evaluation" | "expert_evaluation" | "expert_evaluation/exercises" | string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you have two different modes here insted of just one mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! This was part of a previous iteration for storing evaluations. It is not needed anymore. We have now removed it 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just implement the comments from above, this PR looks good to me. I also tested again on the test server and could not see any issues with it. Good job!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please incorporate the changes requested by @FelixTJDietrich.
Otherwise, it looks good! Great job 👍
9d9c852
Motivation and Context
By GPT-4o and Suno (sound on 🎶)
Side.by.Side.-.SD.480p.mov
The primary motivation behind this tool is to create a robust and versatile benchmark for evaluating feedback on student submissions across specific use cases.
Description
This PR introduces the Side-by-Side Evaluation Tool, designed to assist experts in evaluating feedback provided on student submissions. This tool is especially useful for researchers seeking to assess the quality and relevance of feedback across multiple criteria and through multiple evaluators. Ths Side-by-Side Tool provides two views:
Steps for Testing
Testserver States
Note
These badges show the state of the test servers.
Green = Currently available, Red = Currently locked
Click on the badges to get to the test servers.
Test Data
Use the following exercises to test
exercise-5000.json
exercise-800.json
exercise-700.json
exercise-123.json
Screenshots
Researcher View - Imported Exercises and Metrics
Researcher View - Defining Metrics and creating expert links
Expert View - Welcome Screen
Exper View - Evaluation View