This project involves conducting debates between agents in a system called Agent Arena. The goal is to evaluate the performance of these agents based on their reasoning capabilities. The project workflow revolves around selecting the best-performing agents and allowing users to evaluate the selected agents to identify the best reasoning agent.
-
General Debate in Agent Arena:
- Multiple agents engage in a debate.
- The debate follows structured argumentation or free-form reasoning, depending on the setup.
-
Evaluation:
- After the debate, the agents' performance is evaluated using a set of predefined Grading Notes. These notes include criteria or metrics that quantify the quality of reasoning exhibited by each agent.
-
Best K Agents Selected:
- Based on the evaluation from the grading notes, the top K agents are selected.
- This selection of agents then undergoes further evaluation by users.
-
User Evaluation:
- The selected agents are subjected to a User Evaluation phase, where human evaluators assess the reasoning quality of the agents in a hands-on manner.
-
Outcome:
- The final outcome is the identification of the Best Reasoning Agent, which stands out based on both objective evaluation (Grading Notes) and user feedback.
The project aims to identify and highlight the best reasoning agent, which excels in logical argumentation and critical thinking, as determined through a combination of automated and user evaluations.