Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More flexible evaluation options #36

Open
emilradix opened this issue Oct 22, 2024 · 0 comments
Open

More flexible evaluation options #36

emilradix opened this issue Oct 22, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@emilradix
Copy link
Contributor

Right now the evaluation module uses ragas evaluate to always output:
answer relevance, faithfulness, context_recall, context_precision

Answer relevance and faithfulness are interesting metrics, however they work better for scenarios where you do not have a ground truth answer available.

I think we need a simple scoring that compares the ground truth answer with the generated answer as an alternative to these two. Since we have the ground truth, it might be more powerful, and it should be quite a bit faster. That can also easily be done with ragas, we just need to add the option.

Finally, right now you are forced to rerun answer_evals, even if we just want to benchmark the search itself, which I think we should also update.

@emilradix emilradix added the enhancement New feature or request label Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant