Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synth gen #19

Merged
merged 8 commits into from
Dec 13, 2024
Merged

Synth gen #19

merged 8 commits into from
Dec 13, 2024

Conversation

matthewcoole
Copy link
Collaborator

Synthetic test set generation was initially done manually. This pull request adds a script to perform this process on the fly based on the data (metadata and supporting documentation) gathered from the EIDC.

The synthetic test set should not really be changed regularly, to do so would somewhat invalidate the comparisons between evaluations runs over time. Test set regeneration should probably only be performed every few months, or after a time period where notable changes to the EIDC metadata/supporting documentation would be expected.

Before merging this should be tested on the scicom cluster.

Copy link

answer_correctness: 0.5301486449452893
context_recall: 0.48994649644992144
answer_relevancy: 0.507238666960303
context_precision: 0.5160380342683453

@matthewcoole matthewcoole marked this pull request as ready for review December 13, 2024 09:26
@matthewcoole matthewcoole merged commit 2ffcec5 into main Dec 13, 2024
1 check passed
@matthewcoole matthewcoole deleted the synth-gen branch December 13, 2024 09:26
Copy link

context_precision: 0.5173989459123115
answer_relevancy: 0.5339675674904192
answer_correctness: 0.5103777571005693
context_recall: 0.5014947906424193

Copy link

answer_correctness: 0.5279516582668016
context_precision: 0.49952555821721517
answer_relevancy: 0.4706517310875107
context_recall: 0.5026754765893211

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant