You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
collect data on HyDE effectiveness with open source LLMs
Tested LLMs
Mistral 7B instruct v0.2
Llama2 7B chat hf
Zepyher-7B beta
Benchmark Results
Output Quality (Contriever)
Llama2 7B chat hf results (iters = 1):
map all 0.3118
ndcg_cut_10 all 0.4728
recall_1000 all 0.7900
Llama2 7B chat hf results (iters = 8):
map all 0.3722
ndcg_cut_10 all 0.5561
recall_1000 all 0.8185
Mistral 7B instruct v0.2 (iters = 1):
map all 0.3201
ndcg_cut_10 all 0.4918
recall_1000 all 0.8021
Mistral 7B instruct v0.2 (iters = 8):
map all 0.3725
ndcg_cut_10 all 0.5578
recall_1000 all 0.8319
Zepyher-7B beta (iters = 1):
map all 0.2368
ndcg_cut_10 all 0.3935
recall_1000 all 0.7286
Zepyher-7B beta (iters = 8):
map all 0.3613
ndcg_cut_10 all 0.5231
recall_1000 all 0.8196
Output Quality (BM25)
Llama2 7B chat hf results (iters = 1):
map all 0.3291
ndcg_cut_10 all 0.5293
recall_1000 all 0.8022
Llama2 7B chat hf results (iters = 8):
map all 0.3463
ndcg_cut_10 all 0.5554
recall_1000 all 0.8272
Mistral 7B instruct v0.2 (iters = 1):
map all 0.3602
ndcg_cut_10 all 0.5238
recall_1000 all 0.8401
Mistral 7B instruct v0.2 (iters = 8):
map all 0.3678
ndcg_cut_10 all 0.5601
recall_1000 all 0.8380
3-shot HyDE Output Quality (Contriever):
Only run on Mistral 7B instruct v0.2.
Mistral 7B instruct v0.2 (iters = 1):
map all 0.0328
ndcg_cut_10 all 0.0506
recall_1000 all 0.1050
Mistral 7B instruct v0.2 (iters = 8):
map all 0.0942
ndcg_cut_10 all 0.1612
recall_1000 all 0.2576
Potential Next Steps
Look into finetuned versions of the above LLMs for improved task specific effectiveness? e.g. Improved code performance with code finetunes of llama, mistral, and MPT.
Goals
Tested LLMs
Benchmark Results
Output Quality (Contriever)
Llama2 7B chat hf results (iters = 1):
map all 0.3118
ndcg_cut_10 all 0.4728
recall_1000 all 0.7900
Llama2 7B chat hf results (iters = 8):
map all 0.3722
ndcg_cut_10 all 0.5561
recall_1000 all 0.8185
Mistral 7B instruct v0.2 (iters = 1):
map all 0.3201
ndcg_cut_10 all 0.4918
recall_1000 all 0.8021
Mistral 7B instruct v0.2 (iters = 8):
map all 0.3725
ndcg_cut_10 all 0.5578
recall_1000 all 0.8319
Zepyher-7B beta (iters = 1):
map all 0.2368
ndcg_cut_10 all 0.3935
recall_1000 all 0.7286
Zepyher-7B beta (iters = 8):
map all 0.3613
ndcg_cut_10 all 0.5231
recall_1000 all 0.8196
Output Quality (BM25)
Llama2 7B chat hf results (iters = 1):
map all 0.3291
ndcg_cut_10 all 0.5293
recall_1000 all 0.8022
Llama2 7B chat hf results (iters = 8):
map all 0.3463
ndcg_cut_10 all 0.5554
recall_1000 all 0.8272
Mistral 7B instruct v0.2 (iters = 1):
map all 0.3602
ndcg_cut_10 all 0.5238
recall_1000 all 0.8401
Mistral 7B instruct v0.2 (iters = 8):
map all 0.3678
ndcg_cut_10 all 0.5601
recall_1000 all 0.8380
3-shot HyDE Output Quality (Contriever):
Only run on Mistral 7B instruct v0.2.
Mistral 7B instruct v0.2 (iters = 1):
map all 0.0328
ndcg_cut_10 all 0.0506
recall_1000 all 0.1050
Mistral 7B instruct v0.2 (iters = 8):
map all 0.0942
ndcg_cut_10 all 0.1612
recall_1000 all 0.2576
Potential Next Steps
HyDE + AlphaGeometry performance
The text was updated successfully, but these errors were encountered: