Skip to content

Latest commit

 

History

History
21 lines (17 loc) · 950 Bytes

process.md

File metadata and controls

21 lines (17 loc) · 950 Bytes

First, we need to get an optimal evaluation system.

  • Determine difference between different evaluation methods:
    • Evaluate the metric prompts of Context Precision, Context Recall, Factual Correctness, and Sematic Similarity using t-validation(each for 3 times).
    • We need to iterate the prompt to stabilize the score of each metric.
    • We can also use another prompt to evaluate the different prompt.
  • For different task, we need to combine different metrics to get different final scores. So we will construct a formula to calculate the final score.

Second, we can use different retrieval methods to get results.

  • Embedding Search
  • BM25
  • Hybrid
  • GraphRag
  • Agent

We need to test the performance of each method using different datasets.

The conclusion of our paper will include: The effects of parameters and prompts on the metric scores. The formula to calculate the final score. The performance of different retrieval methods.