Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Evaluation Package

Step 1

Install dependencies

Run pip3 install -r setup.txt

Setup the bleurt package

git clone https://github.com/google-research/bleurt.git
cd bleurt
pip install .
cd ..

Step 2

Setting Up JSON Configuration for Model Evaluation

JSON Configuration

The JSON configuration consists of three main sections: model, data, and result.

  1. Model Configuration In the model section, you need to provide your API key and specify the model name.
{
  "model": {
    "watsonxai_token": "Bearer Your-API-Key",
    "model_name": "google/flan-t5-xxl"
  },
  ...
}

Replace Your-API-Key with your actual API key.

  1. Data Configuration

The data section contains parameters related to the input data and evaluation.

{
  ...
  "data": {
    "data_path": "path/to/dataset",
    "question": "instruction",
    "context": "input",
    "idea_answer": "output",
    "q_num": 5
  },
  ...
}
  • data_path: Provide the path to the dataset file (Ex CoQA.json).
  • question: Specify the column or field name in the dataset that contains the questions.
  • context: Specify the column or field name in the dataset that contains the context or input information.
  • idea_answer: Specify the column or field name in the dataset that contains the ideal answers for evaluation.
  • q_num: Specify the number of questions to be evaluated from the dataset.
  1. Result Configuration

The result section is used to define the file where the evaluation results will be saved.

{
  ...
  "result": {
    "result_file": "path/to/result-file.csv"
  }
}
  • result_file: Provide the path to the CSV file where the evaluation results will be stored.

Naming convention :

modelname_sourcedata_retriever_reranker_evaluatedOn.csv

Examples : flan-t5-xxl_excludeRedbooks_ES_colBERT_IBMTest.csv flan-t5-xxl_passageAvailable_NA_NA_QuAC.csv

Step 3

Run the evaluation script

python eval_script.py

Evaluation Run the evaluation result will be generated into a provided path for result_file.

Make your result path connected with MLFlow results.