SuperKnowa/5. LLM Model Evaluations/I. LLM Eval Toolkit/Eval_Package at main · ibm-ecosystem-engineering/SuperKnowa

History

Name		Name	Last commit message	Last commit date
parent directory ..
__pycache__		__pycache__
Readme.md		Readme.md
config.json		config.json
eval_script.py		eval_script.py
setup.txt		setup.txt

Readme.md

Evaluation Package

Step 1

Install dependencies

Run pip3 install -r setup.txt

Setup the bleurt package

git clone https://github.com/google-research/bleurt.git
cd bleurt
pip install .
cd ..

Step 2

Setting Up JSON Configuration for Model Evaluation

JSON Configuration

The JSON configuration consists of three main sections: model, data, and result.

Model Configuration In the model section, you need to provide your API key and specify the model name.

{
  "model": {
    "watsonxai_token": "Bearer Your-API-Key",
    "model_name": "google/flan-t5-xxl"
  },
  ...
}

Replace Your-API-Key with your actual API key.

Data Configuration

The data section contains parameters related to the input data and evaluation.

{
  ...
  "data": {
    "data_path": "path/to/dataset",
    "question": "instruction",
    "context": "input",
    "idea_answer": "output",
    "q_num": 5
  },
  ...
}

data_path: Provide the path to the dataset file (Ex CoQA.json).
question: Specify the column or field name in the dataset that contains the questions.
context: Specify the column or field name in the dataset that contains the context or input information.
idea_answer: Specify the column or field name in the dataset that contains the ideal answers for evaluation.
q_num: Specify the number of questions to be evaluated from the dataset.

Result Configuration

The result section is used to define the file where the evaluation results will be saved.

{
  ...
  "result": {
    "result_file": "path/to/result-file.csv"
  }
}

result_file: Provide the path to the CSV file where the evaluation results will be stored.

Naming convention :

modelname_sourcedata_retriever_reranker_evaluatedOn.csv

Examples : flan-t5-xxl_excludeRedbooks_ES_colBERT_IBMTest.csv flan-t5-xxl_passageAvailable_NA_NA_QuAC.csv

Step 3

Run the evaluation script

python eval_script.py

Evaluation Run the evaluation result will be generated into a provided path for result_file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval_Package

Eval_Package

Readme.md

Evaluation Package

Step 1

Install dependencies

Setup the bleurt package

Step 2

Setting Up JSON Configuration for Model Evaluation

JSON Configuration

Naming convention :

Step 3

Make your result path connected with MLFlow results.

Files

Eval_Package

Directory actions

More options

Directory actions

More options

Latest commit

History

Eval_Package

Folders and files

parent directory

Readme.md

Evaluation Package

Step 1

Install dependencies

Setup the bleurt package

Step 2

Setting Up JSON Configuration for Model Evaluation

JSON Configuration

Naming convention :

Step 3

Make your result path connected with MLFlow results.