Skip to content

Commit

Permalink
Merge pull request #24 from NERC-CEH/synth-eval
Browse files Browse the repository at this point in the history
Synth eval
  • Loading branch information
matthewcoole authored Dec 20, 2024
2 parents d9bd639 + 2b9c1b4 commit 8016933
Show file tree
Hide file tree
Showing 6 changed files with 106 additions and 3 deletions.
1 change: 1 addition & 0 deletions data/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,4 @@
/eidc_rag_test_set.csv
/rag-pipeline.yml
/pipeline.yml
/cleaned_testset.csv
11 changes: 9 additions & 2 deletions dvc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -62,17 +62,24 @@ stages:
- scripts/generate_synthetic_testset.py
outs:
- ${files.test-set}
evaluate-synthetic-testset:
cmd: uv run scripts/evaluate_synthetic_testset.py ${files.test-set} ${files.cleaned-test-set}
deps:
- ${files.test-set}
- scripts/evaluate_synthetic_testset.py
outs:
- ${files.cleaned-test-set}
run-rag-pipeline:
cmd: >-
uv run scripts/run_rag_pipeline.py
-i ${files.test-set}
-i ${files.cleaned-test-set}
-o ${files.eval-set}
-ds ${files.doc-store}
-c ${doc-store.collection}
-m ${rag.model}
-p ${files.pipeline}
deps:
- ${files.test-set}
- ${files.cleaned-test-set}
- ${files.doc-store}
- scripts/run_rag_pipeline.py
outs:
Expand Down
1 change: 1 addition & 0 deletions params.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ files:
embeddings: data/embeddings.json
doc-store: data/chroma-data
test-set: data/eidc_rag_testset.csv
cleaned-test-set: data/cleaned_testset.csv
eval-set: data/evaluation_data.csv
metrics: data/metrics.json
eval-plot: data/eval.png
Expand Down
58 changes: 58 additions & 0 deletions prompts/synth-eval.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
You are helpful assistant.
Your task is to assess whether a given question is appropriate for use in evaluating a retrieval augmented generative system.
The system being evaluated is designed to answers questions about environmental science datasets stored in a data centre called the Environmental Information Data Centre (EIDC).
The data cetnre stores metadata information about datasets and this is the information the system uses to answer the questions.
The metadata includes information about the dataset, such as the title, description, keywords, and supporting documentation that may provide limited descrioption of the data collection methods and data processing steps.
The criteria to assess the questions on are:
1. Is the question clear?
2. If the question appears to be specific to one dataset, is it clear to what dataset it is referring?
3. If the question is more general, is it clear what type of information is being requested?

You must also assess whether or a pre-generated "ground truth" answer is reasonable. In some cases, no appropriate answer will have been generated because the question is not answerable based on the provided context, in which case the question should be marked as not appropriate.

The question (along with the ground truth) will be provided in a JSON format.
Examples:
{
"question": "What specific parameters are recorded for each tree within the surveyed 10m x 10m plots in the 'UK Environmental Change Network (ECN) woodland vegetation data' dataset?",
"ground_truth": "The specific parameters recorded for each tree within the surveyed 10m x 10m plots in the 'UK Environmental Change Network (ECN) woodland vegetation data' dataset are seedlings, diameter at breast height (dbh), height and species dominance."
}
{
"question": "What are the specific types of structures and features included in the GIS shapefiles for the 'Building, infrastructure, and river channel reaches' dataset related to the Chamoli event?",
"ground_truth": "The GIS shapefiles include information about bridges, buildings, roads, and river valleys."
}

Your response should be in the form of a JSON object containing the question and a boolean value indicating whether the question is appropriate or not.
The object should also include a reason code for the assessment.

Examples:
{
"question": "What was the average weed abundance across the surveyed lowland arable fields over the three-year period?",
"ground_truth": "The answer to given question is not present in context"
"appropriate": false,
"reason": "The question appears to be reffering to a specific dataset but it is not clear which one. The ground_truth does not give an appropriate answer."
}
{
"question": "What specific parameters are recorded for each tree within the surveyed 10m x 10m plots in the 'UK Environmental Change Network (ECN) woodland vegetation data' dataset?",
"ground_truth": "The specific parameters recorded for each tree within the surveyed 10m x 10m plots in the 'UK Environmental Change Network (ECN) woodland vegetation data' dataset are seedlings, diameter at breast height (dbh), height and species dominance.",
"appropriate": true,
"reason": "The question is clear and specific to a dataset and the ground truth provides a reasonable answer."
}
{
"question": "What are the specific types of structures and features included in the GIS shapefiles for the 'Building, infrastructure, and river channel reaches' dataset related to the Chamoli event?",
"ground_truth": "The GIS shapefiles include information about bridges, buildings, roads, and river valleys.",
"appropriate": true,
"reason": "The question is asking for specific information from a particular dataset and the ground_thruth provides a reasonable answer."
}
{
"question": "What were the earliest recorded first egg dates for blue tits (Cyanistes caeruleus) across the three woods between 1993 and 2014?",
"appropriate": false,
"reason": "The question appears to refer to a specific dataset by referencing three woods, but it is not clear which dataset."
}
{
"question": "What are the estimated annual loads of nitrogen from non-agricultural sources to rivers in Scotland?",
"ground_truth": "The answer to given question is not present in context",
"appropriate": false,
"reason": "The ground_truth does not provide an answer to the question."
}

The question to assess is:
35 changes: 35 additions & 0 deletions scripts/evaluate_synthetic_testset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
import json
from argparse import ArgumentParser
from json import JSONDecodeError
from pathlib import Path

import ollama
import pandas as pd
from tqdm import tqdm


def main(input: str, output: str, model: str, prompt_file: str) -> None:
df = pd.read_csv(input)
prompt = Path(prompt_file).read_text()
df["appropriate"] = False
df["reason"] = ""
for i, row in tqdm(df.iterrows(), total=len(df)):
json_q = json.dumps({"question": row["question"], "ground_truth": row["ground_truth"]}, indent=4)
response = ollama.generate(model=model, prompt=prompt + json_q)
try:
result = json.loads(response["response"])
df.loc[i, "appropriate"] = result["appropriate"]
df.loc[i, "reason"] = result["reason"]
except JSONDecodeError:
df.loc[i, "reason"] = "Error decoding response"
df.to_csv(output, index=False)


if __name__ == "__main__":
parser = ArgumentParser("evaluate_synthetic_data.py")
parser.add_argument("eval_dataset", help="File containing the synthetic questions.")
parser.add_argument("output", help="File to output the evaluated synthetic data.")
parser.add_argument("-m", "--model", help="The model to use for evaluation.", default="mistral-nemo")
parser.add_argument("-p", "--prompt-file", help="File containing the prompt to use for evaluation", default="prompts/synth-eval.txt")
args = parser.parse_args()
main(args.eval_dataset, args.output, args.model, args.prompt_file)
3 changes: 2 additions & 1 deletion scripts/run_rag_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,8 +120,9 @@ def main(
rag_pipe.dump(f)

df = pd.read_csv(test_data_file)
df = df[df['appropriate'] == True]
df.drop(
columns=["contexts", "evolution_type", "metadata", "episode_done"],
columns=["contexts", "evolution_type", "metadata", "episode_done", "appropriate", "reason"],
inplace=True,
)

Expand Down

0 comments on commit 8016933

Please sign in to comment.