-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
5398f9e
commit 4fd8f48
Showing
17 changed files
with
193 additions
and
79 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
## Consistency evaluation | ||
|
||
::: consistence_eval | ||
|
||
## Streamlit labelling app | ||
|
||
::: labellingapp | ||
|
||
### Merging labels | ||
|
||
::: merge_labels | ||
|
||
### Run Batch Training | ||
|
||
::: run_all_training | ||
|
||
### Evaluation Utils | ||
|
||
::: evaluation_utils |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
# Evaluation of LLM models and techniques | ||
|
||
## How to run | ||
- Start the language server at the root of this repository with `./start_llm_service.sh` . This is important, do not skip it. | ||
- Run `python run_all_training.py` to train all models (get data, create vector store for each etc) | ||
- Run `python evaluate.py` to run all evaluations | ||
- Results are found in in `./evaluation_results.csv` and `evaluation_results.png` | ||
|
||
## How to add a new evaluation | ||
|
||
- It is "pretty easy" to add a new evaluation. | ||
- (Note that `training_utils.py` already overloads some classes from the original training. Which means that you can modify this to your hearts content without affecting the main code. Enjoy~) | ||
- Step 1: Find the method you want to override and overload the class/method in `training_utils.py`. | ||
- Step 2: Add some if statements in `class ExperimentRunner` to ensure you dont break everything. | ||
- Step 3: Follow the ExperimentRunner templates in `run_all_training.py` to add whatever you added in Step 2 as a new experiment. | ||
- Give it a custom name so it is easy to understand what happens | ||
- Do not worry, the experiments are cached and won't run again if you have run them before. | ||
- Step 4: If you changed something from config, make sure you reset it. Since the file runs in one go, it will affect the following experiments otherwise. | ||
|
||
## How to add a new metric | ||
|
||
- In `evaluation_utils.py`, go to `class EvaluationProcessor`, add a new function that calculates your metric. (You can use the templates provided) | ||
- Update the metric in `self.metric_methods` | ||
- While running the evaluation, add them to your metrics list : | ||
```python | ||
metrics = ["precision", "recall", "map"] | ||
eval_path = Path("../data/evaluation/") | ||
processor = EvaluationProcessor(eval_path, sort_by=None, metrics=metrics) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
# Evaluating the AI search | ||
- The challenge with evaluation in this case was the lack of labels. To solve that, we created a simple streamlit app that let us label datasets according to a few tags. | ||
- The evaluation pipeline runs the entire RAG + Query LLM pipeline on the subset of labelled data. The RAG does not have access to the entire OpenML database but just the subset that was labelled. | ||
|
||
## Manual labelling | ||
### Streamlit labelling app | ||
- Refer to [labelling app](./labelling_tool.md) for more information. | ||
|
||
### Merging labels | ||
- Since there were multiple people who labelled the datasets, it was useful to have a script that would merge them to create a single dataframe. | ||
- The labels were generated per person using the labelling app and then merged into a single consistent dataframe using this script. | ||
- Refer to [merging labels](./merging_labels.md) for more information. | ||
|
||
### Consistency evaluation | ||
- Since multiple people labelled the same dataset differently, Kohn's Kappa score was used to evaluate the consistency of the labelling. A value of ~4.5 was obtained, which shows moderate consistency. | ||
|
||
## Running the evaluation | ||
- Refer to [run training](./evaluation) for more information |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Merging labels | ||
- Takes multiple JSON files as input and merges them into a single csv file with columns `Topics,Dataset IDs` | ||
|
||
## How to use | ||
- Place all the label.json files in the folder `/tools/data/all_labels` | ||
- Run `python merge_labels.py` from the `tools` directory. | ||
- The results would be present in `/data/evaluation/merged_labels.csv` |
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.