Resume Evaluation

In practice, evaluation processes may occasionally fail due to various technical issues, such as model request network interruptions, system failures, or unexpected errors. To ensure the continuity and integrity of the evaluation, follow these steps to effectively restart and complete the process.

Example Scenario:

If the evaluation process for the GPT-4o-Audio model with the dataset my_dataset fails due to a model request network interruption, the last checkpoint is saved in the res/gpt4o_audio/last_res.jsonl file.

To restart the evaluation process, follow these steps:

python audio_evals/main.py --dataset my_dataset --model gpt4o_audio --resume res/gpt4o_audio/last_res.jsonl

This command will resume the evaluation from the last saved checkpoint, ensuring that the process continues seamlessly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Procedures for Restarting an Incomplete Evaluation.md

Procedures for Restarting an Incomplete Evaluation.md

Resume Evaluation

Files

Procedures for Restarting an Incomplete Evaluation.md

Latest commit

History

Procedures for Restarting an Incomplete Evaluation.md

File metadata and controls

Resume Evaluation