Note that for specific models, we need process the output prediction files.
python evaluation/calculate_best_acc.py
python evaluation/calculate_mean_relative_gain.py
python evaluation/calculate_stabability.py
python evaluation/calculate_topk_hit_ratio.py