TopP&R: Robust Support Estimation Approach for Evaluating Fidelity and Diversity in Generative Models
Pumjun Kim, Yoojin Jang, Jisu Kim, Jaejun Yoo
Paper | Project Page | Quick Start: TopPR | Colab
- Our TopP&R is accepted by NeurIPS 2023 🎉!
We propose a robust and reliable evaluation metric for generative models called Topological Precision and Recall (TopP&R, pronounced “topper”), which systematically estimates supports by retaining only topologically and statistically significant features with a certain level of confidence. Existing metrics, such as Inception Score (IS), Fréchet Inception Distance (FID), and various Precision and Recall (P&R) variants, rely heavily on support estimates derived from sample features. However, the reliability of these estimates has been overlooked, even though the quality of the evaluation hinges entirely on their accuracy. In this paper, we demonstrate that current methods not only fail to accurately assess sample quality when support estimation is unreliable, but also yield inconsistent results. In contrast, TopP&R reliably evaluates the sample quality and ensures statistical consistency in its results. Our theoretical and experimental findings reveal that TopP&R provides a robust evaluation, accurately capturing the true trend of change in samples, even in the presence of outliers and non-independent and identically distributed (Non-IID) perturbations where other methods result in inaccurate support estimation. To our knowledge, TopP&R is the first evaluation metric specifically focused on the robust estimation of supports, offering statistical consistency under noisy conditions.
The proposed metric TopP&R is defined in the following three steps: (a) Confidence band estimation with bootstrapping in section 2, (b) Robust support estimation, and (c) Evaluation via TopP&R in section 3 of our paper.
We define the precision and recall of data points as
In practice,
The kernel bandwidths
Our method can be used by pip
command!
pip install top-pr
In this example, we evaluate mode drop case. Please consider that we fix the seed number for random projection with a linear layer in top_pr/top_pr.py
. If you want to evaluate with PRDC, please refer the metric and install prdc package.
# Call packages
import matplotlib.pyplot as plot
import numpy as np
# Call mode drop example case
from top_pr import mode_drop
# Call metrics
from top_pr import compute_top_pr as TopPR
# For comparison to PRDC, use this. 'pip install prdc'
from prdc import compute_prdc
# Evaluation step
start = 0
for Ratio in [0, 1, 2, 3, 4, 5, 6]:
# Define real and fake dataset
REAL = mode_drop.gaussian_mode_drop(method = 'sequential', ratio = 0)
FAKE = mode_drop.gaussian_mode_drop(method = 'sequential', ratio = Ratio)
# Evaluation with TopPR
Top_PR = TopPR(real_features=REAL, fake_features=FAKE, alpha = 0.1, kernel = "cosine", random_proj = True, f1_score = True)
# Evaluation with P&R and D&C
PR = compute_prdc(REAL, FAKE, 3)
DC = compute_prdc(REAL, FAKE, 5)
if (start == 0):
pr = [PR.get('precision'), PR.get('recall')]
dc = [DC.get('density'), DC.get('coverage')]
Top_pr = [Top_PR.get('fidelity'), Top_PR.get('diversity'), Top_PR.get('Top_F1')]
start = 1
else:
pr = np.vstack((pr, [PR.get('precision'), PR.get('recall')]))
dc = np.vstack((dc, [DC.get('density'), DC.get('coverage')]))
Top_pr = np.vstack((Top_pr, [Top_PR.get('fidelity'), Top_PR.get('diversity'), Top_PR.get('Top_F1')]))
# Visualization of Result
x = [0, 0.17, 0.34, 0.50, 0.67, 0.85, 1]
fig = plot.figure(figsize = (12,3))
for i in range(1,3):
axes = fig.add_subplot(1,2,i)
# Fidelity
if (i == 1):
axes.set_title("Fidelity",fontsize = 15)
plot.ylim([0.5, 1.5])
plot.plot(x, Top_pr[:,0], color = [255/255, 110/255, 97/255], linestyle = '-', linewidth = 3, marker = 'o', label = "TopP")
plot.plot(x, pr[:,0], color = [77/255, 110/255, 111/255], linestyle = ':', linewidth = 3, marker = 'o', label = "precision (k=3)")
plot.plot(x, dc[:,0], color = [15/255, 76/255, 130/255], linestyle = '-.', linewidth = 3, marker = 'o', label = "density (k=5)")
plot.plot(x, np.linspace(1.0, 1.0, 11), color = 'black', linestyle = ':', linewidth = 2)
plot.legend(fontsize = 9)
# Diversity
elif (i == 2):
axes.set_title("Diversity",fontsize = 15)
plot.plot(x, Top_pr[:,1], color = [255/255, 110/255, 97/255], linestyle = '-', linewidth = 3, marker = 'o', label = "TopR")
plot.plot(x, pr[:,1], color = [77/255, 110/255, 111/255], linestyle = ':', linewidth = 3, marker = 'o', label = "recall (k=3)")
plot.plot(x, dc[:,1], color = [15/255, 76/255, 130/255], linestyle = '-.', linewidth = 3, marker = 'o', label = "coverage (k=5)")
plot.plot(x, np.linspace(1.0, 0.14, 11), color = 'black', linestyle = ':', linewidth = 2)
plot.legend(fontsize = 9)
Above test code will result in the following figure.
# Evaluation step
start = 0
for Ratio in [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]:
# Define real and fake dataset
REAL = mode_drop.gaussian_mode_drop(method = 'simultaneous', ratio = 0)
FAKE = mode_drop.gaussian_mode_drop(method = 'simultaneous', ratio = Ratio)
# Evaluation with TopPR
Top_PR = TopPR(real_features=REAL, fake_features=FAKE, alpha = 0.1, kernel = "cosine", random_proj = True, f1_score = True)
# Evaluation with P&R and D&C
PR = compute_prdc(REAL, FAKE, 3)
DC = compute_prdc(REAL, FAKE, 5)
if (start == 0):
pr = [PR.get('precision'), PR.get('recall')]
dc = [DC.get('density'), DC.get('coverage')]
Top_pr = [Top_PR.get('fidelity'), Top_PR.get('diversity'), Top_PR.get('Top_F1')]
start = 1
else:
pr = np.vstack((pr, [PR.get('precision'), PR.get('recall')]))
dc = np.vstack((dc, [DC.get('density'), DC.get('coverage')]))
Top_pr = np.vstack((Top_pr, [Top_PR.get('fidelity'), Top_PR.get('diversity'), Top_PR.get('Top_F1')]))
# Visualization of Result
x = [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
fig = plot.figure(figsize = (12,3))
for i in range(1,3):
axes = fig.add_subplot(1,2,i)
# Fidelity
if (i == 1):
axes.set_title("Fidelity",fontsize = 15)
plot.ylim([0.5, 1.5])
plot.plot(x, Top_pr[:,0], color = [255/255, 110/255, 97/255], linestyle = '-', linewidth = 3, marker = 'o', label = "TopP")
plot.plot(x, pr[:,0], color = [77/255, 110/255, 111/255], linestyle = ':', linewidth = 3, marker = 'o', label = "precision (k=3)")
plot.plot(x, dc[:,0], color = [15/255, 76/255, 130/255], linestyle = '-.', linewidth = 3, marker = 'o', label = "density (k=5)")
plot.plot(x, np.linspace(1.0, 1.0, 7), color = 'black', linestyle = ':', linewidth = 2)
plot.legend(fontsize = 9)
# Diversity
elif (i == 2):
axes.set_title("Diversity",fontsize = 15)
plot.plot(x, Top_pr[:,1], color = [255/255, 110/255, 97/255], linestyle = '-', linewidth = 3, marker = 'o', label = "TopR")
plot.plot(x, pr[:,1], color = [77/255, 110/255, 111/255], linestyle = ':', linewidth = 3, marker = 'o', label = "recall (k=3)")
plot.plot(x, dc[:,1], color = [15/255, 76/255, 130/255], linestyle = '-.', linewidth = 3, marker = 'o', label = "coverage (k=5)")
plot.plot(x, np.linspace(1.0, 0.14, 7), color = 'black', linestyle = ':', linewidth = 2)
plot.legend(fontsize = 9)
Above test code will result in the following figure.
If you find this repository useful for your research, please cite the following work.
@article{kim2023topp,
title={TopP$\backslash$\&R: Robust Support Estimation Approach for Evaluating Fidelity and Diversity in Generative Models},
author={Kim, Pum Jun and Jang, Yoojin and Kim, Jisu and Yoo, Jaejun},
journal={arXiv preprint arXiv:2306.08013},
year={2023}
}