How to interpret metrics from a training run? #78

Guillawme · 2021-01-18T11:50:26Z

Guillawme
Jan 18, 2021

Hello,

As I am trying to use Topaz more, I figured I should take a closer look at the training results, so I decided to systematically plot the log file in addition to inspecting the picks visually. But I don't know much about machine learning, so I am not sure how to interpret these plots. Here is an attempt below, I would appreciate any comments. Thank you in advance. 😃

Loss

As I understand it, this metric should go down during training, so the following plot looks ok to me.

GE penalty

Should this metric also go down (or stay down)? Looks like it does anyway.

Precision

This one takes values between 0 and 1. It will always approach 1 for the training set, because this is one of the parameters being optimized in the training procedure. What we want is that it also approaches 1 for the test set (micrographs and coordinates never seen in training, used to assess how well the model predicts on new data). By this metric, this training run doesn't look very good. Or could the low precision measured on the test set result from sparsely picked micrographs? (I mean sparsely picked by me when preparing the training set, so the model likely picked a lot more particles than were labeled in the test set).

True/false positive rate

Both take values between 0 and 1. We want tpr close to 1 and fpr close to 0, especially for the test set.

Test set

By this metric, it looks to me that my trained model doesn't perform much better than the built-in one.

Training set

Area under precision/recall curve

Based on this article: https://en.wikipedia.org/wiki/Precision_and_recall

Both precision and recall vary in a range from 0 to 1, so when plotting one versus the other, the resulting curve can have a maximum area of 1. Training is optimal when auprc approaches this value of 1 for the test set (the set of images and coordinates that the neural net did not see during its training).

Looks like this training run is far from optimal. Unless in practice it's normal for auprc to not reach anywhere closer to 1?

Answered by tbepler

Jan 22, 2021

An important thing to note for precision and AUPR/average-precision scores: these only go to 1 for a perfect classifier if all of the ground truth particles are labeled! Therefore, we should not expect this to go to 1. Here's a rough example:

Let A be the number of ground truth positives, let B be the number of predicted positives, and let TP be the number of true positives, that is the number of ground truth positives that are also predicted positives (A U B). The precision is TP/B. If all of the predicted positives are ground truth positives, then precision=1.

Now, imagine that A is incompletely labeled. What if instead of having all ground truth positives, A, we only have a labeled sub…

View full answer

tbepler · 2021-01-22T04:05:45Z

tbepler
Jan 22, 2021
Maintainer

An important thing to note for precision and AUPR/average-precision scores: these only go to 1 for a perfect classifier if all of the ground truth particles are labeled! Therefore, we should not expect this to go to 1. Here's a rough example:

Let A be the number of ground truth positives, let B be the number of predicted positives, and let TP be the number of true positives, that is the number of ground truth positives that are also predicted positives (A U B). The precision is TP/B. If all of the predicted positives are ground truth positives, then precision=1.

Now, imagine that A is incompletely labeled. What if instead of having all ground truth positives, A, we only have a labeled subsample, A'. Given a perfect predictor (i.e. B = A), then TP' is A', but we predict all of A. This means our precision is TP/A = A'/A! Therefore, a perfect predictor would only achieve a precision of A'/A which is the fraction of ground truth positives that are labeled! The AUPR is similarly upper bounded.

1 reply

Guillawme Jan 22, 2021
Author

This makes sense, thank you. I indeed suspected an incomplete labeling would alter these metrics, but did not really know how exactly.

Since I posted my first message, I ran other trainings with slightly different parameters, and it seems I did not downscale the micrographs enough for this first run: I got much better tpr/fpr and auprc with more downscaled mics. Will post the new plots later today.

Guillawme · 2021-01-22T14:44:30Z

Guillawme
Jan 22, 2021
Author

So, it looks like I was not downscaling the micrographs enough. I used a factor 4 for the training run described above. When I used a factor 10 (and a few more particles labeled in the training set too), I got much better tpr/fpr and auprc:

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to interpret metrics from a training run? #78

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

How to interpret metrics from a training run? #78

Guillawme Jan 18, 2021

Loss

GE penalty

Precision

True/false positive rate

Test set

Training set

Area under precision/recall curve

Replies: 2 comments · 1 reply

tbepler Jan 22, 2021 Maintainer

Guillawme Jan 22, 2021 Author

Guillawme Jan 22, 2021 Author

Guillawme
Jan 18, 2021

Replies: 2 comments 1 reply

tbepler
Jan 22, 2021
Maintainer

Guillawme Jan 22, 2021
Author

Guillawme
Jan 22, 2021
Author