Skip to content
This repository has been archived by the owner on Jul 24, 2024. It is now read-only.

Loss, entropy, accuracy trends #188

Open
slala2121 opened this issue Jan 27, 2022 · 5 comments
Open

Loss, entropy, accuracy trends #188

slala2121 opened this issue Jan 27, 2022 · 5 comments

Comments

@slala2121
Copy link

I'm trying to understand the relationships and trends among these quantities.

From some experiments, I find that

  1. as the training loss declines, the entropy of the distribution increases rather than decreases. This seems plausible because near convergence, the probability scores for all the negative examples are all relatively low and equal while the prob. score for the positive example increases.

  2. for a small training dataset (10^3) samples, I find that the accuracy declines however. Why might this occur?

Thanks.

@chentingpc
Copy link
Contributor

chentingpc commented Jan 28, 2022

as the training loss declines, the entropy of the distribution increases rather than decreases. This seems plausible because near convergence, the probability scores for all the negative examples are all relatively low and equal while the prob. score for the positive example increases.

yes, it becomes more certain what positive is as it trains.

for a small training dataset (10^3) samples, I find that the accuracy declines however. Why might this occur?

is it overfitting? Otherwise the hparam may be problematic, like learning rate is too big (if you warmup too long learning rate will be too big after certain epochs, then training would be worse).

@slala2121
Copy link
Author

slala2121 commented Jan 28, 2022 via email

@chentingpc
Copy link
Contributor

chentingpc commented Jan 29, 2022 via email

@slala2121
Copy link
Author

Okay. Then I'm not sure why overfitting would occur since the accuracy is measured over the same samples as the training dataset.

@sagi-ezri
Copy link

It is possible that as the training loss declines, the model becomes more confident in its predictions, which can lead to an increase in the entropy of the output distribution. This can happen because the model assigns higher probabilities to correct predictions and lower probabilities to incorrect predictions, which results in a narrower distribution and higher entropy.

Regarding the second observation, one possibility is that the model is too complex for the small training dataset, and therefore, it fails to generalize well to new examples. In this case, reducing the model's complexity or collecting more training data could potentially improve performance.
Another possibility is that the model is underfitting the training data, which can also result in poor accuracy. Underfitting occurs when the model is not complex enough to capture the underlying patterns in the data. In this case, increasing the model's complexity or changing the architecture may be helpful.
Finally, it is also possible that the accuracy measure being used is not sensitive enough to detect differences in performance. In such cases, other evaluation metrics such as precision, recall, or F1 score may be more appropriate to use.

I hope this helps clarify these issues.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants