GLM Model Difference when running with Standardization, Weights, and Beta Constraints #15519
Unanswered
hasithjp
asked this question in
Technical Notes
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Problem
When running with weights, the expectation is that it will produce the same model as up-sampling the training dataset. For example if the class 1 samples are duplicated 5 times and added back into the original training dataset, the results of such a model should equate running the original dataset with weight of 6 for the class 1 response. Which is what we see in the below experiment where standardization is turned on without beta constraints:
However if beta_constraints were added, the up-sampled and weighted cases produce different GLM models with different coefficients:
Explanation
Priors is standardized and changed when standardization is turned on for the model build
When standardization is turned on, the beta_given in beta_constraints, is standardized as well. In the code, you see that the beta given is multiplied by factor d : _betaGiven *= d; where d = 1/sd. In particular be careful when using previous coefficients as priors and with standardization turned on because the penalty taken on different priors.
Weighted Variance
When weights is turned on the variance will differ from the variance in an up-sampled dataset. The way variance is typically calculated is:
When using weights you calculate the weighted variance as:
So when the sum of weights equals the number of observations you have the exact same variance otherwise your variance will differ by a factor of approximately N/N-1 which is relatively small difference but something you will observe in your resulting coefficients.
Solution
If you want to supply the beta constraints for a standardized model build scale your bounds and priors in beta_constraints down by the variance. So that you will have (1/d)* betaGiven *= d; which equals betaGiven.
JIRA Issue Migration Info
Jira Issue: TN-10
Assignee: Amy Wang
Reporter: Amy Wang
State: Closed
Beta Was this translation helpful? Give feedback.
All reactions