Replies: 1 comment
-
Marking as stale. No activity in 60 days. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Your question
Is there any explanation about this? Seems the difference between LinearWithFrozenWeight and LinearWithGradAccumulationAndAsyncCommunication is that LinearWithFrozenWeight does not calculate the gradient of weight. This looks like a performance optimization, but turns out this could cause different results.
Beta Was this translation helpful? Give feedback.
All reactions