You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your great implementation, very useful for the community. I have a question about the ModelStock N-model merging. I see a comment in the implementation here:
However, from the ModelStock paper fig Da in page 24, it seems like the theta is taken as the max of any two pairwise angles? I am wondering if you ran any N-model merging experiments, and saw any strange results or if they roughly followed that of the paper? I'd be curious to see if the mean or the max is the right aggregation method here.
Thanks and looking forward to your response.
The text was updated successfully, but these errors were encountered:
EDIT: the authors of the ModelStock confirmed that it is indeed the average of all pairwise angles that is used for computing the theta fo N-model merging, see this linked issue: naver-ai/model-stock#2
Hey,
Thanks for your great implementation, very useful for the community. I have a question about the ModelStock N-model merging. I see a comment in the implementation here:
mergekit/mergekit/merge_methods/model_stock.py
Lines 72 to 76 in 57e7d14
I see that you've taken a pairwise angles average for the implementation:
mergekit/mergekit/merge_methods/model_stock.py
Lines 78 to 91 in 57e7d14
However, from the ModelStock paper fig Da in page 24, it seems like the theta is taken as the max of any two pairwise angles? I am wondering if you ran any N-model merging experiments, and saw any strange results or if they roughly followed that of the paper? I'd be curious to see if the mean or the max is the right aggregation method here.
Thanks and looking forward to your response.
The text was updated successfully, but these errors were encountered: