Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

N-model ModelStock merging #453

Closed
vishaal27 opened this issue Nov 5, 2024 · 1 comment
Closed

N-model ModelStock merging #453

vishaal27 opened this issue Nov 5, 2024 · 1 comment

Comments

@vishaal27
Copy link

Hey,

Thanks for your great implementation, very useful for the community. I have a question about the ModelStock N-model merging. I see a comment in the implementation here:

# now there is a question of how to come up with a value for theta.
# in the two-vector case, we can get an exact angle between the two vectors
# but the paper doesn't explicitly say what to do in the multi-vector case -
# they keep using a singular theta value and don't elaborate on how to
# calculate it. i'm going to assume an average of pairwise angles for now? i guess?
.

I see that you've taken a pairwise angles average for the implementation:

cos_thetas = []
for i, w_0_offset in enumerate(offsets):
for j in range(i + 1, len(offsets)):
w_1_offset = offsets[j]
norm_product = torch.norm(w_0_offset, dim=-1) * torch.norm(
w_1_offset, dim=-1
)
cos_theta = (
(w_0_offset * w_1_offset).sum(dim=-1) / norm_product.clamp(min=1e-6)
).clamp(-1, 1)
cos_thetas.append(cos_theta)
cos_theta = torch.stack(cos_thetas).mean(dim=0).unsqueeze(-1)

However, from the ModelStock paper fig Da in page 24, it seems like the theta is taken as the max of any two pairwise angles? I am wondering if you ran any N-model merging experiments, and saw any strange results or if they roughly followed that of the paper? I'd be curious to see if the mean or the max is the right aggregation method here.

Thanks and looking forward to your response.

@vishaal27
Copy link
Author

EDIT: the authors of the ModelStock confirmed that it is indeed the average of all pairwise angles that is used for computing the theta fo N-model merging, see this linked issue: naver-ai/model-stock#2

@cg123 cg123 closed this as completed Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants