Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Routing #118

Open
alexliap opened this issue Jun 27, 2024 · 1 comment
Open

Routing #118

alexliap opened this issue Jun 27, 2024 · 1 comment

Comments

@alexliap
Copy link

Is the router implemented the noisy top k routing suggested by the OUTRAGEOUSLY LARGE NEURAL NETWORKS:
THE SPARSELY-GATED MIXTURE-OF-EXPERTS LAYER
paper?

In the router code you seem to apply the noise at the input of the router and not at the router scores like in the paper above:

 def forward(self, x):
        if self.training and self.args.moe_jitter_eps is not None:
            x = x * self.jitter(x)

        scores = self.layer(x.view(-1, x.shape[-1])).softmax(dim=-1)
        expert_weights, expert_indices = self._top_k(scores)
        if self.args.moe_normalize_expert_weights:
            expert_weights = expert_weights / torch.norm(
                expert_weights, p=self.args.moe_normalize_expert_weights,dim=-1, keepdim=True)

        expert_indices = (
            _uniform_expert_assignment(expert_indices, self.args.moe_num_experts)
            if self.args.uniform_expert_assignment else expert_indices
        )
        return scores, expert_weights, expert_indices

In the aforementioned paper the noisy top k works like:
image

Is this somehting equivalent? I am not trying to argue that it is wrong, but i was just trying to figure out if this is the same.

@alexliap alexliap changed the title Routing Routing #question Jun 27, 2024
@alexliap alexliap changed the title Routing #question Routing Jun 27, 2024
@mvpatel2000
Copy link
Contributor

@tgale96 what do you think since you implemented this? It does seem different to me but not sure if it was pulled from some other paper

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants