[DO NOT MERGE] Ranran hide a2a #1029

RissyRan · 2024-11-12T23:04:20Z

No description provided.

…ound.py !

wang2yn84

What does X, E and M stands for here?

RissyRan · 2024-11-13T05:44:55Z

What does X, E and M stands for here?

Could you point me to the specific line? X/E should be expert, and M should be model dimension.

wang2yn84 · 2024-11-13T05:58:46Z

What does X, E and M stands for here?

Could you point me to the specific line? X/E should be expert, and M should be model dimension.

Sure, I'm looking at jnp.einsum("BXM,XEM -> BXE") from hide_ff2_a2a.py

RissyRan · 2024-11-13T06:02:25Z

What does X, E and M stands for here?

Could you point me to the specific line? X/E should be expert, and M should be model dimension.

Sure, I'm looking at jnp.einsum("BXM,XEM -> BXE") from hide_ff2_a2a.py

I think this is one example for @gobbleturk's test? X seems sequence here, M is model dimension, E is MLP dimension

gobbleturk · 2024-11-13T10:44:36Z

What does X, E and M stands for here?

Could you point me to the specific line? X/E should be expert, and M should be model dimension.

Sure, I'm looking at jnp.einsum("BXM,XEM -> BXE") from hide_ff2_a2a.py

I think this is one example for @gobbleturk's test? X seems sequence here, M is model dimension, E is MLP dimension

This script is testing the second a2a at the end of the feed forward layer where we move from activations sharded on the expert dimension to activations sharded on the batch dimension
This is a toy script so I combine sequence and batch into one dimension "B" representing token batch

X is for experts
E is for embed (the smaller dimension, often called "hidden dim" or "model dim")
M is for MLP (the larger dimension, often called "intermediate dim" , or "FF dim")

gobbleturk and others added 18 commits September 13, 2024 17:38

Toy hide a2a

d9d5b78

working on initial hide a2a

49049e8

Will test actually assingin input chunk to shard map result

c12f550

Initial chunking behavior done

7a478a1

The debuggining starts!

817afa5

Success!

a50673c

Working overlap EP + FSDP with shard mapgit add MaxText/my_a2a_playgr…

efd0d88

…ound.py !

Hide ff2 a2a as well!

bfb0c4e

Ensure same inputs before and after a2a

ec51ab8

Hide please

45a82af

pain among pain

498be64

Add custom config

8aa74d9

Add flexibility of configs

622f1c8

Add more flexibility to configs

94e675f

Move EP ahead

d4b86b9

fix conflict

bc5e7e2

Merge branch 'mattdavidow-hide-a2a' into ranran-hide-a2a

f159c7e

fix conflict

511c0bc

wang2yn84 reviewed Nov 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DO NOT MERGE] Ranran hide a2a #1029

[DO NOT MERGE] Ranran hide a2a #1029

RissyRan commented Nov 12, 2024

wang2yn84 left a comment

RissyRan commented Nov 13, 2024

wang2yn84 commented Nov 13, 2024

RissyRan commented Nov 13, 2024

gobbleturk commented Nov 13, 2024 •

edited

Loading

[DO NOT MERGE] Ranran hide a2a #1029

Are you sure you want to change the base?

[DO NOT MERGE] Ranran hide a2a #1029

Conversation

RissyRan commented Nov 12, 2024

wang2yn84 left a comment

Choose a reason for hiding this comment

RissyRan commented Nov 13, 2024

wang2yn84 commented Nov 13, 2024

RissyRan commented Nov 13, 2024

gobbleturk commented Nov 13, 2024 • edited Loading

gobbleturk commented Nov 13, 2024 •

edited

Loading