Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel] Upload a MoE config file for Mixtral8x7B 8GPU on AMD_Instinct_MI300X_OAM machine (fp16) #261

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Jacob0226
Copy link

This is just a tuned MoE config file which includes boarder examples.
The Jira ticket GPUAI-2067 adopts this config for better MoE performance.

@shajrawi
Copy link
Collaborator

@divakar-amd can you please review? see details here https://ontrack-internal.amd.com/browse/GPUAI-2067

@Jacob0226 I cannot access the documentation wiki in JIRA - can you please explain how the tuning results were gathered?

@Jacob0226
Copy link
Author

@divakar-amd can you please review? see details here https://ontrack-internal.amd.com/browse/GPUAI-2067

@Jacob0226 I cannot access the documentation wiki in JIRA - can you please explain how the tuning results were gathered?

I share the ticket to both of you.
I use the script benchmarks/kernels/benchmark_mixtral_moe_rocm.py to get the runtime on various bs values.
For the Mixtral8x7B, it needs more bs value and this new MoE config basically cover bs value from 1 to 32768. The original one in the vllm repo has a max number of bs 4000.

@Jacob0226
Copy link
Author

The jira ticket may contain too much info you don't need here.
When I benchmark Mixtral with vLLM, I found I need more bs config in MoE, so I use benchmark_mixtral_moe_rocm.py to do the tunings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants