-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REQUEST] Some questions about deepspeed sequence parallel #6708
Comments
@yingtongxiong , the recommended use of Deepspeed sequence parallelism (deepspeed ulysses) is to call it from a client framework/script. Please take a look at these two examples: Megatron-DeepSpeed, HugingFace transformer |
Okay Thank you very much |
https://github.com/microsoft/DeepSpeedExamples/blob/uly-hf/post_training/sequence_parallelism/test_ulysses.py#L113 I see in here, the mesh_param is commented, so I think if I want to use sp, this parameters should be transmitted, is it right? @samadejacobs |
Aslo, when I use sp all2all overlap, I found a little bug. DeepSpeed/deepspeed/sequence/layer.py Line 242 in a1b0c35
|
@yingtongxiong I think |
Thank you, And I don't know how to set sequence_parallel_size? In config? how to transmit to deepspeed.initialize? maybe can you give me an example? |
Sure, in the test it is here:
Note that the number of processes you want to launch is still the number of gpus you are using, so So, for example, if you're using Each data parallel group takes in a separate copy of the full train dataset, and each sequence parallel rank in each data parallel group needs to be given the slice of data according to that rank. You need to do this splitting yourself. You must also let the model know which parts of the sequence it's processing by providing a AFAIK, the above There is another attempt at integrating ulysses requiring changes across It seems likely that the changes to The main question I have is if we need to use this new loss if we're using |
@yingtongxiong: I will try to put together a working example soon (if what I have is actually working) |
Thank you very much |
okay thank you |
Hello, I want to run sequence parallel on pure deepspeed repo. However, I found that it is necessary to let developer to create sequence parallel process group, is it right? I want to know there is any solutions to use sequence parallel or MoE(which also requires expert_data_process_group and so on) on pure deepspeed.
The text was updated successfully, but these errors were encountered: