USP: A Unified Sequence Parallelism Approach for Long Context Generative AI

Chinese Blog 1; Chinese Blog 2

DeepSpeed-Ulysses and Ring-Attention are not mutually exclusive options. Both should be used in a mixed manner to jointly split the sequence dimension. By adjusting their parallelism degrees to ensure that ulysses-degree multiplied by ring-degree equals sp-degree, we refer to this as Unified-SP. The advantage of Unified-SP is that it encompasses the capabilities of both original methods without any loss, only offering additional benefits. Firstly, it eliminates the restriction that Ulysses' sp-degree must be less than the number of attention heads. Moreover, the communication pattern of mixed parallelism is more friendly to heterogeneous networks, providing acceleration over PCIe and in multi-machine multi-GPU environments compared to using Ulysses or Ring alone. Therefore, we recommend using the Unified-SP implementation as the default sequence parallelism solution.

In xDiT, we utilize the USP implementation from feifeibear/long-context-attention. Since DiT does not use Causal Attention, there is no need for load balancing operations on Ring-Attention. For more details, please refer to the following paper.

@article{fang2024unified,
  title={USP: A Unified Sequence Parallelism Approach for Long Context Generative AI},
  author={Fang, Jiarui and Zhao, Shangchun},
  journal={arXiv preprint arXiv:2405.07719},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

usp.md

usp.md

USP: A Unified Sequence Parallelism Approach for Long Context Generative AI

Files

usp.md

Latest commit

History

usp.md

File metadata and controls

USP: A Unified Sequence Parallelism Approach for Long Context Generative AI