Fix Rope Compatibility with Cos/Sin Position Embedding for Batch Size > 1 #477

wizyoung · 2024-12-14T08:04:16Z

Summary

Fix Rope Compatibility with Cos/Sin Position Embedding for Batch Size > 1

This PR addresses an issue with the compatibility of the ROPE implementation when using cosine/sine position embeddings with a batch size greater than 1.
In the default behavior of transformers, position_ids is set to None during training, which results in the following computation:

cache_position = torch.arange(seq_len)
position_ids = cache_position.unsqueeze(0)

This leads to the shape of the position embeddings being (1, seq_len, head_dim), which is consistent with the implementation in Liger.
However, if position_ids are pre-calculated for any reason(In my experiment, I implement m-rope in another different way making position_ids pre-calculated), the current implementation fails to handle this scenario correctly. This PR introduces a fix to ensure that the ROPE implementation can accommodate pre-computed position_ids.
In the unit test test_ropy.py, I have added a variable expand_position_ids to simulate this condition. The previous implementation fails under this scenario, while the new patch successfully resolves the issue.

pytest details:

Testing Done

Hardware Type:
run make test to ensure correctness
run make checkstyle to ensure code style
run make test-convergence to ensure convergence

wizyoung and others added 11 commits September 26, 2024 22:02

FIX: tl.program_id() does indeed not have a cast method in triton2.3.1

b1d3781

Merge branch 'main' into main

00f47dd

Merge branch 'main' into main

dc07725

Merge branch 'main' into main

ede50df

Merge branch 'linkedin:main' into main

ae7e13b

Merge branch 'linkedin:main' into main

d90f5fc

Merge branch 'linkedin:main' into main

208d536

fix rope cos sin tensor shape

35e8c4c

Refactor rope tensor handling for batch and sequence offsets

47f7120

fix rope

fe9abd6

fix style

f3c29c8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Rope Compatibility with Cos/Sin Position Embedding for Batch Size > 1 #477

Fix Rope Compatibility with Cos/Sin Position Embedding for Batch Size > 1 #477

wizyoung commented Dec 14, 2024

Fix Rope Compatibility with Cos/Sin Position Embedding for Batch Size > 1 #477

Are you sure you want to change the base?

Fix Rope Compatibility with Cos/Sin Position Embedding for Batch Size > 1 #477

Conversation

wizyoung commented Dec 14, 2024

Summary

Testing Done