Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the code of Position ID Rearrangement correct? #5

Open
TimeTrapzz opened this issue Jan 7, 2025 · 0 comments
Open

Is the code of Position ID Rearrangement correct? #5

TimeTrapzz opened this issue Jan 7, 2025 · 0 comments

Comments

@TimeTrapzz
Copy link

    if past_key_value is not None:
            cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)

            # Prepare full_position_ids for the keys (from the cache)
            full_position_ids = torch.arange(
                    0, past_key_value.seen_tokens, dtype=torch.long, device=query_states.device
                )
            full_position_ids = full_position_ids.unsqueeze(0)

    key_states = apply_single_rotary_pos_emb(key_states, cos, sin, full_position_ids)

Based on the code provided in your repository, the position IDs for key_states have been reassigned and rotated. However, each document block has already been assigned corresponding positions when the KV cache is prefilled. After concatenating these KV caches, the original positions are [0, 1, 2, ... , l, 0, 1, 2, ..., l, 0, 1, 2, ..., l]. If the rotation in the aforementioned code is applied, shouldn't the result be in the form of [0, 2, 4, ..., 2l, l+1, l+3]? This doesn't seem to be in line with expectations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant