Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[XLA:MSA] Fix the AllocationRequest for window prefetch #21736

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

copybara-service[bot]
Copy link

@copybara-service copybara-service bot commented Jan 23, 2025

[XLA:MSA] Fix the AllocationRequest for window prefetch

We currently have two performance issues in window prefetch.

The first is that we specified the created WindowPrefetchedAllocation to consume CopyResources by using non-zero shape. This could consume all resources and interfere with the prefetching decision of other tensors. In fact, we don't have prefetching implemented yet, so we could specify using zero CopyResource.

Another issue is that the generated allocation from window prefetch currently spans too long in time. Its earliest prefetch time is set to the operand's define time, and its end is the use time. We should keep the earliest prefetch time to be as close as the use time as possible. This is for keeping the interference to the prefetching of other tensors to be minimal.

We updated WindowPrefetch() to simply just allocate chunk for the exposed span vmem at this moment. Since we don't call Prefetch() from WindowPrefetch(), we can simplify the data structure of AllocationRequest and PrefetchContext a bit.

@copybara-service copybara-service bot force-pushed the test_718439738 branch 3 times, most recently from 0e7d18d to 4de2457 Compare January 24, 2025 23:59
We currently have two performance issues in window prefetch.

The first is that we specified the created WindowPrefetchedAllocation to consume CopyResources by using non-zero shape. This could consume all resources and interfere with the prefetching decision of other tensors. In fact, we don't have prefetching implemented yet, so we could specify using zero CopyResource.

Another issue is that the generated allocation from window prefetch currently spans too long in time. Its earliest prefetch time is set to the operand's define time, and its end is the use time. We should keep the earliest prefetch time to be as close as the use time as possible. This is for keeping the interference to the prefetching of other tensors to be minimal.

We updated WindowPrefetch() to simply just allocate chunk for the exposed span vmem at this moment. Since we don't call Prefetch() from WindowPrefetch(), we can simplify the data structure of AllocationRequest and PrefetchContext a bit.

PiperOrigin-RevId: 718439738
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant