Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework GPU runtime system and copies #1991

Merged
merged 121 commits into from
Aug 8, 2023
Merged

Rework GPU runtime system and copies #1991

merged 121 commits into from
Aug 8, 2023

Conversation

athas
Copy link
Member

@athas athas commented Jul 21, 2023

This PR removes the explicit tracking of permutation information from LMADs. The motivation is primarily simplicity: the presence of permutations made some of the LMAD functions much more complicated. Further, it was not actually complete: it was perfectly possible to express e.g. a column-major array without actually making use of the permutation mechanism, simply by permuting the strides (and shape) of the LMAD instead.

The only thing we truly use the permutations for is to detect transpositions during copies. This can be done in another way: check whether the index function basis of the source is a permutation of the destination. An even better solution would be to dynamically check whether the involved LMADs express a transposition. This can be easily done in time quadratic to the rank of the arrays, which is usually very low (and the operations involved are integer comparisons).

@athas athas added the run-benchmarks Makes GA run the benchmark suite. label Jul 21, 2023
@athas
Copy link
Member Author

athas commented Jul 28, 2023

This is no longer just about getting rid of permutations, but also how we handle copies in code generation. The new approach will allow much more dynamic behaviour, with the goal of moving more intelligence to the runtime system, where it is much easier to understand and debug.

@athas athas mentioned this pull request Aug 3, 2023
@athas
Copy link
Member Author

athas commented Aug 8, 2023

This change works quite well, but unfortunately there is one performance regression, on OptionPricing. This regression is due to the compiler now inserting fewer copies, but one of the remaining copies is now a manifest with a permutation of (2,1,0) (i.e. reversing the dimensions). Previously this was two manifests, both of which were transposes. We do not have an efficient GPU kernel for handling the reversal of array dimensions, so this ends up being rather inefficient.

@athas
Copy link
Member Author

athas commented Aug 8, 2023

Oh, a simple manifest-manifest simplification rule solved that quite simply. and now OptionPricing is actually quite a bit faster than on master.

@athas athas merged commit 13ee8c5 into master Aug 8, 2023
27 of 29 checks passed
@athas athas deleted the lmad-no-perm branch August 8, 2023 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-benchmarks Makes GA run the benchmark suite.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant