Rework GPU runtime system and copies #1991

athas · 2023-07-21T21:52:24Z

This PR removes the explicit tracking of permutation information from LMADs. The motivation is primarily simplicity: the presence of permutations made some of the LMAD functions much more complicated. Further, it was not actually complete: it was perfectly possible to express e.g. a column-major array without actually making use of the permutation mechanism, simply by permuting the strides (and shape) of the LMAD instead.

The only thing we truly use the permutations for is to detect transpositions during copies. This can be done in another way: check whether the index function basis of the source is a permutation of the destination. An even better solution would be to dynamically check whether the involved LMADs express a transposition. This can be easily done in time quadratic to the rank of the arrays, which is usually very low (and the operations involved are integer comparisons).

Replaces rebasing, I hope.

We are actually still too conservative. A fully dynamic approach is needed.

Adds LMADCopy to Imp and implements generic code generation. Very slow, but at least functional for the C backend.

athas · 2023-07-28T08:44:47Z

This is no longer just about getting rid of permutations, but also how we handle copies in code generation. The new approach will allow much more dynamic behaviour, with the goal of moving more intelligence to the runtime system, where it is much easier to understand and debug.

athas · 2023-08-08T07:47:51Z

This change works quite well, but unfortunately there is one performance regression, on OptionPricing. This regression is due to the compiler now inserting fewer copies, but one of the remaining copies is now a manifest with a permutation of (2,1,0) (i.e. reversing the dimensions). Previously this was two manifests, both of which were transposes. We do not have an efficient GPU kernel for handling the reversal of array dimensions, so this ends up being rather inefficient.

athas · 2023-08-08T08:54:49Z

Oh, a simple manifest-manifest simplification rule solved that quite simply. and now OptionPricing is actually quite a bit faster than on master.

athas added 3 commits July 21, 2023 15:52

Remove permutations from LMADs.

a47a1ad

Fix this.

59ca859

Detect transpositions properly (sort of).

aaeb3da

athas added the run-benchmarks Makes GA run the benchmark suite. label Jul 21, 2023

athas added 25 commits July 22, 2023 00:07

Fix unittests.

9418f8e

This seems never sound.

876a05e

Merge branch 'master' into lmad-no-perm

bb43fd6

An ISPC write always has a varying index.

d6254cb

Make memory simplification parametric in the RuleBook.

100379d

Use inferred memory information when possible.

2723a96

No need for these permutations.

d59755b

Merge branch 'master' into lmad-no-perm

ac865c5

Merge branch 'master' into lmad-no-perm

b3a49ef

IxFun/LMAD.rank is useful.

0e67b72

New IxFun concept: embedding.

67bfc54

Replaces rebasing, I hope.

Half-baked testing.

b519de6

Remove rebasing.

63c1dd3

More readable this way.

ee0c3c7

embed=>expand, and also simplify it a bit.

1fb3ba8

I cannot believe this actually works.

e24d0e6

Better detection of transpositions.

7ccfb67

We are actually still too conservative. A fully dynamic approach is needed.

Just do the same as in the thread case.

ab177f2

We will need this cool function.

5e174c3

Begin rework of copying in codegen.

f71f4cc

Adds LMADCopy to Imp and implements generic code generation. Very slow, but at least functional for the C backend.

Add mechanism for optimised copy functions.

4672e99

Correct contiguousness checking.

36f2497

Start work on GPU copies.

35acf6f

Merge branch 'master' into lmad-no-perm

6cf7b92

Better handling of cost centre names.

0db4d43

athas added 3 commits August 2, 2023 15:32

Merge branch 'master' into lmad-no-perm

f448f53

Fix ISPC codegen.

15c0eec

Merge branch 'master' into lmad-no-perm

6fa39b5

athas mentioned this pull request Aug 3, 2023

HIP backend #2003

Closed

athas added 20 commits August 3, 2023 13:28

Restore basic Python support.

a381d4a

Centralise GPU local memory handling.

e5ce2c0

Implement LMAD copy for sequential Python.

d31df9d

Fix.

93a152f

Fix address calculation.

0539183

Fix PyOpenCL local memory.

a721bdf

Merge branch 'master' into lmad-no-perm

ce96ef5

Use efficient copies in PyOpenCL backend as well.

1dbddae

Implement efficient host<->gpu copying.

ae1aef3

Unify GPU copying.

231fea3

No need to export all the guts.

c14cbbd

Move more stuff out of boilerplate module.

08d5ee7

This is unused.

ebe4148

Simplify further.

05841e6

No need to keep these separate.

adc3e86

Eliminate need for boilerplate modules.

644a49e

Clean up these types.

28fc9ca

Proper error detection in these API functions.

1f02851

Properly ignore these.

4ae1316

Oops, these were the CUDA functions.

50ccf11

Manifest simplification rule.

ea30bb3

Fewer allocations needed.

5fc50c0

athas merged commit 13ee8c5 into master Aug 8, 2023
27 of 29 checks passed

athas deleted the lmad-no-perm branch August 8, 2023 13:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework GPU runtime system and copies #1991

Rework GPU runtime system and copies #1991

athas commented Jul 21, 2023

athas commented Jul 28, 2023

athas commented Aug 8, 2023

athas commented Aug 8, 2023

Rework GPU runtime system and copies #1991

Rework GPU runtime system and copies #1991

Conversation

athas commented Jul 21, 2023

athas commented Jul 28, 2023

athas commented Aug 8, 2023

athas commented Aug 8, 2023