Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIE2] Combine 2D/3D post-increments for 512-bit load/stores #208

Merged
merged 1 commit into from
Oct 10, 2024

Conversation

gbossu
Copy link
Collaborator

@gbossu gbossu commented Oct 8, 2024

QoR results are mixed, but I believe this is a change in the right direction as this is simplifying the MIR. Those combines are also needed if we want to post-pipeline benchmarks.

| Core_Compute_Cycle_Count      | Mul2D_0      | Mul2D_1      | Conv2D_2x8_0 | Mul2d_bf16_0 | ReduceSumAxis_7_aie2_int8 | ReduceMeanAxis_7_aie2_int8 | Mul2d_bf16_1 | Add2D_bf16_1 | Add2D_bf16_0 | ReduceSumAxis_6_aie2_int8 | ReduceSumAxis_3_aie2_int8 | ReduceSumAxis_5_aie2_int8 | ReduceMeanAxis_6_aie2_int8 | ReduceMeanAxis_3_aie2_int8 | ReduceMeanAxis_5_aie2_int8 | InstanceNormPart1_aie2_bf16_0 | LayerNormC8Part1_aie2_bf16_0 | ReduceSumAxis_7_aie2_bf16 | InstanceNormPart2_aie2_bf16_0 | ReduceMeanAxis_7_aie2_bf16 | InstanceNormPart1_aie2_int8_0 | ReduceSumAxis_6_aie2_bf16 | ReduceSumAxis_3_aie2_bf16 | ReduceSumAxis_5_aie2_bf16 | ReduceMeanAxis_5_aie2_bf16 | ReduceMeanAxis_6_aie2_bf16 | ReduceProdAxis_6_aie2_bf16 | ReduceProdAxis_5_aie2_bf16 | ReduceProdAxis_3_aie2_bf16 | LayerNormC8Part2_aie2_bf16_0 | ReduceProdAxis_7_aie2_bf16 | ReduceSumAxis_4_aie2_int8 | ReduceSumAxis_1_aie2_int8 | ReduceMeanAxis_1_aie2_int8 | ReduceSumAxis_2_aie2_int8 | ReduceMeanAxis_4_aie2_int8 | Mish_aie2_bfloat16 | ReduceMeanAxis_2_aie2_int8 | LayerNormC8Part1_aie2_int8_0 | ReduceSumAxis_1_aie2_bf16 | ReduceSumAxis_2_aie2_bf16 | ReduceSumAxis_4_aie2_bf16 | ReduceMeanAxis_1_aie2_bf16 | ReduceMeanAxis_4_aie2_bf16 | ReduceMeanAxis_2_aie2_bf16 | Conv2D_bf16_0 | ReduceProdAxis_2_aie2_bf16 | ReduceProdAxis_1_aie2_bf16 | ReduceProdAxis_4_aie2_bf16 | Conv2D_bf16_1 | Conv2D_Transpose_AIE2_0 | Conv2D_2x8_1 |    ...    | Mish_aie2_int8 | BatchNorm2D_0 | Conv2D_ReLU_1 | Conv2D_FC_0  | DilatedConv2D_1 | Conv2D_FC_1  | Conv2D_0     | Conv2D_mixed_batch_1 | BilinearInterpolation_1 | GEMM_int8_0  | GEMM_int8_1  | BilinearInterpolation_0 | Average diff |
| ----------------------------- | ------------ | ------------ | ------------ | ------------ | ------------------------- | -------------------------- | ------------ | ------------ | ------------ | ------------------------- | ------------------------- | ------------------------- | -------------------------- | -------------------------- | -------------------------- | ----------------------------- | ---------------------------- | ------------------------- | ----------------------------- | -------------------------- | ----------------------------- | ------------------------- | ------------------------- | ------------------------- | -------------------------- | -------------------------- | -------------------------- | -------------------------- | -------------------------- | ---------------------------- | -------------------------- | ------------------------- | ------------------------- | -------------------------- | ------------------------- | -------------------------- | ------------------ | -------------------------- | ---------------------------- | ------------------------- | ------------------------- | ------------------------- | -------------------------- | -------------------------- | -------------------------- | ------------- | -------------------------- | -------------------------- | -------------------------- | ------------- | ----------------------- | ------------ |    ...    | -------------- | ------------- | ------------- | ------------ | --------------- | ------------ | ------------ | -------------------- | ----------------------- | ------------ | ------------ | ----------------------- | ------------ |
| Baseline                      | 451          | 451          | 1684         | 416          | 2071                      | 2111                       | 272          | 311          | 257          | 2855                      | 2879                      | 2882                      | 2928                       | 2932                       | 2949                       | 2838                          | 8780                         | 6686                      | 13040                         | 6737                       | 11108                         | 7450                      | 7464                      | 7467                      | 7613                       | 7620                       | 10509                      | 10526                      | 10527                      | 11710                        | 2252                       | 7211                      | 7149                      | 7438                       | 7190                      | 7465                       | 5661               | 7500                       | 7712                         | 12349                     | 12372                     | 12387                     | 13438                      | 13444                      | 13474                      | 25071         | 18747                      | 37320                      | 37349                      | 39569         | 64836                   | 3811         |    ...    | 9418           | 386           | 27951         | 2692         | 5510            | 1177         | 7967         | 22344                | 378                     | 3019         | 36178        | 724                     |              |
| This PR                       | 496          | 496          | 1819         | 448          | 2215                      | 2255                       | 288          | 329          | 269          | 2945                      | 2969                      | 2972                      | 3018                       | 3022                       | 3039                       | 2906                          | 8952                         | 6814                      | 13288                         | 6865                       | 11303                         | 7578                      | 7592                      | 7595                      | 7743                       | 7750                       | 10677                      | 10694                      | 10695                      | 11894                        | 2284                       | 7302                      | 7235                      | 7527                       | 7275                      | 7553                       | 5726               | 7586                       | 7798                         | 12477                     | 12500                     | 12515                     | 13566                      | 13572                      | 13602                      | 25197         | 18835                      | 37488                      | 37517                      | 39655         | 64976                   | 3819         |    ...    | 9417           | 385           | 27503         | 2644         | 5382            | 1140         | 7687         | 21504                | 361                     | 2815         | 33499        | 667                     | +0.25%       |
| Total diff                    | REGR(+9.98%) | REGR(+9.98%) | REGR(+8.02%) | REGR(+7.69%) | REGR(+6.95%)              | REGR(+6.82%)               | REGR(+5.88%) | REGR(+5.79%) | REGR(+4.67%) | REGR(+3.15%)              | REGR(+3.13%)              | REGR(+3.12%)              | REGR(+3.07%)               | REGR(+3.07%)               | REGR(+3.05%)               | REGR(+2.40%)                  | REGR(+1.96%)                 | REGR(+1.91%)              | REGR(+1.90%)                  | REGR(+1.90%)               | REGR(+1.76%)                  | REGR(+1.72%)              | REGR(+1.71%)              | REGR(+1.71%)              | REGR(+1.71%)               | REGR(+1.71%)               | REGR(+1.60%)               | REGR(+1.60%)               | REGR(+1.60%)               | REGR(+1.57%)                 | REGR(+1.42%)               | REGR(+1.26%)              | REGR(+1.20%)              | REGR(+1.20%)               | REGR(+1.18%)              | REGR(+1.18%)               | REGR(+1.15%)       | REGR(+1.15%)               | REGR(+1.12%)                 | REGR(+1.04%)              | REGR(+1.03%)              | REGR(+1.03%)              | REGR(+0.95%)               | REGR(+0.95%)               | REGR(+0.95%)               | REGR(+0.50%)  | REGR(+0.47%)               | REGR(+0.45%)               | REGR(+0.45%)               | REGR(+0.22%)  | REGR(+0.22%)            | REGR(+0.21%) |    ...    | SAME(-0.01%)   | IMPR(-0.26%)  | IMPR(-1.60%)  | IMPR(-1.78%) | IMPR(-2.32%)    | IMPR(-3.14%) | IMPR(-3.51%) | IMPR(-3.76%)         | IMPR(-4.50%)            | IMPR(-6.76%) | IMPR(-7.41%) | IMPR(-7.87%)            | +0.25%       |

Some notes on the regressions:

  • Mul2D goes from II=9 to II=10. We are still missing VST.SRS combining. Once this is there, this should be a candidate for the newer pipeliner
  • Add2D/Mul2D bf16 variants: we now introduce lots of pointer copies instead of re-using the same pointer and enforcing ordering. This should be investigated separately
  • ReduceSum/Mean int8 variants: should be handled in the post-pipeliner to target II=4
  • Conv2D_2x8_0 The regression is in the outer loop. We are changing the live ranges of our 2D/3D iterators and forcing a spill

Copy link
Collaborator

@andcarminati andcarminati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We would certainly miss these combiners, the sooner they are enabled the better we can deal with the effects.

@gbossu gbossu merged commit af61e7b into aie-public Oct 10, 2024
9 checks passed
@gbossu gbossu deleted the gaetan.512-bit.postinc branch October 10, 2024 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants