Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat[next][dace]: Dace fieldview transformations #1594

Merged
Show file tree
Hide file tree
Changes from 250 commits
Commits
Show all changes
285 commits
Select commit Hold shift + click to select a range
dcf3eab
Add support to translate each builtin call to a tasklet node
edopao May 6, 2024
7e6909e
Resolve dace warnings
edopao May 7, 2024
2b07cc5
Remove bultin translator for domain expressions
edopao May 7, 2024
2370fa6
Remove bultin translator for domain expressions (1)
edopao May 7, 2024
8e801df
Refactor
edopao May 7, 2024
812a6e5
Minor edit
edopao May 7, 2024
1d0b50b
Extract ITIR visitor to separate class
edopao May 7, 2024
97a1d22
Code refactoring
edopao May 7, 2024
a30cc7d
Fix formatting
edopao May 7, 2024
f595b01
Add IteratorExpr type
edopao May 10, 2024
a6bcb6c
Indirection shift implemented as tasklet node
edopao May 10, 2024
738da27
Add ConnectivityExpr type
edopao May 13, 2024
e5494d8
Remove ConnectivityExpr type, use ValueExpr instead
edopao May 13, 2024
e9455e3
Changes in preparation for shift builtin
edopao May 13, 2024
cbf55de
Refactoring
edopao May 13, 2024
9d5b1ed
Merge remote-tracking branch 'origin/dace-fieldview' into dace-fieldv…
edopao May 13, 2024
801704b
Merge remote-tracking branch 'origin/main' into dace-fieldview
edopao May 13, 2024
c45c417
Add support for programs without computation (pure memlets)
edopao May 13, 2024
3f26d91
Merge remote-tracking branch 'origin/dace-fieldview' into dace-fieldv…
edopao May 13, 2024
f173244
Merge remote-tracking branch 'origin/main' into dace-fieldview-shifts
edopao May 13, 2024
d67518a
Fix test
edopao May 13, 2024
783542f
Fix for chain of shift expressions shift(V2E(E2V(i_edge, x), y))(edges)
edopao May 14, 2024
1fa9de4
Support for multi-dimensional shift
edopao May 14, 2024
96338c2
Fix typo
edopao May 14, 2024
57e369f
Add support for cartesian shift with dynamic offset
edopao May 15, 2024
ec4714c
Add support for unstructured shift with dynamic offset
edopao May 15, 2024
46cb6c6
Code refactoring in test file
edopao May 15, 2024
c20a94d
Typo
edopao May 15, 2024
d1f7432
Code cleanup
edopao May 15, 2024
c4385c1
Merge remote-tracking branch 'origin/main' into dace-fieldview
edopao May 16, 2024
ed16fd4
Import updates from branch dace-fieldview-shifts
edopao May 16, 2024
9f7176f
Review comments
edopao May 16, 2024
4f40f42
Merge branch 'dace-fieldview' into dace-fieldview-shifts
edopao May 16, 2024
932db7c
Avoid tasklet-to-tasklet edge connections
edopao May 16, 2024
46febb0
Avoid tasklet-to-tasklet edge connections
edopao May 16, 2024
949bad7
Add support for in-out field parameters
edopao May 16, 2024
8890f95
Refactoring: import modules, not symbols
edopao May 17, 2024
87b71a6
Minor edit
edopao May 17, 2024
665a609
Remove internal package for builtin translators
edopao May 17, 2024
82fdf64
Add wrapper function to build SDFG
edopao May 17, 2024
e4718b0
Merge pull request #4 from edopao/dace-fieldview-refactor_imports
edopao May 17, 2024
47fcabe
Merge remote-tracking branch 'origin/dace-fieldview' into dace-fieldv…
edopao May 17, 2024
51aaf0f
Add fieldview flavor of all test cases
edopao May 17, 2024
6ccecf1
Code changes imported from branch dace-fieldview-shifts
edopao May 17, 2024
e66b960
Code comments updated
edopao May 17, 2024
7f89a16
Merge remote-tracking branch 'origin/dace-fieldview' into dace-fieldv…
edopao May 17, 2024
4c190bd
Remove support for inlined chained shift
edopao May 21, 2024
6052de2
Add support for neighbors builtin
edopao May 21, 2024
7300864
Add support for reduce builtin
edopao May 22, 2024
55adbd5
Refactoring
edopao May 23, 2024
ad21dc4
Add support for both inlined and fieldview neighbor reduction
edopao May 23, 2024
bb9123b
Minor edit
edopao May 23, 2024
0025d77
Code refactoring
edopao May 23, 2024
9926d7d
Add support for skip values ONLY for inlined GTIR
edopao May 23, 2024
172f19e
Masked array implementation based on connectivity table
edopao May 27, 2024
b1f4a47
Merge 2 different implementations of reduce
edopao May 27, 2024
63e6e92
Add support for reduce lambda function
edopao May 28, 2024
107e295
Add support for neighbors masked array returned by select statements
edopao May 29, 2024
3c71efa
Import changes from neighbors branch
edopao May 29, 2024
e369cac
Merge remote-tracking branch 'origin/dace-fieldview' into dace-fieldv…
edopao May 29, 2024
d0bd277
Import changes from neighbors branch
edopao May 29, 2024
afb5ed1
Merge remote-tracking branch 'origin/dace-fieldview-shifts' into dace…
edopao May 29, 2024
2f75cfb
Add debuginfo for ir.Program and ir.Stmt nodes
edopao May 29, 2024
695db7c
Merge remote-tracking branch 'origin/dace-fieldview' into dace-fieldv…
edopao May 29, 2024
074f0b2
Merge remote-tracking branch 'origin/dace-fieldview-shifts' into dace…
edopao May 29, 2024
085f307
Fix error in debuginfo
edopao May 29, 2024
f19960b
Merge remote-tracking branch 'origin/main' into dace-fieldview
edopao May 29, 2024
841040e
Merge remote-tracking branch 'origin/dace-fieldview' into dace-fieldv…
edopao May 29, 2024
b3df358
Merge remote-tracking branch 'origin/dace-fieldview-shifts' into dace…
edopao May 29, 2024
dc1434c
Fix error in debuginfo (1)
edopao May 29, 2024
eacde66
Merge remote-tracking branch 'origin/dace-fieldview' into dace-fieldv…
edopao May 29, 2024
138a33c
Merge remote-tracking branch 'origin/dace-fieldview-shifts' into dace…
edopao May 29, 2024
3769fb5
Remove nested SDFG for neighbors builtin
edopao Jun 14, 2024
b1b5887
Remove masked array for skip values, rely on identity value
edopao Jun 26, 2024
a5b0f41
import changes from neighbors branch
edopao Jun 28, 2024
f7ac3d8
Merge remote-tracking branch 'origin/main' into dace-fieldview
edopao Jun 28, 2024
01ff262
Merge remote-tracking branch 'origin/dace-fieldview' into dace-fieldv…
edopao Jun 28, 2024
c61e796
import changes from neighbors branch
edopao Jun 28, 2024
5a457b2
Merge remote-tracking branch 'origin/dace-fieldview-shifts' into dace…
edopao Jun 28, 2024
f4d9d89
Let's see what auto opt can do.
philip-paul-mueller Jul 3, 2024
9318011
Import changes from branch dace-fieldview-neighbors
edopao Jul 4, 2024
11efdeb
Merge remote-tracking branch 'origin/main' into dace-fieldview
edopao Jul 4, 2024
25b9048
Support field with start offset
edopao Jul 4, 2024
2dc6f97
Merge branch 'dace-fieldview' into dace-fieldview-shifts
edopao Jul 4, 2024
f6e5b7c
Add test coverage for temporary with start offset (cartesian shift)
edopao Jul 4, 2024
d7312fa
Support field with start offset
edopao Jul 4, 2024
628c18b
Merge branch 'dace-fieldview' into dace-fieldview-shifts
edopao Jul 4, 2024
c4f2738
Test IR updated for literal operand
edopao Jul 4, 2024
0fd0b65
Add test coverage to previous commit
edopao Jul 4, 2024
38d2720
Refactor PrimitiveTranslator interface
edopao Jul 4, 2024
d3541c1
Made a small modfication to some code.
philip-paul-mueller Jul 5, 2024
e855ef9
Fix formatting
edopao Jul 5, 2024
5726509
Started with a first nabla stuff.
philip-paul-mueller Jul 5, 2024
e44f3a2
It seems that local storage does not work well with this transformer.
philip-paul-mueller Jul 5, 2024
4cff071
Fix for domain horzontal/vertical dims
edopao Jul 5, 2024
f642e85
Fix for type inference on single value expression
edopao Jul 5, 2024
f216a36
Merge remote-tracking branch 'origin/dace-fieldview' into dace-fieldv…
edopao Jul 5, 2024
a2af8cd
Merge remote-tracking branch 'edoardo/dace-fieldview' into dace-field…
philip-paul-mueller Jul 5, 2024
74bd468
Updated it now seems to work.
philip-paul-mueller Jul 5, 2024
667eb7e
Updated the nabla4 calculations.
philip-paul-mueller Jul 5, 2024
58b8e58
Now all the calculations are done.
philip-paul-mueller Jul 5, 2024
e898b31
Formated a bit.
philip-paul-mueller Jul 5, 2024
eae968f
Refactored the code.
philip-paul-mueller Jul 5, 2024
defb55d
Import changes from dace-fieldview-neighbors
edopao Jul 5, 2024
fc9661c
Import changes from dace-fieldview-shifts
edopao Jul 5, 2024
e424d4e
Minor edit
edopao Jul 5, 2024
7ef1d56
Merge remote-tracking branch 'origin/dace-fieldview' into dace-fieldv…
edopao Jul 5, 2024
563ee1a
WIP: Working on accessing.
philip-paul-mueller Jul 7, 2024
0dc376e
Merge remote-tracking branch 'edoardo/dace-fieldview' into dace-field…
philip-paul-mueller Jul 8, 2024
9df80ad
Merge remote-tracking branch 'edoardo/dace-fieldview-shifts' into dac…
philip-paul-mueller Jul 8, 2024
f32fd38
Now the shift works, at least the shift in the particular dimension.
philip-paul-mueller Jul 8, 2024
538abff
Prepare to go to real input.
philip-paul-mueller Jul 8, 2024
a07fe81
nabla4 works now with the custom icon stuff.
philip-paul-mueller Jul 8, 2024
fec054a
First step in shifting.
philip-paul-mueller Jul 8, 2024
ea7bf64
Now we have one shifting.
philip-paul-mueller Jul 8, 2024
b291152
The helper function works.
philip-paul-mueller Jul 8, 2024
008209d
It now works with the normal shiftuing stuff.
philip-paul-mueller Jul 8, 2024
b832aca
Now the full nabla4 should be ported.
philip-paul-mueller Jul 8, 2024
94ab9d7
Restructured the code and removed the inline version.
philip-paul-mueller Jul 8, 2024
04cde84
Made some small update.
philip-paul-mueller Jul 8, 2024
3dd0860
This is the base of all fusion operations.
philip-paul-mueller Jul 9, 2024
3178b71
Reworked some parts.
philip-paul-mueller Jul 10, 2024
66c5fcd
Address review comments
edopao Jul 10, 2024
d5abad4
Merge remote-tracking branch 'origin/main' into dace-fieldview
edopao Jul 10, 2024
1df1bc3
Apply convention for map variables
edopao Jul 10, 2024
2032b60
Merge remote-tracking branch 'origin/dace-fieldview' into dace-fieldv…
edopao Jul 10, 2024
6394243
Updated and fixed a big in the `is_interstate_transient()` function.
philip-paul-mueller Jul 10, 2024
fcd8ee3
Small corrections and format improvements.
philip-paul-mueller Jul 11, 2024
a25a6a4
Fixed some missing include.
philip-paul-mueller Jul 11, 2024
a57e108
Added a first and mostly untested version of the serial fusion transf…
philip-paul-mueller Jul 11, 2024
62ad165
Started debugin, very strange bug.
philip-paul-mueller Jul 11, 2024
7f72794
Import changes from dace-fieldview-neighbors
edopao Jul 11, 2024
abf3918
Import changes from dace-fieldview-shifts
edopao Jul 11, 2024
a6d31fb
Merge remote-tracking branch 'origin/dace-fieldview' into dace-fieldv…
edopao Jul 11, 2024
4a2ccaa
More debugger friendly.
philip-paul-mueller Jul 11, 2024
d353d0e
It should now work and I figured out why it was not working before.
philip-paul-mueller Jul 11, 2024
073065d
A fix.
philip-paul-mueller Jul 11, 2024
19be2c4
Added some "test" for the merger.
philip-paul-mueller Jul 11, 2024
5237b13
Now the nabla4 optimizes with my fusion operation.
philip-paul-mueller Jul 11, 2024
489bb4a
Added some more test.
philip-paul-mueller Jul 12, 2024
2f6274e
Merge remote-tracking branch 'edoardo/dace-fieldview' into dace-field…
philip-paul-mueller Jul 12, 2024
4d1a3cc
Fixed some small problem in detecting recursive dataflow.
philip-paul-mueller Jul 12, 2024
2da7453
Made some imporvements to the test.
philip-paul-mueller Jul 12, 2024
ba97fd2
Made some comments better.
philip-paul-mueller Jul 12, 2024
9301dbe
Import changes from branch dace-fieldview-neighbors
edopao Jul 12, 2024
7f60cfe
Import changes from branch dace-fieldview-shifts
edopao Jul 12, 2024
699a88b
Merge remote-tracking branch 'origin/dace-fieldview' into dace-fieldv…
edopao Jul 12, 2024
b3131db
Avoid direct import of symbols from module
edopao Jul 12, 2024
130c877
Address review comments
edopao Jul 12, 2024
7fbd7e1
Merge remote-tracking branch 'origin/dace-fieldview' into dace-fieldv…
edopao Jul 12, 2024
fb2ba90
Started with an untested map promoted.
philip-paul-mueller Jul 12, 2024
bf06cb4
Merge remote-tracking branch 'edoardo/dace-fieldview' into dace-field…
philip-paul-mueller Jul 12, 2024
52c1d01
Updated the tests, but it still does not work.
philip-paul-mueller Jul 12, 2024
849900f
The newest version about shift from Edoardo.
philip-paul-mueller Jul 12, 2024
42f4aba
Now the shift test works too.
philip-paul-mueller Jul 12, 2024
84b2ba7
Added some more checking functionality to the base promoter.
philip-paul-mueller Jul 12, 2024
24bde91
Added a concrete promoter.
philip-paul-mueller Jul 12, 2024
fd81e75
Added a custom (okay currently not really custom) simplification pass.
philip-paul-mueller Jul 12, 2024
284b6a8
Updated the auto fusion stuff.
philip-paul-mueller Jul 12, 2024
a6b191c
Merge remote-tracking branch 'origin/main' into dace-fieldview-shifts
edopao Jul 12, 2024
033db6b
Removed all my non gt4py parts and moved it to a separate repo. The t…
philip-paul-mueller Jul 15, 2024
32712ea
Merge remote-tracking branch 'gt4py/main' into dace-fieldview-transfo…
philip-paul-mueller Jul 15, 2024
ebb76de
Merge remote-tracking branch 'edoardo/dace-fieldview-shifts' into dac…
philip-paul-mueller Jul 15, 2024
0fddb8d
Added a transformation to bring the map iteration indexes in the corr…
philip-paul-mueller Jul 15, 2024
c4f64a4
Updated the auto optimizer.
philip-paul-mueller Jul 15, 2024
6143b95
Made the `gt_simplify()` function aviable.
philip-paul-mueller Jul 17, 2024
00aa64c
Fixed a porting bug.
philip-paul-mueller Jul 17, 2024
b67f0c0
Fixed an edge case in the computation of the output partition if teh …
philip-paul-mueller Jul 17, 2024
b447c2a
Added a map promoter that is able to promote trivial maps that are ge…
philip-paul-mueller Jul 17, 2024
090f08d
Added a function to turn an SDFG into one that runs on GPU.
philip-paul-mueller Jul 17, 2024
04dd63a
Updated the auto optimizer to handle GPU cases.
philip-paul-mueller Jul 17, 2024
bb34f44
Reorganized the GPU stuff.
philip-paul-mueller Jul 18, 2024
88f5245
There is now a k blocking transformation.
philip-paul-mueller Jul 19, 2024
73a01c1
Made some fixes to the k blocking stuff.
philip-paul-mueller Jul 19, 2024
dd1242c
Small fix.
Jul 19, 2024
f3798f3
Fixed an error.
philip-paul-mueller Jul 19, 2024
863bd5f
Now auto optimization also does blocking.
philip-paul-mueller Jul 19, 2024
ea0da2b
Made some fixes, but it still does not work.
Jul 22, 2024
a95bda2
Made some fixes.
philip-paul-mueller Jul 22, 2024
18a8560
If blocking is applied the name of the outer map is now also changed.
philip-paul-mueller Jul 22, 2024
5d979c9
Implemented the possibility to also set the launch bound stuff.
philip-paul-mueller Jul 23, 2024
bcd63d3
Fixed a bug in the auto omptimizer.
philip-paul-mueller Jul 23, 2024
c165f9f
Restructured and cleaned up the auto omptimizer routine.
philip-paul-mueller Jul 26, 2024
316ba9c
Fixed a bug in the `get_map_variable()` function.
philip-paul-mueller Jul 26, 2024
a796766
First batch of stuff for review.
philip-paul-mueller Jul 26, 2024
882ad44
Also checked the map fusion helper stuff.
philip-paul-mueller Jul 26, 2024
a0bf263
Made the reuse of transients optional and disabled it.
philip-paul-mueller Jul 28, 2024
37392fd
First PR candidate for the optimization pipeline.
philip-paul-mueller Jul 29, 2024
5c92c76
Made a small fix in the test function if teh intermnediate was correct.
philip-paul-mueller Jul 29, 2024
f4f5ae5
Added the first series of tests for teh serial map fusion.
philip-paul-mueller Jul 29, 2024
6d12757
Made some small modifications to the map fusion test.
philip-paul-mueller Jul 29, 2024
e590a07
Added a test for the blocking.
philip-paul-mueller Jul 29, 2024
7e99d98
Addressed Edoardo's comments.
philip-paul-mueller Jul 29, 2024
bd35c6d
Forgot to apply some of Edoardo's suggestions.
philip-paul-mueller Jul 29, 2024
0da8ae2
Added myself to the list of authors.
philip-paul-mueller Jul 29, 2024
f978ef7
Merge remote-tracking branch 'gt4py/main' into dace-fieldview-transfo…
philip-paul-mueller Jul 29, 2024
888fb55
Added an utility module.
philip-paul-mueller Jul 29, 2024
0767d6f
Made it possible to extend the applicability of teh map promotion tra…
philip-paul-mueller Jul 29, 2024
c396200
Added a test for the map promotion.
philip-paul-mueller Jul 29, 2024
8e471f1
Reorganized the tests.
philip-paul-mueller Jul 30, 2024
28fcb84
Modified teh first step of teh auto optimizer.
philip-paul-mueller Jul 30, 2024
e8829c6
Added a test to ensure that fusion does not skrew up with indirect ac…
philip-paul-mueller Jul 30, 2024
9373629
Added a todo for a test.
philip-paul-mueller Jul 30, 2024
63a5112
Addressed Edoardo's comment.
philip-paul-mueller Jul 30, 2024
5a2c12c
Added the possibility to controll the iteration order also from teh o…
philip-paul-mueller Jul 31, 2024
bcfbd68
Clarified some buggy behaviour inside the GPU transformation function.
philip-paul-mueller Jul 31, 2024
03f4b1a
Inside a Map there can not be a library node for fusion.
philip-paul-mueller Jul 31, 2024
fd2366f
Applied Edoardo's comments.
philip-paul-mueller Jul 31, 2024
5ed2a8f
Applied another change.
philip-paul-mueller Jul 31, 2024
368c8ad
Merge remote-tracking branch 'gt4py/main' into dace-fieldview-transfo…
philip-paul-mueller Jul 31, 2024
dbc3874
Merge remote-tracking branch 'gt4py/main' into dace-fieldview-transfo…
Aug 2, 2024
8e97cd6
Removed stray symlink.
Aug 2, 2024
390f02b
Merge remote-tracking branch 'gt4py/main' into dace-fieldview-transfo…
philip-paul-mueller Aug 22, 2024
46c549b
Updated the licence header.
philip-paul-mueller Aug 22, 2024
27d8ea6
This should make the names a bit more consistent.
philip-paul-mueller Aug 22, 2024
1a1a705
Removed some stra `view()` call.
philip-paul-mueller Aug 22, 2024
3c4523a
Fixed a bug in the `SerialMapFusion` transformation.
philip-paul-mueller Aug 23, 2024
cbed51a
Added the first batch of Enrique's suggestions.
philip-paul-mueller Aug 26, 2024
ca71735
Fixed a bug in the map promoter.
philip-paul-mueller Aug 26, 2024
83a5fe4
fixup! Added the first batch of Enrique's suggestions.
philip-paul-mueller Aug 26, 2024
0ee90f5
First new version of the k blocking.
philip-paul-mueller Aug 26, 2024
5d875fd
Further, refactored the KBlock transformation.
philip-paul-mueller Aug 27, 2024
57ae4ea
Added an ADRF for the DaCe parts of the toolchain.
philip-paul-mueller Aug 27, 2024
4d2e941
Made a note in to the map fusion files that we will delete them as so…
philip-paul-mueller Aug 27, 2024
243bc8e
Removed all reference to the HackMD file and changed them with refere…
philip-paul-mueller Aug 27, 2024
a74a54d
Updated the map promotion.
philip-paul-mueller Aug 27, 2024
b7400a6
Fixed a small typo in the `TrivialGPUMapPromoter`.
philip-paul-mueller Aug 28, 2024
36a6386
Added tests for the `TrivialGPUMapPromoter`.
philip-paul-mueller Aug 28, 2024
d6cde5c
Updated the map promotion implementation.
philip-paul-mueller Aug 28, 2024
b616189
Merge remote-tracking branch 'gt4py/main' into dace-fieldview-transfo…
philip-paul-mueller Aug 28, 2024
32d3883
Update docs/development/ADRs/0018-Canonical_SDFG_in_GT4Py_Transformat…
philip-paul-mueller Sep 2, 2024
201c8e2
Corrected the ADR.
philip-paul-mueller Sep 2, 2024
210a8d9
Second appling.
philip-paul-mueller Sep 2, 2024
017fc9f
Renamed the `KBlocking` to `LoopBlocking`.
philip-paul-mueller Sep 2, 2024
20da858
Made some smaller modification.
philip-paul-mueller Sep 2, 2024
8c31694
Added the comment Enrique mentioned.
philip-paul-mueller Sep 2, 2024
c8ecd25
Removed the auto use fixture, it is now imported explicitly.
philip-paul-mueller Sep 2, 2024
7aed88f
Forgot to rename the `KBlocking` also in the tests.
philip-paul-mueller Sep 2, 2024
2a8494a
Further modifications.
philip-paul-mueller Sep 4, 2024
87d3ae5
Merge remote-tracking branch 'gt4py/main' into dace-fieldview-transfo…
philip-paul-mueller Sep 4, 2024
2cfbe20
Applied the last comments.
philip-paul-mueller Sep 4, 2024
3e7d09f
Updated Edoardo's comments.
philip-paul-mueller Sep 4, 2024
7cb5e35
Update docs/development/ADRs/0018-Canonical_SDFG_in_GT4Py_Transformat…
philip-paul-mueller Sep 4, 2024
04b652e
Refactored the loop blocking transformation.
Sep 4, 2024
ba0ecdc
Merge branch 'main' into dace-fieldview-transformations
philip-paul-mueller Sep 5, 2024
e4df5ae
Fixed an merge issue with master.
philip-paul-mueller Sep 5, 2024
7265ecc
Fixed an issue related to the refactoring yesterday evening.
philip-paul-mueller Sep 5, 2024
5ab199d
Something is fishy.
philip-paul-mueller Sep 5, 2024
a0866a7
Switched to UUID from time.
philip-paul-mueller Sep 5, 2024
71c6681
Merge branch 'main' into dace-fieldview-transformations
philip-paul-mueller Sep 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions AUTHORS.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
- Madonna, Alberto. ETH Zurich - CSCS
- Mariotti, Kean. ETH Zurich - CSCS
- Müller, Christoph. MeteoSwiss
- Müller, Philip. ETH Zurich - CSCS
- Osuna, Carlos. MeteoSwiss
- Paone, Edoardo. ETH Zurich - CSCS
- Röthlin, Matthias. MeteoSwiss
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# GT4Py - GridTools Framework
#
# Copyright (c) 2014-2024, ETH Zurich
# All rights reserved.
#
# Please, refer to the LICENSE file in the root directory.
# SPDX-License-Identifier: BSD-3-Clause

"""Transformation and optimization pipeline for the DaCe backend in GT4Py.

Please also see [this HackMD document](https://hackmd.io/@gridtools/rklwk4OIR#Requirements-on-SDFG)
egparedes marked this conversation as resolved.
Show resolved Hide resolved
that explains the general structure and requirements on the SDFG.
"""

from .auto_opt import dace_auto_optimize, gt_auto_optimize, gt_set_iteration_order, gt_simplify
from .gpu_utils import (
GPUSetBlockSize,
SerialMapPromoterGPU,
gt_gpu_transformation,
gt_set_gpu_blocksize,
)
from .k_blocking import KBlocking
from .map_orderer import MapIterationOrder
from .map_promoter import SerialMapPromoter
from .map_serial_fusion import SerialMapFusion


__all__ = [
"GPUSetBlockSize",
"KBlocking",
"MapIterationOrder",
"SerialMapFusion",
"SerialMapPromoter",
"SerialMapPromoterGPU",
"dace_auto_optimize",
"gt_auto_optimize",
"gt_gpu_transformation",
"gt_set_iteration_order",
"gt_set_gpu_blocksize",
"gt_simplify",
]
Original file line number Diff line number Diff line change
@@ -0,0 +1,341 @@
# GT4Py - GridTools Framework
#
# Copyright (c) 2014-2024, ETH Zurich
# All rights reserved.
#
# Please, refer to the LICENSE file in the root directory.
# SPDX-License-Identifier: BSD-3-Clause

"""Fast access to the auto optimization on DaCe."""

from typing import Any, Optional, Sequence

import dace
from dace.transformation import dataflow as dace_dataflow
from dace.transformation.auto import auto_optimize as dace_aoptimize

from gt4py.next import common as gtx_common
from gt4py.next.program_processors.runners.dace_fieldview import (
transformations as gtx_transformations,
)


__all__ = [
philip-paul-mueller marked this conversation as resolved.
Show resolved Hide resolved
"dace_auto_optimize",
"gt_simplify",
"gt_set_iteration_order",
"gt_auto_optimize",
]


def dace_auto_optimize(
sdfg: dace.SDFG,
device: dace.DeviceType = dace.DeviceType.CPU,
use_gpu_storage: bool = True,
**kwargs: Any,
) -> dace.SDFG:
"""This is a convenient wrapper arround DaCe's `auto_optimize` function.
philip-paul-mueller marked this conversation as resolved.
Show resolved Hide resolved

Args:
sdfg: The SDFG that should be optimized in place.
device: the device for which optimizations should be done, defaults to CPU.
philip-paul-mueller marked this conversation as resolved.
Show resolved Hide resolved
use_gpu_storage: Assumes that the SDFG input is already on the GPU.
This parameter is `False` in DaCe but here is changed to `True`.
kwargs: Are forwarded to the underlying auto optimized exposed by DaCe.
philip-paul-mueller marked this conversation as resolved.
Show resolved Hide resolved
"""
return dace_aoptimize.auto_optimize(
sdfg,
device=device,
use_gpu_storage=use_gpu_storage,
**kwargs,
)
egparedes marked this conversation as resolved.
Show resolved Hide resolved


def gt_simplify(
sdfg: dace.SDFG,
validate: bool = True,
validate_all: bool = False,
skip: Optional[set[str]] = None,
) -> Any:
"""Performs simplifications on the SDFG in place.

Instead of calling `sdfg.simplify()` directly, you should use this function,
as it is specially tuned for GridTool based SDFGs.

Args:
sdfg: The SDFG to optimize.
validate: Perform validation after the pass has run.
validate_all: Perform extensive validation.
skip: List of simplify passes that should not be applied.

Note:
The reason for this function is that we can influence how simplify works.
Since some parts in simplify might break things in the SDFG.
However, currently nothing is customized yet, and the function just calls
the simplification pass directly.
"""
from dace.transformation.passes.simplify import SimplifyPass

return SimplifyPass(
egparedes marked this conversation as resolved.
Show resolved Hide resolved
validate=validate,
validate_all=validate_all,
verbose=False,
skip=skip,
).apply_pass(sdfg, {})


def gt_set_iteration_order(
sdfg: dace.SDFG,
leading_dim: gtx_common.Dimension,
validate: bool = True,
validate_all: bool = False,
philip-paul-mueller marked this conversation as resolved.
Show resolved Hide resolved
) -> Any:
"""Set the iteration order of the Maps correctly.

Modifies the order of the Map parameters such that `leading_dim`
is the fastest varying one, the order of the other dimensions in
a Map is unspecific. `leading_dim` should be the dimensions were
the stride is one.

Args:
sdfg: The SDFG to process.
leading_dim: The leading dimensions.
validate: Perform validation during the steps.
validate_all: Perform extensive validation.
"""
return sdfg.apply_transformations_once_everywhere(
gtx_transformations.MapIterationOrder(
leading_dim=leading_dim,
)
)


def gt_auto_optimize(
sdfg: dace.SDFG,
gpu: bool,
leading_dim: Optional[gtx_common.Dimension] = None,
aggressive_fusion: bool = True,
make_persistent: bool = True,
gpu_block_size: Optional[Sequence[int | str] | str] = None,
block_dim: Optional[gtx_common.Dimension] = None,
blocking_size: int = 10,
reuse_transients: bool = False,
validate: bool = True,
validate_all: bool = False,
**kwargs: Any,
) -> dace.SDFG:
"""Performs GT4Py specific optimizations on the SDFG in place.

The auto optimization works in different phases, that focuses each on
different aspects of the SDFG. The initial SDFG is assumed to have a
very large number of rather simple Maps.

1. Some general simplification transformations, beyond classical simplify,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to avoid separating each phase into its own documented function instead of having them all together in this large block?

Copy link
Contributor Author

@philip-paul-mueller philip-paul-mueller Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is actually no reason to not do it, however, if you look at the individual phases it does not yet make sense to turn them into separate functions, with the exception of phase 2.

However, I agree that if the function become more complex we should separate the phases where it make sense.
I will separate phase 2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You said that I should not mark threads as resolved if I do not fully accept them.
It probably clicked on the wrong button or so.

But can I mark this threads (or other threads such as this one) as resolved?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I think you should only mark threads as resolved if you fully accept the suggestion from the reviewer and don't have further comments, so there is no need for the reviewer to look at it again. In the case you reply something, even if you agree with the suggestion, you should not mark the conversation as resolved because it will be collapsed and the reviewer won't see your comment.

Copy link
Contributor Author

@philip-paul-mueller philip-paul-mueller Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that, If you made a comment and I replied something, then if, after the next round, the thread is still open, but you have not written anything to it, that I can close it?
My issue is that I can not distinguish between "I (Enrique) forgot to mark the thread as resolved" or "I (Enrique) forgot to reply".

are applied to the SDFG.
2. In this phase the function tries to reduce the number of maps. This
philip-paul-mueller marked this conversation as resolved.
Show resolved Hide resolved
process mostly relies on the map fusion transformation. If
`aggressive_fusion` is set the function will also promote certain Maps, to
make them fusable. For this it will add dummy dimensions. However, currently
the function will only add horizonal dimensions.
In this phase some optimizations inside the bigger kernels themselves might
be applied as well.
3. After the function created big kernels it will apply some optimization,
inside the kernels itself. For example fuse maps inside them.
4. Afterwards it will process the map ranges and iteration order. For this
the function assumes that the dimension indicated by `leading_dim` is the
one with stride one.
5. If requested the function will now apply blocking, on the dimension indicated
by `leading_dim`. (The reason that it is not done in the kernel optimization
phase is a restriction dictated by the implementation.)
6. If requested the SDFG will be transformed to GPU. For this the
`gt_gpu_transformation()` function is used, that might apply several other
optimizations.
7. Afterwards some general transformations to the SDFG are applied.
This includes:
- Use fast implementation for library nodes.
- Move small transients to stack.
- Make transients persistent (if requested).
- Apply DaCe's `TransientReuse` transformation (if requested).

Args:
sdfg: The SDFG that should be optimized in place.
gpu: Optimize for GPU or CPU.
leading_dim: Leading dimension, indicates where the stride is 1.
aggressive_fusion: Be more aggressive in fusion, will lead to the promotion
of certain maps.
make_persistent: Turn all transients to persistent lifetime, thus they are
allocated over the whole lifetime of the program, even if the kernel exits.
Thus the SDFG can not be called by different threads.
gpu_block_size: The thread block size for maps in GPU mode, currently only
one for all.
block_dim: On which dimension blocking should be applied.
blocking_size: How many elements each block should process.
reuse_transients: Run the `TransientReuse` transformation, might reduce memory footprint.
validate: Perform validation during the steps.
validate_all: Perform extensive validation.
philip-paul-mueller marked this conversation as resolved.
Show resolved Hide resolved

Todo:
- Make sure that `SDFG.simplify()` is not called indirectly, by temporarily
overwriting it with `gt_simplify()`.
- Specify arguments to set the size of GPU thread blocks depending on the
dimensions. I.e. be able to use a different size for 1D than 2D Maps.
- Add a parallel version of Map fusion.
- Implement some model to further guide to determine what we want to fuse.
Something along the line "Fuse if operational intensity goes up, but
not if we have too much internal space (register pressure).
- Create a custom array elimination pass that honors rule 1.
- Check if a pipeline could be used to speed up some computations.
"""
device = dace.DeviceType.GPU if gpu else dace.DeviceType.CPU

with dace.config.temporary_config():
dace.Config.set("optimizer", "match_exception", value=True)
dace.Config.set("store_history", value=False)

# TODO(phimuell): Should there be a zeroth phase, in which we generate
# a chanonical form of the SDFG, for example move all local maps
# to internal serial maps, such that they do not block fusion?

# Phase 1: Initial Cleanup
gt_simplify(sdfg)
sdfg.apply_transformations_repeated(
[
dace_dataflow.TrivialMapElimination,
# TODO(phimuell): Investigate if these two are appropriate.
dace_dataflow.MapReduceFusion,
dace_dataflow.MapWCRFusion,
],
validate=validate,
validate_all=validate_all,
)

# Compute the SDFG hash to see if something has changed.
sdfg_hash = sdfg.hash_sdfg()

# Phase 2: Kernel Creation
# We will now try to reduce the number of kernels and create large Maps/kernels.
# For this we essentially use Map fusion. We do this is a loop because
# after a graph modification followed by simplify new fusing opportunities
# might arise. We use the hash of the SDFG to detect if we have reached a
# fix point.
# TODO(phimuell): Find a better upper bound for the starvation protection.
for _ in range(100):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking if we could set the upper limit of this range to the number of maps in the SDFG?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if the number of maps is a good value.
However, I very much agree that we should use some "informed" value.
I left a todo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is not way to automatically compute a reasonable value for the upper limit, shouldn't then this limit be an argument to the function so the user can decide?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now added an argument for that that defaulted to the value 100.
Although we will probably never set it anyway.

# Use map fusion to reduce their number and to create big kernels
# TODO(phimuell): Use a cost measurement to decide if fusion should be done.
# TODO(phimuell): Add parallel fusion transformation. Should it run after
# or with the serial one?
sdfg.apply_transformations_repeated(
gtx_transformations.SerialMapFusion(
only_toplevel_maps=True,
),
validate=validate,
validate_all=validate_all,
)

# Now do some cleanup task, that may enable further fusion opportunities.
# Note for performance reasons simplify is deferred.
phase2_cleanup = []
phase2_cleanup.append(dace_dataflow.TrivialTaskletElimination())

# TODO(phimuell): Should we do this all the time or only once? (probably the later)
# TODO(phimuell): Add a criteria to decide if we should promote or not.
phase2_cleanup.append(
gtx_transformations.SerialMapPromoter(
only_toplevel_maps=True,
promote_vertical=True,
promote_horizontal=False,
promote_local=False,
)
)

sdfg.apply_transformations_once_everywhere(
phase2_cleanup,
validate=validate,
validate_all=validate_all,
)

# Use the hash to determine if the transformations did modify the SDFG.
# If not we have optimized the SDFG as much as we could, in this phase.
old_sdfg_hash = sdfg_hash
sdfg_hash = sdfg.hash_sdfg()
if old_sdfg_hash == sdfg_hash:
break

# The SDFG was modified by the transformations above. The SDFG was
# modified. Call Simplify and try again to further optimize.
gt_simplify(sdfg)

else:
raise RuntimeWarning("Optimization of the SDFG did not converge.")

# Phase 3: Optimizing the kernels themselves.
# Currently this only applies fusion inside Maps.
sdfg.apply_transformations_repeated(
gtx_transformations.SerialMapFusion(
only_inner_maps=True,
),
validate=validate,
validate_all=validate_all,
)
gt_simplify(sdfg)

# Phase 4: Iteration Space
# This essentially ensures that the stride 1 dimensions are handled
# by the inner most loop nest (CPU) or x-block (GPU)
if leading_dim is not None:
gt_set_iteration_order(
sdfg=sdfg,
leading_dim=leading_dim,
validate=validate,
validate_all=validate_all,
)

# Phase 5: Apply blocking
if block_dim is not None:
sdfg.apply_transformations_once_everywhere(
gtx_transformations.KBlocking(
philip-paul-mueller marked this conversation as resolved.
Show resolved Hide resolved
blocking_size=blocking_size,
block_dim=block_dim,
),
validate=validate,
validate_all=validate_all,
)

# Phase 6: Going to GPU
if gpu:
# TODO(phimuell): The GPU function might modify the map iteration order.
# This is because how it is implemented (promotion and
# fusion). However, because of its current state, this
# should not happen, but we have to look into it.
gpu_launch_factor: Optional[int] = kwargs.get("gpu_launch_factor", None)
gpu_launch_bounds: Optional[int] = kwargs.get("gpu_launch_bounds", None)
gtx_transformations.gt_gpu_transformation(
sdfg,
gpu_block_size=gpu_block_size,
gpu_launch_bounds=gpu_launch_bounds,
gpu_launch_factor=gpu_launch_factor,
validate=validate,
validate_all=validate_all,
try_removing_trivial_maps=True,
philip-paul-mueller marked this conversation as resolved.
Show resolved Hide resolved
)

# Phase 7: General Optimizations
# The following operations apply regardless if we have a GPU or CPU.
# The DaCe auto optimizer also uses them. Note that the reuse transient
# is not done by DaCe.
if reuse_transients:
# TODO(phimuell): Investigate if we should enable it, it may make things
# harder for the compiler. Maybe write our own to
# only consider big transients and not small ones (~60B)
transient_reuse = dace.transformation.passes.TransientReuse()
transient_reuse.apply_pass(sdfg, {})

# Set the implementation of the library nodes.
dace_aoptimize.set_fast_implementations(sdfg, device)
philip-paul-mueller marked this conversation as resolved.
Show resolved Hide resolved
# TODO(phimuell): Fix the bug, it uses the tile value and not the stack array value.
dace_aoptimize.move_small_arrays_to_stack(sdfg)
if make_persistent:
# TODO(phimuell): Allow to also to set the lifetime to `SDFG`.
dace_aoptimize.make_transients_persistent(sdfg, device)

return sdfg
Loading
Loading