Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat[next][dace]: Dace fieldview transformations #1594

Merged
Show file tree
Hide file tree
Changes from 250 commits
Commits
Show all changes
285 commits
Select commit Hold shift + click to select a range
dcf3eab
Add support to translate each builtin call to a tasklet node
edopao May 6, 2024
7e6909e
Resolve dace warnings
edopao May 7, 2024
2b07cc5
Remove bultin translator for domain expressions
edopao May 7, 2024
2370fa6
Remove bultin translator for domain expressions (1)
edopao May 7, 2024
8e801df
Refactor
edopao May 7, 2024
812a6e5
Minor edit
edopao May 7, 2024
1d0b50b
Extract ITIR visitor to separate class
edopao May 7, 2024
97a1d22
Code refactoring
edopao May 7, 2024
a30cc7d
Fix formatting
edopao May 7, 2024
f595b01
Add IteratorExpr type
edopao May 10, 2024
a6bcb6c
Indirection shift implemented as tasklet node
edopao May 10, 2024
738da27
Add ConnectivityExpr type
edopao May 13, 2024
e5494d8
Remove ConnectivityExpr type, use ValueExpr instead
edopao May 13, 2024
e9455e3
Changes in preparation for shift builtin
edopao May 13, 2024
cbf55de
Refactoring
edopao May 13, 2024
9d5b1ed
Merge remote-tracking branch 'origin/dace-fieldview' into dace-fieldv…
edopao May 13, 2024
801704b
Merge remote-tracking branch 'origin/main' into dace-fieldview
edopao May 13, 2024
c45c417
Add support for programs without computation (pure memlets)
edopao May 13, 2024
3f26d91
Merge remote-tracking branch 'origin/dace-fieldview' into dace-fieldv…
edopao May 13, 2024
f173244
Merge remote-tracking branch 'origin/main' into dace-fieldview-shifts
edopao May 13, 2024
d67518a
Fix test
edopao May 13, 2024
783542f
Fix for chain of shift expressions shift(V2E(E2V(i_edge, x), y))(edges)
edopao May 14, 2024
1fa9de4
Support for multi-dimensional shift
edopao May 14, 2024
96338c2
Fix typo
edopao May 14, 2024
57e369f
Add support for cartesian shift with dynamic offset
edopao May 15, 2024
ec4714c
Add support for unstructured shift with dynamic offset
edopao May 15, 2024
46cb6c6
Code refactoring in test file
edopao May 15, 2024
c20a94d
Typo
edopao May 15, 2024
d1f7432
Code cleanup
edopao May 15, 2024
c4385c1
Merge remote-tracking branch 'origin/main' into dace-fieldview
edopao May 16, 2024
ed16fd4
Import updates from branch dace-fieldview-shifts
edopao May 16, 2024
9f7176f
Review comments
edopao May 16, 2024
4f40f42
Merge branch 'dace-fieldview' into dace-fieldview-shifts
edopao May 16, 2024
932db7c
Avoid tasklet-to-tasklet edge connections
edopao May 16, 2024
46febb0
Avoid tasklet-to-tasklet edge connections
edopao May 16, 2024
949bad7
Add support for in-out field parameters
edopao May 16, 2024
8890f95
Refactoring: import modules, not symbols
edopao May 17, 2024
87b71a6
Minor edit
edopao May 17, 2024
665a609
Remove internal package for builtin translators
edopao May 17, 2024
82fdf64
Add wrapper function to build SDFG
edopao May 17, 2024
e4718b0
Merge pull request #4 from edopao/dace-fieldview-refactor_imports
edopao May 17, 2024
47fcabe
Merge remote-tracking branch 'origin/dace-fieldview' into dace-fieldv…
edopao May 17, 2024
51aaf0f
Add fieldview flavor of all test cases
edopao May 17, 2024
6ccecf1
Code changes imported from branch dace-fieldview-shifts
edopao May 17, 2024
e66b960
Code comments updated
edopao May 17, 2024
7f89a16
Merge remote-tracking branch 'origin/dace-fieldview' into dace-fieldv…
edopao May 17, 2024
4c190bd
Remove support for inlined chained shift
edopao May 21, 2024
6052de2
Add support for neighbors builtin
edopao May 21, 2024
7300864
Add support for reduce builtin
edopao May 22, 2024
55adbd5
Refactoring
edopao May 23, 2024
ad21dc4
Add support for both inlined and fieldview neighbor reduction
edopao May 23, 2024
bb9123b
Minor edit
edopao May 23, 2024
0025d77
Code refactoring
edopao May 23, 2024
9926d7d
Add support for skip values ONLY for inlined GTIR
edopao May 23, 2024
172f19e
Masked array implementation based on connectivity table
edopao May 27, 2024
b1f4a47
Merge 2 different implementations of reduce
edopao May 27, 2024
63e6e92
Add support for reduce lambda function
edopao May 28, 2024
107e295
Add support for neighbors masked array returned by select statements
edopao May 29, 2024
3c71efa
Import changes from neighbors branch
edopao May 29, 2024
e369cac
Merge remote-tracking branch 'origin/dace-fieldview' into dace-fieldv…
edopao May 29, 2024
d0bd277
Import changes from neighbors branch
edopao May 29, 2024
afb5ed1
Merge remote-tracking branch 'origin/dace-fieldview-shifts' into dace…
edopao May 29, 2024
2f75cfb
Add debuginfo for ir.Program and ir.Stmt nodes
edopao May 29, 2024
695db7c
Merge remote-tracking branch 'origin/dace-fieldview' into dace-fieldv…
edopao May 29, 2024
074f0b2
Merge remote-tracking branch 'origin/dace-fieldview-shifts' into dace…
edopao May 29, 2024
085f307
Fix error in debuginfo
edopao May 29, 2024
f19960b
Merge remote-tracking branch 'origin/main' into dace-fieldview
edopao May 29, 2024
841040e
Merge remote-tracking branch 'origin/dace-fieldview' into dace-fieldv…
edopao May 29, 2024
b3df358
Merge remote-tracking branch 'origin/dace-fieldview-shifts' into dace…
edopao May 29, 2024
dc1434c
Fix error in debuginfo (1)
edopao May 29, 2024
eacde66
Merge remote-tracking branch 'origin/dace-fieldview' into dace-fieldv…
edopao May 29, 2024
138a33c
Merge remote-tracking branch 'origin/dace-fieldview-shifts' into dace…
edopao May 29, 2024
3769fb5
Remove nested SDFG for neighbors builtin
edopao Jun 14, 2024
b1b5887
Remove masked array for skip values, rely on identity value
edopao Jun 26, 2024
a5b0f41
import changes from neighbors branch
edopao Jun 28, 2024
f7ac3d8
Merge remote-tracking branch 'origin/main' into dace-fieldview
edopao Jun 28, 2024
01ff262
Merge remote-tracking branch 'origin/dace-fieldview' into dace-fieldv…
edopao Jun 28, 2024
c61e796
import changes from neighbors branch
edopao Jun 28, 2024
5a457b2
Merge remote-tracking branch 'origin/dace-fieldview-shifts' into dace…
edopao Jun 28, 2024
f4d9d89
Let's see what auto opt can do.
philip-paul-mueller Jul 3, 2024
9318011
Import changes from branch dace-fieldview-neighbors
edopao Jul 4, 2024
11efdeb
Merge remote-tracking branch 'origin/main' into dace-fieldview
edopao Jul 4, 2024
25b9048
Support field with start offset
edopao Jul 4, 2024
2dc6f97
Merge branch 'dace-fieldview' into dace-fieldview-shifts
edopao Jul 4, 2024
f6e5b7c
Add test coverage for temporary with start offset (cartesian shift)
edopao Jul 4, 2024
d7312fa
Support field with start offset
edopao Jul 4, 2024
628c18b
Merge branch 'dace-fieldview' into dace-fieldview-shifts
edopao Jul 4, 2024
c4f2738
Test IR updated for literal operand
edopao Jul 4, 2024
0fd0b65
Add test coverage to previous commit
edopao Jul 4, 2024
38d2720
Refactor PrimitiveTranslator interface
edopao Jul 4, 2024
d3541c1
Made a small modfication to some code.
philip-paul-mueller Jul 5, 2024
e855ef9
Fix formatting
edopao Jul 5, 2024
5726509
Started with a first nabla stuff.
philip-paul-mueller Jul 5, 2024
e44f3a2
It seems that local storage does not work well with this transformer.
philip-paul-mueller Jul 5, 2024
4cff071
Fix for domain horzontal/vertical dims
edopao Jul 5, 2024
f642e85
Fix for type inference on single value expression
edopao Jul 5, 2024
f216a36
Merge remote-tracking branch 'origin/dace-fieldview' into dace-fieldv…
edopao Jul 5, 2024
a2af8cd
Merge remote-tracking branch 'edoardo/dace-fieldview' into dace-field…
philip-paul-mueller Jul 5, 2024
74bd468
Updated it now seems to work.
philip-paul-mueller Jul 5, 2024
667eb7e
Updated the nabla4 calculations.
philip-paul-mueller Jul 5, 2024
58b8e58
Now all the calculations are done.
philip-paul-mueller Jul 5, 2024
e898b31
Formated a bit.
philip-paul-mueller Jul 5, 2024
eae968f
Refactored the code.
philip-paul-mueller Jul 5, 2024
defb55d
Import changes from dace-fieldview-neighbors
edopao Jul 5, 2024
fc9661c
Import changes from dace-fieldview-shifts
edopao Jul 5, 2024
e424d4e
Minor edit
edopao Jul 5, 2024
7ef1d56
Merge remote-tracking branch 'origin/dace-fieldview' into dace-fieldv…
edopao Jul 5, 2024
563ee1a
WIP: Working on accessing.
philip-paul-mueller Jul 7, 2024
0dc376e
Merge remote-tracking branch 'edoardo/dace-fieldview' into dace-field…
philip-paul-mueller Jul 8, 2024
9df80ad
Merge remote-tracking branch 'edoardo/dace-fieldview-shifts' into dac…
philip-paul-mueller Jul 8, 2024
f32fd38
Now the shift works, at least the shift in the particular dimension.
philip-paul-mueller Jul 8, 2024
538abff
Prepare to go to real input.
philip-paul-mueller Jul 8, 2024
a07fe81
nabla4 works now with the custom icon stuff.
philip-paul-mueller Jul 8, 2024
fec054a
First step in shifting.
philip-paul-mueller Jul 8, 2024
ea7bf64
Now we have one shifting.
philip-paul-mueller Jul 8, 2024
b291152
The helper function works.
philip-paul-mueller Jul 8, 2024
008209d
It now works with the normal shiftuing stuff.
philip-paul-mueller Jul 8, 2024
b832aca
Now the full nabla4 should be ported.
philip-paul-mueller Jul 8, 2024
94ab9d7
Restructured the code and removed the inline version.
philip-paul-mueller Jul 8, 2024
04cde84
Made some small update.
philip-paul-mueller Jul 8, 2024
3dd0860
This is the base of all fusion operations.
philip-paul-mueller Jul 9, 2024
3178b71
Reworked some parts.
philip-paul-mueller Jul 10, 2024
66c5fcd
Address review comments
edopao Jul 10, 2024
d5abad4
Merge remote-tracking branch 'origin/main' into dace-fieldview
edopao Jul 10, 2024
1df1bc3
Apply convention for map variables
edopao Jul 10, 2024
2032b60
Merge remote-tracking branch 'origin/dace-fieldview' into dace-fieldv…
edopao Jul 10, 2024
6394243
Updated and fixed a big in the `is_interstate_transient()` function.
philip-paul-mueller Jul 10, 2024
fcd8ee3
Small corrections and format improvements.
philip-paul-mueller Jul 11, 2024
a25a6a4
Fixed some missing include.
philip-paul-mueller Jul 11, 2024
a57e108
Added a first and mostly untested version of the serial fusion transf…
philip-paul-mueller Jul 11, 2024
62ad165
Started debugin, very strange bug.
philip-paul-mueller Jul 11, 2024
7f72794
Import changes from dace-fieldview-neighbors
edopao Jul 11, 2024
abf3918
Import changes from dace-fieldview-shifts
edopao Jul 11, 2024
a6d31fb
Merge remote-tracking branch 'origin/dace-fieldview' into dace-fieldv…
edopao Jul 11, 2024
4a2ccaa
More debugger friendly.
philip-paul-mueller Jul 11, 2024
d353d0e
It should now work and I figured out why it was not working before.
philip-paul-mueller Jul 11, 2024
073065d
A fix.
philip-paul-mueller Jul 11, 2024
19be2c4
Added some "test" for the merger.
philip-paul-mueller Jul 11, 2024
5237b13
Now the nabla4 optimizes with my fusion operation.
philip-paul-mueller Jul 11, 2024
489bb4a
Added some more test.
philip-paul-mueller Jul 12, 2024
2f6274e
Merge remote-tracking branch 'edoardo/dace-fieldview' into dace-field…
philip-paul-mueller Jul 12, 2024
4d1a3cc
Fixed some small problem in detecting recursive dataflow.
philip-paul-mueller Jul 12, 2024
2da7453
Made some imporvements to the test.
philip-paul-mueller Jul 12, 2024
ba97fd2
Made some comments better.
philip-paul-mueller Jul 12, 2024
9301dbe
Import changes from branch dace-fieldview-neighbors
edopao Jul 12, 2024
7f60cfe
Import changes from branch dace-fieldview-shifts
edopao Jul 12, 2024
699a88b
Merge remote-tracking branch 'origin/dace-fieldview' into dace-fieldv…
edopao Jul 12, 2024
b3131db
Avoid direct import of symbols from module
edopao Jul 12, 2024
130c877
Address review comments
edopao Jul 12, 2024
7fbd7e1
Merge remote-tracking branch 'origin/dace-fieldview' into dace-fieldv…
edopao Jul 12, 2024
fb2ba90
Started with an untested map promoted.
philip-paul-mueller Jul 12, 2024
bf06cb4
Merge remote-tracking branch 'edoardo/dace-fieldview' into dace-field…
philip-paul-mueller Jul 12, 2024
52c1d01
Updated the tests, but it still does not work.
philip-paul-mueller Jul 12, 2024
849900f
The newest version about shift from Edoardo.
philip-paul-mueller Jul 12, 2024
42f4aba
Now the shift test works too.
philip-paul-mueller Jul 12, 2024
84b2ba7
Added some more checking functionality to the base promoter.
philip-paul-mueller Jul 12, 2024
24bde91
Added a concrete promoter.
philip-paul-mueller Jul 12, 2024
fd81e75
Added a custom (okay currently not really custom) simplification pass.
philip-paul-mueller Jul 12, 2024
284b6a8
Updated the auto fusion stuff.
philip-paul-mueller Jul 12, 2024
a6b191c
Merge remote-tracking branch 'origin/main' into dace-fieldview-shifts
edopao Jul 12, 2024
033db6b
Removed all my non gt4py parts and moved it to a separate repo. The t…
philip-paul-mueller Jul 15, 2024
32712ea
Merge remote-tracking branch 'gt4py/main' into dace-fieldview-transfo…
philip-paul-mueller Jul 15, 2024
ebb76de
Merge remote-tracking branch 'edoardo/dace-fieldview-shifts' into dac…
philip-paul-mueller Jul 15, 2024
0fddb8d
Added a transformation to bring the map iteration indexes in the corr…
philip-paul-mueller Jul 15, 2024
c4f64a4
Updated the auto optimizer.
philip-paul-mueller Jul 15, 2024
6143b95
Made the `gt_simplify()` function aviable.
philip-paul-mueller Jul 17, 2024
00aa64c
Fixed a porting bug.
philip-paul-mueller Jul 17, 2024
b67f0c0
Fixed an edge case in the computation of the output partition if teh …
philip-paul-mueller Jul 17, 2024
b447c2a
Added a map promoter that is able to promote trivial maps that are ge…
philip-paul-mueller Jul 17, 2024
090f08d
Added a function to turn an SDFG into one that runs on GPU.
philip-paul-mueller Jul 17, 2024
04dd63a
Updated the auto optimizer to handle GPU cases.
philip-paul-mueller Jul 17, 2024
bb34f44
Reorganized the GPU stuff.
philip-paul-mueller Jul 18, 2024
88f5245
There is now a k blocking transformation.
philip-paul-mueller Jul 19, 2024
73a01c1
Made some fixes to the k blocking stuff.
philip-paul-mueller Jul 19, 2024
dd1242c
Small fix.
Jul 19, 2024
f3798f3
Fixed an error.
philip-paul-mueller Jul 19, 2024
863bd5f
Now auto optimization also does blocking.
philip-paul-mueller Jul 19, 2024
ea0da2b
Made some fixes, but it still does not work.
Jul 22, 2024
a95bda2
Made some fixes.
philip-paul-mueller Jul 22, 2024
18a8560
If blocking is applied the name of the outer map is now also changed.
philip-paul-mueller Jul 22, 2024
5d979c9
Implemented the possibility to also set the launch bound stuff.
philip-paul-mueller Jul 23, 2024
bcd63d3
Fixed a bug in the auto omptimizer.
philip-paul-mueller Jul 23, 2024
c165f9f
Restructured and cleaned up the auto omptimizer routine.
philip-paul-mueller Jul 26, 2024
316ba9c
Fixed a bug in the `get_map_variable()` function.
philip-paul-mueller Jul 26, 2024
a796766
First batch of stuff for review.
philip-paul-mueller Jul 26, 2024
882ad44
Also checked the map fusion helper stuff.
philip-paul-mueller Jul 26, 2024
a0bf263
Made the reuse of transients optional and disabled it.
philip-paul-mueller Jul 28, 2024
37392fd
First PR candidate for the optimization pipeline.
philip-paul-mueller Jul 29, 2024
5c92c76
Made a small fix in the test function if teh intermnediate was correct.
philip-paul-mueller Jul 29, 2024
f4f5ae5
Added the first series of tests for teh serial map fusion.
philip-paul-mueller Jul 29, 2024
6d12757
Made some small modifications to the map fusion test.
philip-paul-mueller Jul 29, 2024
e590a07
Added a test for the blocking.
philip-paul-mueller Jul 29, 2024
7e99d98
Addressed Edoardo's comments.
philip-paul-mueller Jul 29, 2024
bd35c6d
Forgot to apply some of Edoardo's suggestions.
philip-paul-mueller Jul 29, 2024
0da8ae2
Added myself to the list of authors.
philip-paul-mueller Jul 29, 2024
f978ef7
Merge remote-tracking branch 'gt4py/main' into dace-fieldview-transfo…
philip-paul-mueller Jul 29, 2024
888fb55
Added an utility module.
philip-paul-mueller Jul 29, 2024
0767d6f
Made it possible to extend the applicability of teh map promotion tra…
philip-paul-mueller Jul 29, 2024
c396200
Added a test for the map promotion.
philip-paul-mueller Jul 29, 2024
8e471f1
Reorganized the tests.
philip-paul-mueller Jul 30, 2024
28fcb84
Modified teh first step of teh auto optimizer.
philip-paul-mueller Jul 30, 2024
e8829c6
Added a test to ensure that fusion does not skrew up with indirect ac…
philip-paul-mueller Jul 30, 2024
9373629
Added a todo for a test.
philip-paul-mueller Jul 30, 2024
63a5112
Addressed Edoardo's comment.
philip-paul-mueller Jul 30, 2024
5a2c12c
Added the possibility to controll the iteration order also from teh o…
philip-paul-mueller Jul 31, 2024
bcfbd68
Clarified some buggy behaviour inside the GPU transformation function.
philip-paul-mueller Jul 31, 2024
03f4b1a
Inside a Map there can not be a library node for fusion.
philip-paul-mueller Jul 31, 2024
fd2366f
Applied Edoardo's comments.
philip-paul-mueller Jul 31, 2024
5ed2a8f
Applied another change.
philip-paul-mueller Jul 31, 2024
368c8ad
Merge remote-tracking branch 'gt4py/main' into dace-fieldview-transfo…
philip-paul-mueller Jul 31, 2024
dbc3874
Merge remote-tracking branch 'gt4py/main' into dace-fieldview-transfo…
Aug 2, 2024
8e97cd6
Removed stray symlink.
Aug 2, 2024
390f02b
Merge remote-tracking branch 'gt4py/main' into dace-fieldview-transfo…
philip-paul-mueller Aug 22, 2024
46c549b
Updated the licence header.
philip-paul-mueller Aug 22, 2024
27d8ea6
This should make the names a bit more consistent.
philip-paul-mueller Aug 22, 2024
1a1a705
Removed some stra `view()` call.
philip-paul-mueller Aug 22, 2024
3c4523a
Fixed a bug in the `SerialMapFusion` transformation.
philip-paul-mueller Aug 23, 2024
cbed51a
Added the first batch of Enrique's suggestions.
philip-paul-mueller Aug 26, 2024
ca71735
Fixed a bug in the map promoter.
philip-paul-mueller Aug 26, 2024
83a5fe4
fixup! Added the first batch of Enrique's suggestions.
philip-paul-mueller Aug 26, 2024
0ee90f5
First new version of the k blocking.
philip-paul-mueller Aug 26, 2024
5d875fd
Further, refactored the KBlock transformation.
philip-paul-mueller Aug 27, 2024
57ae4ea
Added an ADRF for the DaCe parts of the toolchain.
philip-paul-mueller Aug 27, 2024
4d2e941
Made a note in to the map fusion files that we will delete them as so…
philip-paul-mueller Aug 27, 2024
243bc8e
Removed all reference to the HackMD file and changed them with refere…
philip-paul-mueller Aug 27, 2024
a74a54d
Updated the map promotion.
philip-paul-mueller Aug 27, 2024
b7400a6
Fixed a small typo in the `TrivialGPUMapPromoter`.
philip-paul-mueller Aug 28, 2024
36a6386
Added tests for the `TrivialGPUMapPromoter`.
philip-paul-mueller Aug 28, 2024
d6cde5c
Updated the map promotion implementation.
philip-paul-mueller Aug 28, 2024
b616189
Merge remote-tracking branch 'gt4py/main' into dace-fieldview-transfo…
philip-paul-mueller Aug 28, 2024
32d3883
Update docs/development/ADRs/0018-Canonical_SDFG_in_GT4Py_Transformat…
philip-paul-mueller Sep 2, 2024
201c8e2
Corrected the ADR.
philip-paul-mueller Sep 2, 2024
210a8d9
Second appling.
philip-paul-mueller Sep 2, 2024
017fc9f
Renamed the `KBlocking` to `LoopBlocking`.
philip-paul-mueller Sep 2, 2024
20da858
Made some smaller modification.
philip-paul-mueller Sep 2, 2024
8c31694
Added the comment Enrique mentioned.
philip-paul-mueller Sep 2, 2024
c8ecd25
Removed the auto use fixture, it is now imported explicitly.
philip-paul-mueller Sep 2, 2024
7aed88f
Forgot to rename the `KBlocking` also in the tests.
philip-paul-mueller Sep 2, 2024
2a8494a
Further modifications.
philip-paul-mueller Sep 4, 2024
87d3ae5
Merge remote-tracking branch 'gt4py/main' into dace-fieldview-transfo…
philip-paul-mueller Sep 4, 2024
2cfbe20
Applied the last comments.
philip-paul-mueller Sep 4, 2024
3e7d09f
Updated Edoardo's comments.
philip-paul-mueller Sep 4, 2024
7cb5e35
Update docs/development/ADRs/0018-Canonical_SDFG_in_GT4Py_Transformat…
philip-paul-mueller Sep 4, 2024
04b652e
Refactored the loop blocking transformation.
Sep 4, 2024
ba0ecdc
Merge branch 'main' into dace-fieldview-transformations
philip-paul-mueller Sep 5, 2024
e4df5ae
Fixed an merge issue with master.
philip-paul-mueller Sep 5, 2024
7265ecc
Fixed an issue related to the refactoring yesterday evening.
philip-paul-mueller Sep 5, 2024
5ab199d
Something is fishy.
philip-paul-mueller Sep 5, 2024
a0866a7
Switched to UUID from time.
philip-paul-mueller Sep 5, 2024
71c6681
Merge branch 'main' into dace-fieldview-transformations
philip-paul-mueller Sep 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions AUTHORS.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
- Madonna, Alberto. ETH Zurich - CSCS
- Mariotti, Kean. ETH Zurich - CSCS
- Müller, Christoph. MeteoSwiss
- Müller, Philip. ETH Zurich - CSCS
- Osuna, Carlos. MeteoSwiss
- Paone, Edoardo. ETH Zurich - CSCS
- Röthlin, Matthias. MeteoSwiss
Expand Down
130 changes: 130 additions & 0 deletions docs/development/ADRs/0018-Canonical_SDFG_in_GT4Py_Transformations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
---
tags: [backend, dace, optimization]
---

# Canonical Form of an SDFG in GT4Py (Especially for Optimizations)

- **Status**: valid
- **Authors**: Philip Müller (@philip-paul-mueller)
- **Created**: 2024-08-27

In the context of the implementation of the new DaCe fieldview we decided about a particular form of the SDFG.
Their main intent is to reduce the complexity of the GT4Py specific transformations.

## Context

The canonical is outlined in this document was mainly designed from the perspective of the optimization pipeline.
philip-paul-mueller marked this conversation as resolved.
Show resolved Hide resolved
Thus it emphasizes a form that can be handled in a simple and efficient way by a transformation.
In the pipeline we distinguish between:

- Intrastate optimization: optimization of the data flow within states.
- Interstate optimization: optimization between states, these are transformations that are _intended_ to _reduce_ the number of states.

The current (GT4Py) pipeline mainly focus on intrastate optimization and relays on DaCe, especially its simplify pass, for interstate optimizations.

## Decision

The canonical form is defined by several rules that affect different aspects of an SDFG and what a transformation can assume.
This allows simplifying the implementation of certain transformations.

#### General Aspects

The following rules especially affects transformations and how they operate:

1. Intrastate transformation and interstate transformations must run separately and can not be mixed in the same (DaCe) pipeline.

- [Rationale]: As a consequence the number of "interstate transients" (transients that are used in multiple states) remains constant during intrastate transformations.
- [Note 1]: It is allowed to run them one after another, as long as they are strictly separated.
- [Note 2]: It is allowed for an _intrastate_ transformation to act in a way that allows state fusion by later intrastate transformations.
- [Note 3]: The DaCe simplification pass violates this rule, for that reason this pass must always be called on its own, see also rule 2.
philip-paul-mueller marked this conversation as resolved.
Show resolved Hide resolved

2. It is invalid to call the simplification pass directly, i.e. the usage of `SDFG.simplify()` is not allowed. The only valid way to call _simplify()_ is to call the `gt_simplify()` function provided by GT4Py.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to apply this rule in the current PR; I mean you can replace the call to simplify, currently used in build_sdfg_from_gtir, with the gt_simplify method. Alternatively, hopefully soon when we merge #1623, we can remove the call to simplify and instead call the optimization workflow from DaCeTranslator class (see src/gt4py/next/program_processors/runners/dace_fieldview/workflow.py)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file you mentioned does not seem to exists. But I have removed all occurrences of simplify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It exists in the open PR. Ok, we can call the optimization in the next PR.

- [Rationale]: It was observed that some sub-passes in _simplify()_ have a negative impact and that additional passes might be needed in the future.
By using a single function later modifications to _simplify()_ are easy.
- [Note]: One issue is that the remove redundant array transformation is not able to handle all cases.

#### Global Memory

The only restriction we impose on global memory is:

3. The same global memory is allowed to be used as input and output at the same time, if and only if the output depends _elementwise_ on the input.
- [Rationale 1]: This allows the removal of double buffering, that DaCe may not remove. See also rule 2.
- [Rationale 2]: This formulation allows writing expressions such as `a += 1`, with only memory for `a`.
Phrased more technically, using global memory for input and output is allowed if and only if the two computations `tmp = computation(global_memory); global_memory = tmp;` and `global_memory = computation(global_memory);` are equivalent.
- [Note]: In the long term this rule will be changed to: Global memory (an array) is either used as input (only read from) or as output (only written to) but never for both.

#### State Machine

For the SDFG state machine we assume that:

4. An interstate edge can only access scalars, i.e. use them in their assignment or condition expressions, but not arrays, even if they have shape `(1,)`.

- [Rationale]: If an array is also used in interstate edges it became very tedious to verify if the array could be removed or not.
- [Note]: Running _simplify()_ might actually result in the violation of this rule, see note of rule 9.

5. The state graph does not contain any cycles, i.e. the implementation of a for/while loop using states is not allowed, the new loop construct or serial maps must be used in that case.
- [Rationale]: This is a simplification that makes it much simpler to define what "later in the computation" means, as we will never have a cycle.
- [Note]: Currently the code generator does not support the `LoopRegion` construct and it is transformed to a state machine.

#### Transients

The rules we impose on transients are a bit more complicated, however, while sounding restrictive, they are very permissive.
It is important to note that these rules only have to be met after _simplify()_ was called once on the SDFG:

6. Downstream of a write access, i.e., in all states that follow the state where the access node is located, there are no other access nodes that are used to write to the same array.

- [Rationale 1]: This rule, together with rule 7 and 8, essentially ensures that the assignment in the SDFG follows SSA style, while allowing for expressions such as:

```python
if cond:
a = true_branch()
else:
a = false_branch()
```

(**NOTE:** This could also be done with references, however, they are strongly discouraged.)

- [Rationale 2]: This still allows reductions with WCR as they write to the same access node and loops, whose body modifies a transient that outlives the loop body, as they use the same access node.

7. It is _recommended_ that a write access node should only have one incoming edge.

- [Rationale]: This case is handled poorly by some DaCe transformations, thus we should avoid them as much as possible.

8. No two access nodes in a state can refer to the same array.

- [Rationale]: Together with rule 5 this guarantees SSA style.
- [Note]: An SDFG can still be constructed using different access node for the same underlying data; _simplify()_ will combine them.

9. Every access node that reads from an array (having an outgoing edge) that was not written to in the same state must be a source node.

- [Rationale]: Together with rule 1, 4, 5, 6, 7 and 8 this simplifies checking if a transient can be safely removed or if it is used somewhere else.
These rules guarantee that the number of "interstate transients" remains constant and this set is given by the _set of source nodes and all access nodes that have an outgoing degree larger than one_.
- [Note]: To prevent some issues caused by the violation of rule 4 by _simplify()_, this set is extended with the transient sink nodes and all scalars.
Excess interstate transients, that will be kept alive that way, will be removed by later calls to _simplify()_.

10. Every AccessNode within a map scope must refer to a data descriptor whose lifetime must be `dace.dtypes.AllocationLifetime.Scope` and its storage class should either be `dace.dtypes.StorageType.Default` or _preferably_ `dace.dtypes.StorageType.Register`.
- [Rationale 1]: This makes optimizations operating inside maps/kernels simpler, as it guarantees that the AccessNode does not propagate outside.
- [Rationale 2]: The storage type avoids the need to dynamically allocate memory inside a kernel.

#### Maps

For maps we assume the following:

11. The names of map variables (iteration variables) follow the following pattern.

- 11.1: All map variables iterating over the same dimension (disregarding the actual range) have the same deterministic name, that includes the `gtx.Dimension.value` string.
- 11.2: The name of horizontal dimensions (`kind` attribute) always end in `__gtx_horizontal`.
- 11.3: The name of vertical dimensions (`kind` attribute) always end in `__gtx_vertical`.
- 11.4: The name of local dimensions always ends in `__gtx_localdim`.
- 11.5: No transformation is allowed to modify the name of an iteration variable that follows rules 11.2, 11.3 or 11.4.
- [Rationale]: Without this rule it is very hard to tell which map variable does what, this way we can transmit information from GT4Py to DaCe, see also rule 12.

12. Two map ranges, i.e. the pair map/iteration variable and range, can only be fused if they have the same name _and_ cover the same range.
- [Rationale 1]: Because of rule 11, we will only fuse maps that actually makes sense to fuse.
- [Rationale 2]: This allows fusing maps without renaming the map variables.
- [Note]: This rule might be dropped in the future.

## Consequences

The rules outlined above impose a certain form of an SDFG.
Most of these rules are designed to ensure that the SDFG follows SSA style and to simplify transformations, especially making validation checks simple, while imposing a minimal number of restrictions.
3 changes: 2 additions & 1 deletion docs/development/ADRs/Index.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ _None_

### Transformations

_None_
- [0018 - Canonical Form of an SDFG in GT4Py (Especially for Optimizations)](0018-Canonical_SDFG_in_GT4Py_Transformations.md)

### Backends and Code Generation

Expand All @@ -47,6 +47,7 @@ _None_
- [0008 - Mapping Domain to Cpp Backend](0008-Mapping_Domain_to_Cpp-Backend.md)
- [0016 - Multiple Backends and Build Systems](0016-Multiple-Backends-and-Build-Systems.md)
- [0017 - Toolchain Configuration](0017-Toolchain-Configuration.md)
- [0018 - Canonical Form of an SDFG in GT4Py (Especially for Optimizations)](0018-Canonical_SDFG_in_GT4Py_Transformations.md)

### Python Integration

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
from gt4py.next.program_processors.runners.dace_fieldview import (
gtir_builtin_translators,
gtir_to_tasklet,
transformations as gtx_transformations,
utility as dace_fieldview_util,
)
from gt4py.next.type_system import type_specifications as ts, type_translation as tt
Expand Down Expand Up @@ -531,5 +532,5 @@ def build_sdfg_from_gtir(
# we can remove unnecesssary data connectors (not done by dace simplify pass)
sdfg.apply_transformations_repeated(dace_dataflow.PruneConnectors)

sdfg.simplify()
gtx_transformations.gt_simplify(sdfg)
return sdfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# GT4Py - GridTools Framework
#
# Copyright (c) 2014-2024, ETH Zurich
# All rights reserved.
#
# Please, refer to the LICENSE file in the root directory.
# SPDX-License-Identifier: BSD-3-Clause

"""Transformation and optimization pipeline for the DaCe backend in GT4Py.

Please also see [ADR0018](https://github.com/GridTools/gt4py/tree/main/docs/development/ADRs/0018-Canonical_SDFG_in_GT4Py_Transformations.md)
that explains the general structure and requirements on the SDFGs.
"""

from .auto_opt import gt_auto_optimize, gt_set_iteration_order, gt_simplify
from .gpu_utils import GPUSetBlockSize, gt_gpu_transformation, gt_set_gpu_blocksize
from .loop_blocking import LoopBlocking
from .map_orderer import MapIterationOrder
from .map_promoter import SerialMapPromoter
from .map_serial_fusion import SerialMapFusion


__all__ = [
"GPUSetBlockSize",
"LoopBlocking",
"MapIterationOrder",
"SerialMapFusion",
"SerialMapPromoter",
"SerialMapPromoterGPU",
"gt_auto_optimize",
"gt_gpu_transformation",
"gt_set_iteration_order",
"gt_set_gpu_blocksize",
"gt_simplify",
]
Loading