Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat[next][dace]: Dace fieldview transformations #1594

Conversation

philip-paul-mueller
Copy link
Contributor

@philip-paul-mueller philip-paul-mueller commented Jul 29, 2024

Initial PR for the Optimization pipeline of the DaCe backend in GT4Py.
The PR is very basic, but contains the basis for later advancement.
Essentially, it defines the gt_auto_optimize() function which performs optimization on a basic heuristic.

There are some parts missing, such as:

  • There is no way to guide the optimization, i.e. a decision making process that determine if a map should be fused or not.
  • There is also no transformation which fuses parallel maps (was not needed for this kernel).

sdfg=sdfg,
)

def _rewire_map_scope(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this function doesn't need to be a method because it doesn't use self, it should be declared as a module-level function outside of the class. Check here: https://google.github.io/styleguide/pyguide.html#217-function-and-method-decorators

Never use staticmethod unless forced to in order to integrate with an API defined in an existing library. Write a module-level function instead.

Copy link
Contributor Author

@philip-paul-mueller philip-paul-mueller Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right it does not need to be a method. However, making it a free function feels wrong to me, as the functionality is now split, as some parts are implemented in the class and one (super important one) is a free function. For me this does not make any sense.
For me a class group stuff together and now you tear it apart. It would be different, if the free function could potentially be used somewhere else, but the function is super specific.


return (independent_nodes, dependent_nodes)

def classify_node(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this function doesn't need to be a method because it doesn't use self, it should be declared as a module-level function outside of the class. Check here: https://google.github.io/styleguide/pyguide.html#217-function-and-method-decorators

Never use staticmethod unless forced to in order to integrate with an API defined in an existing library. Write a module-level function instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my point above regarding _rewire_map_scope(). However, there is an inconsistency it should be private.

Copy link
Contributor

@egparedes egparedes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final round of comments. Please merge main back in the PR.

Copy link
Contributor

@egparedes egparedes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Contributor

@edopao edopao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the ADR page.

- [Note 2]: It is allowed for an _intrastate_ transformation to act in a way that allows state fusion by later intrastate transformations.
- [Note 3]: The DaCe simplification pass violates this rule, for that reason this pass must always be called on its own, see also rule 2.

2. It is invalid to call the simplification pass directly, i.e. the usage of `SDFG.simplify()` is not allowed. The only valid way to call _simplify()_ is to call the `gt_simplify()` function provided by GT4Py.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to apply this rule in the current PR; I mean you can replace the call to simplify, currently used in build_sdfg_from_gtir, with the gt_simplify method. Alternatively, hopefully soon when we merge #1623, we can remove the call to simplify and instead call the optimization workflow from DaCeTranslator class (see src/gt4py/next/program_processors/runners/dace_fieldview/workflow.py)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file you mentioned does not seem to exists. But I have removed all occurrences of simplify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It exists in the open PR. Ok, we can call the optimization in the next PR.

- [Note]: To prevent some issues caused by the violation of rule 4 by _simplify()_, this set is extended with the transient sink nodes and all scalars.
Excess interstate transients, that will be kept alive that way, will be removed by later calls to _simplify()_.

10. Every AccessNode within a map scope must refer to a data descriptor whose lifetime must be `dace.dtypes.AllocationLifetime.Scope` and its storage class should _preferably_ be `dace.dtypes.StorageType.Register`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the storage_type: I always use default storage while building the SDFG. I believe it is responsibility of the transformations to set the optimal storage_type.

Copy link
Contributor Author

@philip-paul-mueller philip-paul-mueller Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why the word preferably is used.
If you set it to default then it will be turned to register anyway.
The main point of this rule, it to ensure that no temporary variable that is used inside a map is stored in global memory.
I changed the wording to also allow default storage location.

@philip-paul-mueller
Copy link
Contributor Author

cscs-ci run default

@egparedes
Copy link
Contributor

It looks like the tests you were trying to fix are still failing...

The error is the DaCe loading code.
Let's use only general names to see what is happening.
Also added a helper function.
@philip-paul-mueller philip-paul-mueller merged commit 70dd7c3 into GridTools:main Sep 6, 2024
43 checks passed
edopao added a commit to edopao/gt4py that referenced this pull request Sep 6, 2024
Initial PR for the Optimization pipeline of the DaCe backend in GT4Py.
The PR is very basic, but contains the basis for later advancement.
Essentially, it defines the `gt_auto_optimize()` function which performs
optimization on a basic heuristic.

There are some parts missing, such as:
- There is no way to guide the optimization, i.e. a decision making
process that determine if a map should be fused or not.
- There is also no transformation which fuses parallel maps (was not
needed for this kernel).

---------

Co-authored-by: Edoardo Paone <[email protected]>
Co-authored-by: Philip Mueller <[email protected]>
Co-authored-by: Enrique González Paredes <[email protected]>
philip-paul-mueller added a commit that referenced this pull request Dec 4, 2024
The [initial version](#1594) of
the optimization pipeline only contained a rough draft.
Currently this PR contains a copy of the map fusion transformations from
DaCe that are currently under
[review](spcl/dace#1629). As soon as that PR is
merged and DaCe was updated in GT4Py these files will be deleted.

This PR collects some general improvements:
- [x] More liberal `LoopBlocking` transformation (with tests).
- [x] Incorporate `MapFusionParallel` 
- [x] Using of `can_be_applied_to()` as soon as DaCe is updated
(`TrivialGPUMapElimination`, `SerialMapPromoter`).
- [x] Looking at strides that the Lowering generates. (Partly done)

However, it still uses MapFusion implementation that ships with GT4Py
and not the one in DaCe.


Note:
Because of commit 60e4226 this PR must be merged after
[PR1768](#1768).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants