Skip to content

Commit

Permalink
Add multidimensional maps to GPU docs (#1608)
Browse files Browse the repository at this point in the history
  • Loading branch information
tbennun authored Jun 27, 2024
1 parent 6fa0212 commit fb074f2
Showing 1 changed file with 8 additions and 0 deletions.
8 changes: 8 additions & 0 deletions doc/optimization/gpu.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,14 @@ If multiple thread-block maps are present, the maximum of their parameters will
smaller map will have an ``if`` condition predicating a subset of the threads to work. This enables optimizing programs
via thread/warp specialization.

**Multi-Dimensional Maps**: If a Map scope is multi-dimensional, the code generator will map the order of the block and grid dimensions
to the _reversed_ order of the map dimensions. This means that the last map dimension (which would correspond to, e.g., the most internal loop
in CPU schedules) is ``{block,thread}Idx.x``. Any dimension of the Map scope beyond the third dimension will by default be linearized into
``{block,thread}Idx.z``. This can sometimes result in slower code, as recovering the index in the kernel code involves delinearization,
which uses modulo operations. For example, the Map scope ``for i, j, k, l in dace.map[0:N, 0:M, 0:K, 0:L]`` will result in ``threadIdx.x``
mapping to ``l``, ``threadIdx.y`` to ``k``, and ``threadIdx.z``'s range will span ``N * M`` and map ``threadIdx.z % M`` to ``j`` and
``threadIdx.z / M`` to ``i``.

Some examples of Example of an SDFG **without** a GPU thread-block map and its generated code:

.. raw:: html
Expand Down

0 comments on commit fb074f2

Please sign in to comment.