Skip to content

Commit

Permalink
some clarifications
Browse files Browse the repository at this point in the history
  • Loading branch information
kaushikcfd committed Feb 11, 2020
1 parent f50af5e commit 6b665db
Showing 1 changed file with 6 additions and 7 deletions.
13 changes: 6 additions & 7 deletions pyop2/gpu/TODO.org
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
* Limitations/TODOs
** Changes in TSFC so that PyOP2 could have a better understanding of the variable names
- [[https://github.com/OP2/PyOP2/blob/630e55118013966e84dcc62328c45fc9061196e6/pyop2/gpu/tile.py#L65-L79][Currently]] variable names have been hard coded for CG type FE kernel on
triangular meshes.
- Once this has been done it would then be reasonable to tackle other elements
- [[https://github.com/OP2/PyOP2/blob/f50af5e819e726b97b1997f00b1ad4f66b0b574b/pyop2/gpu/tile.py#L117][Currently]], we go through a phase of metadata inference assuming a homogeneity
of kernel structure.
- Once this has been done it would then be reasonable to tackle more elements

*** Information to be fed from TSFC
- [ ] variable name of the action input
Expand Down Expand Up @@ -38,7 +38,7 @@ we are going from GEM representation to loopy kernel.

** Global reduction kernels. For ex. ~assemble(dot(f,f)*dx)~
- Currently all the threads write to a single memory location atomically,
thereby losing concurrency.
thereby losing some concurrency.
- Possible solution:
- Fix the block size, say 256.
- Map single cell to single thread.
Expand All @@ -47,9 +47,8 @@ we are going from GEM representation to loopy kernel.
- Finally another reduction across the newly created intermediary variable.
- One starting step would be to map the '+=' to a loopy's sum node.

** Do we need atomic additions of the output DoFs for a DG kernel?

** Tiling transformation logic fails for low orders
** Atomic scatter redutions for DG elements, necessary?
** Inner loop parallelization logic fails for low orders
- The received TSFC kernel has a slightly different representation at low orders
like P_0, P_1, DG0, DG1, etc. because some loops are unrolled, causing to
diverge from the "assumed" template of all the kernel's loop structures.
Expand Down

0 comments on commit 6b665db

Please sign in to comment.