Skip to content
This repository has been archived by the owner on Nov 27, 2024. It is now read-only.

[spec] Caching

kynan edited this page Feb 23, 2013 · 1 revision

This page discusses caching of plan objects and generated code regarding the OpenCL backend.

plan object caching

plan characterisation

The main characteristics of a plan object are:

  • partitioning of the iteration space, ie, number of partitions / partition size.
  • an execution scheme for the staging of data in local memory, ie, ind and loc maps.
  • an execution scheme for the par_loop, ie, blockmap, and block and thread colouring.

These depends on the characteristics of ParLoopCall objects:

  1. Iteration space size: affect the number of partitions.
  2. Global reduction arguments' datatype size: affect the partition size because of the storage required for the on device reduction part.
  3. Global read and direct Dat arguments' datatype size. affect the partition size because of extra kernel stub arguments that are passed through local memory.
  4. Indirect Dat arguments' datatype size and dim: affect the partition size because of the local memory space required for staging.
  5. Indirect Dat arguments' mapping values and indices: affect the content of ind and loc maps.
  6. Indirect reduction Dat arguments' mapping values: affect the colouring scheme.
  7. The order of appearance of the first occurrence of a (Dat, Map) pair for Indirect Dat arguments: affect the content of the ind map.
  8. The order of Indirect Dat arguments; indices for a given (Dat, Map) pair: affect the content of the loc maps.

parloop characterisation

In order to cache plan object in the OpenCL backend; ParLoopCall indirect Dat arguments are sorted by (Dat, Map) (7. and 8.), and provide a canonical representation of its arguments wrt plan caching (method _plan_key), that representation includes:

  • The size of the iteration space (1.)
  • The partition size (2., 3., and 4.)
  • For (5.), for each mapping, a tuple of the map values' md5 digest and of a the list of indices appearing in the args.
  • For (6.), for each Dat that is indirectly reduced, the list of tuple of map values' md5 digest and of indices through which Dat is reduced.

This canonical representation is laid as follows:

(iteraction_space_size, partition_size, [(map.md5digest, [idx])], [[(map.md5digest, [idx])]])

notes

  • do we actually need to digest entire mapping values ? can we digest only the elements that are actually indexed ?

generated code caching

The generated code for the execution of a par_loop depends on the characteristics of ParLoopCall:

  1. The user kernel as it is directly inlined in the generated code.
  2. The name of the user kernel function: affect the call statement inside the kernel stub.
  3. The type of par_loop: direct or indirect.
  4. The Const objects of the user program: argument of the kernel stub and user kernel.
  5. The dimension of Const objects (scalar or not): scalar const are passed by value to the user kernel while non scalar are passed as pointers.
  6. Argument type (Dat, Global, Mat): each type has specific generate code logic.
  7. Argument data type: affect type declaration for argument, local variables etc.
  8. Staged Dat argument dimension: affect staging code.
  9. Global reduction argument dimension: affect work-grou-wide reduction code.
  10. Staged Dat argument access mode: (R,RW: need staging-in code, RW,W: need staging-out code, INC: need coloured execution scheme.
  11. The unique Dat-Map pair: affect the number of ind map.
  12. The indirect Dat arguments: affect which ind and loc map used.
  13. Vec maps argument dimension: affect the code populating the local vec map array.
  14. The order of the arguments: affect the user kernel call statement.
  15. The iteration space extents.

parloop characterisation

In order to cache generated code in the OpenCL backend; the canonical representation of a ParLoopCall wrt generated code caching comprises:

  • An md5 digest of the user_kernel code (before instrumentation) and of the user kernel name (1. and 2.)
  • When present the iteration space's extents (15.) or an empty tuple
  • For each Const: a tuple (name, is_scalar) (4. and 5.)
  • For each arguments:
    • a tuple (type, dimension, access) (3.,6.,7.,8., 9., and 10.)
    • an indirect description value: (-1) for global and direct dats, the position of the Map in the order of appearance of Maps in the arguments (3., 11., 12., and 14.), of the negative value of the map dimension for vector arguments (13.).

This canonical representation is laid as follows:

(user_kernel_code_and_name_digest, extents, [(argtype, argdim, argacc, indvalue)], [(const_name, const_is_scalar)])

notes

  • note: the key is redundant wrt to (3.).