[spec] Sequential backend

Dependencies

To run the sequential backend, you need

instant apt-get install python-instant
C compiler

How it works

Rather than just compiling the user’s kernel with instant and carrying out the loop over set elements in python, we push the entire loop down into C. This is orders of magnitude faster because we don’t have python overheads in the inner loop.

Passing data to C

The kernel we compile looks like:

void wrap_user_kernel(PyObject *arg1, PyObject *arg2, ..., PyObject *argn)
{
   T1 *arg1_data = (T1 *)(((PyArrayObject *)arg1)->data);
   ...
   for ( int i = 0; i < set_size; i++ ) {
       user_kernel(...);
   }
}

For every argument to the parallel loop with pass a PyObject * which is a pointer to the data. In addition, for indirect arguments only, we pass an additional PyObject * which is a pointer to the map.

No attempt is made to determine unique Dats and Maps to pass. So multiple data and map pointers may point to the same underlying piece of memory.

For an indirect argument a with map m accessing index idx the user kernel gets a pointer to the data:

a.data + m.values[i * m.dim + idx] * a.dim

For vector maps, we declare an additional local variable in the kernel wrapper:

T *argN_vec[N];

Inside the kernel loop, this is populated with the correct data pointers (as above).

Caveats (interacting with Python)

OP2, and thus PyOP2 assume that data is actually stored as 1D arrays, irrespective of the dimension of the OP2-level data object. We must therefore be careful when populating data structures that the numpy data are laid out in C order. As an example, the following will not work as expected:

import numpy as np
map_vals = np.array([range(10), range(1, 11)], dtype=int).transpose()
...
# instantiate op2 data structures.

Although when we print map_vals we see it nicely displayed in what looks like C order:

In [9]: a
Out[9]: 
array([ [ 0,  1],
       [ 1,  2],
       [ 2,  3],
       [ 3,  4],
       [ 4,  5],
       [ 5,  6],
       [ 6,  7],
       [ 7,  8],
       [ 8,  9],
       [ 9, 10] ])

in actuality, the data are stored as:

In [11]: a.T
Out[11]: 
array([ [ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10] ])

To avoid this, either create the array correctly in the first place:

a = np.array([(i, i+1) for i in range(10)], dtype=int)

or explicitly copy into the displayed order before passing to OP2 datastructures

a = np.array([range(10), range(1, 11)], dtype=int).transpose()
a = a.ravel()

Debugging the sequential backend

You can debug the generated code with gdb. This requires a few modifications to sequential.py to add debug symbols and so forth to the generated code. Add:

cppargs=['-g', '-O0'], modulename=kernel._name

to the inline_with_numpy call. This makes instant compile without optimisations and with debug symbols. Additionally, the generated code is put in the module given by modulename in the current directory, rather than instant’s cache. To debug your failing example, that executes a kernel foo in a file foo.py do:

$ gdb --args python foo.py
...
(gdb) break wrap_foo__
Function "wrap_foo__" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
(gdb) r
...
inspect as normal

Provide feedback

Saved searches