Use Numpy Arrays for Single Process per Node
Even in the MPI case, if there is only one process in the node shared communicator, then use a simple numpy array. This reduces the total number of shared memory segments, which is limited by the kernel in terms of maximum number of open files (since shared memory segments appear as files to userspace).