AoS-to-SoA conversion #4

jeffhammond · 2015-10-28T20:44:35Z

Feel free to reject this one, since it does not make a huge difference on performance. However, the AoS-to-SoA conversion is a well-known optimization that is worth illustrating in a code like this.

This optimization should also help in the CUDA version, but I don't plan to submit those changes, since Intel employees are discouraged from writing CUDA ;-)

AoS = array of structs, the way your code was before.
SoA = struct of arrays, as demonstrated in these changes.

…he default appears optimal

CorySimon · 2015-10-29T23:30:57Z

@jeffhammond Thanks. Regrading the SoA versus AoS, I was thinking that AoS

struct StructureAtom {
    // Cartesian position, units: A
    double x;
    double y;
    double z;
    // Lennard-Jones epsilon parameter with adsorbate
    double epsilon;  // units: K
    // Lennard-Jones sigma parameter with adsorbate
    double sigma;  // units: A
};

StructureAtom * structureatoms;

would be more efficient since I loop over the each atom during the simulations and access all the data for a given atom inside the loop. AoS makes the memory for a given atom contiguous, so this is the reason that I would think AoS is more efficient.

Or is the reason that SoA is more efficient here due to padding/alignment of memory?

Source

jeffhammond · 2015-10-29T23:36:12Z

I don't see any performance benefit associated with this optimization right now, so empirically it is not significant. However, SoA is more amenable to vectorization, because there will be contiguous memory streams associated with all five struct members. In the AoS version, accesses are not contiguous and not aligned.

I suspect that the effects of SoA will be more visible on an architecture like Xeon Phi, but I have not measured this yet.

jeffhammond added 14 commits October 28, 2015 06:30

fix build system and rename files appropriately

7935deb

change run script to only do CPU

c47ae84

add option for C++ restrict

a049ac5

private data as much as possible

f43a404

whitespace cleanup

f67c970

line break after output

525d329

add OpenMP timing and print FOM

809112a

bug fix - copy and paste error dy->dz

6ec0d41

add optimizations to kernel

6915743

print thread count at top

67dc03b

nested OMP needs work

20adf17

allow user to use both inner and outer loop thread parallelism, but t…

73f76a8

…he default appears optimal

convert AoS to SoA

486644b

shared required when passing by reference, which is worth ~10% it seems

2d05533

jeffhammond added 6 commits October 30, 2015 14:01

merge in scripts that work

e006484

add figure

908615c

label figure correctly

3befd84

better label

bfb3d92

range 8000 from 9000

f9798ee

range 8000 from 9000 - fig

7234b5e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AoS-to-SoA conversion #4

AoS-to-SoA conversion #4

jeffhammond commented Oct 28, 2015

CorySimon commented Oct 29, 2015

jeffhammond commented Oct 29, 2015

AoS-to-SoA conversion #4

Are you sure you want to change the base?

AoS-to-SoA conversion #4

Conversation

jeffhammond commented Oct 28, 2015

CorySimon commented Oct 29, 2015

jeffhammond commented Oct 29, 2015