Skip to content

Files

Latest commit

Mihail Popovpablooliveira
Mihail Popov
and
Nov 13, 2014
9681631 · Nov 13, 2014

History

History
This branch is 1 commit ahead of benchmark-subsetting/NPB3.0-omp-C:master.

CG

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Nov 13, 2014
Nov 13, 2014
Nov 13, 2014
Note: please observe that in the routine conj_grad three 
implementations of the sparse matrix-vector multiply have
been supplied.  The default matrix-vector multiply is not
loop unrolled.  The alternate implementations are unrolled
to a depth of 2 and unrolled to a depth of 8.  Please
experiment with these to find the fastest for your particular
architecture.  If reporting timing results, any of these three may
be used without penalty.

Performance examples:
The non-unrolled version of the multiply is actually (slightly: 
maybe %5) faster on the sp2-66MHz-WN on 16 nodes than is the 
unrolled-by-2 version below.   On the Cray t3d, the reverse is true, 
i.e., the unrolled-by-two version is some 10% faster.  
The unrolled-by-8 version below is significantly faster
on the Cray t3d - overall speed of code is 1.5 times faster.