Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global Array and Intel parallel research kernels #206

Open
wants to merge 41 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
9746918
add directory for intel kernels
nelsonje Jul 17, 2014
cf83315
first pass at global array dereference
nelsonje Aug 19, 2014
2eac2f1
First pass at GlobalArray distributions
nelsonje Aug 19, 2014
7d4b59e
need to fix dereference so it actually computes correct element
nelsonje Aug 19, 2014
a59044f
Improve allocation
nelsonje Aug 19, 2014
b9b6914
Add iteration
nelsonje Aug 19, 2014
5f7f2e4
split out distributions
nelsonje Aug 19, 2014
881bbb4
whew, something that works
nelsonje Sep 11, 2014
54215d5
add GlobalArray test
nelsonje Sep 11, 2014
b946908
add Intel synch_p2p kernel
nelsonje Sep 11, 2014
ee2f742
global array updates for multi-node
nelsonje Sep 11, 2014
3ae0c68
whoops, thinking about distribution backwards
nelsonje Sep 12, 2014
c5086ee
a little closer to working
nelsonje Sep 16, 2014
fe6ca1b
adjust iteration to match indexing
nelsonje Sep 16, 2014
52e2d91
rename dimension flags, add prototype of additional pattern for stenc…
nelsonje Sep 16, 2014
c5c598d
fixed last-core unbalance problem
nelsonje Sep 16, 2014
f120034
some cleanup
nelsonje Sep 16, 2014
c5dd712
sort of working
nelsonje Sep 18, 2014
e376c13
Modify Linear addresses to support creation on a different core than …
nelsonje Sep 18, 2014
0d14b0f
MUCH simpler way to deal with addressing using Linear addresses
nelsonje Sep 18, 2014
fe5eef2
blocking and feed forward
nelsonje Sep 20, 2014
5c1c812
parameterize index type
nelsonje Sep 21, 2014
dba8a50
some updates
nelsonje Sep 21, 2014
a3ea8cb
a mpi-like version
nelsonje Sep 21, 2014
57f6876
try semaphore-based forward
nelsonje Oct 1, 2014
97b15b5
a couple hacks
nelsonje Oct 2, 2014
1982cd3
a little cleanup
nelsonje Oct 23, 2014
c074048
Merge branch 'master' into nelson+GlobalArray
nelsonje Feb 6, 2015
3f3916b
use intel license instead for p2p kernels
nelsonje Mar 13, 2015
af6dc9f
Make n represent row count and m column count; fix some row- vs. colu…
nelsonje Mar 13, 2015
1dd7f9c
fix cores == columns case
nelsonje Mar 16, 2015
6367649
disable route graph output
nelsonje Mar 18, 2015
9e4f293
enable local assignment of global array cell
nelsonje Mar 18, 2015
547d732
Support symmetric allocation of non-multiple-of-64-byte data without …
nelsonje Mar 20, 2015
722dbea
some examples of symmetric allocation
nelsonje Mar 20, 2015
292d9e0
refine examples a bit
nelsonje Mar 20, 2015
159e395
make Grappa namespace explicit in examples
nelsonje Mar 20, 2015
6890595
use remote blocking readFF
nelsonje Mar 20, 2015
f3e06e8
minor p2p-border changes
nelsonje Mar 25, 2015
60e92c5
Merge master
nelsonje Mar 25, 2015
6b70bc7
Merge branch 'master', remote branch 'origin' into nelson+GlobalArray
nelsonje Mar 25, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions applications/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ add_subdirectory(join)
add_subdirectory(isopath)
add_subdirectory(graphlab)
add_subdirectory(util)
add_subdirectory(intelParRes)
12 changes: 12 additions & 0 deletions applications/intelParRes/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@

file(GLOB PARRES
"*.cpp"
"*/*.cpp"
)

# make separate build targets for each BFS variant
foreach(file ${PARRES})
get_filename_component(base ${file} NAME_WE)
add_grappa_exe(parres-${base} ${base}.exe ${file})
set_property(TARGET ${name} PROPERTY FOLDER "Applications")
endforeach()
222 changes: 222 additions & 0 deletions applications/intelParRes/synch_p2p/p2p-border.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,222 @@
////////////////////////////////////////////////////////////////////////
// Copyright (c) 2013, Intel Corporation
// Copyright (c) 2014, Jacob Nelson
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions
// are met:
//
// * Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// * Redistributions in binary form must reproduce the above
// copyright notice, this list of conditions and the following
// disclaimer in the documentation and/or other materials provided
// with the distribution.
// * Neither the name of Intel Corporation nor the names of its
// contributors may be used to endorse or promote products
// derived from this software without specific prior written
// permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
// FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
// COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
// BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
// LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
// CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
// LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
// ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
// POSSIBILITY OF SUCH DAMAGE.
////////////////////////////////////////////////////////////////////////

#include <Grappa.hpp>
#include <FullEmpty.hpp>
#include <array/GlobalArray.hpp>

using namespace Grappa;

DEFINE_uint64(n, 4, "number of rows in array");
DEFINE_uint64(m, 4, "number of columns in array");
DEFINE_uint64(iterations, 1, "number of iterations");
DEFINE_string(pattern, "border", "what pattern of kernel should we run?");

#define ARRAY(i,j) (local[(j)+((i)*dim2_percore)])

//#define ARRAY(i,j) (local[1+j+i*(dim2_percore)])

double * local = NULL;
int dim1_size = 0;
int dim2_size = 0;
int dim2_percore = 0;

GlobalArray< double, int, Distribution::Local, Distribution::Block > ga;
FullEmpty<double> * lefts = NULL;

double iter_time = 0.0;

int main( int argc, char * argv[] ) {
init( &argc, &argv );

run([]{
LOG(INFO) << "Grappa pipeline stencil execution on 2D ("
<< FLAGS_m << "x" << FLAGS_n
<< ") grid";

ga.allocate( FLAGS_n, FLAGS_m );
on_all_cores( [] {
lefts = new FullEmpty<double>[FLAGS_n];
} );

double avgtime = 0.0;
double mintime = 366.0*24.0*3600.0;
double maxtime = 0.0;

// initialize
LOG(INFO) << "Initializing....";
forall( ga, [] (int i, int j, double& d) {
if( i == 0 ) {
d = j;
} else if ( j == 0 ) {
d = i;
} else {
d = 0.0;
}
// LOG(INFO) << "Initx: ga(" << i << "," << j << ")=" << d;
});
on_all_cores( [] {
for( int i = 0; i < FLAGS_n; ++i ) {
if( (Grappa::mycore() == 0) ||
(Grappa::mycore() == 1 && ga.dim2_percore == 1 ) ) {
lefts[i].writeXF(i);
} else {
if( i == 0 ) {
lefts[i].writeXF( ga.local_chunk[0] - 1 ); // one to the left of our first element
} else {
lefts[i].reset();
}
}
}
} );

LOG(INFO) << "Running " << FLAGS_iterations << " iterations....";
double total_start = Grappa::walltime();
for( int iter = 0; iter < FLAGS_iterations; ++iter ) {
on_all_cores( [] {
for( int i = 1; i < FLAGS_n; ++i ) {
if( ! (Grappa::mycore() == 0) ) {
lefts[i].reset();
}
}
} );

// execute kernel
VLOG(2) << "Starting iteration " << iter;
double start = Grappa::walltime();

if( FLAGS_pattern == "border" ) {

finish( [] {
on_all_cores( [] {
local = ga.local_chunk;
dim1_size = ga.dim1_size;
dim2_size = ga.dim2_size;
dim2_percore = ga.dim2_percore;
int first_j = Grappa::mycore() * dim2_percore;

iter_time = Grappa::walltime();
for( int i = 1; i < dim1_size; ++i ) {

// prepare to iterate over this segment
double left = readFF( &lefts[i] );
double diag = readFF( &lefts[i-1] );
double up = 0.0;
double current = i;

for( int j = 0; j < dim2_percore; ++j ) {
int actual_j = j + first_j;
if( actual_j > 0 ) {
// compute this cell's value
up = local[ (i-1)*dim2_percore + j ];
current = up + left - diag;

// update for next iteration
diag = up;
left = current;

// write value
local[ (i)*dim2_percore + j ] = current;
}
}

// if we're at the end of a segment, write to corresponding full bit
if( Grappa::mycore()+1 < Grappa::cores() ) {
delegate::call<async>( Grappa::mycore()+1,
[=] () {
writeXF( &lefts[i], current );
} );
}

}
iter_time = Grappa::walltime() - iter_time;
iter_time = reduce<double,collective_min<double>>( &iter_time );

} );
} );

// forall( ga, [] (int i, int j, double& d) {
// LOG(INFO) << "Done: ga(" << i << "," << j << ")=" << ARRAY(i,j);
// } );

} else {
LOG(FATAL) << "unknown kernel pattern " << FLAGS_pattern << "!";
}
double end = Grappa::walltime();

if( iter > 0 || FLAGS_iterations == 1 ) { // skip the first iteration
//double time = end - start;
double time = iter_time;
avgtime += time;
mintime = std::min( mintime, time );
maxtime = std::max( maxtime, time );
}

on_all_cores( [] {
VLOG(2) << "done with this iteration";
} );

// copy top right corner value to bottom left corner to create dependency
VLOG(2) << "Adding end-to-end dependence for iteration " << iter;
double val = delegate::read( &ga[ FLAGS_n-1 ][ FLAGS_m-1 ] );
delegate::write( &ga[0][0], -1.0 * val );
delegate::call( 0, [val] { lefts[0].writeXF( -1.0 * val ); } );
if( ga.dim2_percore == 1 ) delegate::call( 1, [val] { lefts[0].writeXF( -1.0 * val ); } );
VLOG(2) << "Done with iteration " << iter;
}
double total_time = Grappa::walltime() - total_start;

avgtime /= (double) std::max( FLAGS_iterations-1, static_cast<google::uint64>(1) );
LOG(INFO) << "Rate (MFlops/s): " << 1.0E-06 * 2 * ((double)(FLAGS_m-1)*(FLAGS_n-1)) / mintime
<< ", Avg time (s): " << avgtime
<< ", Min time (s): " << mintime
<< ", Max time (s): " << maxtime;
LOG(INFO) << "Total time: " << total_time;
LOG(INFO) << "Overall rate (MFlops/s): " << 1.0E-06 * 2 * ((double)(FLAGS_m-1)*(FLAGS_n-1)) / (total_time / FLAGS_iterations);

// verify result
VLOG(2) << "Verifying result";
double expected_corner_val = (double) FLAGS_iterations * ( FLAGS_m + FLAGS_n - 2 );
double actual_corner_val = delegate::read( &ga[ FLAGS_n-1 ][ FLAGS_m-1 ] );
CHECK_DOUBLE_EQ( actual_corner_val, expected_corner_val );

on_all_cores( [] {
if( lefts ) delete [] lefts;
} );
ga.deallocate( );

LOG(INFO) << "Done.";
});
finalize();
return 0;
}
Loading