Skip to content

(Updated) Benchmark of fgemv for Givaro::Integer in the field of RNS on a multicore server

ZHG2017 edited this page Jun 25, 2019 · 8 revisions

Note p = (0 for sequential, 1 for <RNSModulus, grain>, 2 for <RNSModulus, threads>, 3 for ParSeqHelper::Compose<ParSeqHelper::Parallel<FFLAS::CuttingStrategy::RNSModulus, grain>, ParSeqHelper::Parallel<rec, StrategyParameter::TwoDAdaptive>>)

Benchmark using OpenMP

OMP_NUM_THREADS=1 for 4000x4000 and 100 bits

Time method m(dimension m of the matrix) k(dimension k of the matrix) i(number of repetitions)
3.76299 0 4000 4000 3
FGEMM_MP: InfNorm compute bound on the output 166ms
--------------------------------------------
rns_double::init  FOR1D loop: 658ms
--------------------------------------------
rns_double::init  Arns = _crt_in x A_beta^T or Arns =  A_beta x _crt_in^T : 1156ms
--------------------------------------------
rns_double::init  RNS freduce : 281ms
==========================================
Pointwise fgemm : 1.4144 (12) moduli 
==========================================

OMP_NUM_THREADS=8 for 4000x4000 and 100 bits

Using Parallel<Block,Threads> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T

Time method m(dimension m of the matrix) k(dimension k of the matrix) i(number of repetitions)
1.75352 1 4000 4000 3
1.82197 2 4000 4000 3
1.84784 3 4000 4000 3

Using Parallel<Recursive,ThreeDAdaptive>for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T

Time method m(dimension m of the matrix) k(dimension k of the matrix) i(number of repetitions)
2.5914 1 4000 4000 3
2.61315 2 4000 4000 3
2.59683 3 4000 4000 3

Using Parallel<Recursive,TWoD>for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T

Time method m(dimension m of the matrix) k(dimension k of the matrix) i(number of repetitions)
1.98691 1 4000 4000 3
1.75898 2 4000 4000 3
1.74864 3 4000 4000 3

Using ParallelParallel<Recursive,TwoDAdaptive>for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T

Time method m(dimension m of the matrix) k(dimension k of the matrix) i(number of repetitions)
1.68973 1 4000 4000 3
1.85503 2 4000 4000 3
1.66372 3 4000 4000 3

Using Parallel<Recursive,ThreeDInPlace> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T

Time method m(dimension m of the matrix) k(dimension k of the matrix) i(number of repetitions)
2.4082 1 4000 4000 3
2.58037 2 4000 4000 3
2.53939 3 4000 4000 3

Using Parallel<Recursive,ThreeDInPlace> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T

Time method m(dimension m of the matrix) k(dimension k of the matrix) i(number of repetitions)
2.66199 1 4000 4000 3
2.65415 2 4000 4000 3
2.69702 3 4000 4000 3

Detailed timming Using Parallel<Recursive,TwoDAdaptive> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T

Method 1

FGEMM_MP: InfNorm compute bound on the output 182ms
--------------------------------------------
rns_double::init  FOR1D loop: 128ms
--------------------------------------------
rns_double::init  Arns = _crt_in x A_beta^T or Arns =  A_beta x _crt_in^T : 434ms
--------------------------------------------
rns_double::init  RNS freduce : 484ms
==========================================
Pointwise fgemm : 0.487823 (12) moduli 
==========================================

Method 2

FGEMM_MP: InfNorm compute bound on the output 213ms
--------------------------------------------
rns_double::init  FOR1D loop: 151ms
--------------------------------------------
rns_double::init  Arns = _crt_in x A_beta^T or Arns =  A_beta x _crt_in^T : 423ms
--------------------------------------------
rns_double::init  RNS freduce : 333ms
==========================================
Pointwise fgemm : 0.423272 (12) moduli 
==========================================

Method 3

FGEMM_MP: InfNorm compute bound on the output 197ms
--------------------------------------------
rns_double::init  FOR1D loop: 157ms
--------------------------------------------
rns_double::init  Arns = _crt_in x A_beta^T or Arns =  A_beta x _crt_in^T : 379ms
--------------------------------------------
rns_double::init  RNS freduce : 489ms
==========================================
Pointwise fgemm : 0.47559 (12) moduli 
==========================================

Detailed timming about further parallelized InfNorm() Using Parallel<Recursive,TwoDAdaptive> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T

Method 1

FGEMM_MP: InfNorm compute bound on the output 141ms
--------------------------------------------
rns_double::init  FOR1D loop: 165ms
--------------------------------------------
rns_double::init  Arns = _crt_in x A_beta^T or Arns =  A_beta x _crt_in^T : 325ms
--------------------------------------------
rns_double::init  RNS freduce : 363ms
==========================================
Pointwise fgemm : 0.481474 (12) moduli 
==========================================

Method 2

FGEMM_MP: InfNorm compute bound on the output 96ms
--------------------------------------------
rns_double::init  FOR1D loop: 176ms
--------------------------------------------
rns_double::init  Arns = _crt_in x A_beta^T or Arns =  A_beta x _crt_in^T : 479ms
--------------------------------------------
rns_double::init  RNS freduce : 548ms
==========================================
Pointwise fgemm : 0.47297 (12) moduli 
==========================================

Method 3

FGEMM_MP: InfNorm compute bound on the output 82ms
--------------------------------------------
rns_double::init  FOR1D loop: 170ms
--------------------------------------------
rns_double::init  Arns = _crt_in x A_beta^T or Arns =  A_beta x _crt_in^T : 456ms
--------------------------------------------
rns_double::init  RNS freduce : 450ms
==========================================
Pointwise fgemm : 0.480714 (12) moduli 
==========================================

Using ParallelParallel<Recursive,TwoDAdaptive> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T with further parallelized InfNorm()

Time method m(dimension m of the matrix) k(dimension k of the matrix) i(number of repetitions)
1.71206 1 4000 4000 10
1.59091 2 4000 4000 10
1.63789 3 4000 4000 10

Detailed timming about further parallelized InfNorm() and further parallelized freduce Using Parallel<Recursive,TwoDAdaptive> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T

Method 1

FGEMM_MP: InfNorm compute bound on the output 112ms
--------------------------------------------
rns_double::init  FOR1D loop: 162ms
--------------------------------------------
rns_double::init  Arns = _crt_in x A_beta^T or Arns =  A_beta x _crt_in^T : 406ms
--------------------------------------------
rns_double::init  RNS freduce : 296ms
==========================================
Pointwise fgemm : 0.452811 (12) moduli 
==========================================

Method 2

FGEMM_MP: InfNorm compute bound on the output 118ms
--------------------------------------------
rns_double::init  FOR1D loop: 163ms
--------------------------------------------
rns_double::init  Arns = _crt_in x A_beta^T or Arns =  A_beta x _crt_in^T : 430ms
--------------------------------------------
rns_double::init  RNS freduce : 432ms
==========================================
Pointwise fgemm : 0.457114 (12) moduli 
==========================================

Method 3

FGEMM_MP: InfNorm compute bound on the output 84ms
--------------------------------------------
rns_double::init  FOR1D loop: 139ms
--------------------------------------------
rns_double::init  Arns = _crt_in x A_beta^T or Arns =  A_beta x _crt_in^T : 382ms
--------------------------------------------
rns_double::init  RNS freduce : 306ms
==========================================
Pointwise fgemm : 0.466784 (12) moduli 
==========================================

Using ParallelParallel<Recursive,TwoDAdaptive> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T with further parallelized InfNorm() and further parallelized freduce

Time method m(dimension m of the matrix) k(dimension k of the matrix) i(number of repetitions)
1.57949 1 4000 4000 10
1.56238 2 4000 4000 10
1.48942 3 4000 4000 10
Clone this wiki locally