(Updated) Benchmark of fgemv for Givaro::Integer in the field of RNS on a multicore server

Note p = (0 for sequential, 1 for <RNSModulus, grain>, 2 for <RNSModulus, threads>, 3 for ParSeqHelper::Compose<ParSeqHelper::Parallel<FFLAS::CuttingStrategy::RNSModulus, grain>, ParSeqHelper::Parallel<rec, StrategyParameter::TwoDAdaptive>>)

Benchmark using OpenMP

OMP_NUM_THREADS=1 for 4000x4000 and 100 bits

Time	method	m(dimension m of the matrix)	k(dimension k of the matrix)	i(number of repetitions)
3.76299	0	4000	4000	3

FGEMM_MP: InfNorm compute bound on the output 166ms
--------------------------------------------
rns_double::init  FOR1D loop: 658ms
--------------------------------------------
rns_double::init  Arns = _crt_in x A_beta^T or Arns =  A_beta x _crt_in^T : 1156ms
--------------------------------------------
rns_double::init  RNS freduce : 281ms
==========================================
Pointwise fgemm : 1.4144 (12) moduli 
==========================================

OMP_NUM_THREADS=8 for 4000x4000 and 100 bits

Using Parallel<Block,Threads> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T

Time	method	m(dimension m of the matrix)	k(dimension k of the matrix)	i(number of repetitions)
1.75352	1	4000	4000	3
1.82197	2	4000	4000	3
1.84784	3	4000	4000	3

Using Parallel<Recursive,ThreeDAdaptive>for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T

Time	method	m(dimension m of the matrix)	k(dimension k of the matrix)	i(number of repetitions)
2.5914	1	4000	4000	3
2.61315	2	4000	4000	3
2.59683	3	4000	4000	3

Using Parallel<Recursive,TWoD>for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T

Time	method	m(dimension m of the matrix)	k(dimension k of the matrix)	i(number of repetitions)
1.98691	1	4000	4000	3
1.75898	2	4000	4000	3
1.74864	3	4000	4000	3

Using ParallelParallel<Recursive,TwoDAdaptive>for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T

Time	method	m(dimension m of the matrix)	k(dimension k of the matrix)	i(number of repetitions)
1.68973	1	4000	4000	3
1.85503	2	4000	4000	3
1.66372	3	4000	4000	3

Using Parallel<Recursive,ThreeDInPlace> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T

Time	method	m(dimension m of the matrix)	k(dimension k of the matrix)	i(number of repetitions)
2.4082	1	4000	4000	3
2.58037	2	4000	4000	3
2.53939	3	4000	4000	3

Using Parallel<Recursive,ThreeDInPlace> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T

Time	method	m(dimension m of the matrix)	k(dimension k of the matrix)	i(number of repetitions)
2.66199	1	4000	4000	3
2.65415	2	4000	4000	3
2.69702	3	4000	4000	3

Detailed timming Using Parallel<Recursive,TwoDAdaptive> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T

Method 1

FGEMM_MP: InfNorm compute bound on the output 182ms
--------------------------------------------
rns_double::init  FOR1D loop: 128ms
--------------------------------------------
rns_double::init  Arns = _crt_in x A_beta^T or Arns =  A_beta x _crt_in^T : 434ms
--------------------------------------------
rns_double::init  RNS freduce : 484ms
==========================================
Pointwise fgemm : 0.487823 (12) moduli 
==========================================

Method 2

FGEMM_MP: InfNorm compute bound on the output 213ms
--------------------------------------------
rns_double::init  FOR1D loop: 151ms
--------------------------------------------
rns_double::init  Arns = _crt_in x A_beta^T or Arns =  A_beta x _crt_in^T : 423ms
--------------------------------------------
rns_double::init  RNS freduce : 333ms
==========================================
Pointwise fgemm : 0.423272 (12) moduli 
==========================================

Method 3

FGEMM_MP: InfNorm compute bound on the output 197ms
--------------------------------------------
rns_double::init  FOR1D loop: 157ms
--------------------------------------------
rns_double::init  Arns = _crt_in x A_beta^T or Arns =  A_beta x _crt_in^T : 379ms
--------------------------------------------
rns_double::init  RNS freduce : 489ms
==========================================
Pointwise fgemm : 0.47559 (12) moduli 
==========================================

Detailed timming about further parallelized InfNorm() Using Parallel<Recursive,TwoDAdaptive> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T

Method 1

FGEMM_MP: InfNorm compute bound on the output 141ms
--------------------------------------------
rns_double::init  FOR1D loop: 165ms
--------------------------------------------
rns_double::init  Arns = _crt_in x A_beta^T or Arns =  A_beta x _crt_in^T : 325ms
--------------------------------------------
rns_double::init  RNS freduce : 363ms
==========================================
Pointwise fgemm : 0.481474 (12) moduli 
==========================================

Method 2

FGEMM_MP: InfNorm compute bound on the output 96ms
--------------------------------------------
rns_double::init  FOR1D loop: 176ms
--------------------------------------------
rns_double::init  Arns = _crt_in x A_beta^T or Arns =  A_beta x _crt_in^T : 479ms
--------------------------------------------
rns_double::init  RNS freduce : 548ms
==========================================
Pointwise fgemm : 0.47297 (12) moduli 
==========================================

Method 3

FGEMM_MP: InfNorm compute bound on the output 82ms
--------------------------------------------
rns_double::init  FOR1D loop: 170ms
--------------------------------------------
rns_double::init  Arns = _crt_in x A_beta^T or Arns =  A_beta x _crt_in^T : 456ms
--------------------------------------------
rns_double::init  RNS freduce : 450ms
==========================================
Pointwise fgemm : 0.480714 (12) moduli 
==========================================

Using ParallelParallel<Recursive,TwoDAdaptive> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T with further parallelized InfNorm()

Time	method	m(dimension m of the matrix)	k(dimension k of the matrix)	i(number of repetitions)
1.71206	1	4000	4000	10
1.59091	2	4000	4000	10
1.63789	3	4000	4000	10

Detailed timming about further parallelized InfNorm() and further parallelized freduce Using Parallel<Recursive,TwoDAdaptive> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T

Method 1

FGEMM_MP: InfNorm compute bound on the output 112ms
--------------------------------------------
rns_double::init  FOR1D loop: 162ms
--------------------------------------------
rns_double::init  Arns = _crt_in x A_beta^T or Arns =  A_beta x _crt_in^T : 406ms
--------------------------------------------
rns_double::init  RNS freduce : 296ms
==========================================
Pointwise fgemm : 0.452811 (12) moduli 
==========================================

Method 2

FGEMM_MP: InfNorm compute bound on the output 118ms
--------------------------------------------
rns_double::init  FOR1D loop: 163ms
--------------------------------------------
rns_double::init  Arns = _crt_in x A_beta^T or Arns =  A_beta x _crt_in^T : 430ms
--------------------------------------------
rns_double::init  RNS freduce : 432ms
==========================================
Pointwise fgemm : 0.457114 (12) moduli 
==========================================

Method 3

FGEMM_MP: InfNorm compute bound on the output 84ms
--------------------------------------------
rns_double::init  FOR1D loop: 139ms
--------------------------------------------
rns_double::init  Arns = _crt_in x A_beta^T or Arns =  A_beta x _crt_in^T : 382ms
--------------------------------------------
rns_double::init  RNS freduce : 306ms
==========================================
Pointwise fgemm : 0.466784 (12) moduli 
==========================================

Using ParallelParallel<Recursive,TwoDAdaptive> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T with further parallelized InfNorm() and further parallelized freduce

Time	method	m(dimension m of the matrix)	k(dimension k of the matrix)	i(number of repetitions)
1.57949	1	4000	4000	10
1.56238	2	4000	4000	10
1.48942	3	4000	4000	10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Updated) Benchmark of fgemv for Givaro::Integer in the field of RNS on a multicore server

Benchmark using OpenMP

OMP_NUM_THREADS=1 for 4000x4000 and 100 bits

OMP_NUM_THREADS=8 for 4000x4000 and 100 bits

Using Parallel<Block,Threads> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T

Using Parallel<Recursive,ThreeDAdaptive>for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T

Using Parallel<Recursive,TWoD>for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T

Using ParallelParallel<Recursive,TwoDAdaptive>for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T

Using Parallel<Recursive,ThreeDInPlace> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T

Using Parallel<Recursive,ThreeDInPlace> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T

Detailed timming Using Parallel<Recursive,TwoDAdaptive> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T

Method 1

Method 2

Method 3

Detailed timming about further parallelized InfNorm() Using Parallel<Recursive,TwoDAdaptive> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T

Method 1

Method 2

Method 3

Using ParallelParallel<Recursive,TwoDAdaptive> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T with further parallelized InfNorm()

Detailed timming about further parallelized InfNorm() and further parallelized freduce Using Parallel<Recursive,TwoDAdaptive> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T

Method 1

Method 2

Method 3

Using ParallelParallel<Recursive,TwoDAdaptive> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T with further parallelized InfNorm() and further parallelized freduce

Clone this wiki locally