-
Notifications
You must be signed in to change notification settings - Fork 23
(Updated) Benchmark of fgemv for Givaro::Integer in the field of RNS on a multicore server
ZHG2017 edited this page Jun 25, 2019
·
8 revisions
Note p = (0 for sequential, 1 for <RNSModulus, grain>, 2 for <RNSModulus, threads>, 3 for ParSeqHelper::Compose<ParSeqHelper::Parallel<FFLAS::CuttingStrategy::RNSModulus, grain>, ParSeqHelper::Parallel<rec, StrategyParameter::TwoDAdaptive>>)
Time | method | m(dimension m of the matrix) | k(dimension k of the matrix) | i(number of repetitions) |
---|---|---|---|---|
3.76299 | 0 | 4000 | 4000 | 3 |
FGEMM_MP: InfNorm compute bound on the output 166ms
--------------------------------------------
rns_double::init FOR1D loop: 658ms
--------------------------------------------
rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T : 1156ms
--------------------------------------------
rns_double::init RNS freduce : 281ms
==========================================
Pointwise fgemm : 1.4144 (12) moduli
==========================================
Using Parallel<Block,Threads> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T
Time | method | m(dimension m of the matrix) | k(dimension k of the matrix) | i(number of repetitions) |
---|---|---|---|---|
1.75352 | 1 | 4000 | 4000 | 3 |
1.82197 | 2 | 4000 | 4000 | 3 |
1.84784 | 3 | 4000 | 4000 | 3 |
Using Parallel<Recursive,ThreeDAdaptive>for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T
Time | method | m(dimension m of the matrix) | k(dimension k of the matrix) | i(number of repetitions) |
---|---|---|---|---|
2.5914 | 1 | 4000 | 4000 | 3 |
2.61315 | 2 | 4000 | 4000 | 3 |
2.59683 | 3 | 4000 | 4000 | 3 |
Using Parallel<Recursive,TWoD>for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T
Time | method | m(dimension m of the matrix) | k(dimension k of the matrix) | i(number of repetitions) |
---|---|---|---|---|
1.98691 | 1 | 4000 | 4000 | 3 |
1.75898 | 2 | 4000 | 4000 | 3 |
1.74864 | 3 | 4000 | 4000 | 3 |
Using ParallelParallel<Recursive,TwoDAdaptive>for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T
Time | method | m(dimension m of the matrix) | k(dimension k of the matrix) | i(number of repetitions) |
---|---|---|---|---|
1.68973 | 1 | 4000 | 4000 | 3 |
1.85503 | 2 | 4000 | 4000 | 3 |
1.66372 | 3 | 4000 | 4000 | 3 |
Using Parallel<Recursive,ThreeDInPlace> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T
Time | method | m(dimension m of the matrix) | k(dimension k of the matrix) | i(number of repetitions) |
---|---|---|---|---|
2.4082 | 1 | 4000 | 4000 | 3 |
2.58037 | 2 | 4000 | 4000 | 3 |
2.53939 | 3 | 4000 | 4000 | 3 |
Using Parallel<Recursive,ThreeDInPlace> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T
Time | method | m(dimension m of the matrix) | k(dimension k of the matrix) | i(number of repetitions) |
---|---|---|---|---|
2.66199 | 1 | 4000 | 4000 | 3 |
2.65415 | 2 | 4000 | 4000 | 3 |
2.69702 | 3 | 4000 | 4000 | 3 |
Detailed timming Using Parallel<Recursive,TwoDAdaptive> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T
FGEMM_MP: InfNorm compute bound on the output 182ms
--------------------------------------------
rns_double::init FOR1D loop: 128ms
--------------------------------------------
rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T : 434ms
--------------------------------------------
rns_double::init RNS freduce : 484ms
==========================================
Pointwise fgemm : 0.487823 (12) moduli
==========================================
FGEMM_MP: InfNorm compute bound on the output 213ms
--------------------------------------------
rns_double::init FOR1D loop: 151ms
--------------------------------------------
rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T : 423ms
--------------------------------------------
rns_double::init RNS freduce : 333ms
==========================================
Pointwise fgemm : 0.423272 (12) moduli
==========================================
FGEMM_MP: InfNorm compute bound on the output 197ms
--------------------------------------------
rns_double::init FOR1D loop: 157ms
--------------------------------------------
rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T : 379ms
--------------------------------------------
rns_double::init RNS freduce : 489ms
==========================================
Pointwise fgemm : 0.47559 (12) moduli
==========================================
Detailed timming about further parallelized InfNorm() Using Parallel<Recursive,TwoDAdaptive> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T
FGEMM_MP: InfNorm compute bound on the output 141ms
--------------------------------------------
rns_double::init FOR1D loop: 165ms
--------------------------------------------
rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T : 325ms
--------------------------------------------
rns_double::init RNS freduce : 363ms
==========================================
Pointwise fgemm : 0.481474 (12) moduli
==========================================
FGEMM_MP: InfNorm compute bound on the output 96ms
--------------------------------------------
rns_double::init FOR1D loop: 176ms
--------------------------------------------
rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T : 479ms
--------------------------------------------
rns_double::init RNS freduce : 548ms
==========================================
Pointwise fgemm : 0.47297 (12) moduli
==========================================
FGEMM_MP: InfNorm compute bound on the output 82ms
--------------------------------------------
rns_double::init FOR1D loop: 170ms
--------------------------------------------
rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T : 456ms
--------------------------------------------
rns_double::init RNS freduce : 450ms
==========================================
Pointwise fgemm : 0.480714 (12) moduli
==========================================
Using ParallelParallel<Recursive,TwoDAdaptive> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T with further parallelized InfNorm()
Time | method | m(dimension m of the matrix) | k(dimension k of the matrix) | i(number of repetitions) |
---|---|---|---|---|
1.71206 | 1 | 4000 | 4000 | 10 |
1.59091 | 2 | 4000 | 4000 | 10 |
1.63789 | 3 | 4000 | 4000 | 10 |
Detailed timming about further parallelized InfNorm() and further parallelized freduce Using Parallel<Recursive,TwoDAdaptive> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T
FGEMM_MP: InfNorm compute bound on the output 112ms
--------------------------------------------
rns_double::init FOR1D loop: 162ms
--------------------------------------------
rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T : 406ms
--------------------------------------------
rns_double::init RNS freduce : 296ms
==========================================
Pointwise fgemm : 0.452811 (12) moduli
==========================================
FGEMM_MP: InfNorm compute bound on the output 118ms
--------------------------------------------
rns_double::init FOR1D loop: 163ms
--------------------------------------------
rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T : 430ms
--------------------------------------------
rns_double::init RNS freduce : 432ms
==========================================
Pointwise fgemm : 0.457114 (12) moduli
==========================================
FGEMM_MP: InfNorm compute bound on the output 84ms
--------------------------------------------
rns_double::init FOR1D loop: 139ms
--------------------------------------------
rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T : 382ms
--------------------------------------------
rns_double::init RNS freduce : 306ms
==========================================
Pointwise fgemm : 0.466784 (12) moduli
==========================================
Using ParallelParallel<Recursive,TwoDAdaptive> for rns_double::init Arns = _crt_in x A_beta^T or Arns = A_beta x _crt_in^T with further parallelized InfNorm() and further parallelized freduce
Time | method | m(dimension m of the matrix) | k(dimension k of the matrix) | i(number of repetitions) |
---|---|---|---|---|
1.57949 | 1 | 4000 | 4000 | 10 |
1.56238 | 2 | 4000 | 4000 | 10 |
1.48942 | 3 | 4000 | 4000 | 10 |