- Test types
- Factor (OpenBLAS/Cuda 11.5)
- Factor (Intel-MKL)
- Solve (OpenBLAS/Cuda 11.5, nRHS = 1, 2, 10)
- Solve (Intel-MKL, nRHS = 1, 2, 10)
- Analysis (OpenBLAS/Cuda 11.5)
The tests have been executed on a ThinkStation P720 equipped with
- RAM: 128Gb
- CPU: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
- GPU: Quadro RTX 5000
Random sparse matrices are generating from structured random graphs (one parameter per vertex, there is a non-empty block for each edge connecting two vertices - similarly to the block-sparse matrix coming from a factor graph). The graph types we consider are:
- FLAT: any two vertices are connected with probability 'fill'
- FLAT+SCHUR: like flat, plus a set of 'schursize' type parameters that can be eliminated independently is added with 'schurfill' probability of having a random connection to any existing vertices.
- GRID: a 2-dimensional grid is generated, vertices are connected up to distance 'conn' with probability 'fill'
- MERI: a certain number of "tracks" is generated, they are a sequence of len 'size' of parameters, each having a probability 'fill' of being connected to the "neighboring" parameters up to distance 'band' inside the tracks. They are joined into a non-trivial topology as follows: 'n' tracks connect two poles, and each pole has an additional set of 'hairs' track departing from it. For each topology + settings 5 problems are generated at random and tested.
Command: cmake --build build -v -- -j16 && build/bench -B 1_CHOLMOD -O factor
Problem type: 10_FLAT_size=1000_fill=0.1_bsize=3
.....(5/5, done!)
Operation: factor
- 1_CHOLMOD (basis for comparison):
0.436s, 0.429s, 0.419s, 0.432s, 0.449s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.368s (-15.56%), 0.378s (-11.82%), 0.364s (-13.27%), 0.355s (-17.75%), 0.379s (-15.63%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
63.7ms (-85.39%), 53.3ms (-87.58%), 52.3ms (-87.53%), 52.1ms (-87.93%), 54.3ms (-87.91%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
50.0ms (-88.53%), 48.8ms (-88.64%), 46.7ms (-88.85%), 41.3ms (-90.44%), 47.7ms (-89.40%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
40.0ms (-90.82%), 40.1ms (-90.66%), 39.9ms (-90.48%), 39.3ms (-90.91%), 42.0ms (-90.65%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
42.2ms (-90.32%), 43.1ms (-89.95%), 44.5ms (-89.39%), 43.4ms (-89.94%), 47.3ms (-89.47%)
Problem type: 11_FLAT_size=4000_fill=0.01_bsize=3
.....(5/5, done!)
Operation: factor
- 1_CHOLMOD (basis for comparison):
13.320s, 13.072s, 13.133s, 13.098s, 13.076s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
12.496s (-6.19%), 12.319s (-5.76%), 12.433s (-5.34%), 12.298s (-6.11%), 12.615s (-3.52%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
1.352s (-89.85%), 1.351s (-89.66%), 1.377s (-89.51%), 1.370s (-89.54%), 1.388s (-89.38%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
1.455s (-89.08%), 1.432s (-89.05%), 1.475s (-88.77%), 1.442s (-88.99%), 1.504s (-88.50%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
1.599s (-87.99%), 1.575s (-87.95%), 1.617s (-87.69%), 1.580s (-87.94%), 1.656s (-87.33%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
1.680s (-87.39%), 1.651s (-87.37%), 1.695s (-87.10%), 1.656s (-87.36%), 1.754s (-86.59%)
Problem type: 12_FLAT_size=2000_fill=0.03_bsize=2-5
.....(5/5, done!)
Operation: factor
- 1_CHOLMOD (basis for comparison):
3.416s, 3.457s, 3.370s, 3.619s, 3.379s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
3.086s (-9.65%), 3.154s (-8.78%), 3.098s (-8.08%), 3.210s (-11.29%), 3.147s (-6.89%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
0.395s (-88.44%), 0.421s (-87.82%), 0.406s (-87.97%), 0.411s (-88.65%), 0.419s (-87.60%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
0.365s (-89.33%), 0.369s (-89.33%), 0.370s (-89.01%), 0.383s (-89.41%), 0.377s (-88.84%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
0.387s (-88.67%), 0.394s (-88.60%), 0.394s (-88.30%), 0.410s (-88.66%), 0.406s (-88.00%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
0.421s (-87.66%), 0.432s (-87.52%), 0.433s (-87.14%), 0.448s (-87.63%), 0.448s (-86.75%)
Problem type: 20_FLAT+SCHUR_size=1000_fill=0.1_bsize=3_schursize=50000_schurfill=0.02
.....(5/5, done!)
Operation: factor
- 1_CHOLMOD (basis for comparison):
1.913s, 1.908s, 1.909s, 1.907s, 1.920s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.549s (-71.30%), 0.547s (-71.33%), 0.550s (-71.21%), 0.545s (-71.40%), 0.550s (-71.37%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
0.117s (-93.87%), 0.117s (-93.86%), 0.117s (-93.86%), 0.117s (-93.85%), 0.115s (-94.00%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
99.5ms (-94.80%), 97.9ms (-94.87%), 98.3ms (-94.85%), 98.8ms (-94.82%), 0.100s (-94.77%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
98.9ms (-94.83%), 99.2ms (-94.80%), 98.7ms (-94.83%), 99.3ms (-94.79%), 98.6ms (-94.86%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
0.103s (-94.63%), 0.102s (-94.63%), 0.102s (-94.65%), 0.102s (-94.64%), 0.102s (-94.69%)
Problem type: 21_FLAT+SCHUR_size=1000_fill=0.1_bsize=3_schursize=5000_schurfill=0.2
.....(5/5, done!)
Operation: factor
- 1_CHOLMOD (basis for comparison):
0.426s, 0.427s, 0.420s, 0.429s, 0.419s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.360s (-15.57%), 0.359s (-15.91%), 0.352s (-16.27%), 0.367s (-14.56%), 0.354s (-15.62%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
52.5ms (-87.68%), 53.6ms (-87.46%), 52.7ms (-87.47%), 56.3ms (-86.89%), 53.5ms (-87.24%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
48.2ms (-88.70%), 48.7ms (-88.60%), 44.1ms (-89.52%), 49.6ms (-88.44%), 46.3ms (-88.96%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
39.1ms (-90.83%), 39.6ms (-90.72%), 38.8ms (-90.77%), 41.1ms (-90.42%), 39.1ms (-90.66%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
42.0ms (-90.14%), 43.5ms (-89.81%), 41.6ms (-90.10%), 44.5ms (-89.62%), 41.0ms (-90.21%)
Problem type: 30_GRID_size=100x100_fill=1.0_conn=2_bsize=3
.....(5/5, done!)
Operation: factor
- 1_CHOLMOD (basis for comparison):
0.390s, 0.397s, 0.393s, 0.393s, 0.393s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.369s (-5.55%), 0.383s (-3.60%), 0.376s (-4.50%), 0.366s (-6.94%), 0.367s (-6.60%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
91.2ms (-76.64%), 91.3ms (-77.00%), 91.3ms (-76.79%), 92.1ms (-76.56%), 91.2ms (-76.81%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
46.1ms (-88.18%), 50.5ms (-87.29%), 52.2ms (-86.72%), 46.1ms (-88.27%), 52.0ms (-86.79%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
41.0ms (-89.50%), 38.0ms (-90.42%), 40.3ms (-89.75%), 40.4ms (-89.72%), 38.1ms (-90.31%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
38.0ms (-90.26%), 37.5ms (-90.56%), 38.2ms (-90.28%), 37.8ms (-90.38%), 37.1ms (-90.58%)
Problem type: 31_GRID_size=150x150_fill=1.0_conn=2_bsize=3
.....(5/5, done!)
Operation: factor
- 1_CHOLMOD (basis for comparison):
1.960s, 1.987s, 1.991s, 2.002s, 2.040s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
1.106s (-43.57%), 1.118s (-43.75%), 1.089s (-45.30%), 1.094s (-45.37%), 1.097s (-46.24%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
0.242s (-87.67%), 0.234s (-88.23%), 0.243s (-87.82%), 0.229s (-88.58%), 0.224s (-89.01%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
0.128s (-93.46%), 0.130s (-93.46%), 0.129s (-93.54%), 0.130s (-93.52%), 0.130s (-93.64%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
0.115s (-94.14%), 0.115s (-94.22%), 0.115s (-94.23%), 0.115s (-94.25%), 0.115s (-94.37%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
0.115s (-94.13%), 0.114s (-94.24%), 0.115s (-94.22%), 0.115s (-94.27%), 0.115s (-94.38%)
Problem type: 32_GRID_size=200x200_fill=0.25_conn=2_bsize=3
.....(5/5, done!)
Operation: factor
- 1_CHOLMOD (basis for comparison):
10.625s, 8.614s, 10.769s, 9.346s, 10.025s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
2.821s (-73.45%), 2.679s (-68.90%), 2.791s (-74.09%), 2.743s (-70.65%), 2.681s (-73.25%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
0.476s (-95.52%), 0.465s (-94.60%), 0.475s (-95.59%), 0.484s (-94.83%), 0.471s (-95.30%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
0.299s (-97.18%), 0.293s (-96.60%), 0.304s (-97.18%), 0.295s (-96.85%), 0.289s (-97.11%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
0.277s (-97.39%), 0.272s (-96.84%), 0.282s (-97.39%), 0.273s (-97.08%), 0.267s (-97.34%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
0.286s (-97.31%), 0.278s (-96.77%), 0.290s (-97.30%), 0.281s (-96.99%), 0.274s (-97.27%)
Problem type: 33_GRID_size=200x200_fill=0.05_conn=3_bsize=3
.....(5/5, done!)
Operation: factor
- 1_CHOLMOD (basis for comparison):
0.964s, 1.094s, 1.058s, 0.960s, 1.088s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.476s (-50.58%), 0.470s (-57.00%), 0.537s (-49.23%), 0.469s (-51.14%), 0.548s (-49.69%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
0.148s (-84.68%), 0.138s (-87.39%), 0.158s (-85.09%), 0.142s (-85.22%), 0.157s (-85.56%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
63.2ms (-93.44%), 63.3ms (-94.21%), 70.8ms (-93.31%), 62.7ms (-93.47%), 69.3ms (-93.63%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
52.3ms (-94.57%), 53.4ms (-95.12%), 61.1ms (-94.22%), 54.1ms (-94.36%), 60.9ms (-94.41%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
47.7ms (-95.05%), 47.7ms (-95.64%), 55.7ms (-94.74%), 48.9ms (-94.90%), 55.5ms (-94.90%)
Problem type: 40_MERI_size=1500_n=4_hairlen=600_hairs=2_band=120_fill=0.1_bsize=3
.....(5/5, done!)
Operation: factor
- 1_CHOLMOD (basis for comparison):
0.654s, 0.639s, 0.644s, 0.660s, 0.634s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.537s (-17.91%), 0.538s (-15.77%), 0.525s (-18.36%), 0.543s (-17.85%), 0.517s (-18.43%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
0.189s (-71.16%), 0.179s (-71.91%), 0.193s (-70.08%), 0.187s (-71.63%), 0.194s (-69.32%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
80.5ms (-87.69%), 78.6ms (-87.69%), 78.6ms (-87.79%), 81.3ms (-87.68%), 78.6ms (-87.60%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
64.8ms (-90.10%), 61.7ms (-90.33%), 61.7ms (-90.41%), 64.0ms (-90.31%), 62.0ms (-90.22%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
53.6ms (-91.81%), 51.9ms (-91.88%), 53.6ms (-91.66%), 55.6ms (-91.58%), 53.0ms (-91.64%)
Problem type: 41_MERI_size=1500_n=7_hairlen=600_hairs=2_band=120_fill=0.1_bsize=3
.....(5/5, done!)
Operation: factor
- 1_CHOLMOD (basis for comparison):
1.009s, 1.014s, 1.001s, 1.015s, 1.004s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.821s (-18.65%), 0.840s (-17.13%), 0.813s (-18.77%), 0.837s (-17.57%), 0.823s (-17.97%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
0.274s (-72.81%), 0.289s (-71.45%), 0.284s (-71.68%), 0.289s (-71.50%), 0.290s (-71.06%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
0.128s (-87.32%), 0.130s (-87.16%), 0.131s (-86.88%), 0.130s (-87.20%), 0.131s (-86.90%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
96.3ms (-90.46%), 95.5ms (-90.58%), 95.6ms (-90.46%), 98.4ms (-90.30%), 95.4ms (-90.50%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
82.1ms (-91.87%), 82.1ms (-91.90%), 80.8ms (-91.93%), 85.3ms (-91.59%), 81.9ms (-91.84%)
Command: cmake --build build -v -- -j16 && build/bench -B 1_CHOLMOD -O factor -S ^2
Problem type: 10_FLAT_size=1000_fill=0.1_bsize=3
.....(5/5, done!)
Operation: factor
- 1_CHOLMOD (basis for comparison):
0.200s, 67.0ms, 63.1ms, 62.0ms, 63.6ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
66.1ms (-67.00%), 50.2ms (-24.97%), 66.4ms (+5.24%), 49.3ms (-20.54%), 53.3ms (-16.10%)
Problem type: 11_FLAT_size=4000_fill=0.01_bsize=3
.....(5/5, done!)
Operation: factor
- 1_CHOLMOD (basis for comparison):
1.311s, 1.288s, 1.276s, 1.243s, 1.311s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
1.575s (+20.19%), 1.588s (+23.26%), 1.560s (+22.21%), 1.243s (-0.04%), 1.600s (+22.04%)
Problem type: 12_FLAT_size=2000_fill=0.03_bsize=2-5
.....(5/5, done!)
Operation: factor
- 1_CHOLMOD (basis for comparison):
0.411s, 0.414s, 0.392s, 0.410s, 0.395s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.471s (+14.75%), 0.475s (+14.82%), 0.462s (+17.96%), 0.501s (+22.44%), 0.391s (-1.11%)
Problem type: 20_FLAT+SCHUR_size=1000_fill=0.1_bsize=3_schursize=50000_schurfill=0.02
.....(5/5, done!)
Operation: factor
- 1_CHOLMOD (basis for comparison):
1.709s, 1.816s, 1.744s, 1.729s, 1.669s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.209s (-87.75%), 0.207s (-88.58%), 0.232s (-86.72%), 0.205s (-88.12%), 0.204s (-87.80%)
Problem type: 21_FLAT+SCHUR_size=1000_fill=0.1_bsize=3_schursize=5000_schurfill=0.2
.....(5/5, done!)
Operation: factor
- 1_CHOLMOD (basis for comparison):
82.6ms, 76.9ms, 74.9ms, 76.0ms, 77.8ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
46.4ms (-43.85%), 46.8ms (-39.22%), 49.4ms (-33.99%), 50.2ms (-33.91%), 47.8ms (-38.52%)
Problem type: 30_GRID_size=100x100_fill=1.0_conn=2_bsize=3
.....(5/5, done!)
Operation: factor
- 1_CHOLMOD (basis for comparison):
0.237s, 0.235s, 0.244s, 0.235s, 0.238s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.195s (-17.43%), 0.204s (-13.13%), 0.185s (-24.11%), 0.203s (-13.29%), 0.203s (-14.74%)
Problem type: 31_GRID_size=150x150_fill=1.0_conn=2_bsize=3
.....(5/5, done!)
Operation: factor
- 1_CHOLMOD (basis for comparison):
0.724s, 0.736s, 0.735s, 0.722s, 0.786s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.551s (-23.85%), 0.551s (-25.18%), 0.565s (-23.11%), 0.558s (-22.70%), 0.585s (-25.58%)
Problem type: 32_GRID_size=200x200_fill=0.25_conn=2_bsize=3
.....(5/5, done!)
Operation: factor
- 1_CHOLMOD (basis for comparison):
1.798s, 1.746s, 2.060s, 1.839s, 1.908s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
1.247s (-30.65%), 1.238s (-29.06%), 1.185s (-42.46%), 1.260s (-31.49%), 1.296s (-32.05%)
Problem type: 33_GRID_size=200x200_fill=0.05_conn=3_bsize=3
.....(5/5, done!)
Operation: factor
- 1_CHOLMOD (basis for comparison):
0.373s, 0.449s, 0.381s, 0.366s, 0.380s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.180s (-51.81%), 0.165s (-63.25%), 0.177s (-53.49%), 0.156s (-57.25%), 0.179s (-52.80%)
Problem type: 40_MERI_size=1500_n=4_hairlen=600_hairs=2_band=120_fill=0.1_bsize=3
.....(5/5, done!)
Operation: factor
- 1_CHOLMOD (basis for comparison):
0.335s, 0.364s, 0.404s, 0.332s, 0.340s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.148s (-55.66%), 0.145s (-60.21%), 0.160s (-60.48%), 0.149s (-55.20%), 0.147s (-56.63%)
Problem type: 41_MERI_size=1500_n=7_hairlen=600_hairs=2_band=120_fill=0.1_bsize=3
.....(5/5, done!)
Operation: factor
- 1_CHOLMOD (basis for comparison):
0.604s, 0.573s, 0.549s, 0.531s, 0.531s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.236s (-60.98%), 0.221s (-61.45%), 0.214s (-60.95%), 0.215s (-59.61%), 0.213s (-59.81%)
Command: cmake --build build -v -- -j16 && build/bench -B 1_CHOLMOD -O solve-1,solve-2,solve-10
Problem type: 10_FLAT_size=1000_fill=0.1_bsize=3
.....(5/5, done!)
Operation: solve-1
- 1_CHOLMOD (basis for comparison):
8.3ms, 8.0ms, 7.4ms, 7.3ms, 7.4ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
18.4ms (+122.51%), 19.8ms (+147.03%), 16.3ms (+120.15%), 15.8ms (+115.65%), 15.4ms (+107.82%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
2.6ms (-68.77%), 2.1ms (-73.59%), 2.3ms (-69.22%), 2.1ms (-71.02%), 2.3ms (-68.80%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
4.4ms (-47.34%), 4.1ms (-49.31%), 3.9ms (-47.36%), 4.0ms (-45.27%), 4.2ms (-42.92%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
2.3ms (-72.11%), 2.1ms (-73.15%), 2.1ms (-72.20%), 2.0ms (-73.39%), 2.3ms (-69.26%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
1.3ms (-84.72%), 1.2ms (-85.19%), 1.1ms (-84.74%), 1.1ms (-84.72%), 1.3ms (-82.95%)
Operation: solve-10
- 1_CHOLMOD (basis for comparison):
23.9ms, 25.2ms, 23.7ms, 24.1ms, 25.3ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
25.7ms (+7.33%), 24.3ms (-3.46%), 24.1ms (+1.43%), 24.0ms (-0.58%), 24.7ms (-2.43%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
15.4ms (-35.63%), 11.7ms (-53.39%), 13.7ms (-42.20%), 11.9ms (-50.58%), 12.8ms (-49.56%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
14.5ms (-39.37%), 14.0ms (-44.24%), 13.6ms (-42.73%), 13.2ms (-45.11%), 15.5ms (-38.61%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
8.6ms (-63.90%), 8.3ms (-67.12%), 7.9ms (-66.74%), 7.8ms (-67.87%), 9.0ms (-64.63%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
6.0ms (-74.87%), 5.7ms (-77.35%), 5.5ms (-76.78%), 5.4ms (-77.62%), 6.1ms (-75.94%)
Operation: solve-2
- 1_CHOLMOD (basis for comparison):
18.7ms, 19.9ms, 18.8ms, 18.5ms, 18.6ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
19.7ms (+5.34%), 17.8ms (-10.59%), 17.3ms (-7.92%), 17.4ms (-5.92%), 17.2ms (-7.41%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
8.1ms (-56.67%), 6.5ms (-67.39%), 5.8ms (-69.01%), 5.7ms (-69.04%), 6.9ms (-62.95%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
14.4ms (-23.00%), 14.2ms (-28.62%), 13.2ms (-30.01%), 12.6ms (-31.82%), 15.3ms (-17.84%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
8.5ms (-54.39%), 8.2ms (-58.93%), 7.6ms (-59.60%), 7.3ms (-60.18%), 8.8ms (-52.66%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
5.8ms (-69.06%), 5.6ms (-72.14%), 5.2ms (-72.12%), 5.1ms (-72.13%), 6.0ms (-67.57%)
Problem type: 11_FLAT_size=4000_fill=0.01_bsize=3
.....(5/5, done!)
Operation: solve-1
- 1_CHOLMOD (basis for comparison):
94.1ms, 78.0ms, 88.7ms, 87.4ms, 78.3ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.201s (+113.82%), 0.200s (+155.97%), 0.204s (+130.15%), 0.204s (+133.57%), 0.200s (+155.76%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
11.4ms (-87.94%), 11.3ms (-85.52%), 11.9ms (-86.63%), 11.5ms (-86.89%), 11.1ms (-85.80%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
15.4ms (-83.63%), 16.3ms (-79.06%), 16.2ms (-81.75%), 16.0ms (-81.72%), 15.4ms (-80.29%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
9.2ms (-90.21%), 9.9ms (-87.30%), 9.1ms (-89.69%), 9.1ms (-89.65%), 9.2ms (-88.24%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
6.0ms (-93.62%), 5.9ms (-92.42%), 5.8ms (-93.41%), 5.8ms (-93.32%), 6.0ms (-92.32%)
Operation: solve-10
- 1_CHOLMOD (basis for comparison):
0.306s, 0.274s, 0.308s, 0.301s, 0.279s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.295s (-3.69%), 0.294s (+7.35%), 0.298s (-3.28%), 0.296s (-1.46%), 0.292s (+4.48%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
80.1ms (-73.84%), 80.3ms (-70.68%), 78.1ms (-74.65%), 78.1ms (-74.03%), 76.7ms (-72.54%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
0.112s (-63.40%), 0.115s (-57.97%), 0.116s (-62.25%), 0.115s (-61.67%), 0.113s (-59.62%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
77.9ms (-74.56%), 82.0ms (-70.03%), 78.0ms (-74.68%), 76.5ms (-74.56%), 77.1ms (-72.41%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
57.8ms (-81.13%), 56.6ms (-79.33%), 56.9ms (-81.54%), 56.1ms (-81.34%), 59.1ms (-78.85%)
Operation: solve-2
- 1_CHOLMOD (basis for comparison):
0.252s, 0.224s, 0.257s, 0.249s, 0.228s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.222s (-11.91%), 0.222s (-0.56%), 0.225s (-12.29%), 0.225s (-9.59%), 0.223s (-2.16%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
53.5ms (-78.81%), 53.1ms (-76.27%), 52.2ms (-79.69%), 52.3ms (-78.99%), 51.2ms (-77.50%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
0.109s (-56.86%), 0.112s (-50.05%), 0.114s (-55.69%), 0.112s (-54.92%), 0.110s (-51.68%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
76.7ms (-69.62%), 79.0ms (-64.66%), 74.2ms (-71.12%), 73.6ms (-70.47%), 76.2ms (-66.52%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
54.0ms (-78.61%), 53.3ms (-76.18%), 53.1ms (-79.33%), 52.5ms (-78.93%), 54.0ms (-76.27%)
Problem type: 12_FLAT_size=2000_fill=0.03_bsize=2-5
.....(5/5, done!)
Operation: solve-1
- 1_CHOLMOD (basis for comparison):
37.9ms, 37.4ms, 37.9ms, 37.7ms, 38.8ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
79.0ms (+108.63%), 80.6ms (+115.66%), 80.6ms (+112.46%), 82.9ms (+119.87%), 78.5ms (+102.27%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
7.6ms (-80.05%), 7.9ms (-78.83%), 7.3ms (-80.86%), 7.4ms (-80.24%), 7.6ms (-80.47%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
9.3ms (-75.52%), 9.3ms (-75.24%), 9.5ms (-74.94%), 9.4ms (-75.19%), 9.4ms (-75.76%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
5.2ms (-86.38%), 5.1ms (-86.24%), 5.2ms (-86.30%), 5.2ms (-86.09%), 5.4ms (-85.97%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
3.1ms (-91.93%), 3.2ms (-91.57%), 3.1ms (-91.85%), 3.1ms (-91.68%), 3.2ms (-91.81%)
Operation: solve-10
- 1_CHOLMOD (basis for comparison):
0.115s, 0.107s, 0.118s, 0.120s, 0.121s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.117s (+1.93%), 0.121s (+13.25%), 0.120s (+2.14%), 0.122s (+2.27%), 0.124s (+2.48%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
46.4ms (-59.59%), 49.2ms (-53.99%), 46.8ms (-60.23%), 49.2ms (-58.85%), 48.6ms (-59.98%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
49.9ms (-56.59%), 50.1ms (-53.19%), 52.6ms (-55.26%), 51.3ms (-57.09%), 55.4ms (-54.37%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
32.0ms (-72.14%), 31.8ms (-70.26%), 33.3ms (-71.71%), 34.1ms (-71.46%), 35.4ms (-70.81%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
23.6ms (-79.43%), 23.9ms (-77.68%), 24.2ms (-79.44%), 24.4ms (-79.59%), 26.1ms (-78.52%)
Operation: solve-2
- 1_CHOLMOD (basis for comparison):
95.0ms, 84.2ms, 94.0ms, 97.6ms, 98.8ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
88.5ms (-6.89%), 91.6ms (+8.82%), 89.4ms (-4.83%), 92.1ms (-5.55%), 88.0ms (-10.98%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
25.0ms (-73.67%), 25.3ms (-69.90%), 25.1ms (-73.32%), 25.6ms (-73.75%), 25.5ms (-74.18%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
46.9ms (-50.68%), 47.1ms (-44.07%), 50.0ms (-46.78%), 47.9ms (-50.87%), 51.0ms (-48.45%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
30.0ms (-68.46%), 30.2ms (-64.19%), 31.4ms (-66.59%), 33.0ms (-66.16%), 33.1ms (-66.52%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
22.1ms (-76.79%), 23.4ms (-72.17%), 22.7ms (-75.84%), 22.7ms (-76.76%), 24.7ms (-74.97%)
Problem type: 20_FLAT+SCHUR_size=1000_fill=0.1_bsize=3_schursize=50000_schurfill=0.02
.....(5/5, done!)
Operation: solve-1
- 1_CHOLMOD (basis for comparison):
54.2ms, 56.0ms, 54.1ms, 54.5ms, 54.1ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
50.8ms (-6.34%), 49.3ms (-12.03%), 52.0ms (-3.91%), 49.6ms (-9.03%), 51.0ms (-5.71%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
4.8ms (-91.06%), 4.8ms (-91.40%), 5.2ms (-90.35%), 4.7ms (-91.41%), 5.6ms (-89.68%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
6.1ms (-88.84%), 5.8ms (-89.63%), 5.8ms (-89.25%), 6.3ms (-88.38%), 6.1ms (-88.80%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
4.5ms (-91.79%), 4.6ms (-91.82%), 4.6ms (-91.59%), 4.5ms (-91.82%), 4.5ms (-91.65%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
4.6ms (-91.43%), 4.5ms (-91.91%), 4.9ms (-90.96%), 4.6ms (-91.51%), 4.5ms (-91.67%)
Operation: solve-10
- 1_CHOLMOD (basis for comparison):
0.252s, 0.261s, 0.251s, 0.255s, 0.247s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.114s (-54.67%), 0.109s (-58.46%), 0.115s (-54.00%), 0.112s (-56.00%), 0.112s (-54.58%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
33.1ms (-86.84%), 31.8ms (-87.83%), 37.2ms (-85.17%), 32.2ms (-87.37%), 33.5ms (-86.42%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
52.7ms (-79.09%), 47.7ms (-81.75%), 48.2ms (-80.77%), 50.8ms (-80.08%), 49.0ms (-80.14%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
93.1ms (-63.04%), 88.7ms (-66.04%), 88.5ms (-64.70%), 88.6ms (-65.24%), 88.9ms (-63.97%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
0.156s (-38.02%), 0.154s (-41.19%), 0.163s (-35.06%), 0.153s (-39.94%), 0.153s (-38.03%)
Operation: solve-2
- 1_CHOLMOD (basis for comparison):
0.119s, 0.120s, 0.116s, 0.116s, 0.120s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
58.4ms (-50.76%), 60.3ms (-49.74%), 60.1ms (-48.25%), 55.6ms (-52.17%), 59.2ms (-50.77%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
9.5ms (-92.00%), 10.1ms (-91.62%), 9.8ms (-91.58%), 9.1ms (-92.18%), 9.1ms (-92.41%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
13.4ms (-88.69%), 13.2ms (-89.03%), 13.1ms (-88.75%), 14.0ms (-87.92%), 13.4ms (-88.86%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
13.0ms (-89.06%), 12.2ms (-89.84%), 12.3ms (-89.41%), 12.5ms (-89.28%), 12.4ms (-89.68%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
17.5ms (-85.20%), 16.7ms (-86.04%), 18.0ms (-84.51%), 17.0ms (-85.43%), 16.9ms (-85.94%)
Problem type: 21_FLAT+SCHUR_size=1000_fill=0.1_bsize=3_schursize=5000_schurfill=0.2
.....(5/5, done!)
Operation: solve-1
- 1_CHOLMOD (basis for comparison):
8.7ms, 8.5ms, 8.8ms, 8.7ms, 8.6ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
16.2ms (+85.52%), 16.0ms (+88.12%), 16.3ms (+85.32%), 16.0ms (+83.63%), 16.5ms (+91.82%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
2.2ms (-74.75%), 2.1ms (-74.96%), 2.9ms (-67.00%), 2.4ms (-72.23%), 2.5ms (-71.06%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
3.9ms (-54.87%), 4.0ms (-52.74%), 4.3ms (-50.68%), 3.9ms (-55.16%), 4.0ms (-53.38%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
2.0ms (-76.79%), 2.1ms (-75.20%), 2.3ms (-73.99%), 2.2ms (-74.56%), 2.4ms (-72.10%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
1.3ms (-85.35%), 1.2ms (-85.64%), 1.3ms (-84.95%), 1.2ms (-85.80%), 1.2ms (-86.04%)
Operation: solve-10
- 1_CHOLMOD (basis for comparison):
27.5ms, 27.4ms, 28.1ms, 29.5ms, 29.5ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
25.9ms (-5.67%), 25.6ms (-6.29%), 25.6ms (-8.88%), 25.6ms (-13.20%), 26.0ms (-11.90%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
12.2ms (-55.79%), 12.3ms (-54.90%), 14.4ms (-48.72%), 13.7ms (-53.51%), 14.7ms (-49.98%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
14.0ms (-49.11%), 13.7ms (-49.86%), 15.3ms (-45.45%), 15.0ms (-49.26%), 14.6ms (-50.59%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
8.1ms (-70.41%), 8.4ms (-69.37%), 9.1ms (-67.68%), 8.8ms (-70.28%), 8.6ms (-70.86%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
5.8ms (-78.76%), 6.1ms (-77.54%), 6.4ms (-77.11%), 6.1ms (-79.45%), 6.0ms (-79.57%)
Operation: solve-2
- 1_CHOLMOD (basis for comparison):
19.6ms, 19.2ms, 20.8ms, 20.2ms, 20.4ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
18.3ms (-6.69%), 18.3ms (-4.85%), 18.0ms (-13.33%), 17.9ms (-11.43%), 18.2ms (-10.42%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
6.1ms (-68.85%), 6.3ms (-67.42%), 7.6ms (-63.33%), 6.6ms (-67.49%), 7.3ms (-64.27%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
13.0ms (-33.65%), 13.2ms (-31.36%), 14.6ms (-29.43%), 14.3ms (-29.31%), 14.2ms (-30.22%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
7.8ms (-60.44%), 7.9ms (-58.96%), 8.6ms (-58.54%), 8.5ms (-57.71%), 8.4ms (-58.88%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
5.5ms (-72.09%), 5.7ms (-70.27%), 5.9ms (-71.78%), 5.7ms (-71.80%), 5.7ms (-71.94%)
Problem type: 30_GRID_size=100x100_fill=1.0_conn=2_bsize=3
.....(5/5, done!)
Operation: solve-1
- 1_CHOLMOD (basis for comparison):
21.9ms, 22.2ms, 22.2ms, 21.8ms, 21.8ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
36.9ms (+68.57%), 36.5ms (+64.78%), 36.6ms (+65.30%), 37.4ms (+71.45%), 35.3ms (+61.59%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
31.3ms (+43.00%), 32.7ms (+47.60%), 29.1ms (+31.24%), 31.5ms (+44.34%), 31.2ms (+42.96%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
17.3ms (-20.96%), 17.3ms (-21.82%), 17.3ms (-21.91%), 17.3ms (-20.49%), 17.3ms (-20.49%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
8.8ms (-59.88%), 8.7ms (-60.95%), 8.7ms (-60.95%), 8.7ms (-60.25%), 8.7ms (-59.94%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
4.6ms (-79.21%), 4.5ms (-79.53%), 4.5ms (-79.71%), 4.5ms (-79.35%), 4.5ms (-79.34%)
Operation: solve-10
- 1_CHOLMOD (basis for comparison):
56.9ms, 58.5ms, 58.1ms, 56.4ms, 56.5ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
87.4ms (+53.68%), 87.1ms (+48.96%), 85.4ms (+46.82%), 87.1ms (+54.37%), 85.8ms (+51.86%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
0.118s (+108.19%), 0.120s (+105.89%), 0.119s (+104.31%), 0.119s (+111.38%), 0.119s (+110.68%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
47.5ms (-16.57%), 46.9ms (-19.81%), 46.7ms (-19.64%), 46.6ms (-17.31%), 46.6ms (-17.44%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
26.8ms (-52.89%), 26.4ms (-54.82%), 26.2ms (-54.88%), 26.1ms (-53.69%), 26.1ms (-53.81%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
20.7ms (-63.65%), 20.7ms (-64.54%), 20.5ms (-64.66%), 20.7ms (-63.27%), 20.5ms (-63.76%)
Operation: solve-2
- 1_CHOLMOD (basis for comparison):
30.4ms, 31.0ms, 30.8ms, 31.2ms, 31.0ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
43.6ms (+43.51%), 42.7ms (+37.61%), 43.2ms (+40.31%), 43.8ms (+40.48%), 43.7ms (+41.17%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
47.6ms (+56.43%), 45.1ms (+45.61%), 45.7ms (+48.30%), 45.0ms (+44.34%), 46.4ms (+49.72%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
30.3ms (-0.18%), 30.5ms (-1.62%), 30.1ms (-2.37%), 30.0ms (-3.87%), 31.0ms (+0.03%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
16.1ms (-46.90%), 16.0ms (-48.43%), 16.1ms (-47.75%), 16.3ms (-47.58%), 16.3ms (-47.20%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
9.8ms (-67.90%), 9.8ms (-68.43%), 9.8ms (-68.23%), 9.9ms (-68.24%), 9.9ms (-67.89%)
Problem type: 31_GRID_size=150x150_fill=1.0_conn=2_bsize=3
.....(5/5, done!)
Operation: solve-1
- 1_CHOLMOD (basis for comparison):
56.1ms, 0.192s, 57.1ms, 56.6ms, 56.5ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
85.4ms (+52.18%), 85.5ms (-55.40%), 82.1ms (+43.78%), 83.0ms (+46.64%), 83.6ms (+47.81%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
51.4ms (-8.45%), 52.2ms (-72.77%), 49.9ms (-12.62%), 50.1ms (-11.48%), 51.2ms (-9.36%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
31.7ms (-43.45%), 31.6ms (-83.50%), 34.7ms (-39.31%), 33.3ms (-41.12%), 33.2ms (-41.29%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
16.2ms (-71.15%), 16.9ms (-91.16%), 17.0ms (-70.24%), 16.9ms (-70.11%), 16.7ms (-70.42%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
9.0ms (-83.99%), 9.0ms (-95.32%), 9.0ms (-84.24%), 9.1ms (-83.93%), 9.2ms (-83.72%)
Operation: solve-10
- 1_CHOLMOD (basis for comparison):
0.155s, 0.139s, 0.155s, 0.151s, 0.152s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.197s (+27.08%), 0.191s (+37.67%), 0.190s (+22.55%), 0.194s (+28.55%), 0.202s (+33.16%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
0.226s (+45.83%), 0.221s (+58.93%), 0.219s (+41.60%), 0.221s (+46.60%), 0.223s (+47.11%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
95.1ms (-38.62%), 98.3ms (-29.20%), 99.1ms (-35.99%), 99.0ms (-34.46%), 97.9ms (-35.37%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
61.5ms (-60.26%), 64.5ms (-53.54%), 64.9ms (-58.06%), 63.8ms (-57.73%), 61.6ms (-59.35%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
64.9ms (-58.08%), 64.9ms (-53.29%), 65.0ms (-58.00%), 65.1ms (-56.91%), 65.0ms (-57.13%)
Operation: solve-2
- 1_CHOLMOD (basis for comparison):
81.7ms, 71.0ms, 81.5ms, 80.2ms, 81.0ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.104s (+27.83%), 0.102s (+43.24%), 0.102s (+24.98%), 0.102s (+27.09%), 0.101s (+24.58%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
82.7ms (+1.19%), 82.2ms (+15.75%), 83.7ms (+2.66%), 87.3ms (+8.89%), 82.5ms (+1.91%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
59.2ms (-27.53%), 58.9ms (-17.07%), 61.8ms (-24.15%), 61.8ms (-22.98%), 61.4ms (-24.12%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
33.3ms (-59.22%), 33.6ms (-52.68%), 35.0ms (-57.11%), 35.0ms (-56.30%), 34.6ms (-57.22%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
22.7ms (-72.20%), 22.6ms (-68.16%), 23.0ms (-71.76%), 22.5ms (-71.89%), 22.5ms (-72.23%)
Problem type: 32_GRID_size=200x200_fill=0.25_conn=2_bsize=3
.....(5/5, done!)
Operation: solve-1
- 1_CHOLMOD (basis for comparison):
0.111s, 93.7ms, 97.1ms, 98.1ms, 94.3ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.161s (+45.01%), 0.162s (+72.28%), 0.166s (+70.44%), 0.179s (+82.85%), 0.156s (+65.12%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
0.116s (+4.24%), 0.123s (+31.20%), 0.121s (+25.08%), 0.120s (+22.63%), 0.112s (+18.48%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
54.1ms (-51.41%), 56.2ms (-40.02%), 56.0ms (-42.33%), 55.4ms (-43.50%), 56.9ms (-39.66%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
29.7ms (-73.26%), 30.5ms (-67.51%), 28.7ms (-70.45%), 28.2ms (-71.23%), 27.9ms (-70.41%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
16.1ms (-85.53%), 16.4ms (-82.47%), 16.2ms (-83.33%), 17.1ms (-82.52%), 16.9ms (-82.11%)
Operation: solve-10
- 1_CHOLMOD (basis for comparison):
0.341s, 0.299s, 0.319s, 0.322s, 0.310s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.351s (+2.92%), 0.344s (+15.16%), 0.438s (+37.37%), 0.374s (+16.32%), 0.346s (+11.40%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
0.527s (+54.58%), 0.558s (+86.67%), 0.560s (+75.59%), 0.542s (+68.49%), 0.517s (+66.72%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
0.201s (-40.89%), 0.212s (-29.10%), 0.209s (-34.65%), 0.206s (-36.13%), 0.208s (-32.81%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
0.142s (-58.34%), 0.144s (-52.01%), 0.137s (-57.07%), 0.136s (-57.80%), 0.131s (-57.88%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
0.127s (-62.76%), 0.129s (-56.78%), 0.132s (-58.59%), 0.135s (-58.10%), 0.126s (-59.43%)
Operation: solve-2
- 1_CHOLMOD (basis for comparison):
0.200s, 0.160s, 0.166s, 0.157s, 0.151s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.199s (-0.46%), 0.199s (+24.27%), 0.220s (+32.69%), 0.224s (+42.52%), 0.198s (+31.36%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
0.180s (-10.05%), 0.190s (+18.81%), 0.190s (+14.61%), 0.187s (+18.78%), 0.178s (+18.11%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
0.106s (-46.79%), 0.109s (-31.90%), 0.108s (-34.53%), 0.108s (-31.41%), 0.110s (-27.01%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
65.7ms (-67.11%), 66.2ms (-58.57%), 62.8ms (-62.09%), 62.6ms (-60.21%), 60.5ms (-59.91%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
44.8ms (-77.55%), 45.0ms (-71.86%), 45.0ms (-72.82%), 47.1ms (-70.09%), 44.0ms (-70.87%)
Problem type: 33_GRID_size=200x200_fill=0.05_conn=3_bsize=3
.....(5/5, done!)
Operation: solve-1
- 1_CHOLMOD (basis for comparison):
27.0ms, 27.6ms, 28.1ms, 27.2ms, 30.6ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
36.9ms (+36.45%), 37.1ms (+34.42%), 40.3ms (+43.34%), 37.9ms (+39.32%), 39.7ms (+29.55%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
25.8ms (-4.61%), 28.3ms (+2.45%), 27.5ms (-1.95%), 28.0ms (+2.97%), 29.7ms (-3.23%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
20.5ms (-24.17%), 21.9ms (-20.76%), 20.7ms (-26.30%), 21.1ms (-22.48%), 20.9ms (-31.90%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
10.5ms (-61.13%), 11.1ms (-59.86%), 10.7ms (-62.07%), 10.3ms (-62.17%), 10.6ms (-65.41%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
5.7ms (-78.98%), 5.7ms (-79.29%), 5.9ms (-79.05%), 5.6ms (-79.47%), 6.0ms (-80.43%)
Operation: solve-10
- 1_CHOLMOD (basis for comparison):
85.6ms, 87.7ms, 88.0ms, 92.0ms, 91.2ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
91.0ms (+6.39%), 92.5ms (+5.51%), 96.8ms (+10.00%), 92.4ms (+0.46%), 97.2ms (+6.58%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
0.115s (+34.38%), 0.129s (+47.22%), 0.115s (+30.59%), 0.119s (+29.14%), 0.120s (+31.92%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
53.8ms (-37.08%), 58.5ms (-33.26%), 55.7ms (-36.72%), 55.6ms (-39.57%), 56.2ms (-38.33%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
33.7ms (-60.63%), 35.5ms (-59.55%), 34.7ms (-60.56%), 33.1ms (-64.06%), 34.3ms (-62.35%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
27.5ms (-67.87%), 28.0ms (-68.04%), 29.7ms (-66.26%), 27.0ms (-70.64%), 30.3ms (-66.81%)
Operation: solve-2
- 1_CHOLMOD (basis for comparison):
42.5ms, 42.8ms, 44.5ms, 42.2ms, 46.0ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
46.2ms (+8.65%), 46.5ms (+8.59%), 50.6ms (+13.80%), 47.3ms (+11.97%), 49.2ms (+6.96%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
46.9ms (+10.36%), 50.5ms (+17.95%), 51.4ms (+15.56%), 47.4ms (+12.26%), 50.0ms (+8.67%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
38.4ms (-9.69%), 40.0ms (-6.60%), 39.7ms (-10.81%), 38.1ms (-9.78%), 38.8ms (-15.58%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
21.4ms (-49.70%), 22.0ms (-48.66%), 22.7ms (-49.05%), 20.9ms (-50.45%), 21.8ms (-52.72%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
13.8ms (-67.48%), 13.8ms (-67.65%), 15.6ms (-64.93%), 13.5ms (-67.94%), 15.6ms (-66.04%)
Problem type: 40_MERI_size=1500_n=4_hairlen=600_hairs=2_band=120_fill=0.1_bsize=3
.....(5/5, done!)
Operation: solve-1
- 1_CHOLMOD (basis for comparison):
27.4ms, 24.6ms, 24.9ms, 25.5ms, 25.1ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
37.7ms (+37.66%), 37.1ms (+51.01%), 36.3ms (+45.93%), 37.3ms (+46.05%), 35.8ms (+42.54%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
19.3ms (-29.49%), 17.4ms (-29.19%), 17.3ms (-30.52%), 17.0ms (-33.29%), 15.4ms (-38.86%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
30.9ms (+13.06%), 30.2ms (+23.11%), 30.3ms (+21.73%), 32.0ms (+25.39%), 31.8ms (+26.44%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
16.5ms (-39.53%), 16.5ms (-33.00%), 15.6ms (-37.15%), 16.7ms (-34.40%), 15.5ms (-38.24%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
8.7ms (-68.15%), 8.3ms (-66.31%), 8.3ms (-66.76%), 8.4ms (-67.07%), 8.2ms (-67.56%)
Operation: solve-10
- 1_CHOLMOD (basis for comparison):
77.1ms, 72.9ms, 72.1ms, 74.5ms, 73.9ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
91.7ms (+18.97%), 90.6ms (+24.26%), 88.6ms (+22.95%), 90.8ms (+21.97%), 87.5ms (+18.42%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
82.2ms (+6.67%), 78.1ms (+7.06%), 78.3ms (+8.70%), 82.5ms (+10.73%), 79.3ms (+7.36%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
59.0ms (-23.44%), 58.0ms (-20.47%), 58.0ms (-19.55%), 61.8ms (-17.05%), 60.1ms (-18.67%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
34.4ms (-55.39%), 34.4ms (-52.77%), 32.2ms (-55.28%), 33.5ms (-55.05%), 32.3ms (-56.32%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
21.5ms (-72.06%), 21.0ms (-71.19%), 20.9ms (-70.95%), 22.0ms (-70.48%), 20.8ms (-71.84%)
Operation: solve-2
- 1_CHOLMOD (basis for comparison):
42.4ms, 39.2ms, 39.3ms, 39.8ms, 39.6ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
43.5ms (+2.42%), 42.7ms (+9.13%), 41.8ms (+6.23%), 43.6ms (+9.77%), 41.1ms (+3.74%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
44.2ms (+4.09%), 40.4ms (+3.06%), 39.2ms (-0.30%), 42.4ms (+6.62%), 43.2ms (+9.16%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
55.8ms (+31.49%), 61.7ms (+57.44%), 54.9ms (+39.54%), 59.1ms (+48.69%), 58.1ms (+46.74%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
32.6ms (-23.28%), 32.3ms (-17.58%), 30.5ms (-22.52%), 32.4ms (-18.55%), 30.6ms (-22.67%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
19.7ms (-53.48%), 18.6ms (-52.44%), 18.7ms (-52.56%), 19.2ms (-51.59%), 18.5ms (-53.32%)
Problem type: 41_MERI_size=1500_n=7_hairlen=600_hairs=2_band=120_fill=0.1_bsize=3
.....(5/5, done!)
Operation: solve-1
- 1_CHOLMOD (basis for comparison):
39.4ms, 39.2ms, 39.4ms, 42.6ms, 43.4ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
55.2ms (+40.11%), 55.8ms (+42.18%), 55.7ms (+41.44%), 55.9ms (+31.14%), 56.8ms (+31.07%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
24.8ms (-37.12%), 24.5ms (-37.44%), 24.2ms (-38.47%), 25.5ms (-40.13%), 24.3ms (-44.05%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
46.8ms (+18.82%), 47.1ms (+20.03%), 47.0ms (+19.46%), 46.5ms (+9.07%), 54.3ms (+25.15%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
25.1ms (-36.23%), 25.1ms (-36.05%), 24.4ms (-38.12%), 23.9ms (-43.99%), 23.8ms (-45.02%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
12.7ms (-67.75%), 13.1ms (-66.72%), 12.7ms (-67.80%), 12.9ms (-69.64%), 12.5ms (-71.27%)
Operation: solve-10
- 1_CHOLMOD (basis for comparison):
0.117s, 0.112s, 0.116s, 0.119s, 0.123s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.135s (+15.30%), 0.136s (+21.47%), 0.135s (+16.72%), 0.133s (+11.59%), 0.138s (+12.51%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
0.116s (-1.24%), 0.114s (+2.40%), 0.117s (+0.98%), 0.121s (+1.35%), 0.120s (-2.14%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
89.5ms (-23.60%), 88.4ms (-20.80%), 88.8ms (-23.19%), 89.8ms (-24.74%), 89.7ms (-26.94%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
53.5ms (-54.32%), 52.7ms (-52.78%), 50.7ms (-56.11%), 53.2ms (-55.43%), 51.6ms (-57.99%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
32.9ms (-71.89%), 33.1ms (-70.32%), 32.5ms (-71.90%), 33.7ms (-71.74%), 32.9ms (-73.20%)
Operation: solve-2
- 1_CHOLMOD (basis for comparison):
62.7ms, 60.9ms, 62.2ms, 66.0ms, 65.7ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
63.4ms (+1.04%), 63.9ms (+4.98%), 63.7ms (+2.31%), 66.0ms (+0.05%), 65.7ms (-0.04%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
63.2ms (+0.68%), 61.8ms (+1.39%), 60.6ms (-2.59%), 62.4ms (-5.39%), 63.1ms (-3.92%)
- 4_BaSpaCho_CUDA_batchsize=4 (vs. 1_CHOLMOD):
86.3ms (+37.52%), 85.0ms (+39.57%), 86.5ms (+39.05%), 86.6ms (+31.33%), 85.0ms (+29.43%)
- 5_BaSpaCho_CUDA_batchsize=8 (vs. 1_CHOLMOD):
49.9ms (-20.53%), 49.5ms (-18.67%), 47.9ms (-22.98%), 47.6ms (-27.83%), 48.1ms (-26.81%)
- 6_BaSpaCho_CUDA_batchsize=16 (vs. 1_CHOLMOD):
29.0ms (-53.71%), 29.1ms (-52.24%), 28.8ms (-53.65%), 29.6ms (-55.12%), 28.7ms (-56.33%)
Command: cmake --build build -v -- -j16 && build/bench -B 1_CHOLMOD -O solve-1,solve-2,solve-10 -S ^2
Problem type: 10_FLAT_size=1000_fill=0.1_bsize=3
.....(5/5, done!)
Operation: solve-1
- 1_CHOLMOD (basis for comparison):
3.4ms, 3.7ms, 4.0ms, 4.1ms, 4.1ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
3.6ms (+5.08%), 3.3ms (-9.34%), 3.8ms (-5.03%), 3.5ms (-14.14%), 3.3ms (-18.98%)
Operation: solve-10
- 1_CHOLMOD (basis for comparison):
8.3ms, 8.6ms, 7.4ms, 8.2ms, 7.4ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
7.1ms (-14.87%), 6.2ms (-28.01%), 6.9ms (-7.34%), 6.5ms (-20.34%), 6.4ms (-13.00%)
Operation: solve-2
- 1_CHOLMOD (basis for comparison):
12.8ms, 13.2ms, 11.9ms, 12.0ms, 12.5ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
7.9ms (-38.49%), 6.6ms (-50.11%), 6.9ms (-42.00%), 6.5ms (-45.85%), 6.4ms (-48.71%)
Problem type: 11_FLAT_size=4000_fill=0.01_bsize=3
.....(5/5, done!)
Operation: solve-1
- 1_CHOLMOD (basis for comparison):
32.8ms, 33.0ms, 33.1ms, 31.6ms, 32.6ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
28.0ms (-14.60%), 30.5ms (-7.67%), 32.5ms (-1.60%), 29.3ms (-7.21%), 29.3ms (-9.93%)
Operation: solve-10
- 1_CHOLMOD (basis for comparison):
51.6ms, 50.0ms, 50.9ms, 51.1ms, 49.5ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
48.5ms (-6.00%), 55.8ms (+11.61%), 53.4ms (+5.05%), 48.4ms (-5.30%), 51.5ms (+4.07%)
Operation: solve-2
- 1_CHOLMOD (basis for comparison):
0.164s, 0.161s, 0.163s, 0.257s, 0.162s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
46.0ms (-71.91%), 50.9ms (-68.39%), 49.3ms (-69.77%), 47.0ms (-81.73%), 48.3ms (-70.19%)
Problem type: 12_FLAT_size=2000_fill=0.03_bsize=2-5
.....(5/5, done!)
Operation: solve-1
- 1_CHOLMOD (basis for comparison):
16.5ms, 14.5ms, 14.9ms, 14.9ms, 15.0ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
13.7ms (-17.04%), 13.8ms (-5.15%), 13.3ms (-11.23%), 14.6ms (-1.60%), 14.4ms (-3.86%)
Operation: solve-10
- 1_CHOLMOD (basis for comparison):
27.8ms, 22.4ms, 24.2ms, 23.9ms, 25.6ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
26.9ms (-3.10%), 26.2ms (+16.70%), 27.1ms (+11.91%), 29.1ms (+21.71%), 29.1ms (+13.61%)
Operation: solve-2
- 1_CHOLMOD (basis for comparison):
64.7ms, 64.2ms, 62.8ms, 65.7ms, 66.0ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
21.0ms (-67.53%), 21.8ms (-66.09%), 22.1ms (-64.83%), 23.8ms (-63.80%), 29.9ms (-54.64%)
Problem type: 20_FLAT+SCHUR_size=1000_fill=0.1_bsize=3_schursize=50000_schurfill=0.02
.....(5/5, done!)
Operation: solve-1
- 1_CHOLMOD (basis for comparison):
86.3ms, 64.7ms, 64.5ms, 85.3ms, 64.5ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
34.8ms (-59.74%), 36.6ms (-43.50%), 36.4ms (-43.57%), 36.9ms (-56.69%), 37.6ms (-41.70%)
Operation: solve-10
- 1_CHOLMOD (basis for comparison):
0.348s, 0.347s, 0.358s, 0.356s, 0.351s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.101s (-71.01%), 0.103s (-70.34%), 96.8ms (-73.01%), 0.102s (-71.41%), 0.101s (-71.36%)
Operation: solve-2
- 1_CHOLMOD (basis for comparison):
0.175s, 0.174s, 0.177s, 0.177s, 0.175s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
43.5ms (-75.17%), 44.9ms (-74.28%), 45.4ms (-74.34%), 46.1ms (-73.95%), 45.2ms (-74.20%)
Problem type: 21_FLAT+SCHUR_size=1000_fill=0.1_bsize=3_schursize=5000_schurfill=0.2
.....(5/5, done!)
Operation: solve-1
- 1_CHOLMOD (basis for comparison):
6.0ms, 5.9ms, 6.4ms, 6.3ms, 5.9ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
4.1ms (-31.97%), 4.1ms (-30.14%), 4.3ms (-32.43%), 4.1ms (-34.76%), 4.0ms (-31.13%)
Operation: solve-10
- 1_CHOLMOD (basis for comparison):
17.9ms, 18.0ms, 18.0ms, 17.8ms, 18.0ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
8.8ms (-50.99%), 8.2ms (-54.53%), 8.2ms (-54.32%), 8.3ms (-53.15%), 8.2ms (-54.56%)
Operation: solve-2
- 1_CHOLMOD (basis for comparison):
19.9ms, 19.7ms, 21.4ms, 20.2ms, 20.0ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
7.6ms (-61.78%), 7.5ms (-61.76%), 7.6ms (-64.56%), 7.5ms (-62.89%), 7.1ms (-64.47%)
Problem type: 30_GRID_size=100x100_fill=1.0_conn=2_bsize=3
.....(5/5, done!)
Operation: solve-1
- 1_CHOLMOD (basis for comparison):
17.0ms, 18.9ms, 20.6ms, 16.8ms, 21.3ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
23.9ms (+40.75%), 26.8ms (+42.15%), 26.4ms (+28.41%), 25.3ms (+50.05%), 25.1ms (+17.83%)
Operation: solve-10
- 1_CHOLMOD (basis for comparison):
56.8ms, 56.6ms, 70.0ms, 56.7ms, 56.6ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
60.8ms (+6.94%), 61.7ms (+8.89%), 64.2ms (-8.21%), 60.7ms (+6.92%), 60.9ms (+7.54%)
Operation: solve-2
- 1_CHOLMOD (basis for comparison):
27.0ms, 26.8ms, 34.3ms, 26.5ms, 26.6ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
30.4ms (+12.69%), 33.7ms (+25.79%), 36.0ms (+5.03%), 33.0ms (+24.25%), 33.5ms (+26.04%)
Problem type: 31_GRID_size=150x150_fill=1.0_conn=2_bsize=3
.....(5/5, done!)
Operation: solve-1
- 1_CHOLMOD (basis for comparison):
41.2ms, 40.7ms, 41.0ms, 40.5ms, 40.5ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
62.0ms (+50.23%), 61.0ms (+50.03%), 65.8ms (+60.55%), 61.0ms (+50.55%), 58.6ms (+44.47%)
Operation: solve-10
- 1_CHOLMOD (basis for comparison):
0.136s, 0.131s, 0.131s, 0.132s, 0.130s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.132s (-2.78%), 0.131s (+0.34%), 0.138s (+5.44%), 0.130s (-1.11%), 0.131s (+0.20%)
Operation: solve-2
- 1_CHOLMOD (basis for comparison):
85.6ms, 62.5ms, 64.3ms, 62.2ms, 62.2ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
71.3ms (-16.72%), 71.4ms (+14.26%), 72.0ms (+11.89%), 70.8ms (+13.90%), 69.2ms (+11.36%)
Problem type: 32_GRID_size=200x200_fill=0.25_conn=2_bsize=3
.....(5/5, done!)
Operation: solve-1
- 1_CHOLMOD (basis for comparison):
70.5ms, 72.3ms, 70.5ms, 71.7ms, 70.5ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
99.0ms (+40.33%), 0.101s (+39.23%), 0.102s (+44.12%), 0.104s (+44.96%), 0.109s (+54.28%)
Operation: solve-10
- 1_CHOLMOD (basis for comparison):
0.229s, 0.232s, 0.229s, 0.293s, 0.238s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.210s (-8.56%), 0.223s (-4.15%), 0.211s (-7.86%), 0.224s (-23.63%), 0.231s (-3.25%)
Operation: solve-2
- 1_CHOLMOD (basis for comparison):
0.126s, 0.124s, 0.127s, 0.135s, 0.132s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.123s (-2.60%), 0.115s (-6.73%), 0.122s (-4.25%), 0.125s (-7.97%), 0.127s (-4.04%)
Problem type: 33_GRID_size=200x200_fill=0.05_conn=3_bsize=3
.....(5/5, done!)
Operation: solve-1
- 1_CHOLMOD (basis for comparison):
24.7ms, 26.6ms, 25.5ms, 24.9ms, 25.1ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
25.7ms (+3.95%), 27.8ms (+4.61%), 24.4ms (-4.25%), 23.0ms (-7.78%), 26.3ms (+4.61%)
Operation: solve-10
- 1_CHOLMOD (basis for comparison):
90.3ms, 0.103s, 88.8ms, 88.7ms, 94.1ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
61.2ms (-32.19%), 63.2ms (-38.50%), 60.3ms (-32.11%), 58.6ms (-33.97%), 65.7ms (-30.21%)
Operation: solve-2
- 1_CHOLMOD (basis for comparison):
52.4ms, 54.9ms, 52.4ms, 51.6ms, 52.0ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
31.6ms (-39.70%), 34.1ms (-37.83%), 30.8ms (-41.11%), 29.9ms (-42.08%), 34.2ms (-34.25%)
Problem type: 40_MERI_size=1500_n=4_hairlen=600_hairs=2_band=120_fill=0.1_bsize=3
.....(5/5, done!)
Operation: solve-1
- 1_CHOLMOD (basis for comparison):
22.2ms, 21.7ms, 21.6ms, 21.9ms, 22.2ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
19.6ms (-11.82%), 20.1ms (-7.57%), 19.7ms (-8.92%), 19.0ms (-13.34%), 19.5ms (-12.41%)
Operation: solve-10
- 1_CHOLMOD (basis for comparison):
69.6ms, 65.9ms, 63.3ms, 63.4ms, 62.2ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
51.9ms (-25.47%), 52.7ms (-20.03%), 50.8ms (-19.82%), 51.6ms (-18.70%), 49.7ms (-20.13%)
Operation: solve-2
- 1_CHOLMOD (basis for comparison):
31.3ms, 30.8ms, 31.1ms, 31.2ms, 30.7ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
25.0ms (-19.94%), 25.3ms (-18.04%), 26.3ms (-15.27%), 25.1ms (-19.47%), 24.6ms (-19.64%)
Problem type: 41_MERI_size=1500_n=7_hairlen=600_hairs=2_band=120_fill=0.1_bsize=3
.....(5/5, done!)
Operation: solve-1
- 1_CHOLMOD (basis for comparison):
34.4ms, 34.2ms, 34.5ms, 36.1ms, 36.3ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
29.9ms (-12.88%), 31.1ms (-8.81%), 30.9ms (-10.33%), 31.1ms (-13.73%), 30.1ms (-17.11%)
Operation: solve-10
- 1_CHOLMOD (basis for comparison):
0.114s, 98.5ms, 0.103s, 0.102s, 0.103s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
78.4ms (-31.45%), 81.5ms (-17.30%), 80.1ms (-22.49%), 80.8ms (-20.57%), 80.7ms (-21.56%)
Operation: solve-2
- 1_CHOLMOD (basis for comparison):
48.7ms, 48.2ms, 47.8ms, 50.0ms, 49.9ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
39.0ms (-19.81%), 38.5ms (-20.06%), 37.9ms (-20.82%), 39.1ms (-21.71%), 38.6ms (-22.65%)
Command: cmake --build build -v -- -j16 && build/bench -B 1_CHOLMOD -O analysis -S ^[23]
Problem type: 10_FLAT_size=1000_fill=0.1_bsize=3
.....(5/5, done!)
Operation: analysis
- 1_CHOLMOD (basis for comparison):
29.8ms, 25.6ms, 29.4ms, 26.7ms, 27.0ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
21.6ms (-27.45%), 20.0ms (-21.90%), 21.2ms (-27.63%), 20.1ms (-24.68%), 20.5ms (-24.13%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
22.8ms (-23.69%), 26.3ms (+2.66%), 23.4ms (-20.31%), 22.6ms (-15.31%), 21.2ms (-21.62%)
Problem type: 11_FLAT_size=4000_fill=0.01_bsize=3
.....(5/5, done!)
Operation: analysis
- 1_CHOLMOD (basis for comparison):
72.0ms, 64.4ms, 69.9ms, 70.9ms, 65.3ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.256s (+256.07%), 0.265s (+311.09%), 0.284s (+306.39%), 0.264s (+271.75%), 0.275s (+321.19%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
0.278s (+286.11%), 0.267s (+314.92%), 0.291s (+316.09%), 0.281s (+295.57%), 0.300s (+360.02%)
Problem type: 12_FLAT_size=2000_fill=0.03_bsize=2-5
.....(5/5, done!)
Operation: analysis
- 1_CHOLMOD (basis for comparison):
57.0ms, 55.8ms, 51.0ms, 52.8ms, 50.2ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
62.6ms (+9.70%), 66.9ms (+20.07%), 61.8ms (+21.32%), 62.8ms (+18.94%), 62.0ms (+23.55%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
63.3ms (+10.95%), 65.5ms (+17.41%), 63.4ms (+24.47%), 64.4ms (+21.87%), 63.6ms (+26.80%)
Problem type: 20_FLAT+SCHUR_size=1000_fill=0.1_bsize=3_schursize=50000_schurfill=0.02
.....(5/5, done!)
Operation: analysis
- 1_CHOLMOD (basis for comparison):
9.740s, 9.918s, 9.770s, 10.006s, 9.862s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
3.464s (-64.43%), 3.483s (-64.88%), 3.504s (-64.14%), 3.451s (-65.51%), 3.413s (-65.39%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
3.495s (-64.12%), 3.414s (-65.58%), 3.506s (-64.12%), 3.463s (-65.39%), 3.408s (-65.44%)
Problem type: 21_FLAT+SCHUR_size=1000_fill=0.1_bsize=3_schursize=5000_schurfill=0.2
.....(5/5, done!)
Operation: analysis
- 1_CHOLMOD (basis for comparison):
27.6ms, 27.0ms, 27.5ms, 27.2ms, 27.2ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
22.2ms (-19.58%), 23.1ms (-14.52%), 21.8ms (-20.79%), 22.0ms (-19.02%), 21.0ms (-22.89%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
24.2ms (-12.42%), 22.9ms (-15.31%), 24.0ms (-12.82%), 23.9ms (-12.14%), 23.9ms (-12.45%)
Problem type: 30_GRID_size=100x100_fill=1.0_conn=2_bsize=3
.....(5/5, done!)
Operation: analysis
- 1_CHOLMOD (basis for comparison):
55.1ms, 56.0ms, 55.6ms, 55.9ms, 56.0ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
66.1ms (+19.94%), 65.8ms (+17.52%), 66.2ms (+18.99%), 67.3ms (+20.43%), 66.4ms (+18.46%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
55.9ms (+1.47%), 57.2ms (+2.18%), 57.9ms (+4.10%), 56.2ms (+0.55%), 58.8ms (+5.00%)
Problem type: 31_GRID_size=150x150_fill=1.0_conn=2_bsize=3
.....(5/5, done!)
Operation: analysis
- 1_CHOLMOD (basis for comparison):
0.142s, 0.140s, 0.141s, 0.143s, 0.140s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.190s (+33.63%), 0.188s (+34.14%), 0.186s (+32.43%), 0.188s (+31.41%), 0.183s (+30.50%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
0.183s (+28.66%), 0.183s (+30.03%), 0.185s (+31.29%), 0.184s (+28.82%), 0.183s (+30.61%)
Problem type: 32_GRID_size=200x200_fill=0.25_conn=2_bsize=3
.....(5/5, done!)
Operation: analysis
- 1_CHOLMOD (basis for comparison):
0.215s, 0.225s, 0.217s, 0.214s, 0.232s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.376s (+75.15%), 0.422s (+87.69%), 0.420s (+93.29%), 0.413s (+92.69%), 0.404s (+74.53%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
0.414s (+92.64%), 0.396s (+76.09%), 0.415s (+91.14%), 0.393s (+83.41%), 0.387s (+66.83%)
Problem type: 33_GRID_size=200x200_fill=0.05_conn=3_bsize=3
.....(5/5, done!)
Operation: analysis
- 1_CHOLMOD (basis for comparison):
65.6ms, 62.8ms, 63.4ms, 63.0ms, 63.3ms
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
79.1ms (+20.66%), 75.9ms (+20.81%), 80.8ms (+27.54%), 76.1ms (+20.82%), 79.4ms (+25.47%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
78.4ms (+19.64%), 77.3ms (+23.06%), 82.9ms (+30.84%), 77.7ms (+23.38%), 84.0ms (+32.77%)
Problem type: 40_MERI_size=1500_n=4_hairlen=600_hairs=2_band=120_fill=0.1_bsize=3
.....(5/5, done!)
Operation: analysis
- 1_CHOLMOD (basis for comparison):
0.272s, 0.276s, 0.277s, 0.279s, 0.283s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
98.7ms (-63.76%), 98.7ms (-64.27%), 97.0ms (-64.91%), 0.100s (-64.02%), 97.5ms (-65.51%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
0.101s (-62.87%), 97.7ms (-64.64%), 97.8ms (-64.65%), 0.114s (-59.28%), 0.100s (-64.56%)
Problem type: 41_MERI_size=1500_n=7_hairlen=600_hairs=2_band=120_fill=0.1_bsize=3
.....(5/5, done!)
Operation: analysis
- 1_CHOLMOD (basis for comparison):
0.462s, 0.460s, 0.463s, 0.475s, 0.465s
- 2_BaSpaCho_BLAS_numthreads=16 (vs. 1_CHOLMOD):
0.153s (-66.78%), 0.154s (-66.49%), 0.155s (-66.48%), 0.155s (-67.43%), 0.166s (-64.40%)
- 3_BaSpaCho_CUDA (vs. 1_CHOLMOD):
0.158s (-65.88%), 0.155s (-66.30%), 0.155s (-66.43%), 0.156s (-67.19%), 0.154s (-66.96%)