Add a matrix type using an Unboxed vector underneath. #56

OlivierSohn · 2018-03-19T19:14:48Z

Following up on #54, I implemented an unboxed matrix.

My intent was to have a version with minimal working functionality that would suit my project. Nevertheless, I open this PR in case someone has the same need or wants to continue the work.

What (probably) needs to be done before a merge:

~~Port multStd__, multStd2, strassenMixed and multStrassenMixed from the boxed implementations (I removed them here as I couldn't find out how to port them).~~ (Done)
Restore the Functor, Applicative, Foldable implementations which I removed due to a lack of time / knowledge on how to implement them. Regarding Functor, I remember that specifying the Unbox constraint didn't work. As a replacement, I introduced mapMat which does what fmap would do.
~~Also, I removed the rules triggering warnings (see GHC 8 warnings about REWRITE rules not firing #37). They should be reintroduced once GHC 8 warnings about REWRITE rules not firing #37 is fixed.~~ (see discussion below)
Testing : we cannot reuse the same tests as the boxed versions because they rely on Integer which cannot be unboxed. We would have to use Int tests.
~~Benchmarking : on my project I saw better performances, but it would be nice to document it with criterion for example.~~ (Done)

Daniel-Diaz · 2018-03-20T06:37:27Z

This looks quite well. Thanks for the work! I wonder if it won't be a burden maintaining two versions of the same code.

I would say the tests are one of the most fundamental parts to implement before merging. Benchmarks would be a nice addition too, especially if we could compare it to the boxed version. That would give this variant a good reason to exist.

I'd say don't worry about all the multiplication algorithms. If the benchmarks say this version multiplies faster (and it probably will), I'm happy to accept it like that. We can always make progress to that particular purpose later if we want to.

About the rules... They can probably be left there. I don't see any harm besides the warnings. They say the rules might not fire, but when they do it's a performance win. Although ideally we should try to fix #37. But I don't think that stops this PR from merging.

OlivierSohn · 2018-03-20T09:02:27Z

@Daniel-Diaz after reading your comment, I put the rules back in place, added benchmarks for the unboxed version: overall, unboxed multiplications are 2-4x faster!

In the process I also added some missing (imho) SPECIALIZE pragmas to the boxed version, for strassen multiplication, which improved the performances there by approx. 10%.

Commit and detailed report will follow...

OlivierSohn · 2018-03-20T09:19:33Z

(for an updated benchmark, see below. This one is kept here to see the Strassen and StrassenU times which lead to opening #57, and which are not in the updated benchmark)

Running 1 benchmarks...
Benchmark matrix-mult: RUNNING...
benchmarking mult10/Definition
time                 8.683 μs   (8.449 μs .. 8.937 μs)
                     0.991 R²   (0.988 R² .. 0.994 R²)
mean                 9.054 μs   (8.753 μs .. 9.482 μs)
std dev              1.272 μs   (877.6 ns .. 1.874 μs)
variance introduced by outliers: 93% (severely inflated)
             
benchmarking mult10/DefinitionU
time                 3.947 μs   (3.840 μs .. 4.086 μs)
                     0.991 R²   (0.985 R² .. 0.996 R²)
mean                 3.982 μs   (3.869 μs .. 4.115 μs)
std dev              418.5 ns   (351.6 ns .. 529.4 ns)
variance introduced by outliers: 88% (severely inflated)
             
benchmarking mult10/Definition 2
time                 12.74 μs   (12.34 μs .. 13.15 μs)
                     0.990 R²   (0.982 R² .. 0.995 R²)
mean                 12.56 μs   (12.22 μs .. 13.17 μs)
std dev              1.459 μs   (1.018 μs .. 2.388 μs)
variance introduced by outliers: 89% (severely inflated)
             
benchmarking mult10/Strassen
time                 4.471 ms   (4.313 ms .. 4.628 ms)
                     0.985 R²   (0.975 R² .. 0.992 R²)
mean                 4.639 ms   (4.468 ms .. 4.892 ms)
std dev              659.6 μs   (469.2 μs .. 1.011 ms)
variance introduced by outliers: 78% (severely inflated)
             
benchmarking mult10/StrassenU
time                 2.191 ms   (2.096 ms .. 2.284 ms)
                     0.980 R²   (0.969 R² .. 0.990 R²)
mean                 2.199 ms   (2.108 ms .. 2.340 ms)
std dev              362.0 μs   (224.6 μs .. 525.8 μs)
variance introduced by outliers: 87% (severely inflated)
             
benchmarking mult10/Strassen mixed
time                 13.60 μs   (13.09 μs .. 14.09 μs)
                     0.987 R²   (0.980 R² .. 0.992 R²)
mean                 13.69 μs   (13.20 μs .. 14.28 μs)
std dev              1.764 μs   (1.407 μs .. 2.180 μs)
variance introduced by outliers: 91% (severely inflated)
             
benchmarking mult25/Definition
time                 127.4 μs   (122.2 μs .. 133.1 μs)
                     0.985 R²   (0.978 R² .. 0.992 R²)
mean                 133.4 μs   (128.5 μs .. 138.3 μs)
std dev              15.91 μs   (12.88 μs .. 21.34 μs)
variance introduced by outliers: 86% (severely inflated)
             
benchmarking mult25/DefinitionU
time                 51.29 μs   (49.26 μs .. 53.62 μs)
                     0.980 R²   (0.971 R² .. 0.988 R²)
mean                 53.78 μs   (51.62 μs .. 56.32 μs)
std dev              7.926 μs   (6.468 μs .. 11.38 μs)
variance introduced by outliers: 92% (severely inflated)
             
benchmarking mult25/Definition 2
time                 156.4 μs   (149.0 μs .. 162.4 μs)
                     0.977 R²   (0.964 R² .. 0.987 R²)
mean                 165.9 μs   (157.7 μs .. 177.2 μs)
std dev              34.07 μs   (26.76 μs .. 46.15 μs)
variance introduced by outliers: 95% (severely inflated)
             
benchmarking mult25/Strassen
time                 38.92 ms   (36.21 ms .. 41.42 ms)
                     0.982 R²   (0.958 R² .. 0.996 R²)
mean                 42.10 ms   (40.56 ms .. 44.16 ms)
std dev              3.348 ms   (2.540 ms .. 4.359 ms)
variance introduced by outliers: 25% (moderately inflated)
             
benchmarking mult25/StrassenU
time                 17.05 ms   (15.92 ms .. 18.17 ms)
                     0.982 R²   (0.968 R² .. 0.992 R²)
mean                 15.94 ms   (15.56 ms .. 16.48 ms)
std dev              1.133 ms   (903.2 μs .. 1.396 ms)
variance introduced by outliers: 32% (moderately inflated)
             
benchmarking mult25/Strassen mixed
time                 167.9 μs   (161.7 μs .. 173.8 μs)
                     0.986 R²   (0.980 R² .. 0.991 R²)
mean                 172.6 μs   (166.6 μs .. 178.9 μs)
std dev              20.58 μs   (17.22 μs .. 25.33 μs)
variance introduced by outliers: 85% (severely inflated)
             
benchmarking mult100/Definition
time                 10.46 ms   (9.535 ms .. 11.20 ms)
                     0.963 R²   (0.931 R² .. 0.986 R²)
mean                 9.617 ms   (9.218 ms .. 10.01 ms)
std dev              1.033 ms   (860.2 μs .. 1.282 ms)
variance introduced by outliers: 58% (severely inflated)
             
benchmarking mult100/DefinitionU
time                 2.827 ms   (2.743 ms .. 2.935 ms)
                     0.987 R²   (0.979 R² .. 0.993 R²)
mean                 2.923 ms   (2.816 ms .. 3.099 ms)
std dev              456.4 μs   (261.6 μs .. 689.0 μs)
variance introduced by outliers: 83% (severely inflated)
             
benchmarking mult100/Definition 2
time                 8.699 ms   (8.246 ms .. 8.999 ms)
                     0.985 R²   (0.964 R² .. 0.996 R²)
mean                 8.137 ms   (7.863 ms .. 8.360 ms)
std dev              657.7 μs   (513.9 μs .. 863.9 μs)
variance introduced by outliers: 47% (moderately inflated)
             
benchmarking mult100/Strassen
time                 2.404 s    (NaN s .. 2.680 s)
                     0.999 R²   (0.995 R² .. 1.000 R²)
mean                 2.412 s    (2.365 s .. 2.447 s)
std dev              53.23 ms   (0.0 s .. 60.55 ms)
variance introduced by outliers: 19% (moderately inflated)
             
benchmarking mult100/StrassenU
time                 841.0 ms   (805.9 ms .. 913.9 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 864.6 ms   (848.2 ms .. 877.2 ms)
std dev              19.47 ms   (0.0 s .. 21.87 ms)
variance introduced by outliers: 19% (moderately inflated)
             
benchmarking mult100/Strassen mixed
time                 8.105 ms   (7.877 ms .. 8.367 ms)
                     0.990 R²   (0.981 R² .. 0.995 R²)
mean                 8.220 ms   (7.994 ms .. 8.533 ms)
std dev              689.3 μs   (507.4 μs .. 1.082 ms)
variance introduced by outliers: 47% (moderately inflated)
             
benchmarking mult150/Definition
time                 36.63 ms   (33.81 ms .. 39.36 ms)
                     0.981 R²   (0.961 R² .. 0.993 R²)
mean                 34.36 ms   (32.88 ms .. 35.85 ms)
std dev              3.088 ms   (2.376 ms .. 4.520 ms)
variance introduced by outliers: 36% (moderately inflated)
             
benchmarking mult150/DefinitionU
time                 9.663 ms   (8.957 ms .. 10.19 ms)
                     0.975 R²   (0.958 R² .. 0.986 R²)
mean                 9.124 ms   (8.828 ms .. 9.446 ms)
std dev              883.0 μs   (737.9 μs .. 1.120 ms)
variance introduced by outliers: 53% (severely inflated)
             
benchmarking mult150/Definition 2
time                 28.68 ms   (27.30 ms .. 30.55 ms)
                     0.986 R²   (0.974 R² .. 0.995 R²)
mean                 26.10 ms   (24.94 ms .. 27.15 ms)
std dev              2.355 ms   (1.818 ms .. 3.080 ms)
variance introduced by outliers: 37% (moderately inflated)
             
benchmarking mult150/Strassen
time                 17.11 s    (14.83 s .. 20.24 s)
                     0.995 R²   (0.994 R² .. 1.000 R²)
mean                 16.72 s    (16.38 s .. 17.05 s)
std dev              562.1 ms   (0.0 s .. 572.2 ms)
variance introduced by outliers: 19% (moderately inflated)
             
benchmarking mult150/StrassenU
time                 5.961 s    (5.796 s .. 6.129 s)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 5.668 s    (5.575 s .. 5.710 s)
std dev              81.23 ms   (0.0 s .. 87.14 ms)
variance introduced by outliers: 19% (moderately inflated)
             
benchmarking mult150/Strassen mixed
time                 22.15 ms   (20.66 ms .. 23.80 ms)
                     0.978 R²   (0.957 R² .. 0.993 R²)
mean                 23.31 ms   (22.60 ms .. 24.19 ms)
std dev              1.804 ms   (1.357 ms .. 2.609 ms)
variance introduced by outliers: 33% (moderately inflated)
             
benchmarking mult250/Definition
time                 177.4 ms   (161.0 ms .. 203.1 ms)
                     0.975 R²   (0.898 R² .. 0.999 R²)
mean                 168.7 ms   (155.7 ms .. 182.3 ms)
std dev              19.32 ms   (11.88 ms .. 27.36 ms)
variance introduced by outliers: 27% (moderately inflated)
             
benchmarking mult250/DefinitionU
time                 45.56 ms   (42.18 ms .. 49.68 ms)
                     0.986 R²   (0.970 R² .. 0.998 R²)
mean                 44.99 ms   (43.82 ms .. 46.53 ms)
std dev              2.580 ms   (1.816 ms .. 3.460 ms)
variance introduced by outliers: 20% (moderately inflated)
             
benchmarking mult250/Definition 2
time                 107.3 ms   (96.78 ms .. 117.7 ms)
                     0.990 R²   (0.978 R² .. 0.998 R²)
mean                 104.5 ms   (100.1 ms .. 110.3 ms)
std dev              7.664 ms   (4.986 ms .. 10.91 ms)
variance introduced by outliers: 21% (moderately inflated)
             
benchmarking mult250/Strassen
time                 16.70 s    (16.09 s .. 18.08 s)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 16.81 s    (16.60 s .. 16.93 s)
std dev              189.5 ms   (0.0 s .. 213.3 ms)
variance introduced by outliers: 19% (moderately inflated)
             
benchmarking mult250/StrassenU
Progress: 1/2
time                 5.069 s    (4.858 s .. 5.374 s)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 5.420 s    (5.337 s .. 5.577 s)
std dev              135.8 ms   (1.088 fs .. 138.4 ms)
variance introduced by outliers: 19% (moderately inflated)
             
benchmarking mult250/Strassen mixed
time                 112.5 ms   (87.89 ms .. 135.4 ms)
                     0.939 R²   (0.889 R² .. 0.999 R²)
mean                 104.6 ms   (97.39 ms .. 115.9 ms)
std dev              13.34 ms   (3.166 ms .. 16.50 ms)
variance introduced by outliers: 42% (moderately inflated)
             
benchmarking mult400/Definition
time                 564.8 ms   (368.5 ms .. 925.5 ms)
                     0.956 R²   (0.915 R² .. 1.000 R²)
mean                 975.0 ms   (885.5 ms .. 1.061 s)
std dev              146.4 ms   (0.0 s .. 149.2 ms)
variance introduced by outliers: 46% (moderately inflated)
             
benchmarking mult400/DefinitionU
time                 182.0 ms   (179.0 ms .. 184.5 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 181.5 ms   (180.5 ms .. 182.6 ms)
std dev              1.248 ms   (815.8 μs .. 1.605 ms)
variance introduced by outliers: 14% (moderately inflated)
             
benchmarking mult400/Definition 2
time                 392.5 ms   (120.9 ms .. 647.1 ms)
                     0.917 R²   (0.873 R² .. 1.000 R²)
mean                 594.6 ms   (561.4 ms .. 620.6 ms)
std dev              40.37 ms   (0.0 s .. 45.11 ms)
variance introduced by outliers: 20% (moderately inflated)

…h unboxed and boxed versions to improve performances.

OlivierSohn · 2018-03-20T09:28:27Z

(Edited)

Reflecting on these results, I wonder if strassen multiplication is of any use to users of the library, because it seems so slow compared to others, and blows up the memory when using a square matrix of length 512 or so (4GB during the benchmark, note that this may also be due to criterion running tests in parallel, but still...).

strassen /mixed/ multiplication seems ok though.

OlivierSohn · 2018-03-20T11:22:42Z

Updated benchmarks with new multiplication functions (now boxed and unboxed versions have the same functionalities), and where the strassen benchmark was removed (cf #57):


Benchmark matrix-mult: RUNNING...
benchmarking mult10/Definition
time                 10.35 μs   (9.965 μs .. 10.71 μs)
                     0.990 R²   (0.986 R² .. 0.994 R²)
mean                 10.50 μs   (10.19 μs .. 10.84 μs)
std dev              1.102 μs   (883.5 ns .. 1.458 μs)
variance introduced by outliers: 87% (severely inflated)
             
benchmarking mult10/Definition U
time                 4.152 μs   (3.988 μs .. 4.304 μs)
                     0.988 R²   (0.983 R² .. 0.992 R²)
mean                 4.317 μs   (4.160 μs .. 4.473 μs)
std dev              542.1 ns   (463.5 ns .. 688.0 ns)
variance introduced by outliers: 92% (severely inflated)
             
benchmarking mult10/Definition 2
time                 12.60 μs   (12.20 μs .. 13.02 μs)
                     0.992 R²   (0.988 R² .. 0.996 R²)
mean                 12.45 μs   (12.21 μs .. 12.80 μs)
std dev              959.7 ns   (745.5 ns .. 1.419 μs)
variance introduced by outliers: 78% (severely inflated)
             
benchmarking mult10/Definition 2 U
time                 6.202 μs   (5.955 μs .. 6.439 μs)
                     0.992 R²   (0.989 R² .. 0.996 R²)
mean                 6.089 μs   (5.940 μs .. 6.252 μs)
std dev              509.1 ns   (416.7 ns .. 627.7 ns)
variance introduced by outliers: 82% (severely inflated)
             
benchmarking mult10/Strassen mixed
time                 11.43 μs   (11.21 μs .. 11.63 μs)
                     0.996 R²   (0.995 R² .. 0.998 R²)
mean                 11.34 μs   (11.10 μs .. 11.62 μs)
std dev              853.0 ns   (713.8 ns .. 1.166 μs)
variance introduced by outliers: 77% (severely inflated)
             
benchmarking mult10/Strassen mixed U
time                 5.617 μs   (5.447 μs .. 5.783 μs)
                     0.994 R²   (0.991 R² .. 0.997 R²)
mean                 5.541 μs   (5.394 μs .. 5.689 μs)
std dev              499.4 ns   (422.3 ns .. 623.6 ns)
variance introduced by outliers: 84% (severely inflated)
             
benchmarking mult25/Definition
time                 119.6 μs   (115.8 μs .. 123.6 μs)
                     0.991 R²   (0.984 R² .. 0.996 R²)
mean                 118.9 μs   (115.8 μs .. 122.3 μs)
std dev              10.52 μs   (8.329 μs .. 13.77 μs)
variance introduced by outliers: 77% (severely inflated)
             
benchmarking mult25/Definition U
time                 42.60 μs   (41.46 μs .. 43.90 μs)
                     0.993 R²   (0.988 R² .. 0.996 R²)
mean                 43.79 μs   (42.73 μs .. 44.87 μs)
std dev              3.820 μs   (2.998 μs .. 4.977 μs)
variance introduced by outliers: 80% (severely inflated)
             
benchmarking mult25/Definition 2
time                 138.6 μs   (135.4 μs .. 142.2 μs)
                     0.991 R²   (0.984 R² .. 0.995 R²)
mean                 141.2 μs   (137.0 μs .. 148.8 μs)
std dev              18.66 μs   (12.56 μs .. 33.16 μs)
variance introduced by outliers: 88% (severely inflated)
             
benchmarking mult25/Definition 2 U
time                 46.94 μs   (45.57 μs .. 48.63 μs)
                     0.990 R²   (0.986 R² .. 0.995 R²)
mean                 48.59 μs   (46.91 μs .. 50.79 μs)
std dev              6.704 μs   (4.839 μs .. 11.58 μs)
variance introduced by outliers: 90% (severely inflated)
             
benchmarking mult25/Strassen mixed
time                 137.3 μs   (133.1 μs .. 141.1 μs)
                     0.993 R²   (0.989 R² .. 0.996 R²)
mean                 136.4 μs   (133.0 μs .. 140.6 μs)
std dev              13.16 μs   (10.67 μs .. 16.62 μs)
variance introduced by outliers: 80% (severely inflated)
             
benchmarking mult25/Strassen mixed U
time                 44.84 μs   (43.81 μs .. 45.93 μs)
                     0.992 R²   (0.988 R² .. 0.996 R²)
mean                 45.21 μs   (43.96 μs .. 46.68 μs)
std dev              4.649 μs   (3.558 μs .. 6.531 μs)
variance introduced by outliers: 84% (severely inflated)
             
benchmarking mult100/Definition
time                 8.493 ms   (8.198 ms .. 8.779 ms)
                     0.990 R²   (0.983 R² .. 0.996 R²)
mean                 8.402 ms   (8.170 ms .. 8.672 ms)
std dev              713.1 μs   (544.1 μs .. 991.9 μs)
variance introduced by outliers: 47% (moderately inflated)
             
benchmarking mult100/Definition U
time                 2.282 ms   (2.197 ms .. 2.368 ms)
                     0.990 R²   (0.986 R² .. 0.995 R²)
mean                 2.312 ms   (2.257 ms .. 2.371 ms)
std dev              188.1 μs   (157.0 μs .. 235.7 μs)
variance introduced by outliers: 58% (severely inflated)
             
benchmarking mult100/Definition 2
time                 7.430 ms   (7.193 ms .. 7.661 ms)
                     0.988 R²   (0.977 R² .. 0.995 R²)
mean                 7.463 ms   (7.294 ms .. 7.688 ms)
std dev              533.3 μs   (412.2 μs .. 801.6 μs)
variance introduced by outliers: 41% (moderately inflated)
             
benchmarking mult100/Definition 2 U
time                 2.086 ms   (2.017 ms .. 2.148 ms)
                     0.991 R²   (0.987 R² .. 0.996 R²)
mean                 2.087 ms   (2.046 ms .. 2.132 ms)
std dev              151.2 μs   (126.0 μs .. 200.8 μs)
variance introduced by outliers: 53% (severely inflated)
             
benchmarking mult100/Strassen mixed
time                 6.802 ms   (6.578 ms .. 7.045 ms)
                     0.988 R²   (0.979 R² .. 0.996 R²)
mean                 7.103 ms   (6.929 ms .. 7.354 ms)
std dev              625.7 μs   (416.6 μs .. 973.0 μs)
variance introduced by outliers: 50% (moderately inflated)
             
benchmarking mult100/Strassen mixed U
time                 2.064 ms   (1.994 ms .. 2.145 ms)
                     0.983 R²   (0.969 R² .. 0.992 R²)
mean                 2.087 ms   (2.020 ms .. 2.184 ms)
std dev              282.1 μs   (211.7 μs .. 416.4 μs)
variance introduced by outliers: 80% (severely inflated)
             
benchmarking mult150/Definition
time                 28.55 ms   (26.45 ms .. 30.11 ms)
                     0.989 R²   (0.979 R² .. 0.997 R²)
mean                 30.38 ms   (29.14 ms .. 33.59 ms)
std dev              3.900 ms   (1.355 ms .. 6.454 ms)
variance introduced by outliers: 51% (severely inflated)
             
benchmarking mult150/Definition U
time                 7.540 ms   (7.257 ms .. 7.810 ms)
                     0.992 R²   (0.988 R² .. 0.996 R²)
mean                 8.016 ms   (7.788 ms .. 8.455 ms)
std dev              890.5 μs   (560.7 μs .. 1.521 ms)
variance introduced by outliers: 63% (severely inflated)
             
benchmarking mult150/Definition 2
time                 23.45 ms   (22.57 ms .. 24.38 ms)
                     0.992 R²   (0.984 R² .. 0.996 R²)
mean                 24.72 ms   (23.97 ms .. 25.73 ms)
std dev              2.049 ms   (1.462 ms .. 2.810 ms)
variance introduced by outliers: 35% (moderately inflated)
             
benchmarking mult150/Definition 2 U
time                 5.905 ms   (5.748 ms .. 6.038 ms)
                     0.994 R²   (0.989 R² .. 0.997 R²)
mean                 5.942 ms   (5.838 ms .. 6.076 ms)
std dev              359.4 μs   (257.5 μs .. 492.4 μs)
variance introduced by outliers: 35% (moderately inflated)
             
benchmarking mult150/Strassen mixed
time                 23.86 ms   (22.94 ms .. 25.05 ms)
                     0.986 R²   (0.967 R² .. 0.997 R²)
mean                 23.18 ms   (22.65 ms .. 24.10 ms)
std dev              1.550 ms   (1.083 ms .. 2.494 ms)
variance introduced by outliers: 28% (moderately inflated)
             
benchmarking mult150/Strassen mixed U
time                 5.906 ms   (5.661 ms .. 6.171 ms)
                     0.985 R²   (0.972 R² .. 0.994 R²)
mean                 5.908 ms   (5.789 ms .. 6.080 ms)
std dev              427.5 μs   (316.4 μs .. 569.1 μs)
variance introduced by outliers: 43% (moderately inflated)
             
benchmarking mult250/Definition
time                 151.8 ms   (136.4 ms .. 166.9 ms)
                     0.990 R²   (0.978 R² .. 0.999 R²)
mean                 152.7 ms   (148.6 ms .. 157.1 ms)
std dev              5.811 ms   (4.437 ms .. 7.218 ms)
variance introduced by outliers: 12% (moderately inflated)
             
benchmarking mult250/Definition U
time                 44.24 ms   (42.15 ms .. 46.50 ms)
                     0.992 R²   (0.981 R² .. 0.998 R²)
mean                 43.94 ms   (42.84 ms .. 45.76 ms)
std dev              2.660 ms   (1.545 ms .. 4.232 ms)
variance introduced by outliers: 20% (moderately inflated)
             
benchmarking mult250/Definition 2
time                 97.75 ms   (93.72 ms .. 101.0 ms)
                     0.998 R²   (0.994 R² .. 0.999 R²)
mean                 109.3 ms   (105.1 ms .. 115.4 ms)
std dev              7.608 ms   (5.114 ms .. 10.40 ms)
variance introduced by outliers: 20% (moderately inflated)
             
benchmarking mult250/Definition 2 U
time                 25.68 ms   (23.91 ms .. 27.46 ms)
                     0.982 R²   (0.970 R² .. 0.993 R²)
mean                 25.66 ms   (24.81 ms .. 27.00 ms)
std dev              2.271 ms   (1.555 ms .. 3.308 ms)
variance introduced by outliers: 35% (moderately inflated)
             
benchmarking mult250/Strassen mixed
time                 94.98 ms   (87.75 ms .. 100.5 ms)
                     0.994 R²   (0.987 R² .. 0.999 R²)
mean                 101.8 ms   (98.55 ms .. 107.2 ms)
std dev              6.323 ms   (3.278 ms .. 9.698 ms)
variance introduced by outliers: 20% (moderately inflated)
             
benchmarking mult250/Strassen mixed U
time                 24.21 ms   (22.97 ms .. 25.53 ms)
                     0.992 R²   (0.987 R² .. 0.997 R²)
mean                 24.62 ms   (23.94 ms .. 25.74 ms)
std dev              1.894 ms   (1.125 ms .. 3.224 ms)
variance introduced by outliers: 30% (moderately inflated)
             
benchmarking mult400/Definition
time                 669.4 ms   (116.9 ms .. NaN s)
                     0.868 R²   (0.734 R² .. 1.000 R²)
mean                 1.142 s    (924.0 ms .. 1.326 s)
std dev              294.8 ms   (136.0 as .. 319.3 ms)
variance introduced by outliers: 72% (severely inflated)
             
benchmarking mult400/Definition U
time                 213.7 ms   (202.1 ms .. 227.9 ms)
                     0.997 R²   (0.983 R² .. 1.000 R²)
mean                 227.7 ms   (221.8 ms .. 238.3 ms)
std dev              9.651 ms   (4.562 ms .. 13.13 ms)
variance introduced by outliers: 14% (moderately inflated)
             
benchmarking mult400/Definition 2
time                 444.5 ms   (129.8 ms .. 697.6 ms)
                     0.946 R²   (0.809 R² .. 1.000 R²)
mean                 629.4 ms   (597.4 ms .. 648.9 ms)
std dev              29.66 ms   (0.0 s .. 33.78 ms)
variance introduced by outliers: 19% (moderately inflated)
             
benchmarking mult400/Definition 2 U
time                 119.5 ms   (113.2 ms .. 128.0 ms)
                     0.994 R²   (0.981 R² .. 1.000 R²)
mean                 113.6 ms   (106.3 ms .. 117.7 ms)
std dev              7.606 ms   (4.117 ms .. 10.95 ms)
variance introduced by outliers: 12% (moderately inflated)
             
benchmarking mult400/Strassen mixed
time                 675.9 ms   (591.4 ms .. NaN s)
                     0.997 R²   (0.990 R² .. 1.000 R²)
mean                 742.1 ms   (712.8 ms .. 767.8 ms)
std dev              41.68 ms   (136.0 as .. 44.47 ms)
variance introduced by outliers: 19% (moderately inflated)
             
benchmarking mult400/Strassen mixed U
time                 105.2 ms   (99.46 ms .. 109.2 ms)
                     0.994 R²   (0.979 R² .. 0.999 R²)
mean                 112.1 ms   (107.8 ms .. 118.3 ms)
std dev              8.083 ms   (4.399 ms .. 12.80 ms)
variance introduced by outliers: 21% (moderately inflated)
             
benchmarking mult500/Definition
time                 2.379 s    (1.673 s .. 3.803 s)
                     0.956 R²   (0.934 R² .. 1.000 R²)
mean                 3.081 s    (2.667 s .. 3.384 s)
std dev              460.9 ms   (0.0 s .. 525.7 ms)
variance introduced by outliers: 46% (moderately inflated)
             
benchmarking mult500/Definition U
time                 408.3 ms   (329.1 ms .. 478.7 ms)
                     0.995 R²   (0.983 R² .. 1.000 R²)
mean                 408.9 ms   (394.0 ms .. 419.7 ms)
std dev              16.37 ms   (0.0 s .. 18.73 ms)
variance introduced by outliers: 19% (moderately inflated)
             
benchmarking mult500/Definition 2
time                 1.010 s    (2.313 ms .. 1.623 s)
                     0.878 R²   (0.668 R² .. 1.000 R²)
mean                 1.302 s    (1.264 s .. 1.335 s)
std dev              54.27 ms   (0.0 s .. 57.81 ms)
variance introduced by outliers: 19% (moderately inflated)
             
benchmarking mult500/Definition 2 U
time                 204.4 ms   (182.8 ms .. 232.4 ms)
                     0.991 R²   (0.984 R² .. 1.000 R²)
mean                 199.2 ms   (192.2 ms .. 206.5 ms)
std dev              9.200 ms   (7.203 ms .. 10.39 ms)
variance introduced by outliers: 14% (moderately inflated)
             
benchmarking mult500/Strassen mixed
time                 1.523 s    (1.291 s .. 1.805 s)
                     0.996 R²   (0.986 R² .. 1.000 R²)
mean                 1.455 s    (1.403 s .. 1.488 s)
std dev              50.40 ms   (0.0 s .. 58.07 ms)
variance introduced by outliers: 19% (moderately inflated)
             
benchmarking mult500/Strassen mixed U
time                 203.2 ms   (187.2 ms .. 221.0 ms)
                     0.996 R²   (0.990 R² .. 1.000 R²)
mean                 187.6 ms   (179.6 ms .. 195.5 ms)
std dev              10.78 ms   (5.760 ms .. 15.27 ms)
variance introduced by outliers: 14% (moderately inflated)
             
Benchmark matrix-mult: FINISH

So the unboxed version is at least 2x faster (for smaller matrix sizes) and up-to 8x faster, for bigger matrix sizes, which is what we could reasonably expect.

OlivierSohn · 2018-03-20T11:29:36Z

So I guess now what remains to be done is to add some tests, as I mentionned in the edited first message, we can't use Integer because it is not unbox-able.

I probably won't have time to work on it in the near future, so contributions on that side are welcome :)

…boxed version.

Magalame · 2019-05-18T04:48:45Z

@Daniel-Diaz this PR would actually be very interesting performance-wise, any update? Thanks!

Add a matrix type using an Unboxed vector underneath.

806e343

Daniel-Diaz self-assigned this Mar 20, 2018

Daniel-Diaz self-requested a review March 20, 2018 06:26

Daniel-Diaz removed their assignment Mar 20, 2018

Add benchmarks for the unboxed version, Add SPECIALIZE pragmas in bot…

f87a85d

…h unboxed and boxed versions to improve performances.

Commented out the strassen benchmarks.

4f491dd

OlivierSohn mentioned this pull request Mar 20, 2018

Remove (or fix memory usage of) strassen multiplication #57

Open

Add missing multiplication functions.

6ed4200

Add doc, use multStrassenMixed for (*) operator of Num instance in Un…

08e3217

…boxed version.

Daniel-Diaz self-assigned this Mar 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a matrix type using an Unboxed vector underneath. #56

Add a matrix type using an Unboxed vector underneath. #56

OlivierSohn commented Mar 19, 2018 •

edited

Loading

Daniel-Diaz commented Mar 20, 2018

OlivierSohn commented Mar 20, 2018

OlivierSohn commented Mar 20, 2018 •

edited

Loading

OlivierSohn commented Mar 20, 2018 •

edited

Loading

OlivierSohn commented Mar 20, 2018 •

edited

Loading

OlivierSohn commented Mar 20, 2018

Magalame commented May 18, 2019

Add a matrix type using an Unboxed vector underneath. #56

Are you sure you want to change the base?

Add a matrix type using an Unboxed vector underneath. #56

Conversation

OlivierSohn commented Mar 19, 2018 • edited Loading

Daniel-Diaz commented Mar 20, 2018

OlivierSohn commented Mar 20, 2018

OlivierSohn commented Mar 20, 2018 • edited Loading

OlivierSohn commented Mar 20, 2018 • edited Loading

OlivierSohn commented Mar 20, 2018 • edited Loading

OlivierSohn commented Mar 20, 2018

Magalame commented May 18, 2019

OlivierSohn commented Mar 19, 2018 •

edited

Loading

OlivierSohn commented Mar 20, 2018 •

edited

Loading

OlivierSohn commented Mar 20, 2018 •

edited

Loading

OlivierSohn commented Mar 20, 2018 •

edited

Loading