-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorize the cmath matrix operations #116
Conversation
Currently I only optimized the addition and constant multiplication.
VTune analyzer result with detray integration test with rk propagation & constant bfield After optimization |
Is this single or double precision? You may not see that much of a speedup in double precision |
It's double. I will check single later |
Are you compiling with support for vector extensions enabled? |
Maybe not. I thought Release build would do that. Do you know how to enable it? Maybe this? |
You can also ask the compiler to tell you about vectorization optimizations it did |
Another thing to consider is data alignment maybe |
Intel advisor report of And as expected, it is reported that there are bunch of non-vectorized operations with algebra-plugins/math/cmath/include/algebra/math/algorithms/matrix/inverse/partial_pivot_lud.hpp Line 67 in 9e684de
Partial Pivot LU decomposition is written in a totally anti-vectorized manner: The sub vector matrix represents a column of the matrix and we do the Gaussian elimination row-wisely. We will need to rewirte the partial pivot lud later with column-wise gauss elimination. |
de06258
to
536b045
Compare
I will skip partial pivot lud optimization in this PR as it is going to require a significant amount of time |
536b045
to
e2f4951
Compare
e2f4951
to
2360abe
Compare
Free lunch time
What's changed?
Just to help understanding what's going on with new matrix multiplication.
It should be noted that the cmath matrix is made of multiple arrays where each of them represents each column of matrix.
Imagine that we do multiply 4x4 matrix A and B to obtain the matrix C. If we do the matrix multiplication as we usually do like the following figure, we will have to pick up the row of A which cannot be vectorized:
So I came up with an idea to do the matrix multiplication column-wisely by multiplying a column of A to the element of B:
The following figure shows how the column of C is calculated in the new matrix multiplication