-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tiled_cholesky w/o openmp #27
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution, I put some comments around coding styles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few questions:
-
Is this code taken/modified from lapack or blas examples or written from scratch. I see a few C style code and
_WIN32
relics that seem to be coming from somewhere else? -
Maybe I am not in the loop but I thought we were implementing a
stdexec
/omp
version of the cholesky? From the last meeting, I remember you had a design ready for at least 3x3 matrices that you were implementing? -
I don't get the point of using lapack or blas to build cholesky as those libraries already contain methods to do that directly: https://nalgebra.org/docs/user_guide/decompositions_and_lapack/
-
Does this code run in serial or parallel fashion? If serial, what's the advantage of doing this over the serial version we already have?
-
Generally, I am not in favor of adding so many external dependencies (lapack/blas) in this repository unless they are absolutely needed for functionality of our code.
Minor:
-
I think this PR has a conflict with main branch that will need to be resolved before merging.
-
I agree with @weilewei comments and want to emphasize that there needs to be documentation of what this code is doing, why and how is this different from the serial version?
Hi @mhaseeb123 Thanks for asking:
This code modified a lot mainly from Intel open source project and other online sources.
Yes, we were trying to implement a So according to its pseudo code in the paper, first, we have to have the tiled_cholesky decomposition algrithm for constructing task-based graph(for openmp). Once we have this tiled_cholesky decomposition algorithm works, then we can implementing 'omp' version by adding openmp syntax. And last time meeting, I had a design ready for at least 3x3 matrices works for this tiled_cholesky decomposition, i was working on verifying the correctness and generalize the code as much as possible.
As I mention in Question2, this implementation is for tiled_cholesky using functions in lapack or blas. As we have seen, the nvdia's cholesky implementation using their advanced library #include <experimental/linalg>, this is also a block cholesky algorithm which with fix 4 sub_blocks. I cannot find a way to do (task_based )'omp' if we only have previous implemented code.
currently, I am running the code with gpu.
Totally understand. I also worry about this external dependencies issue at the very beginning. Of course, we can discuss wether we need to continue 'omp' implementation based on this PR and using external dependencies (lapack/blas) as well or not. I appreciate any suggestions on task-based implementation for this algorithm |
Hi @hcq9102, thank you for taking the time to respond to questions.
|
Hi @mhaseeb123
Weile and I have discussed the possibility of using <experimental/linalg/..> in task-graph imp one month ago, but there is one function missing in <experimental/linalg/..>, so have to turn to lapack. And its not that good if mixed <experimental/linalg/..> and lapack.
I will try to do senders/receivers if possible, but there is external lib involved issue.
I have verified the correctness with lapack function which calculate cholesky decomposition directly. |
apps/choleskey/cholesky_tiled.cpp
Outdated
reference the implementation from Intel open source project, hetero-streams which will no longer be maintained by Intel. | ||
https://github.com/intel/hetero-streams/tree/master/ref_code/cholesky | ||
|
||
4. Additionally, include openblas library when build: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to install/setup OpenBLAS with CPM during CMake? Personally I don't want to add these extra manual steps of installing and export OPENBLAS_DIR=..
as they would make general usage as well as CI/CD really tedious. @weilewei
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mhaseeb123 agreed not introducing extra flags. I just tried with CPMadd, and building openblas takes a while for the first-time building it.
@hcq9102 try the following in the CMakeLists files. Additionally, I request to add a compilation flag something like USE_OPENBLAS
to cmake build to separate out if we should use cpm to add openblas.
in main CmakeLists.txt
if (USE_OPENBLAS)
cpmaddpackage(NAME openblas GITHUB_REPOSITORY OpenMathLib/OpenBLAS GIT_TAG
v0.3.25)
endif()
in your application CMakeLists.txt:
target_link_libraries(myapp openblas)
General question: I noticed that we are accessing matrices via C-style indexing at several locations. For example: |
I believe the openblas library is C-style and cannot figure out mdspan. |
tiled_cholesky implementation.
BUILD NOTE: build openblas library as needed. build OpenBlas script
include openblas library when build or $ export OPENBLAS_DIR=/openblas/path
TODO: