-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CCSD_T2_8 DGEMM w/ CUBLAS #1027
Conversation
…/nwchem into ccsd_t2_dgemm_cublas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't modify nwchem.nw
sorry, i thought i reset it already. i'll do it now. |
…/nwchem into ccsd_t2_dgemm_cublas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am assuming all these changes are going to be rebased and squashed, right?
By the way, I think it would have been cleaner (for later reading and analysis) to have in a separate set of commits all your efforts of prettifying the computer generated TCE code.
I know it would be cleaner to do that, but I end up doing the pointless beautification in the middle of the serious changes, because I am trying to make it more readable for myself. At some point, we should figure out a linting strategy and just run it on all of TCE, but I am still looking for a Fortran linter that's worth using. |
And the reason the ccsd_t2_7.F is added is that there a second PR coming with GPU support in T2_7, but it isn't done yet. The T2_8 part is done and I don't want the code to diverge too much before I have to merge again. |
FYI performance of alg 2 (best CPU) versus alg 8 (GPU), for 4(H2O) with cc-pVQZ, running on my desktop with AMD 7950X (16x Zen4) and RTX 4090. The performance on GPUs with better FP64 performance will obviously be better. These are the original CCSD code without NTS, but I will reproduce the changes in the ICSD version later.
|
Do I need to remove all the changes to comment syntax before merge or not? Otherwise, I think this is ready. |
Yes, I agree |
This is part 1 of TCE CCSD GPU support.
This merely takes the no-transpose, DGEMM-only version of CCSD_T2_8 and runs the DGEMM on the GPU with CUBLAS. I use double-buffering to overlap communication and computation.
This is faster than CPU-only even on a GeForce desktop system (RTX 4090 vs AMD 7950X, both of which are the fastest of their kind).
I changed the comment syntax from
c
to!
because it makes reading much easier in some IDEs.Please squash on merge.