Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CCSD_T2_8 DGEMM w/ CUBLAS #1027

Merged
merged 43 commits into from
Oct 16, 2024
Merged

Conversation

jeffhammond
Copy link
Collaborator

@jeffhammond jeffhammond commented Oct 15, 2024

This is part 1 of TCE CCSD GPU support.

This merely takes the no-transpose, DGEMM-only version of CCSD_T2_8 and runs the DGEMM on the GPU with CUBLAS. I use double-buffering to overlap communication and computation.

This is faster than CPU-only even on a GeForce desktop system (RTX 4090 vs AMD 7950X, both of which are the fastest of their kind).

I changed the comment syntax from c to ! because it makes reading much easier in some IDEs.

Please squash on merge.

Copy link
Collaborator

@edoapra edoapra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't modify nwchem.nw

@jeffhammond
Copy link
Collaborator Author

Please don't modify nwchem.nw

sorry, i thought i reset it already. i'll do it now.

Copy link
Collaborator

@edoapra edoapra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming all these changes are going to be rebased and squashed, right?
By the way, I think it would have been cleaner (for later reading and analysis) to have in a separate set of commits all your efforts of prettifying the computer generated TCE code.

@jeffhammond
Copy link
Collaborator Author

I know it would be cleaner to do that, but I end up doing the pointless beautification in the middle of the serious changes, because I am trying to make it more readable for myself.

At some point, we should figure out a linting strategy and just run it on all of TCE, but I am still looking for a Fortran linter that's worth using.

@jeffhammond
Copy link
Collaborator Author

And the reason the ccsd_t2_7.F is added is that there a second PR coming with GPU support in T2_7, but it isn't done yet. The T2_8 part is done and I don't want the code to diverge too much before I have to merge again.

@jeffhammond
Copy link
Collaborator Author

FYI performance of alg 2 (best CPU) versus alg 8 (GPU), for 4(H2O) with cc-pVQZ, running on my desktop with AMD 7950X (16x Zen4) and RTX 4090.

The performance on GPUs with better FP64 performance will obviously be better.

These are the original CCSD code without NTS, but I will reproduce the changes in the ICSD version later.

 CCSD iterations
 -----------------------------------------------------------------
 Iter          Residuum       Correlation     Cpu    Wall    V2*C2
 -----------------------------------------------------------------
    1   0.2963584119518  -1.1503632883695   222.2   222.2   126.8
    2   0.0616536970595  -1.1400056630314   222.2   222.2   126.9
    3   0.0200706083171  -1.1564661687722   221.5   221.5   126.3
    4   0.0077536091160  -1.1573100785270   221.8   221.8   126.6
 -----------------------------------------------------------------
 Iterations converged
 CCSD correlation energy / hartree =        -1.157310078526961
 CCSD total energy / hartree       =      -305.445420807308778
 CCSD iterations
 -----------------------------------------------------------------
 Iter          Residuum       Correlation     Cpu    Wall    V2*C2
 -----------------------------------------------------------------
    1   0.2963584119518  -1.1503632883695   173.1   173.1    77.1
    2   0.0616536970595  -1.1400056630314   171.3   171.3    76.1
    3   0.0200706083171  -1.1564661687722   171.5   171.5    76.2
    4   0.0077536091160  -1.1573100785270   171.5   171.5    76.2
 -----------------------------------------------------------------
 Iterations converged
 CCSD correlation energy / hartree =        -1.157310078526962
 CCSD total energy / hartree       =      -305.445420807308778

@jeffhammond
Copy link
Collaborator Author

I am assuming all these changes are going to be rebased and squashed, right? By the way, I think it would have been cleaner (for later reading and analysis) to have in a separate set of commits all your efforts of prettifying the computer generated TCE code.

Do I need to remove all the changes to comment syntax before merge or not? Otherwise, I think this is ready.

@edoapra
Copy link
Collaborator

edoapra commented Oct 16, 2024

I am assuming all these changes are going to be rebased and squashed, right? By the way, I think it would have been cleaner (for later reading and analysis) to have in a separate set of commits all your efforts of prettifying the computer generated TCE code.

Do I need to remove all the changes to comment syntax before merge or not? Otherwise, I think this is ready.

Yes, I agree

@edoapra edoapra merged commit 0b49280 into nwchemgit:master Oct 16, 2024
62 of 63 checks passed
@jeffhammond jeffhammond deleted the ccsd_t2_dgemm_cublas branch October 16, 2024 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants