CCSD_T2_8 DGEMM w/ CUBLAS #1027

jeffhammond · 2024-10-15T07:56:11Z

This is part 1 of TCE CCSD GPU support.

This merely takes the no-transpose, DGEMM-only version of CCSD_T2_8 and runs the DGEMM on the GPU with CUBLAS. I use double-buffering to overlap communication and computation.

This is faster than CPU-only even on a GeForce desktop system (RTX 4090 vs AMD 7950X, both of which are the fastest of their kind).

I changed the comment syntax from c to ! because it makes reading much easier in some IDEs.

Please squash on merge.

…/nwchem into ccsd_t2_dgemm_cublas

edoapra

Please don't modify nwchem.nw

jeffhammond · 2024-10-15T16:55:09Z

Please don't modify nwchem.nw

sorry, i thought i reset it already. i'll do it now.

…/nwchem into ccsd_t2_dgemm_cublas

edoapra

I am assuming all these changes are going to be rebased and squashed, right?
By the way, I think it would have been cleaner (for later reading and analysis) to have in a separate set of commits all your efforts of prettifying the computer generated TCE code.

jeffhammond · 2024-10-15T17:04:59Z

I know it would be cleaner to do that, but I end up doing the pointless beautification in the middle of the serious changes, because I am trying to make it more readable for myself.

At some point, we should figure out a linting strategy and just run it on all of TCE, but I am still looking for a Fortran linter that's worth using.

jeffhammond · 2024-10-15T17:06:15Z

And the reason the ccsd_t2_7.F is added is that there a second PR coming with GPU support in T2_7, but it isn't done yet. The T2_8 part is done and I don't want the code to diverge too much before I have to merge again.

jeffhammond · 2024-10-15T18:34:44Z

FYI performance of alg 2 (best CPU) versus alg 8 (GPU), for 4(H2O) with cc-pVQZ, running on my desktop with AMD 7950X (16x Zen4) and RTX 4090.

The performance on GPUs with better FP64 performance will obviously be better.

These are the original CCSD code without NTS, but I will reproduce the changes in the ICSD version later.

 CCSD iterations
 -----------------------------------------------------------------
 Iter          Residuum       Correlation     Cpu    Wall    V2*C2
 -----------------------------------------------------------------
    1   0.2963584119518  -1.1503632883695   222.2   222.2   126.8
    2   0.0616536970595  -1.1400056630314   222.2   222.2   126.9
    3   0.0200706083171  -1.1564661687722   221.5   221.5   126.3
    4   0.0077536091160  -1.1573100785270   221.8   221.8   126.6
 -----------------------------------------------------------------
 Iterations converged
 CCSD correlation energy / hartree =        -1.157310078526961
 CCSD total energy / hartree       =      -305.445420807308778

 CCSD iterations
 -----------------------------------------------------------------
 Iter          Residuum       Correlation     Cpu    Wall    V2*C2
 -----------------------------------------------------------------
    1   0.2963584119518  -1.1503632883695   173.1   173.1    77.1
    2   0.0616536970595  -1.1400056630314   171.3   171.3    76.1
    3   0.0200706083171  -1.1564661687722   171.5   171.5    76.2
    4   0.0077536091160  -1.1573100785270   171.5   171.5    76.2
 -----------------------------------------------------------------
 Iterations converged
 CCSD correlation energy / hartree =        -1.157310078526962
 CCSD total energy / hartree       =      -305.445420807308778

jeffhammond · 2024-10-16T09:31:28Z

I am assuming all these changes are going to be rebased and squashed, right? By the way, I think it would have been cleaner (for later reading and analysis) to have in a separate set of commits all your efforts of prettifying the computer generated TCE code.

Do I need to remove all the changes to comment syntax before merge or not? Otherwise, I think this is ready.

edoapra · 2024-10-16T17:14:08Z

I am assuming all these changes are going to be rebased and squashed, right? By the way, I think it would have been cleaner (for later reading and analysis) to have in a separate set of commits all your efforts of prettifying the computer generated TCE code.

Do I need to remove all the changes to comment syntax before merge or not? Otherwise, I think this is ready.

Yes, I agree

jeffhammond added 30 commits October 5, 2023 21:10

this works

43ebd7f

add DGEMM version too

2fa9876

straight DGEMM works

8062b6d

remove the loops - DGEMM will always be better

e80bac5

removing loops

fc17d91

cleanup

b3ce4c1

do the pure DGEMM T2_8 in ICSD/NTS too

567fd44

move makefile include to the top so we can use its vars

0a1b133

still debugging

6de9fbe

so far, so good

e78fb8e

so far, so good

9f12e90

okay, it works correctly now

409ba37

okay, it works correctly now

360e2e7

now time for double buffering

355a2c8

clean up

74ab406

arrays are column major. wow.

34242be

n stream version using n=1

9335011

n stream version using n=1

2b18f0c

2 phase version is correct

12f47c9

comment syntax

7cf47de

move T2_7 into separate file

aec8780

fix non-F90 case

5d3d3da

move makefile include to the top so we can use its vars

d18f3c8

still debugging

00eed8c

so far, so good

3258902

so far, so good

59475ec

okay, it works correctly now

cc742c8

okay, it works correctly now

8889f36

now time for double buffering

4feff45

clean up

5805b06

jeffhammond added 10 commits October 15, 2024 10:40

arrays are column major. wow.

7113377

n stream version using n=1

a357f09

n stream version using n=1

c3ec460

2 phase version is correct

4faf45c

comment syntax

e38ca4a

move T2_7 into separate file

2abce96

reset generic input file

6affb37

allow to pass 64_to_32 CI check

319b548

Merge branch 'ccsd_t2_dgemm_cublas' of https://github.com/jeffhammond…

ced49c8

…/nwchem into ccsd_t2_dgemm_cublas

fix 64_to_32 check

ae928c7

edoapra requested changes Oct 15, 2024

View reviewed changes

jeffhammond added 2 commits October 15, 2024 19:55

reset

07beece

Merge branch 'ccsd_t2_dgemm_cublas' of https://github.com/jeffhammond…

707d974

…/nwchem into ccsd_t2_dgemm_cublas

edoapra requested changes Oct 15, 2024

View reviewed changes

fix 64_to_32 check again

b1af9f9

edoapra merged commit 0b49280 into nwchemgit:master Oct 16, 2024
62 of 63 checks passed

jeffhammond deleted the ccsd_t2_dgemm_cublas branch October 16, 2024 17:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CCSD_T2_8 DGEMM w/ CUBLAS #1027

CCSD_T2_8 DGEMM w/ CUBLAS #1027

jeffhammond commented Oct 15, 2024 •

edited

Loading

edoapra left a comment

jeffhammond commented Oct 15, 2024

edoapra left a comment

jeffhammond commented Oct 15, 2024

jeffhammond commented Oct 15, 2024

jeffhammond commented Oct 15, 2024

jeffhammond commented Oct 16, 2024

edoapra commented Oct 16, 2024

CCSD_T2_8 DGEMM w/ CUBLAS #1027

CCSD_T2_8 DGEMM w/ CUBLAS #1027

Conversation

jeffhammond commented Oct 15, 2024 • edited Loading

edoapra left a comment

Choose a reason for hiding this comment

jeffhammond commented Oct 15, 2024

edoapra left a comment

Choose a reason for hiding this comment

jeffhammond commented Oct 15, 2024

jeffhammond commented Oct 15, 2024

jeffhammond commented Oct 15, 2024

jeffhammond commented Oct 16, 2024

edoapra commented Oct 16, 2024

jeffhammond commented Oct 15, 2024 •

edited

Loading