-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
port trczdf to GPU #3
base: dev_gpu
Are you sure you want to change the base?
Conversation
#ifdef _OPENACC | ||
subroutine myalloc_ZDF_gpu() | ||
allocate(zwd(jpk, dimen_jvzdf)) | ||
zwd = huge(zwd(1,1)) | ||
allocate(zws(jpk, dimen_jvzdf)) | ||
zws = huge(zws(1,1)) | ||
allocate(zwi(jpk, dimen_jvzdf)) | ||
zwi = huge(zwi(1,1)) | ||
allocate(zwx(jpk, dimen_jvzdf)) | ||
zwx = huge(zwx(1,1)) | ||
allocate(zwy(jpk, dimen_jvzdf)) | ||
zwy = huge(zwy(1,1)) | ||
allocate(zwz(jpk, dimen_jvzdf)) | ||
zwz = huge(zwz(1,1)) | ||
allocate(zwt(jpk, dimen_jvzdf)) | ||
zwt = huge(zwt(1,1)) | ||
|
||
!$acc enter data create(zwd,zwi,zwx,zws,zwz,zwy,zwt) | ||
!$acc update device(zwd,zwi,zwx,zws,zwz,zwy,zwt) | ||
END subroutine myalloc_ZDF_gpu | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We create a new subroutine here that is called once in trczdf after dimen_jvzdf value is known
We could probably do the same for the CPU version to avoid duplicates, also the memory counter might needs to be adapted
!$acc enter data create( e1t(1:jpj,1:jpi), e2t(1:jpj,1:jpi), e3t(1:jpk,1:jpj,1:jpi) ) if(use_gpu) | ||
!$acc enter data create( e1u(1:jpj,1:jpi), e2u(1:jpj,1:jpi), e3u(1:jpk,1:jpj,1:jpi) ) if(use_gpu) | ||
!$acc enter data create( e1v(1:jpj,1:jpi), e2v(1:jpj,1:jpi), e3v(1:jpk,1:jpj,1:jpi) ) if(use_gpu) | ||
!$acc enter data create( e3w(1:jpk,1:jpj,1:jpi) ) if(use_gpu) | ||
!$acc enter data create( un(1:jpk,1:jpj,1:jpi), vn(1:jpk,1:jpj,1:jpi), wn(1:jpk,1:jpj,1:jpi) ) if(use_gpu) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not a good idea to declare these arrays here:
- they are allocated and deallocated later, which is a waste of time
- GPU allocation should be moved close to CPU
allocate
as the port progress
! NOTE: kernel is too big, should be split | ||
!$acc parallel loop gang vector default(present) async vector_length(32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might want to think about clever ways to generate this kernel as it seems quite big, best performance on A100 was obtained with a vector length of 32 which isn't very high
DO jv = 1, dimen_jvzdf | ||
|
||
ji = jarr_zdf(2,jv) | ||
jj = jarr_zdf(1,jv) | ||
Aij = e1t(jj,ji) * e2t(jj,ji) | ||
|
||
#ifdef _OPENACC | ||
ntx=jv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for GPU version we parallelize on dimen_jvzdf
No description provided.