-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Global interpolation without GSLIB #1542
base: develop
Are you sure you want to change the base?
Global interpolation without GSLIB #1542
Conversation
…tin/feature/global_interpolation_no_gslib
…ture/global_interpolation_no_gslib
@MartinKarp related to the bug with object init, we should probably have an issue where we list "offenders", so we can refactor them to an init routine. |
Think this is more or less ready on the CPU. This is truly a "develop" PR in the sense it is far from perfect. It alleviates the gslib dependency though. I think until I add functionality like fast GPU support we can wait with merging this. |
…ture/global_interpolation_no_gslib
!Isnt this a datarace? | ||
!How do things like this ever get into the code... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this is not a race, since there's no parallelism in the loop (do concurrent != omp parallel do)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I understood incrorrectly then, but to me there is a data dependency on u(sp(i)) or? I thought do concurrent specified that there are no dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that would be true, but we’re working with the gathered data so there’s no duplicates. So concurrent is needed to Force vectorisation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But there might be several values to unpack from one rank to the same u(sp(i)) right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No we're are only communicating unique, gathered dofs, so from a single neighbour there will only be one place to update. The accumulation is needed to account for shared dofs received from other ranks, but for host based MPI each neighbour has an own buffer, so there's no data dependencies in each unpack loop. (What can't be done is to have multithreaded unpack, without critical sections around the updates)
The shorter array with received data is later scattered, which has the one to many update pattern
do concurrent (j = 1:this%send_dof(dst)%size()) | ||
do j = 1,this%send_dof(dst)%size() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would this need to be changed? (and preventing optimisation of the loop filling the buffer)
associate(send_data => this%send_buf(i)%data) | ||
call MPI_Isend(send_data, size(send_data), & | ||
!associate(send_data => this%send_buf(i)%data) | ||
call MPI_Isend(this%send_buf(i)%data, this%send_dof(dst)%size(), & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please keep this associate block such that the code would still work with NAG fortran
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this was just me trying to find why Isend was so slow for many messages on Dardel.
…ture/global_interpolation_no_gslib
…ture/global_interpolation_no_gslib
…ture/global_interpolation_no_gslib
….com:ExtremeFLOW/neko into martin/feature/global_interpolation_no_gslib
…ture/global_interpolation_no_gslib
…ture/global_interpolation_no_gslib
Global interpolation without gslib. Same functionality as previously + GPU support, but no need to build with gslib any more. Removes gslib entirely from the codebase. I am now happy enough with it that I think we can start tying to get it into develop. It will continue to be optimized during the spring.
Features
Todo before merge