-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Velocity Potential and Stream Function Calculations #1072
base: develop
Are you sure you want to change the base?
New Velocity Potential and Stream Function Calculations #1072
Conversation
@WenMeng-NOAA We have updated the computation for streamfunction and velocity potential for compatibility with UPP parallelization. The subroutine now uses a poisson solver with a convergence tester to reduce runtimes. Comparison of spectral and numerical results is here |
The poisson solver works but is substantially slower than the gather --> stptranf --> scatter operation to solve the equation spectrally. The gather is much cheaper than it sounds because it is TWO fields, (U and V) at one level, not the several hundred such fields that comprise state plus derivative variables. A gather of two fields takes about 0.06 seconds on hera at GFS (high) resolution and it hard to even measure at CFS resolution. The relaxations take about 10 seconds on hera for chi and psi together and less than a second for the spectral solver. Spectral solver takes 27 seconds for the GFS (high) resolution case while the poisson solver takes 161 seconds. Spectral solver takes less than a second for the CFS case while the poisson solver takes about three seconds. |
sorc/ncep_post.fd/COLLECT_LOC.f
Outdated
@@ -81,6 +84,9 @@ SUBROUTINE COLLECT_LOC ( A, B ) | |||
deallocate(buff) | |||
deallocate(rbufs) | |||
|
|||
tb=mpi_wtime() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KarinaAsmar-NOAA Clean up the debugging code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GeorgeVandenberghe-NOAA Would you please clean up the debugging part of COLLECT_LOC.f? Let me know when done and I'll push it to this branch.
sorc/ncep_post.fd/COLLECT_LOC.f
Outdated
@@ -104,6 +110,8 @@ SUBROUTINE COLLECT_ALL ( A, B ) | |||
real, dimension(im,jm), intent(out) :: b | |||
integer ierr,n | |||
real, allocatable :: rbufs(:) | |||
real*8 tb,ta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KarinaAsmar-NOAA Clean up the debugging code in this routine.
ad772e3
to
f88bf42
Compare
@KarinaAsmar-NOAA Please update fortran code in the subroutine as following format:
|
@DusanJovic-NOAA @junwang-noaa For your reference, I ran UFS-WM RT 'cpld_control_sfs_intel' on Hera:
|
The allreduce is numerically critical. What it is doing is checking for the maximum difference between the previous value of psi/chi and the result at the end of the next iteration. If the allreduce is left out, this difference is only evaluated on the subdomain associated with MPI rank 0. The allreduce does a max operation on this error on ALL of the ranks. Removing the allreduce will result in faster exit since convergence is only checked on rank 0 and a "bad" rank's larger errors will not be removed by further iterations. The ROOT problem with this whole poisson solver is the extremely slow convergence for all methods tried to date. Note, if we do get rid of the allreduce, rank 0 is one of the best ranks to test on because it's on a corner domain and likely where convergence IS slowest. |
@GeorgeVandenberghe-NOAA It seems that the mpi_allreduce call causes the inline post runs to crash (@WenMeng-NOAA , plese confirm if the inline keeps crashing after removal of allreduce). Removing it does not seem to affect the SFS results (see pptx). |
Crash with allreduce of a single word variable on all tasks just shouldn't
happen. If it does, there is something else wrong, perhaps a fundamental
error in the specified MPI communicator in allreduce or a problem (again
fundamental) in one of the other arguments. a real8 variable with
mpi_real specified as the type is one possibility I will check. BUt
we're moving to a fixed number of iterations anyway and not testing for
convergence so then allreduce is not needed.
…On Wed, Dec 18, 2024 at 10:25 AM KarinaAsmar-NOAA ***@***.***> wrote:
The allreduce is numerically critical. What it is doing is checking for
the maximum difference between the previous value of psi/chi and the result
at the end of the next iteration. If the allreduce is left out, this
difference is only evaluated on the subdomain associated with MPI rank 0.
The allreduce does a max operation on this error on ALL of the ranks.
Removing the allreduce will result in faster exit since convergence is only
checked on rank 0 and a "bad" rank's larger errors will not be removed by
further iterations. The ROOT problem with this whole poisson solver is the
extremely slow convergence for all methods tried to date. Note, if we do
get rid of the allreduce, rank 0 is one of the best ranks to test on
because it's on a corner domain and likely where convergence IS slowest.
@GeorgeVandenberghe-NOAA <https://github.com/GeorgeVandenberghe-NOAA> It
seems that the mpi_allreduce call causes the inline post runs to crash (
@WenMeng-NOAA <https://github.com/WenMeng-NOAA> , plese confirm if the
inline keeps crashing after removal of allreduce). Removing it does not
seem to affect the SFS results (see pptx
<#1072 (comment)>).
—
Reply to this email directly, view it on GitHub
<#1072 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANDS4FVEJGJZI4GU7IQWKW32GGHYDAVCNFSM6AAAAABQK7FXP6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNJRGYYTEMBWGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
George W Vandenberghe
*Lynker Technologies at * NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2141
College Park, MD 20740
***@***.***
301-683-3769(work) 3017751547(cell)
|
Latest comparisons of SFS and GFS with 100,000 iterations and no 'mpi_allreduce' here: |
sorc/ncep_post.fd/UPP_PHYSICS.f
Outdated
do i=ista,iend | ||
if (j>1 .and. j<jm) then | ||
chi(i,j) = 0.25*(ptmp(i-1,j)+ptmp(i+1,j)+ptmp(i,j-1)+ptmp(i,j+1))-dtmp(i,j) | ||
edif=psi(i,j)-pval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KarinaAsmar-NOAA @JesseMeng-NOAA What is the 'pval' used for? I don't see it initialized or calculated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be
edif=chi(i,j)-ptmp(i,j)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WenMeng-NOAA The 'pval' is for evaluating errors across iterations. I missed adding it when restoring the allreduce, it is added now.
Here are the runtime tests for the offline post:
|
Here are the runtime test for the inline post with the UFS RT 'cpld_control_sfs' on Hera:
@DusanJovic-NOAA and @junwang-noaa Please let me know your comments on this PR. |
sorc/ncep_post.fd/UPP_PHYSICS.f
Outdated
exit | ||
endif | ||
enddo ! end of jjk loop for chi | ||
tc=mpi_wtime() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KarinaAsmar-NOAA Comment out the debug code from line 4967 to 4969.
sorc/ncep_post.fd/UPP_PHYSICS.f
Outdated
enddo ! end of jjk loop for psi | ||
! | ||
chi=0. | ||
tb=mpi_wtime() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KarinaAsmar-NOAA Comment out this line.
sorc/ncep_post.fd/UPP_PHYSICS.f
Outdated
! | ||
! poisson solver for psi and chi | ||
psi=0. | ||
ta=mpi_wtime() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KarinaAsmar-NOAA Comment out this line.
This PR adds the CPC-requested streamfunction and velocity potential at 200mb to SFS. It is meant to resolve [Issue #902 ] and update on the runtime issues from the previous PR #951 . Job scripts used for testing are in WCOSS2: /lfs/h2/emc/vpppg/noscrub/karina.asmar/vpot_strm/UPP (submit_run_gfsv16_wcoss2.sh and submit_run_sfs_wcoss2.sh).