New Velocity Potential and Stream Function Calculations #1072

KarinaAsmar-NOAA · 2024-10-21T20:26:17Z

This PR adds the CPC-requested streamfunction and velocity potential at 200mb to SFS. It is meant to resolve [Issue #902 ] and update on the runtime issues from the previous PR #951 . Job scripts used for testing are in WCOSS2: /lfs/h2/emc/vpppg/noscrub/karina.asmar/vpot_strm/UPP (submit_run_gfsv16_wcoss2.sh and submit_run_sfs_wcoss2.sh).

sorc/ncep_post.fd/CALCHIPSI.f

…for strm and vpot

KarinaAsmar-NOAA · 2024-11-21T19:34:11Z

@WenMeng-NOAA We have updated the computation for streamfunction and velocity potential for compatibility with UPP parallelization. The subroutine now uses a poisson solver with a convergence tester to reduce runtimes. Comparison of spectral and numerical results is here

GeorgeVandenberghe-NOAA · 2024-11-22T20:36:56Z

The poisson solver works but is substantially slower than the gather --> stptranf --> scatter operation to solve the equation spectrally. The gather is much cheaper than it sounds because it is TWO fields, (U and V) at one level, not the several hundred such fields that comprise state plus derivative variables. A gather of two fields takes about 0.06 seconds on hera at GFS (high) resolution and it hard to even measure at CFS resolution. The relaxations take about 10 seconds on hera for chi and psi together and less than a second for the spectral solver. Spectral solver takes 27 seconds for the GFS (high) resolution case while the poisson solver takes 161 seconds. Spectral solver takes less than a second for the CFS case while the poisson solver takes about three seconds.

sorc/ncep_post.fd/MDL2P.f

sorc/ncep_post.fd/UPP_PHYSICS.f

sorc/ncep_post.fd/MDL2P.f

WenMeng-NOAA · 2024-12-03T20:33:59Z

sorc/ncep_post.fd/COLLECT_LOC.f

@@ -81,6 +84,9 @@ SUBROUTINE COLLECT_LOC ( A, B )
      deallocate(buff)
      deallocate(rbufs)

+       tb=mpi_wtime()


@KarinaAsmar-NOAA Clean up the debugging code.

@GeorgeVandenberghe-NOAA Would you please clean up the debugging part of COLLECT_LOC.f? Let me know when done and I'll push it to this branch.

WenMeng-NOAA · 2024-12-03T20:34:42Z

sorc/ncep_post.fd/COLLECT_LOC.f

@@ -104,6 +110,8 @@ SUBROUTINE COLLECT_ALL ( A, B )
      real, dimension(im,jm), intent(out) :: b
      integer ierr,n
      real, allocatable :: rbufs(:)
+      real*8 tb,ta


@KarinaAsmar-NOAA Clean up the debugging code in this routine.

WenMeng-NOAA · 2024-12-17T18:22:26Z

@KarinaAsmar-NOAA Please update fortran code in the subroutine as following format:

all lowercase
indentation for loops, conditionals and blocks

WenMeng-NOAA · 2024-12-17T18:59:30Z

@DusanJovic-NOAA @junwang-noaa For your reference, I ran UFS-WM RT 'cpld_control_sfs_intel' on Hera:

develop branch,
run directory: /home/Wen.Meng/stmp2/FV3_RT/rt_449231/cpld_control_sfs_intel
inline post runtime: 5 seconds
develop branch, replace UPP version #PR 1072, include velocity potential and stream function generation
run directory: /home/Wen.Meng/stmp2/FV3_RT/rt_700773/cpld_control_sfs_intel
inline post runtime: 8.7 seconds

GeorgeVandenberghe-NOAA · 2024-12-17T20:16:43Z

The allreduce is numerically critical. What it is doing is checking for the maximum difference between the previous value of psi/chi and the result at the end of the next iteration. If the allreduce is left out, this difference is only evaluated on the subdomain associated with MPI rank 0. The allreduce does a max operation on this error on ALL of the ranks. Removing the allreduce will result in faster exit since convergence is only checked on rank 0 and a "bad" rank's larger errors will not be removed by further iterations. The ROOT problem with this whole poisson solver is the extremely slow convergence for all methods tried to date. Note, if we do get rid of the allreduce, rank 0 is one of the best ranks to test on because it's on a corner domain and likely where convergence IS slowest.

KarinaAsmar-NOAA · 2024-12-18T15:25:28Z

The allreduce is numerically critical. What it is doing is checking for the maximum difference between the previous value of psi/chi and the result at the end of the next iteration. If the allreduce is left out, this difference is only evaluated on the subdomain associated with MPI rank 0. The allreduce does a max operation on this error on ALL of the ranks. Removing the allreduce will result in faster exit since convergence is only checked on rank 0 and a "bad" rank's larger errors will not be removed by further iterations. The ROOT problem with this whole poisson solver is the extremely slow convergence for all methods tried to date. Note, if we do get rid of the allreduce, rank 0 is one of the best ranks to test on because it's on a corner domain and likely where convergence IS slowest.

@GeorgeVandenberghe-NOAA It seems that the mpi_allreduce call causes the inline post runs to crash (@WenMeng-NOAA , plese confirm if the inline keeps crashing after removal of allreduce). Removing it does not seem to affect the SFS results (see pptx).

GeorgeVandenberghe-NOAA · 2024-12-18T15:30:29Z

Crash with allreduce of a single word variable on all tasks just shouldn't happen. If it does, there is something else wrong, perhaps a fundamental error in the specified MPI communicator in allreduce or a problem (again fundamental) in one of the other arguments. a real8 variable with mpi_real specified as the type is one possibility I will check. BUt we're moving to a fixed number of iterations anyway and not testing for convergence so then allreduce is not needed.

…

On Wed, Dec 18, 2024 at 10:25 AM KarinaAsmar-NOAA ***@***.***> wrote: The allreduce is numerically critical. What it is doing is checking for the maximum difference between the previous value of psi/chi and the result at the end of the next iteration. If the allreduce is left out, this difference is only evaluated on the subdomain associated with MPI rank 0. The allreduce does a max operation on this error on ALL of the ranks. Removing the allreduce will result in faster exit since convergence is only checked on rank 0 and a "bad" rank's larger errors will not be removed by further iterations. The ROOT problem with this whole poisson solver is the extremely slow convergence for all methods tried to date. Note, if we do get rid of the allreduce, rank 0 is one of the best ranks to test on because it's on a corner domain and likely where convergence IS slowest. @GeorgeVandenberghe-NOAA <https://github.com/GeorgeVandenberghe-NOAA> It seems that the mpi_allreduce call causes the inline post runs to crash ( @WenMeng-NOAA <https://github.com/WenMeng-NOAA> , plese confirm if the inline keeps crashing after removal of allreduce). Removing it does not seem to affect the SFS results (see pptx <#1072 (comment)>). — Reply to this email directly, view it on GitHub <#1072 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANDS4FVEJGJZI4GU7IQWKW32GGHYDAVCNFSM6AAAAABQK7FXP6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNJRGYYTEMBWGQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- George W Vandenberghe *Lynker Technologies at * NOAA/NWS/NCEP/EMC 5830 University Research Ct., Rm. 2141 College Park, MD 20740 ***@***.*** 301-683-3769(work) 3017751547(cell)

KarinaAsmar-NOAA · 2024-12-23T14:17:30Z

@WenMeng-NOAA Would you please run your tests with the latest changes? Let me know if anything fails and/or if runtimes are not acceptable.

@KarinaAsmar-NOAA Your latest commit removed mpi_allreduce which significantly improved the runtime of the offline post. May I know the reason of this change?

@WenMeng-NOAA I removed it thinking that could be the issue that is causing the inline post to crash, since it is a collective operation requiring the reduced values to be called and distributed across all processes. We were using it to help the poisson loops converge. After removing it, the addition of the openMP directives and the reduced number of loops helped keep accuracy of results and reduce runtimes. You can see the SFS result comparison STRM_VPOT_Poisson_Results_v5.pptx. I haven't been able to test GFS on Hera with the spectral method, but once WCOSS2 is back up I can generate some results. If the inline post works for this test, then it was probably the mpi_allreduce that was causing the runtime/memory issue.

Latest comparisons of SFS and GFS with 100,000 iterations and no 'mpi_allreduce' here:
STRM_VPOT_Poisson_Results_v5.pptx

sorc/ncep_post.fd/UPP_PHYSICS.f

WenMeng-NOAA · 2024-12-27T16:54:47Z

sorc/ncep_post.fd/UPP_PHYSICS.f

+        do i=ista,iend
+          if (j>1 .and. j<jm) then
+            chi(i,j) = 0.25*(ptmp(i-1,j)+ptmp(i+1,j)+ptmp(i,j-1)+ptmp(i,j+1))-dtmp(i,j)
+            edif=psi(i,j)-pval


@KarinaAsmar-NOAA @JesseMeng-NOAA What is the 'pval' used for? I don't see it initialized or calculated?

should be
edif=chi(i,j)-ptmp(i,j)

@WenMeng-NOAA The 'pval' is for evaluating errors across iterations. I missed adding it when restoring the allreduce, it is added now.

WenMeng-NOAA · 2025-01-03T20:43:04Z

Here are the runtime tests for the offline post:

GFS(C768) 120 tasks || develop: 142 seconds  ||  PR #1072: 188 seconds
GEFS (C384) 48 tasks || develop: 52 seconds   || PR #1072: 77 seconds
SFS (C96) 48 tasks  ||  develop: 3.86 seconds   || PR #1072: 5.4 seconds

WenMeng-NOAA · 2025-01-03T21:06:17Z

Here are the runtime test for the inline post with the UFS RT 'cpld_control_sfs' on Hera:

Without  velocity potential and stream function generation: 5 seconds
   run directory: /scratch1/NCEPDEV/stmp2/Wen.Meng/FV3_RT/rt_449231-develop/cpld_control_sfs_intel
With velocity potential and stream function generation: 9 seconds
   run directory: /scratch1/NCEPDEV/stmp2/Wen.Meng/FV3_RT/rt_2703606-sfs/cpld_control_sfs_intel

@DusanJovic-NOAA and @junwang-noaa Please let me know your comments on this PR.

parm/post_avblflds.xml

…OAA/UPP into spectral_strm_vpot

WenMeng-NOAA · 2025-01-06T16:18:05Z

sorc/ncep_post.fd/UPP_PHYSICS.f

+            exit
+            endif
+      enddo    ! end of jjk loop for chi
+      tc=mpi_wtime()


@KarinaAsmar-NOAA Comment out the debug code from line 4967 to 4969.

WenMeng-NOAA · 2025-01-06T16:18:33Z

sorc/ncep_post.fd/UPP_PHYSICS.f

+      enddo    ! end of jjk loop for psi
+!
+      chi=0.
+      tb=mpi_wtime()


@KarinaAsmar-NOAA Comment out this line.

WenMeng-NOAA · 2025-01-06T16:19:02Z

sorc/ncep_post.fd/UPP_PHYSICS.f

+!
+! poisson solver for psi and chi 
+      psi=0.
+      ta=mpi_wtime()


@KarinaAsmar-NOAA Comment out this line.

add spectral strm and vpot at 200mb for SFS

26aaeb0

KarinaAsmar-NOAA requested a review from WenMeng-NOAA as a code owner October 21, 2024 20:26

WenMeng-NOAA reviewed Oct 21, 2024

View reviewed changes

sorc/ncep_post.fd/CALCHIPSI.f Outdated Show resolved Hide resolved

WenMeng-NOAA reviewed Oct 21, 2024

View reviewed changes

sorc/ncep_post.fd/CALCHIPSI.f Outdated Show resolved Hide resolved

WenMeng-NOAA added the On hold label Oct 22, 2024

KarinaAsmar-NOAA and others added 2 commits November 21, 2024 13:05

Merge branch 'NOAA-EMC:develop' into spectral_strm_vpot

6401b6f

replace full domain spectral with parallelized numerical computation …

21c43db

…for strm and vpot

KarinaAsmar-NOAA added 3 commits November 22, 2024 09:56

add modelname==GFS condition and update author documentation

7609b03

add authors calchipsi

31cc868

add endif for GFS model block

a5d8a77

Merge branch 'develop' into spectral_strm_vpot

3751656