Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable global-workflow to run C768C384 GSI on Gaea-C5 #2990

Draft
wants to merge 16 commits into
base: develop
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 45 additions & 3 deletions env/GAEA.env
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,11 @@ export mpmd_opt="--multi-prog --output=mpmd.%j.%t.out"
export OMP_STACKSIZE=2048000
export NTHSTACK=1024000000

ulimit -s unlimited
ulimit -a
# Setting stacksize to unlimited on login nodes is prohibited
if [[ -n "${SLURM_JOB_ID:-}" ]]; then
ulimit -s unlimited
ulimit -a
fi

# Calculate common variables
# Check first if the dependent variables are set
Expand Down Expand Up @@ -71,7 +74,28 @@ elif [[ "${step}" = "sfcanl" ]]; then
export NTHREADS_CYCLE=${threads_per_task:-14}
export APRUN_CYCLE="${APRUN_default} --cpus-per-task=${NTHREADS_CYCLE}"

elif [[ "${step}" = "fcst" ]]; then
elif [[ "${step}" = "eobs" ]]; then

export MKL_NUM_THREADS=4
export MKL_CBWR=AUTO

export NTHREADS_GSI=${NTHREADSmax}
export APRUN_GSI="${APRUN_default} --cpus-per-task=${NTHREADS_GSI}"

export CFP_MP=${CFP_MP:-"YES"}
export USE_CFP=${USE_CFP:-"YES"}
export APRUNCFP="${launcher} -n \$ncmd ${mpmd_opt}"

elif [[ "${step}" = "eupd" ]]; then

export NTHREADS_ENKF=${NTHREADSmax}
export APRUN_ENKF="${launcher} -n ${ntasks_enkf:-${ntasks}} --cpus-per-task=${NTHREADS_ENKF}"

export CFP_MP=${CFP_MP:-"YES"}
export USE_CFP=${USE_CFP:-"YES"}
export APRUNCFP="${launcher} -n \$ncmd ${mpmd_opt}"

elif [[ "${step}" = "fcst" ]] || [[ "${step}" = "efcs" ]]; then

(( nnodes = (ntasks+tasks_per_node-1)/tasks_per_node ))
(( ufs_ntasks = nnodes*tasks_per_node ))
Expand All @@ -93,6 +117,24 @@ elif [[ "${step}" = "oceanice_products" ]]; then
export NTHREADS_OCNICEPOST=${NTHREADS1}
export APRUN_OCNICEPOST="${launcher} -n 1 --cpus-per-task=${NTHREADS_OCNICEPOST}"

elif [[ "${step}" = "ecen" ]]; then

export NTHREADS_ECEN=${NTHREADSmax}
export APRUN_ECEN="${APRUN_default} --cpus-per-task=${NTHREADS_ECEN}"

export NTHREADS_CHGRES=${threads_per_task_chgres:-12}
[[ ${NTHREADS_CHGRES} -gt ${max_tasks_per_node} ]] && export NTHREADS_CHGRES=${max_tasks_per_node}
export APRUN_CHGRES="time"

export NTHREADS_CALCINC=${threads_per_task_calcinc:-1}
[[ ${NTHREADS_CALCINC} -gt ${max_threads_per_task} ]] && export NTHREADS_CALCINC=${max_threads_per_task}
export APRUN_CALCINC="${APRUN_default} --cpus-per-task=${NTHREADS_CALCINC}"

elif [[ "${step}" = "epos" ]]; then

export NTHREADS_EPOS=${NTHREADSmax}
export APRUN_EPOS="${APRUN_default} --cpus-per-task=${NTHREADS_EPOS}"

elif [[ "${step}" = "fit2obs" ]]; then

export NTHREADS_FIT2OBS=${NTHREADS1}
Expand Down
60 changes: 59 additions & 1 deletion parm/config/gfs/config.resources.GAEA
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@

case ${step} in
"prep")
# Run on two nodes (requires ~400GB total)
tasks_per_node=7
;;

Expand All @@ -21,6 +20,65 @@ case ${step} in
esac
;;

"eupd")
# update ntasks to 80 and threads_per_task to 20
case ${CASE} in
Fixed Show fixed Hide fixed
"C768")
export ntasks=80
export threads_per_task=20
;;
*)
;;
esac
export tasks_per_node=$(( max_tasks_per_node / threads_per_task ))
;;

"analcalc")
# decrease tasks_per_node 127 to 64
case ${CASE} in
Fixed Show fixed Hide fixed
"C768")
export tasks_per_node=64
;;
*)
;;
esac
;;

"upp")
# decrease tasks_per_node 120 to 60
case ${CASE} in
Fixed Show fixed Hide fixed
"C768")
export tasks_per_node=60
;;
*)
;;
esac
;;

"fcst")
# increase WRTTASK_PER_GROUP_PER_THREAD_PER_TILE_{GDAS,GFS}
case ${CASE} in
"C768")
export WRTTASK_PER_GROUP_PER_THREAD_PER_TILE_GDAS=20
export WRTTASK_PER_GROUP_PER_THREAD_PER_TILE_GFS=25
(( WRTTASK_PER_GROUP_PER_THREAD_GDAS = WRTTASK_PER_GROUP_PER_THREAD_PER_TILE_GDAS * 6 ))
(( WRTTASK_PER_GROUP_PER_THREAD_GFS = WRTTASK_PER_GROUP_PER_THREAD_PER_TILE_GFS * 6 ))
export WRTTASK_PER_GROUP_PER_THREAD_GDAS
export WRTTASK_PER_GROUP_PER_THREAD_GFS
DavidHuber-NOAA marked this conversation as resolved.
Show resolved Hide resolved
(( ntasks_quilt_gdas = WRITE_GROUP_GDAS * WRTTASK_PER_GROUP_PER_THREAD_GDAS ))
(( ntasks_quilt_gfs = WRITE_GROUP_GFS * WRTTASK_PER_GROUP_PER_THREAD_GFS ))
export ntasks_quilt_gdas
export ntasks_quilt_gfs
Comment on lines +64 to +71
Copy link
Contributor

@DavidHuber-NOAA DavidHuber-NOAA Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aerorahul Since this is duplicating calculations from config.ufs, I wonder if these should be moved from config.ufs to config.resources OR if we should allow the basic variables (i.e. WRTTASK_PER_GROUP_PER_THREAD_PER_TILE_${RUN}) to be predefined in config.ufs. For instance, changing

export WRITE_GROUP_GDAS=2
export WRTTASK_PER_GROUP_PER_THREAD_PER_TILE_GDAS=15
export WRITE_GROUP_GFS=4
export WRTTASK_PER_GROUP_PER_THREAD_PER_TILE_GFS=20 #Note this should be 10 for WCOSS2

to

          export WRITE_GROUP_GDAS=${WRITE_GROUP_GDAS:-2}
          export WRTTASK_PER_GROUP_PER_THREAD_PER_TILE_GDAS=${WRTTASK_PER_GROUP_PER_THREAD_PER_TILE_GDAS:-15}
          export WRITE_GROUP_GFS=${WRITE_GROUP_GFS:-4}
          export WRTTASK_PER_GROUP_PER_THREAD_PER_TILE_GFS=${WRTTASK_PER_GROUP_PER_THREAD_PER_TILE_GFS:-20}

Then, instead of re-sourcing config.resources, Alex would re-source config.ufs, then config.resources, while only defining WRTTASK_PER_GROUP_PER_THREAD_PER_TILE_${RUN}. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Morning @WalterKolczynski-NOAA @aerorahul C5 is down probably through tomorrow (Wed) but just wanted to check if you had a chance to look at this. I think the less we have to calculate in config.resources.$MACH the better. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I favor the second approach of allowing overrides, but want to hear from @aerorahul

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DavidHuber-NOAA's recommendation makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @WalterKolczynski-NOAA and @aerorahul. Hello @DavidHuber-NOAA ...My hands are tied with the C5 OS upgrade, but I was looking through this and writing out all the steps..Everything looks good for your second option until the re-sourcing of config.ufs. In config.fcst, config.ufs is sourced via source "${EXPDIR}/config.ufs" ${string} where string is configured based on which model components are used, i.e., ocn, ice, wave, etc. In order to get the string option into config.ufs, I believe I should re-source config.fcst which will do the re-sourcing of config.ufs. Let me know your thoughts. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DavidBurrows-NCO Yes, you are correct about sourcing config.fcst -- that's the way to go about it rather than config.ufs.

if [[ "${gaea_sourced_resources:-false}" == false ]]; then
export gaea_sourced_resources=true
source "${EXPDIR}/config.resources" "${step}"
fi
;;
*)
;;
esac
;;

*)
;;

Expand Down
Loading