-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce file size via 2-stream model initialization and DA cycling #5
Conversation
@junmeiban Thanks for detailed description in PR message! |
Hi @liujake, |
@junmeiban I think my comment about re-alignment was wrong. |
I have no strong opinion, but could changing the new namelist 'config_use_static' to 'config_use_2streams_ic' (initialization with two input streams) be better? |
The indentation changes are shown on github but not in the file, which was bothering me. I see now that the new lines you have added have leading spaces while the original lines have leading tabs. The usage of tabs versus spaces is not consistent in the |
I will change back to use tab key. |
I agree. “config_use_2streams_ic” is better than “config_use_static” which only emphasize “static” stream. |
Yes. I see indentation difference on github, not in file, and thought that is a github issue. So it is actually an issue of space vs. tab. |
@liujake Added. |
@junmeiban, thank you for the very helpful PR description. It's a great place for me to start to understand the changes and how to use them. I think this comment needs to be changed though:
The commit that is linked to is not actually in the v6 release, or at least it is not merged to either the Additionally, I would like for a similar PR to be issued to Did you build the bundle and run the ctests? Do we need any changes in other bundle repos, such as ctest yamls, streams.atmosphere, stream_list.atmosphere.*, etc..., to make this work? |
@jjguerrette I just built mpas-bundle with this feature branch successfully and will test hofx next. Junmei is testing it in cycling mode. Suggest we merge this merge once we confirm there is no effect to current restart cycling and we get comparable cycling results with 2-stream. ctest case in mpas-jedi repo can be added later after this model change is merged in. PR to the main MPAS model repo can also be done later (it is unlikely these changes will be able to be reviewed/merged quickly). We need to meet our DA needs first by merging this asap to this repo so everyone can benefit from it. |
I also made some edits in PR message. |
I want to make sure we are on the same page for the ctests, but in general agree. Ctests do not need to use the 2-stream setup, but they NEED to keep passing in order to be useful. Eventually, one or more ctests should use the 2-stream code just to make sure we are testing it. That does not need to be part of this PR. |
Tested on Cheyenne. All ctest passed |
All existing 48 ctests passed. |
… with "config_use_2streams_ic".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one comment on grammar for a comment in the Registry. Does this new functionality work with restart input as well? If not, should include that caveat in the description of config_use_2streams_ic
.
src/core_atmosphere/Registry.xml
Outdated
|
||
<nml_option name="config_use_2streams_ic" type="logical" default_value="false" in_defaults="false" | ||
units="-" | ||
description="Whether use separate_init_stream or not" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change to:
description="Whether to use separate init and static streams. Otherwise a single init stream is used for all fields."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about change to :
“Whether to use static and input streams.”
I prefer to use “input” instead of “init”, because the stream name is “input”.
Input stream (or you call it init stream) does not include time-invariant fields (the mesh, terrain variables), we should use input stream with static stream.
This new functionality does not work with restart input. In skamaroc/MPAS-Model@096b5d3, restart stream does not include the time-invariant variables, but in this PR, I have not touched restart stream. This is why this PR has no impact on restart mode run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with input
. Thank you for clarifying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, sorry to be very particular about this. I misunderstood your comment to only include the text, Whether to use static and input streams
. I think that is too short and not clear. For example, the descriptions of config_do_restart
and config_do_DAcycling
are much more clear because they speak to what is different about those initialization procedures. Maybe one sentence about the static
stream would suffice. In the PR description, you have Includes the mesh, some of sfc_input variables(landmask, shdmin, albedo12m, etc) and parameters for gravity wave drag over orography.
That information would be useful here as long as the terminology and variable names used in the description are consistent with the remainder of Registry.xml
.
This figure shows 6-h forecast from restart and 2-stream workflow cycling run verified against GFSANA and the results are comparable. Please see here fore more results :https://drive.google.com/drive/folders/1PqJZaxtUlQBtJQbfuKGYUWAe1og2oHj- |
The logic in mpas_atm_core.F and mpas_atm_core_interface.F would make more sense to me if it was like the following:
|
From the PR descriptions, my impression was that config_do_restart should be false for config_use_2streams_ic = true.
In mpas_atm_core_interface.F:
|
@jamiebresch, I haven't followed all the logic around. Does this mean we do not need the changes in |
Thank you @liujake for the explanation. I keep forgetting that an @junmeiban, thank you for your insight that the configuration option for 2-stream IC is not necessary. It took me getting away from this review to understand why this is the best option going forward. Following on to @jamiebresch's comment, the original commit (skamaroc@096b5d3) forced the use of the static stream for both restart and non-restart runs. It removed all static fields from both restart and init files. That gives us some indication of where the model developers are headed, but we will want to discuss more with them. Eliminating entirely the single-stream restart option, as that other commit did, can only happen after everyone in our group migrates to 2-stream (including me). Possibly we can set a goal for a follow-on PR to eliminate the single-stream I/O after our entire group has switched over. After the outstanding question about the land-use initialization is handled, I am ready to approve and move on. |
@jjguerrette As explained in the section 2 of PR message, landuse change is needed to keep mpasout-cycling DA's results comparable to restart-type cycling results for lower level T and Q. I was not aware of (skamaroc/MPAS-Model@096b5d3) branch also exclude static fields from restart stream, I think we do not need to worry about this now. |
Will one line change in line 256 of mpas_atmphys_landuse.F to |
My fundamental question is that what is the difference/impact on cycling between do_restart=true and do_restart=false. |
@jamiebresch, do you mean |
Ah @jjguerrette yes. |
@YaliWu0219 may see if that is a better way to achieve the same logic. I did not carefully look at the logic. |
I agree it sounds like a simpler fix, but I am not familiar with other 'landuse' variables involved after line 256. I was intended to keep |
@junmeiban @jjguerrette @byoung-joo At line 3169, 3173, 3174 of /glade/scratch/jban/pandac/JB01_VarBC_2stream_update/DA/2018041500/da.log, |
I think the most important thing is to make sure we have a consistent logic with the one previously extensively tested and ensure we can get the same results. Current change in 'landuse' code is a kind of 'hack' anyway to address the issue of worse lower level T/Q issue, may need some 'permanent' fix from model developer at some point. I think not worth time to revise the logic and make more tests. |
I think it's not a 2-stream issue. If we use restart, it also shows zero. But not sure what caused the zero value. |
I think we have a bug in the print method. It can not handle "integer" variable... |
Disregard my previous comment! I deleted it. Restart workflow requires no changes! |
I made a comparison with the logic suggested by @jamiebresch, which also requires adding the related surface emissivity variables to the mpasin and mpasout files in the Registry (see feature/use2stream_moreLandVars). There is a VERY small (if negligible) improvement for most surface variable backgrounds compared to GFS analyses (see here). Note that although the png files are labeled "0hr", they are actually 6hr forecasts. The differences are small, but in general adding the extra surface variables to the input/output streams (experiment labeled as I can extend the comparison to a longer period if desired. The above result is not surprising given the previous determination by @YaliWu0219 that the updated albedo values from the MPAS LSM that are included in the restart file ought to be included in the da-cycling files. The surface emissivity variables that are added in the |
Results look good to me. Then new logic is fine with me. |
Merging now. |
This PR consists of 4 changes: **1. Two-stream model initialization:** Currently, we use the restart file for MPAS-JEDI cycling. While we move to higher-resolution cycling experiments, we face the challenge of both memory usage and disk storage. This PR addresses the disk storage issue. The idea is: instead of using restart files, we move to use the file (init.nc type background/analysis file) that keeps only the necessary fields. Also split out static fields into a separate file. These v7-based changes are based off of a v6-based branch (skamaroc@096b5d3). **2. Allow comparable two-stream 'mpasout' cycling result with that of one-stream 'restart' cycling:** This PR also fixes the large T2m/Q2m and low-level T/Q error growth issue that shows up in 120-km cycling experiments with this 2-stream workflow, which is caused by the different initialization of land use fields when config_do_restart is true and false. After this change in mpas_atmphys_landuse.F, using this two-stream code gives comparable cycling results to the original restart cycling. **3. Add 4 more variables (pressure_base, pressure_p, u/v at cell center) in core_init_atmosphere/Registry.xml to ease init.nc file to be used for the model-space verification purpose.** **4. Commented out one line in mpas_init_atm_cases.F to fix the over-specification issue of sea ice.** Files changed: M src/core_atmosphere/Registry.xml M src/core_atmosphere/mpas_atm_core.F M src/core_atmosphere/mpas_atm_core_interface.F M src/core_atmosphere/physics/mpas_atmphys_landuse.F M src/core_init_atmosphere/Registry.xml M src/core_init_atmosphere/mpas_init_atm_cases.F **Details for two-stream model initialization and DA cycling:** - static stream: Includes the mesh, some of sfc_input variables(landmask, shdmin, albedo12m, etc) and parameters for gravity wave drag over orography. - da_state stream: Fields are specified in the MPAS-Atmosphere Registry. Includes fields needed for DA purposes (either analysis variables or fixed input needed for CRTM or other obs operators) - For cold start FC, both the static stream file and the input stream file should be set to the “init.nc” file produced by the init_atmosphere core; - For cycling run (FC), the input stream file should be the new da_state stream file that was previously written by the model and modified/updated in the DA cycle. (Both static and da_state stream need to be specified in the streams.atmosphere file) - For cycling run (DA): use static.nc to read in mesh fields (specified in the streams.atmosphere file) and set: config_do_restart = false - After implementing the capability, the size of the 120-km init.nc type background/analysis file is about 430M (double precision); ~50 variables (the size of original restart file is about 2GB). Memory usage also decreased when we separated the static stream: <img width="1256" alt="memory_usage" src="https://user-images.githubusercontent.com/23242703/90163085-dba06480-dd52-11ea-9d90-02ea7ffa5863.png"> - If users would like to use restart files in cycling DA, the only difference from before is in prepare “1stCycle_background” step, please see example: (pay attention to streams.atmosphere, namelist.atmosphere and run_fc2018041418.csh) script: /glade/work/jban/pandac/JB01_restart_1stCycle_background/run_fc2018041418.csh results:/glade/scratch/jban/pandac/JB01_restart_1stCycle_background - The modification has no impact on restart mode run. The following figure shows 6-h forecast from restart cycling run (current code: old; this PR :new) verified against GFSANA and there is no difference between them. ![JB01_restart-expmgfs_day0p25_NXTro_surface_pressure_RMS](https://user-images.githubusercontent.com/23242703/98064199-f690eb00-1e0e-11eb-8009-14070edeae57.png) - Tested on Cheyenne. All ctests passed.
This PR consists of 4 changes:
1. Two-stream model initialization:
Currently, we use the restart file for MPAS-JEDI cycling. While we move to higher-resolution cycling experiments, we face the challenge of both memory usage and disk storage. This PR addresses the disk storage issue.
The idea is: instead of using restart files, we move to use the file (init.nc type background/analysis file) that keeps only the necessary fields. Also split out static fields into a separate file. These v7-based changes are based off of a v6-based branch (skamaroc@096b5d3).
2. Allow comparable two-stream 'mpasout' cycling result with that of one-stream 'restart' cycling:
This PR also fixes the large T2m/Q2m and low-level T/Q error growth issue that shows up in 120-km cycling experiments with this 2-stream workflow, which is caused by the different initialization of land use fields when config_do_restart is true and false. After this change in mpas_atmphys_landuse.F, using this two-stream code gives comparable cycling results to the original restart cycling.
3. Add 4 more variables (pressure_base, pressure_p, u/v at cell center) in core_init_atmosphere/Registry.xml to ease init.nc file to be used for the model-space verification purpose.
4. Commented out one line in mpas_init_atm_cases.F to fix the over-specification issue of sea ice.
Files changed:
M src/core_atmosphere/Registry.xml
M src/core_atmosphere/mpas_atm_core.F
M src/core_atmosphere/mpas_atm_core_interface.F
M src/core_atmosphere/physics/mpas_atmphys_landuse.F
M src/core_init_atmosphere/Registry.xml
M src/core_init_atmosphere/mpas_init_atm_cases.F
Details for two-stream model initialization and DA cycling:
static stream: Includes the mesh, some of sfc_input variables(landmask, shdmin, albedo12m, etc) and parameters for gravity wave drag over orography.
da_state stream: Fields are specified in the MPAS-Atmosphere Registry. Includes fields needed for DA purposes (either analysis variables or fixed input needed for CRTM or other obs operators)
For cold start FC, both the static stream file and the input stream file should be set to the “init.nc” file produced by the init_atmosphere core;
For cycling run (FC), the input stream file should be the new da_state stream file that was previously written by the model and modified/updated in the DA cycle. (Both static and da_state stream need to be specified in the streams.atmosphere file)
For cycling run (DA): use static.nc to read in mesh fields (specified in the streams.atmosphere file) and set:
config_do_restart = false
After implementing the capability, the size of the 120-km init.nc type background/analysis file is about 430M (double precision); ~50 variables (the size of original restart file is about 2GB). Memory usage also decreased when we separated the static stream:
If users would like to use restart files in cycling DA, the only difference from before is in prepare “1stCycle_background” step, please see example: (pay attention to streams.atmosphere, namelist.atmosphere and run_fc2018041418.csh)
script: /glade/work/jban/pandac/JB01_restart_1stCycle_background/run_fc2018041418.csh
results:/glade/scratch/jban/pandac/JB01_restart_1stCycle_background
The modification has no impact on restart mode run. The following figure shows 6-h forecast from restart cycling run (current code: old; this PR :new) verified against GFSANA and there is no difference between them.
Tested on Cheyenne. All ctests passed.