Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDF/MOSAiC forcing options #500

Draft
wants to merge 45 commits into
base: main
Choose a base branch
from

Conversation

davidclemenssewall
Copy link
Contributor

@davidclemenssewall davidclemenssewall commented Sep 24, 2024

For detailed information about submitting Pull Requests (PRs) to the CICE-Consortium,
please refer to: https://github.com/CICE-Consortium/About-Us/wiki/Resource-Index#information-for-developers

PR checklist

  • Short (1 sentence) summary of your PR:
    Enable the ingestion of forcing data from the MOSAiC Expedition and provide more flexibility for ice and ocean initialization.
  • Developer(s):
    David Clemens-Sewall
  • Suggest PR reviewers from list in the column to the right.
  • Please copy the PR test results link or provide a summary of testing completed below.
    Icepack base_suite with baseline comparison to main branch:
    215 measured results of 215 total results
    215 of 215 tests PASSED
  • How much do the PR code changes differ from the unmodified code?
    • bit for bit
    • different at roundoff level
    • more substantial
  • Does this PR create or have dependencies on CICE or any other models?
    • Yes
    • No
  • Does this PR add any new test cases?
    • Yes
    • No
      • But we probably should add a standard case for the MOSAiC Expedition, to do so we'll need to add the forcing data to the Icepack standard
  • Is the documentation being updated? ("Documentation" includes information on the wiki or in the .rst files from doc/source/, which are used to create the online technical docs at https://readthedocs.org/projects/cice-consortium-cice/.)
    • Yes
    • No, does the documentation need to be updated at a later time?
      • Yes
      • No
  • Please document the changes in detail, including why the changes are made. This will become part of the PR commit log.
    The overall goal of these options added to the Icepack driver are to make it easier to use Icepack as a standalone tool forced with observations. These changes have been implemented specifically for the MOSAiC Expedition. But my hope is that they will be readily extendable to other field data (or reanalysis, etc), so that new datasets can simply be used in Icepack, rather than requiring code changes for every new forcing dataset.

Observational data presents the following complications:

  1. Missing values. Real observational datasets almost always have missing values (power loss, radiometer dome iced over, etc...). Furthermore, future versions of the observational datasets may fill some of these missing values (e.g., by reprocessing data that previously failed qc).
  2. Variable frequencies. Different data streams may be measured at different sampling frequencies. Data might be measured more often than the model's timestep (thus requiring aggregation) or less often (thus requiring interpolation).
  3. Different start and end time points. Different data streams may have different start or end times, and updated versions of datasets may extend the time domain.
  4. Merged observational datasets are a work-in-progress. The 3 challenges above all touch on the fact that for sea ice field campaigns, it can take years from the end of the campaign until a truly finalized observational dataset is available to force Icepack with (e.g., MOSAiC is 4 years and counting, SHEBA still does not have such a dataset). We still want to be able to use campaign data to support Icepack development during this time, without having to change the Icepack driver every time the forcing dataset is updated.

Additionally, for comparing with field data it is very useful to be able to easily adjust:

  1. The initial state of the sea ice and ocean mixed layer.
  2. Fixed values for the oceanic heat flux convergence (qdp) and sea surface salinity (sss).

To address these challenges the PR provides the following features:

  1. Precalculate the forcing values at each timestep on initialization (in init_forcing). The current paradigm for forcing data in Icepack is that when init_forcing is called, a subroutine for the specific dataset (e.g., atm_CFS) stores the forcing data in the *_data in essentially the same format that the raw data is present in, without timestamp information. It also hardcodes the maximum size that a forcing dataset can be to 8760. Then, at each timestep the get_forcing subroutine has a code block for each forcing dataset that contains the timestamp information and interpolates the forcing data to the given timestep. Under this paradigm, I could not figure out a graceful way to handle missing values, nor forcing data with higher sampling frequency than the Icepack timestep, nor changing start times of the forcing dataset. This PR adds a new option, precalc_forc, for use with the MOSAiC forcing. When precalc_forc = .true., the forcing data is aggregated/interpolated to the Icepack timestep in the call to init_forcing and stored the the *_data arrays. Then, the get_forcing subroutine merely indexes into the *_data arrays.
  2. Read forcing data from MDF-formatted netCDF files (Uttal et al. 2024; https://doi.org/10.5194/gmd-17-5225-2024, https://gitlab.com/mdf-makers/mdf-toolkit). I chose this format because its framework seemed well-aligned, but it is still in development so we can add variables to it that we need for the ice model (e.g., mixed layer depth).
  3. To handle variable frequencies and missing data, forcing data are first temporally-averaged for each timestep and then interpolated. For a give data variable (var_data), The MOSAiC_average subroutine takes the average of all forcing datapoints within each timestep +- 0.5 dt excluding missing values and stores the results in var_data. If there is no valid data within 0.5 dt of a given timestep (e.g., most timesteps if the model timestep is much smaller than the sampling period) then a missing value is placed in var_data(timestep). Then, the MOSAiC_interpolate subroutine linearly interpolates missing values in var_data.
  4. The option to explicitly set in the namelist the initial ice and snow thicknesses for the slab and ITD grid cells and the SST for the 3 ice grid cells.
  5. The option to explicitly set in the namelist fixed ocean mixed layer conditions qdp_fixed, sss_fixed, and hmix_fixed. These are overwritten by forcing data.

Outstanding issues:

  • Automatically detect cadence of the inputs and adjust accordingly (currently the code requires that the atmospheric forcing is at 1 minute cadence and that the oceanic forcing is at daily cadence)
  • It is not obvious to a user that setting the atmospheric forcing (atm_data_type = 'clim') can overwrite the oceanic heat flux convergence.
  • The atm_MOSAiC and ocn_MOSAiC subroutines are nearly identical except for the data variables. It would probably be cleaner to combine them into a single subroutine.
  • Perhaps input checking should be added to prevent users from setting precalc_forc = .true. with non MOSAiC forcings.
  • Update documentation to include:
    • How driver handles MDF formatted forcing
    • MOSAiC forcing
    • Initial ice and SST conditions in namelist
    • Fixed ocean mixed layer conditions in namelist
  • Add warning in log for if MOSAiC time domain includes leg 4-5 transition (observatory relocation)
  • Add MOSAiC test case
  • Add link to MOSAiC forcing data to wiki

davidclemenssewall and others added 28 commits May 8, 2023 13:55
Just updating for_igs to consortium main. Only conflict was in icedrv_init use section
@davidclemenssewall
Copy link
Contributor Author

@apcraig When I try running the base_suite on Windows Subsystem for Linux (-m conda -e linux), I fail the run stage for the conda_linux_smoke_col_1x1_alt03_debug_run1year and conda_linux_smoke_col_1x1_alt04_debug_run1year tests due to a floating-point exception. I've copied the relevant sections from the runlog below. Strangely, this error does not arise in the GH macos-testing. It also does not occur when I run these tests with the main branch locally on WSL. So it seems to be some combination of the platform and the code changes, although I cannot see how these code changes in the driver cause this error in the atmospheric stability. I was hoping you might have a suggestion on how I can fix this. Thanks!

Tail from the icepack runlog:

  Initial forcing data year =         2015
  Final   forcing data year =         2015
 Reading /root/icepack-dirs/input/Icepack_data/forcing/SHEBA/open_clos_lindsay.dat                                      


Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7f05dd5e951f in ???
#1  0x557a0eb656c2 in compute_stability_parameter
        at /root/code/CICE/Icepack/columnphysics/icepack_atmo.F90:992
#2  0x557a0eb67de2 in atmo_boundary_layer
        at /root/code/CICE/Icepack/columnphysics/icepack_atmo.F90:276
#3  0x557a0eb65d98 in __icepack_atmo_MOD_icepack_atm_boundary
        at /root/code/CICE/Icepack/columnphysics/icepack_atmo.F90:954
#4  0x557a0eaf4c59 in __icedrv_step_MOD_ocean_mixed_layer
        at /root/code/CICE/Icepack/configuration/driver/icedrv_step.F90:1251
#5  0x557a0ea69c25 in coupling_prep
        at /root/code/CICE/Icepack/configuration/driver/icedrv_RunMod.F90:303
#6  0x557a0ea6d57b in __icedrv_runmod_MOD_ice_step
        at /root/code/CICE/Icepack/configuration/driver/icedrv_RunMod.F90:221
#7  0x557a0ea6d6dc in __icedrv_runmod_MOD_icedrv_run
        at /root/code/CICE/Icepack/configuration/driver/icedrv_RunMod.F90:57
#8  0x557a0ea6944d in icedrv
        at /root/code/CICE/Icepack/configuration/driver/icedrv_MAIN.F90:51
#9  0x557a0ea69753 in main
        at /root/code/CICE/Icepack/configuration/driver/icedrv_MAIN.F90:28

@davidclemenssewall
Copy link
Contributor Author

Fixed the trailing whitespace, thanks for the shell command!

I agree that we should have some tests with this. I think at minimum there should be a run with the 'atmmosaic' and 'ocnmosaic' initialized in November 2019 with namelist options to set the initial snow and ice thicknesses and run through July of 2020. I'm not sure if it makes sense to include some tests of different physics options with this forcing as well? I don't think any of the other forcing options provide daily updates to qdp, sss, and hmix so perhaps having the new congelation growth option would be nice to test.

The forcing datasets are currently archived at ADC: https://arcticdata.io/catalog/view/doi%3A10.18739%2FA2GX44W6J. We expect to be updating the MOSAiC forcing and evaluation (or comparison) datasets for the next couple years at least. So I'm not sure if it makes sense to add it to the existing tarball, or add it as another row in the table on this page: https://github.com/CICE-Consortium/Icepack/wiki/Icepack-Input-Data. It is also a little bit different from the existing forcing datasets in that users would be encouraged to use it for scientific results as well as code testing. Perhaps we should have a meeting to discuss?

@apcraig
Copy link
Contributor

apcraig commented Sep 27, 2024

Thanks @davidclemenssewall.

If we are going to have some mosaic tests in the test suite, then we'll need to have a reasonably stable forcing dataset that we can install on all the test machines. We'll want to put it on zenodo and have it be part of the Icepack inputdata. Happy to have a broader discussion about how to best proceed.

@davidclemenssewall, if you could propose one (or a few) test cases that we could add to the test suite, that would be great. You can use multiple "options" including 'atmmosaic' and 'ocnmosaic' as needed and turn on the appropriate physics. You can create some set_nml if needed and add them to the base_suite.ts if you'd like. For our testing, we only really need enough test cases to cover the code productively. We do not need to cover all the various physics combinations that a user might find reasonable. We need to be somewhat clever to cover "everything" while still limiting the number of tests as much as possible.

@eclare108213
Copy link
Contributor

Outstanding issues:

  • Automatically detect cadence of the inputs and adjust accordingly (currently the code requires that the atmospheric forcing is at 1 minute cadence and that the oceanic forcing is at daily cadence)

If this is easy to implement, let's do it. Otherwise I'd rather merge the PR earlier and improve it later.

  • It is not obvious to a user that setting the atmospheric forcing (atm_data_type = 'clim') can overwrite the oceanic heat flux convergence.

This could be noted in the documentation and also as a warning in the diagnostics log file when namelist values are written out.

  • The atm_MOSAiC and ocn_MOSAiC subroutines are nearly identical except for the data variables. It would probably be cleaner to combine them into a single subroutine.

Your choice.

  • Perhaps input checking should be added to prevent users from setting precalc_forc = .true. with non MOSAiC forcings.

Definitely.

I also recommend adding the forcing to our datasets and creating a few options for testing, including one that represents a 'production' configuration. I will still recommend that our codes be used in coupled configurations to capture important feedbacks, but we can adjust our stern warnings in the Icepack docs to acknowledge the MOSAiC configuration as a tool to be used thoughtfully.

@dabail10
Copy link
Contributor

dabail10 commented Oct 9, 2024

HI all. Let's discuss this at the meeting tomorrow. I can understand that it would be nice to have this as a part of the Icepack forcing dataset on zenodo. However, the National Science Foundation requires the datasets be put somewhere like the Arctic Data Center where they do reside currently. This would mean duplicating the data on zenodo. Is there a way to have a "link" from zenodo to the ADC?

Copy link

@anton-seaice anton-seaice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this David! This will be super useful

As Elizabeth noted, it needs some documentation.

And when adding the link to the forcing data to the wiki, I think the DOI link is best:

e.g. https://doi.org/10.18739/A2GX44W6J

character(char_len), public :: &
atm_data_format, & ! 'bin'=binary or 'nc'=netcdf
ocn_data_format, & ! 'bin'=binary or 'nc'=netcdf
bgc_data_format, & ! 'bin'=binary or 'nc'=netcdf
atm_data_type, & ! 'default', 'clim', 'CFS'
ocn_data_type, & ! 'default', 'SHEBA'
atm_data_type, & ! 'default', 'clim', 'CFS', 'MOSAiC'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could use a more general name than MOSAIC for the new data type - its supports any 'MDF' file right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions Anton! @eclare108213 and @apcraig, what would you think about making the atm_data_type and ocn_data_type for this 'MDF' instead of 'MOSAiC'? Then I would also propose that we make the default path for these forcing data: /icepack-dirs/input/Icepack_data/forcing/MDF.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MDF works for me. But how will you distinguish between MOSAiC MDF data and some other kind of MDF data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the only place where it matters to the code that this is MOSAiC MDF data is when we check if the simulation time includes the MOSAiC leg 4 to leg 5 ship repositioning. For that check, we can check whether the forcing filename contains the string 'MOSAiC'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gah - thats an annoying quirk. I don't have a straightforward answer to that. This sounds like a different measurement site though? So should the experiment start again at that point with different inititial conditions ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have two different mosaic options. MOSAIC123 and MOSAIC45 for instance or some other name. These would be two separate forcing/ic options?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good idea. I have not worked much with the leg 5 (August-September near the North Pole) data but I will inquire about how we could have an appropriate initial condition.

Based on this discussion, I think it would be cleanest if the data archive contained separate files for legs 1-4 (October-July along transpolar drift) and leg 5 (August-September near the North Pole). I will inquire within the MOSAiC Model Forcing Dataset working group to see whether others agree and if we can update the data archive.

Comment on lines +190 to +195
! fixed ocean mixed layer properties (are overwritten by forcing data)
real (kind=dbl_kind), public :: &
sss_fixed , & ! Sea surface salinity (PSU)
qdp_fixed , & ! Deep ocean heat flux (negative upward, W/m^2)
hmix_fixed ! Mixed layer depth (m)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would be to do this in a different PR, so its clear what the scope of the change is (i.e. its not related to MOSAIC forcing data)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree that this would be best practice. But I admit that I'm loath to do so because the branch I originally made these modifications on is a bit stale and it doesn't feel like a productive use of time to sort through the merge conflicts and do all of the testing if the ultimate outcome for the code is the same. If others feel strongly about it I will try to get to it sometime next week. @apcraig @eclare108213 @dabail10 What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's worth the extra work at this point, to make a separate PR.

Comment on lines 1216 to 1218
call icedrv_system_abort(string=subname//&
' ERROR: only NetCDF input implemented for atm_MOSAiC', &
file=__FILE__,line=__LINE__)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's clearer if you put this at line 1087, if:

if (atm_data_format != 'nc') then ... abort

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. This will be fixed in the next push.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@davidclemenssewall
Copy link
Contributor Author

davidclemenssewall commented Oct 11, 2024

Outstanding issues:

  • Automatically detect cadence of the inputs and adjust accordingly (currently the code requires that the atmospheric forcing is at 1 minute cadence and that the oceanic forcing is at daily cadence)

If this is easy to implement, let's do it. Otherwise I'd rather merge the PR earlier and improve it later.

  • It is not obvious to a user that setting the atmospheric forcing (atm_data_type = 'clim') can overwrite the oceanic heat flux convergence.

This could be noted in the documentation and also as a warning in the diagnostics log file when namelist values are written out.

  • The atm_MOSAiC and ocn_MOSAiC subroutines are nearly identical except for the data variables. It would probably be cleaner to combine them into a single subroutine.

Your choice.

  • Perhaps input checking should be added to prevent users from setting precalc_forc = .true. with non MOSAiC forcings.

Definitely.

I also recommend adding the forcing to our datasets and creating a few options for testing, including one that represents a 'production' configuration. I will still recommend that our codes be used in coupled configurations to capture important feedbacks, but we can adjust our stern warnings in the Icepack docs to acknowledge the MOSAiC configuration as a tool to be used thoughtfully.

Latest commit addresses these with the exception of the warnings, documentation updates, and MOSAiC testing. Working on those currently...

@apcraig
Copy link
Contributor

apcraig commented Oct 13, 2024

Looks like this is going forward nicely. I am mostly comfortable with the proposals thus far. But I think it's a mistake to have a feature that works if "MOSAIC" is in the filename. Please don't implement something like that. We should create a new namelist option if we need to. For now, we can use MDF. If something comes along later that's MDF but not Mosaic, we'll create a new option for it called "BLAHBLAH" or "BBMDF" or something. Or if what we have now is specific to mosaic even though it's MDF, lets call it MOSAIC or MosaicMDF or something. If the implementation is general for MDF, lets use MDF. How general or specific is the current implementation, that should drive the name we pick for the namelist setting.

@davidclemenssewall
Copy link
Contributor Author

Looks like this is going forward nicely. I am mostly comfortable with the proposals thus far. But I think it's a mistake to have a feature that works if "MOSAIC" is in the filename. Please don't implement something like that. We should create a new namelist option if we need to. For now, we can use MDF. If something comes along later that's MDF but not Mosaic, we'll create a new option for it called "BLAHBLAH" or "BBMDF" or something. Or if what we have now is specific to mosaic even though it's MDF, lets call it MOSAIC or MosaicMDF or something. If the implementation is general for MDF, lets use MDF. How general or specific is the current implementation, that should drive the name we pick for the namelist setting.

I think that the right approach here is to break the MOSAiC forcing data into separate MDF files for legs 1-4 and leg 5. Then the forcing code can remain totally general for MDF, and we have two different set_env options for initializing the different legs as you suggested.

@davidclemenssewall davidclemenssewall changed the title Mosaic forcing clean MDF/MOSAiC forcing options Nov 11, 2024
@xcwang2019
Copy link

@davidclemenssewall @eclare108213 @dabail10 Sorry to reply the PR so late. Our group has constructed atmopheric and oceanic forcing files for Icepack using MOSAiC observations. The hourly forcing file is separated into two periods. The format is txt. Should I attach these files here or should I put it on zenodo and provide a link here? These forcing files were used on Icepack 1.1.0 and there is no code change involved. Thanks.

@eclare108213
Copy link
Contributor

@xcwang2019 Let's move discussion of your forcing data to issue #488.
@davidclemenssewall is this PR ready for review or are you still working on it?

@davidclemenssewall
Copy link
Contributor Author

@eclare108213 apologies for the delay here. I am waiting on some additional processed precipitation data that I would like to include in the next version of the MOSAiC forcing data. Once I have the updated dataset released I will request review here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants