Skip to content

CUPiD Meeting: 2024‐01‐10

Katie Dagon edited this page Jan 10, 2024 · 7 revisions

January 10, 2024

Agenda

  1. Deep-dive into the current code base

  2. Discussion

    • Planned starting point for each component
    • How do time series generation, CMORization, & remapping fit in the CUPiD framework?
  3. Upcoming events:

Slides

Notes

  • Deep-dive into code: config.yml and run.py

    • What information do we need for the config file and how should it be organized?
    • Could take some information from previous CESM postprocessing tools
    • ploomer is the task manager
    • near future: rename notebooks to be clear e.g., atmosphere, ocean
    • config file has some information about the jupyter book component
    • but you don't only have to use notebooks
    • what happens when ploomer gets interrupted? possibly some features for continuation
  • Planned starting point for each component

    • Update cupid_analysis.yml if you need additional Python packages (environment yml file)
    • Update config.yml
    • Add notebook templates and/or python scripts
    • Land: resources to getting started are a hurdle
    • Sea ice: started with ADF yml file, updated for sea ice variables and pointing to right directories, got it to work during first 2 steps (single variable timeseries and climatologies), TO DO: Python scripts to make the plots; updates here
    • Shared plotting routines: GeoCAT viz?
    • How to avoid duplicating effort especially when starting from very little (e.g., contour plots that apply to multiple components)
    • Atm/land share the same grid, could try to use ADF to plot land variables (right now ADF is specifying variables)
    • Land ice: start from ADF or use something else for netCDF file manipulation? Suggest modifying ADF code to generate timeseries first
    • TO DO: pull timeseries generation out of ADF so you don't need to run ADF to create it
    • Moving towards a single plotting method is not as essential if we have different ways of running things under a single framework (e.g., different notebooks) - could be a feature for advanced users
    • Ideal: a single CUPiD kernel
  • Time series generation, CMORization, remapping

    • Remapping: CAM-SE, CTSM grids, regionally-refined
    • Time series generation has two types: 1) single variable timeseries; 2) globally integrated timeseries - here we are referring to 1) but 2) is important for timeseries plots
    • Future meeting: does ADF timeseries generation work for other cases? Brian D./Nan have been talking to CISL about another method. Jesse: ADF is a Python wrapper around ncrcat. Brian D: Can you handle high res? Jesse: Have tested with 0.25deg, ~50 year runs. Parallelization is multiple ncrcat calls at once.
    • Can also consider time dimension subsets for ncrcat - could it be faster
    • File format questions too: zarr, nczarr, kerchunk
    • Does CUPiD become "the tool" or "a tool" for timeseries generation? "The tool" seems more long term maintainable
    • Relevant to the scientific Python ecosystem: pulling in different packages
    • CESM: standalone repos for the components, community identifies CESM as a whole
    • For the community: can separate repos but consider feasibility of use
    • Different needs for timeseries generation for different runs (e.g., large ensembles, CMIP production runs)
    • Need the tooling to be offline for existing datasets
    • Would be nice to fold CMORization under this framework as well, not relying on single person
    • Question: Do you create a timeseries for every variable or just the ones you plot?
    • Thoughts on a GUI tool? Used something previously in CMIP (CILC or ecFlow? General workflow managers). Most cases likely from the command line. Want to make it accessible to many users. During CMIP run cycle, likely more use for workflow managers. Sheri M. did some work here.
  • Plans for automated testing / CICD for CUPiD? When would this be set up?

    • Setting up github actions makes sense ASAP
    • yml "linting", trailing whitespace
    • Have github actions run on every PR?
  • Project board is now set up - open to comments

Next steps

  • Figure out how to specify the config.yml file
  • Help all the component models get started (see: collaborative work time at ESDS event on 1/18)
  • Figure out what plot types are needed
  • Test ADF timeseries generation for other cases (e.g., high res)
  • Building momentum and excitement for contributing to CUPiD (e.g., getting this to work with a CESM3 test run)
  • Demonstrating the value of this approach through a "dirty plot" and making it work
Clone this wiki locally