CUPiD_Kickoff_Meeting

Welcome to CUPiD Kickoff Meeting Notes!

Slide Deck

December 6, 2023

Who's in the room?

All the CGD sections, CESM, ESDS, ESMF, GeoCAT

Vision

Unified CESM diagnostics!

Goal #1: Bare-bones deployment

Minimal diagnostics generation for CESM3 development
Diagnostics output from all working groups
Feature-complete API: run inside CIME and outside CESM
Feature requests, github issues

Goal #2: CMIP PI run

Port NCL code (GeoCAT)
Time series generation
Data compression
Subset of CMORization
Climatology generation
Extensibility: Include outside packages

Goal #3: CMIP DECK experiments

CESM3 paper for PI control: publication quality figures, turns into paper introducing CESM 3 (PI control + historical)
Diagnostics for model forcing

Goal #4: High resolution / ensemble support

Initialization prediction, large ensembles

Project Organization

Zulip: #CESM-diagnostics for project discussion
Google group: announcements and meeting invites
Bi-weekly meetings starting in January 2024
Google drive
Github repository for code management
- Examples using NBscuid
- Projects containing the milestones/goals
- Issues defining tasks for each project, especially for bare bones deployments

Demo: NBscuid / ADF merge [Lev]

NBscuid: infrastructure that's running, engine that runs a collection of notebooks or scripts, inputs being case details, etc.
Demo is running ADF and MOM6 diagnostics
config.yml file: where to get notebooks that are running, and what notebooks to run, Jupyter Book config
template notebooks live in a specified directory: engine makes a copy of the notebooks, sticks in parameters (e.g., case name), and runs them
computed_notebooks directory: engine created notebooks for ADF and surface (MOM6)
can also configure python scripts to run
sharing the notebooks uses Jupyter Book via html files, can then copy those to CGD machines for online browsing (similar to ADF)
possible to generate images only for html (no code)? Yes, could generate png images only and then load them into a separate Jupyter Book with some markdown text for explanation. Could also hide notebook cells in presentation of Jupyter Book.
what does ADF do? raw HTML
Would it be problematic to include all the code used to generate the diagnostics?

Next steps

Component-specific goals for Milestone #1
Familiarize code/framework
Proposed next meeting: Jan 10 - deep dive into code
What is the starting point for your component? For example, using the CESM tutorial diagnostics as the launching point. And/or existing NCL-based diagnostics packages.
Each component develops tools in github repo to process output
102 years of coupled model output is available for testing (see slides)

Feedback

Are we missing anything?
Does the vision make sense?
Does the proposed path to get there make sense?

Discussion

Orhan: unstructured grids component of CESM (CAM), would bring Project Raijin into scope in addition to GeoCAT comp and viz. Project Raijin is a VAST/CISL project dedicated to unstructured grids; Python package UXarray; Orhan, Brian M. are co-PIs
Dave L.: Frequently want to compare more than 1 model case (ideally N cases); Jesse: ADF has that functionality
Dave B.: What does ADF do already? Timeseries generation is a key piece (compression, simplified output).
Dave B.: Had previously felt that the goal of diagnostics packages was not to create publication quality plots and instead diagnose the model output; Dave L.: publication quality can mean a good plot / easy to read, that would be helpful for model development not just publications
Do we point to CUPiD repo to support open data/code?
Anna: What is the scope here? Diagnosing a run like a quick overview with good plots, or something broader? Related to defining the starting point.
Jesse: For ADF, they decided on classes of plots (e.g., zonal means). Then also some specialized plots (e.g., QBO), but you can turn on/off.
Justin: To get started, each section can talk about what is important to them.
Jesse: Technical specs of this package? e.g., resolution of figures. Generation of plots is slow from ADF experience (can we make this faster?)
Katelyn: Saved intermediate processed data can help with going back to make publication quality figures
Dave B.: Saving netCDF files along the way - is this part of the process? Could also be in another format - some sort of intermediate file.
Brian D.: Separated out variables or spatial subsetting could help with I/O.
Kate: ADF already computes climatologies; the example showed a potentially repetitive process as in separate components creating intermediate files independently; there are several different ways to create climatologies - what's the best path forward? Thinking about CISM diagnostics
Brian D. is "working" on a single timeseries generation process that everyone can use; let Brian know if you want to be involved; goal is for an offline tool with conversion based on metadata from netCDF files; option to read history OR timeseries files?
Dave B.: single variable timeseries is different from generating a specific plot via climatology generation and/or integrated timeseries
Dave L.: ADF already has some of this functionality, needs to be abstracted out so that the other components can use it - seems like a high priority for this project
Jesse: ADF is modular, could use the upcoming timeseries generation tool
Lev: could see the data processing as a first step / being called first, and then passing it on; Dave L.: would need to know which variables a priori
Katelyn: can abstract under the hood
Isla: from co-chairs meeting - sampling uncertainty with short runs, uniform way across components?; Justin: could do it in ADF
Katelyn: start the conversations in a public forum (github issues? zulip channel?), abstract it later
Jesse: ADF could probably leverage GeoCAT functionality; Katelyn: could open an issue in a repo, but also set up follow-up meetings; Dave B.: timeline and resources? e.g., starting point for sea ice is 25 NCL scripts
Justin: ADF does run CVDP in the interim (NCL)
Dave L.: Can we improve on the scripts during the transition from NCL to python?; Brian M: anecdotal - 50K lines of NCL to 5K lines of Python
Orhan: this is related to essential goals of GeoCAT, challenge in prioritizing functions that are needed by the community; GeoCAT has limited resources and encourages open development framework
Jesse: engineering concerns - conda via NPL is ideal; computer resources allocation? especially using Dask in multiple notebooks at once; Mike: can pass cluster objects around; Lev: switched to individual spinning up of clusters, some questions about memory usage

General
- Home
Meeting Notes
Development Guides
- Roadmap

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUPiD_Kickoff_Meeting

December 6, 2023

Who's in the room?

Vision

Goal #1: Bare-bones deployment

Goal #2: CMIP PI run

Goal #3: CMIP DECK experiments

Goal #4: High resolution / ensemble support

Project Organization

Demo: NBscuid / ADF merge [Lev]

Next steps

Feedback

Discussion

Clone this wiki locally