-
Notifications
You must be signed in to change notification settings - Fork 24
CUPiD_Kickoff_Meeting
Katie Dagon edited this page Dec 6, 2023
·
5 revisions
Welcome to CUPiD Kickoff Meeting Notes!
- All the CGD sections, CESM, ESDS, ESMF, GeoCAT
- Unified CESM diagnostics!
- Minimal diagnostics generation for CESM3 development
- Diagnostics output from all working groups
- Feature-complete API: run inside CIME and outside CESM
- Feature requests, github issues
- Port NCL code (GeoCAT)
- Time series generation
- Data compression
- Subset of CMORization
- Climatology generation
- Extensibility: Include outside packages
- CESM3 paper for PI control: publication quality figures, turns into paper introducing CESM 3 (PI control + historical)
- Diagnostics for model forcing
- Initialization prediction, large ensembles
- Zulip: #CESM-diagnostics for project discussion
- Google group: announcements and meeting invites
- Bi-weekly meetings starting in January 2024
- Google drive
- Github repository for code management
- Examples using NBscuid
- Projects containing the milestones/goals
- Issues defining tasks for each project, especially for bare bones deployments
- NBscuid: infrastructure that's running, engine that runs a collection of notebooks or scripts, inputs being case details, etc.
- Demo is running ADF and MOM6 diagnostics
-
config.yml
file: where to get notebooks that are running, and what notebooks to run, Jupyter Book config - template notebooks live in a specified directory: engine makes a copy of the notebooks, sticks in parameters (e.g., case name), and runs them
-
computed_notebooks
directory: engine created notebooks for ADF and surface (MOM6) - can also configure python scripts to run
- sharing the notebooks uses Jupyter Book via html files, can then copy those to CGD machines for online browsing (similar to ADF)
- possible to generate images only for html (no code)? Yes, could generate png images only and then load them into a separate Jupyter Book with some markdown text for explanation. Could also hide notebook cells in presentation of Jupyter Book.
- what does ADF do? raw HTML
- Would it be problematic to include all the code used to generate the diagnostics?
- Component-specific goals for Milestone #1
- Familiarize code/framework
- Proposed next meeting: Jan 10 - deep dive into code
- What is the starting point for your component? For example, using the CESM tutorial diagnostics as the launching point. And/or existing NCL-based diagnostics packages.
- Each component develops tools in github repo to process output
- 102 years of coupled model output is available for testing (see slides)
- Are we missing anything?
- Does the vision make sense?
- Does the proposed path to get there make sense?
- Orhan: unstructured grids component of CESM (CAM), would bring Project Raijin into scope in addition to GeoCAT comp and viz. Project Raijin is a VAST/CISL project dedicated to unstructured grids; Python package UXarray; Orhan, Brian M. are co-PIs
- Dave L.: Frequently want to compare more than 1 model case (ideally N cases); Jesse: ADF has that functionality
- Dave B.: What does ADF do already? Timeseries generation is a key piece (compression, simplified output).
- Dave B.: Had previously felt that the goal of diagnostics packages was not to create publication quality plots and instead diagnose the model output; Dave L.: publication quality can mean a good plot / easy to read, that would be helpful for model development not just publications
- Do we point to CUPiD repo to support open data/code?
- Anna: What is the scope here? Diagnosing a run like a quick overview with good plots, or something broader? Related to defining the starting point.
- Jesse: For ADF, they decided on classes of plots (e.g., zonal means). Then also some specialized plots (e.g., QBO), but you can turn on/off.
- Justin: To get started, each section can talk about what is important to them.
- Jesse: Technical specs of this package? e.g., resolution of figures. Generation of plots is slow from ADF experience (can we make this faster?)
- Katelyn: Saved intermediate processed data can help with going back to make publication quality figures
- Dave B.: Saving netCDF files along the way - is this part of the process? Could also be in another format - some sort of intermediate file.
- Brian D.: Separated out variables or spatial subsetting could help with I/O.
- Kate: ADF already computes climatologies; the example showed a potentially repetitive process as in separate components creating intermediate files independently; there are several different ways to create climatologies - what's the best path forward? Thinking about CISM diagnostics
- Brian D. is "working" on a single timeseries generation process that everyone can use; let Brian know if you want to be involved; goal is for an offline tool with conversion based on metadata from netCDF files; option to read history OR timeseries files?
- Dave B.: single variable timeseries is different from generating a specific plot via climatology generation and/or integrated timeseries
- Dave L.: ADF already has some of this functionality, needs to be abstracted out so that the other components can use it - seems like a high priority for this project
- Jesse: ADF is modular, could use the upcoming timeseries generation tool
- Lev: could see the data processing as a first step / being called first, and then passing it on; Dave L.: would need to know which variables a priori
- Katelyn: can abstract under the hood
- Isla: from co-chairs meeting - sampling uncertainty with short runs, uniform way across components?; Justin: could do it in ADF
- Katelyn: start the conversations in a public forum (github issues? zulip channel?), abstract it later
- Jesse: ADF could probably leverage GeoCAT functionality; Katelyn: could open an issue in a repo, but also set up follow-up meetings; Dave B.: timeline and resources? e.g., starting point for sea ice is 25 NCL scripts
- Justin: ADF does run CVDP in the interim (NCL)
- Dave L.: Can we improve on the scripts during the transition from NCL to python?; Brian M: anecdotal - 50K lines of NCL to 5K lines of Python
- Orhan: this is related to essential goals of GeoCAT, challenge in prioritizing functions that are needed by the community; GeoCAT has limited resources and encourages open development framework
- Jesse: engineering concerns - conda via NPL is ideal; computer resources allocation? especially using Dask in multiple notebooks at once; Mike: can pass cluster objects around; Lev: switched to individual spinning up of clusters, some questions about memory usage