2024‐07‐03

July 3, 2024

Agenda:

X4C presentation
Observational Dataset Location

Notes

x4c: Xarray for CESM

typical Earth System Data Analysis tasks may involve regridding, projection, seasonalization, data selection, deduced variables, geospatial averaging

Xarray is a super useful tool that also requires nontrivial programming skills and efforts --> motivation for x4c

x4c emphasizes intuitive coding for CESM purposes and liberates scientist from technical details in order to facilitate scientific thinking

da.x.plot() creates publication ready plots (comparison to da.plot() in xarray)

x4c includes regridding capabilities which leverage xESMF

annualization features can also support a list of months over which to annualize as well as a default calendar year

goespatial averaging (geo_mean) can support global means or a square range in lat/lon

GeoCAT is leveraged within x4c for vertical interpolation with ds.x.get_plev()

CESM "timeseries case" system assumes default timeseries directory structure is provided and helps users to more easily load in particular variable names with the expected structure

the CESM diagnostic 'spell' system can be a simplified way of generating plots with CESM output, eg 'TS:ann:gm' will describe surface temperature annual data with a global mean.

Xarray extension fundamental features: regrid, get_plev, annualize, geo_mean, plot

Advanced features: load, get_paths, check_timespan

High-level workflows: gen_climo

Slides from Feng Zhu were emailed out

Discussion

DB: Are mapping files online? FZ: Weighting files can be hosted on github

SL/WW: UXarray allows regridding to happen in a hidden way from user's perspective. Can UXarray be used here to facilitate regridding and avoid relying on mapping files? JN: UXarray maybe needs mapping file? BM: UXarray needs connectivity files (eg a script file or ESMF meshfile); they currently have two methods (nearest neighbor & inverse distance weighted sort of method), so not doing interpolation, and thus is different from ESMF methods.

ML/BM: if you select [12, 1, 2] is this selecting the consecutive December? FZ: Yes.

CESM Observational Dataset Locations

CESM input data repository may be where this ends up being located

We'd want to find space on disk. There is a limited amount of space-- what are we willing to host? At least key metric related data should be available.

IS: preference for doing computations within notebooks vs saving a small file that contains a variable processed to fit the CESM grid.

ML: From CUPiD standpoint, it may make sense to pre-process data

DB: Providing a link to a dataset will result in numerous users having copies of the data, whereas if the data is saved in a particular directory, it can be accessed locally instead of having everyone download it.

ML: CUPiD will eventually bring in CIME as an external and will have access to the machines file in order to facilitate bringing in inputdata

BM: Highly processed files have the advantage of being small and also distributable; be aware of what we make publicly available

JN: could we link to all of the datasets from somewhere on campaign? It would be good to know how much data we're working with before trying to make a new place to store data.

DB: RDA directories link to eg ERA5, we should be aware of this as a potential duplication

Summary: Strategy: keep gathering info on where data is stored, then soft link from campaign, get a sense of directory sizes, and then go from there

General
- Home
Meeting Notes
Development Guides
- Roadmap

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2024‐07‐03

July 3, 2024

Agenda:

Notes

Clone this wiki locally