-
Notifications
You must be signed in to change notification settings - Fork 24
2024‐07‐03
Teagan King edited this page Jul 11, 2024
·
4 revisions
- X4C presentation
- Observational Dataset Location
- typical Earth System Data Analysis tasks may involve regridding, projection, seasonalization, data selection, deduced variables, geospatial averaging
- Xarray is a super useful tool that also requires nontrivial programming skills and efforts --> motivation for x4c
- x4c emphasizes intuitive coding for CESM purposes and liberates scientist from technical details in order to facilitate scientific thinking
- da.x.plot() creates publication ready plots (comparison to da.plot() in xarray)
- x4c includes regridding capabilities which leverage xESMF
- annualization features can also support a list of months over which to annualize as well as a default calendar year
- goespatial averaging (geo_mean) can support global means or a square range in lat/lon
- GeoCAT is leveraged within x4c for vertical interpolation with ds.x.get_plev()
- CESM "timeseries case" system assumes default timeseries directory structure is provided and helps users to more easily load in particular variable names with the expected structure
- the CESM diagnostic 'spell' system can be a simplified way of generating plots with CESM output, eg 'TS:ann:gm' will describe surface temperature annual data with a global mean.
- Xarray extension fundamental features: regrid, get_plev, annualize, geo_mean, plot
- Advanced features: load, get_paths, check_timespan
- High-level workflows: gen_climo
- Slides from Feng Zhu were emailed out
- Discussion
- DB: Are mapping files online? FZ: Weighting files can be hosted on github
- SL/WW: UXarray allows regridding to happen in a hidden way from user's perspective. Can UXarray be used here to facilitate regridding and avoid relying on mapping files? JN: UXarray maybe needs mapping file? BM: UXarray needs connectivity files (eg a script file or ESMF meshfile); they currently have two methods (nearest neighbor & inverse distance weighted sort of method), so not doing interpolation, and thus is different from ESMF methods.
- ML/BM: if you select [12, 1, 2] is this selecting the consecutive December? FZ: Yes.
- CESM Observational Dataset Locations
- CESM input data repository may be where this ends up being located
- We'd want to find space on disk. There is a limited amount of space-- what are we willing to host? At least key metric related data should be available.
- IS: preference for doing computations within notebooks vs saving a small file that contains a variable processed to fit the CESM grid.
- ML: From CUPiD standpoint, it may make sense to pre-process data
- DB: Providing a link to a dataset will result in numerous users having copies of the data, whereas if the data is saved in a particular directory, it can be accessed locally instead of having everyone download it.
- ML: CUPiD will eventually bring in CIME as an external and will have access to the machines file in order to facilitate bringing in inputdata
- BM: Highly processed files have the advantage of being small and also distributable; be aware of what we make publicly available
- JN: could we link to all of the datasets from somewhere on campaign? It would be good to know how much data we're working with before trying to make a new place to store data.
- DB: RDA directories link to eg ERA5, we should be aware of this as a potential duplication
- Summary: Strategy: keep gathering info on where data is stored, then soft link from campaign, get a sense of directory sizes, and then go from there