Skip to content

2024‐07‐03

Teagan King edited this page Jul 11, 2024 · 4 revisions

July 3, 2024

Agenda:

  1. X4C presentation
  2. Observational Dataset Location

Notes

  • typical Earth System Data Analysis tasks may involve regridding, projection, seasonalization, data selection, deduced variables, geospatial averaging
  • Xarray is a super useful tool that also requires nontrivial programming skills and efforts --> motivation for x4c
  • x4c emphasizes intuitive coding for CESM purposes and liberates scientist from technical details in order to facilitate scientific thinking
  • da.x.plot() creates publication ready plots (comparison to da.plot() in xarray)
  • x4c includes regridding capabilities which leverage xESMF
  • annualization features can also support a list of months over which to annualize as well as a default calendar year
  • goespatial averaging (geo_mean) can support global means or a square range in lat/lon
  • GeoCAT is leveraged within x4c for vertical interpolation with ds.x.get_plev()
  • CESM "timeseries case" system assumes default timeseries directory structure is provided and helps users to more easily load in particular variable names with the expected structure
  • the CESM diagnostic 'spell' system can be a simplified way of generating plots with CESM output, eg 'TS:ann:gm' will describe surface temperature annual data with a global mean.
  • Xarray extension fundamental features: regrid, get_plev, annualize, geo_mean, plot
  • Advanced features: load, get_paths, check_timespan
  • High-level workflows: gen_climo
  • Slides from Feng Zhu were emailed out
  • Discussion
  • DB: Are mapping files online? FZ: Weighting files can be hosted on github
  • SL/WW: UXarray allows regridding to happen in a hidden way from user's perspective. Can UXarray be used here to facilitate regridding and avoid relying on mapping files? JN: UXarray maybe needs mapping file? BM: UXarray needs connectivity files (eg a script file or ESMF meshfile); they currently have two methods (nearest neighbor & inverse distance weighted sort of method), so not doing interpolation, and thus is different from ESMF methods.
  • ML/BM: if you select [12, 1, 2] is this selecting the consecutive December? FZ: Yes.
  • CESM Observational Dataset Locations
  • CESM input data repository may be where this ends up being located
  • We'd want to find space on disk. There is a limited amount of space-- what are we willing to host? At least key metric related data should be available.
  • IS: preference for doing computations within notebooks vs saving a small file that contains a variable processed to fit the CESM grid.
  • ML: From CUPiD standpoint, it may make sense to pre-process data
  • DB: Providing a link to a dataset will result in numerous users having copies of the data, whereas if the data is saved in a particular directory, it can be accessed locally instead of having everyone download it.
  • ML: CUPiD will eventually bring in CIME as an external and will have access to the machines file in order to facilitate bringing in inputdata
  • BM: Highly processed files have the advantage of being small and also distributable; be aware of what we make publicly available
  • JN: could we link to all of the datasets from somewhere on campaign? It would be good to know how much data we're working with before trying to make a new place to store data.
  • DB: RDA directories link to eg ERA5, we should be aware of this as a potential duplication
  • Summary: Strategy: keep gathering info on where data is stored, then soft link from campaign, get a sense of directory sizes, and then go from there
Clone this wiki locally