Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change directory structure to better accommodate different data sources and drivers #17

Open
AndyMcAliley opened this issue Mar 24, 2022 · 2 comments
Assignees

Comments

@AndyMcAliley
Copy link
Contributor

Following up on this discussion, it will be cumbersome to add new data sources and drivers. The folders created when the pipeline is run could be fully organized by data source and driver type, but they are not.

For example, after the pipeline is run, the directory structure in 1_fetch looks like this:

├─out
├───dynamic_mntoha/
├───obs_mntoha/
├───lake_metadata.csv
├─tmp
├───dynamic_mntoha/
└───obs_mntoha/

Some issues with this organization system:

  1. No subfolders in tmp/ or out/. At best, future data sources must be identified based on a suffix (e.g. _mntoha). At worst, there is no suffix (as with lake_metadata.csv), so the situation is ripe for file collisions that result in a file being overwritten or used for the wrong data source.
  2. There's no distinction between NLDAS drivers and GCM drivers.
@AndyMcAliley
Copy link
Contributor Author

AndyMcAliley commented Mar 24, 2022

A better way to organize the files might be this:

├─out
├───mntoha
├─────nldas
├─────gcm_access
├─────gcm_gfdl
├─────gcm_... (four more GCM types)
├─────clarity
├─────ice_flags
├─────temperature_observations
├─────lake_metadata.csv
├───large_midwest_footprint
├─────nldas
├─────gcm_access
├─────gcm_gfdl
├─────gcm_... (four more GCM types)
├─────clarity
├─────ice_flags
├─────temperature_observations
├─────lake_metadata.csv
├─tmp
├───mntoha
├─────nldas
├─────gcm_access
├─────gcm_gfdl
├─────gcm_... (four more GCM types)
├─────clarity
├─────ice_flags
├─────temperature_observations
├───large_midwest_footprint
├─────nldas
├─────gcm_access
├─────gcm_gfdl
├─────gcm_... (four more GCM types)
├─────clarity
├─────ice_flags
├─────temperature_observations

@AndyMcAliley AndyMcAliley self-assigned this Mar 24, 2022
@lindsayplatt
Copy link

I have been using the suffix/prefix approach over in lake-temperature-out. I'm not super satisfied by it because you end up having to scroll through a lot of files, so I like the idea of a nested approach!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants