This repository contains a collection of one or more tsdat
pipelines (as found under the pipelines
folder). This
enables related pipelines to be more easily maintained and run together. New pipelines can be added easily via
the cookiecutter template mechanism described below.
The repository is made up of the following core pieces:
-
runner.py
: Main entry point for running a pipeline. -
pipelines/*
: Collection of custom data pipelines usingtsdat
. -
pipelines/example_ingest
: An out-of-the-box exampletsdat
pipeline. -
templates/*
: Template(s) used to generate new pipelines. -
shared/*
: Shared configuration files that may be used across multiple pipelines. -
utils/*
: Utility scripts.
The following are required to develop a tsdat pipeline:
-
A GitHub account. Click here to create an account if you don't have one already
-
An Anaconda environment. We strongly recommend developing in an Anaconda Python environment to ensure that there are no library dependency issues. Click here for more information on installing Anaconda on your computer
Windows Users - You can install Anaconda directly to your Windows box OR you can run via a linux environment using the Windows Subsystem for Linux (WSL). See this tutorial on WSL for how to set up a WSL environment and attach VS Code to it.
You can create a new repository based upon the tsdat pipeline-template repository in GitHub:
-
Click this 'Use this template' link and follow the steps to copy the template repository into to your account.
NOTE: If you are looking to get an older version of the template, you will need to select the box next to 'Include all branches' and set the branch your are interested in as your new default branch.
-
On github click the 'Code' button to get a link to your code, then run
git clone <the link you copied>
from the terminal on your computer where you would like to work on the code.
-
Open an appropriate terminal shell from your computer
- If you are on Linux or Mac, just open a regular terminal
- If you are on Windows, start your Anaconda prompt if you installed Anaconda directly to Windows, OR open a WSL terminal if you installed Anaconda via WSL.
-
Run the following commands to create and activate your conda environment, where $REPOSITORY_ROOT represents the folder where you checked out your pipeline repository:
conda env create --file=conda-environment.yaml conda activate tsdat-pipelines
If you get the following warning message when running tsdat commands in your shell:
UserWarning: pyproj unable to set database path.
Then run the following additional commands to permanently remove this warning message:
conda remove --force pyproj pip install pyproj
-
Open the cloned repository in VS Code. (This repository contains default settings for VS Code that will make it much easier to get started quickly.)
-
Install the recommended extensions (there should be a pop-up in VS Code with recommendations).
-
Tell VS Code to use your new environment:
- Press
F1
to bring up the command pane in VS Code - Type
Python: Select Interpreter
and select it. - Select the newly-created
ingest
conda environment from the drop-down list.You may need to refresh the list (cycle icon in the top right) to see it.
- Bring up the command pane and type
Developer: Reload Window
to reload VS Code and ensure the settings changes propagate correctly.
- Press
-
Ensure your development environment is set up according to the instructions above
-
Use a cookiecutter template to generate a new pipeline folder. From your top level repository folder, run:
make cookies
The
make cookies
command is a memorable shortcut forcookiecutter templates/ingest -o pipelines
Cookiecutter will show some text in the prompts, but more information on these prompts can be found in the template README.md
-
Once cookiecutter is done you will see your new pipeline folder appear inside
pipelines/
. Please see the README.md file inside that folder for more information on how to configure, run, test, and debug your pipeline.
This repository supports adding as many pipelines as you want - just rinse and repeat the steps above.
- Learn more about
tsdat
:- GitHub: https://github.com/tsdat/tsdat
- Documentation: https://tsdat.readthedocs.io
- Data standards: https://github.com/tsdat/data_standards
- Learn more about
xarray
:- GitHub: https://github.com/pydata/xarray
- Documentation: https://xarray.pydata.org
- Learn more about 'pydantic':
- GitHub: https://github.com/samuelcolvin/pydantic/
- Documentation: https://pydantic-docs.helpmanual.io
- Other useful tools:
- VS Code: https://code.visualstudio.com/docs
- Docker: https://docs.docker.com/get-started/
pytest
: https://github.com/pytest-dev/pytestblack
: https://github.com/psf/blackmatplotlib
guide: https://realpython.com/python-matplotlib-guide/
Implementation of the mbari_wec
pipeline was extracted from:
- Sandia National Laboratories. (2021). MBARI WEC 2021 deployment [data set]. Retrieved from https://dx.doi.org/10.15473/1825670.