-
Notifications
You must be signed in to change notification settings - Fork 1
Repository Structure
This folder is a bit of a catch all for more admin/devops-related things. For now it contains two subfolders
This contains all files related to maintaining our run environments. This includes
- python requirements. We define our requirements in
requirements.in
, and compile them using pip-tools to resolve versions inrequirements.txt
. We also generate aconstraints.txt
which isrequirements.txt
but slightly reformatted so that it can be used bypip
as a constraints file. This constraints file is then used to ensure that all of our docker images, regardless of what packages are actually installed on that image, are using the same versions of each python package we use. The script that compiles them lives inadmin/ops
- a
docker
subfolder. This folder contains bash scripts, DockerFiles, and more that are used to manage our various docker images. In addition to being used for our dev container image for development, these images are used for most of our github actions. See more about our docker images here
This folder contains various narrow-scoped scripts that we use for various devops-related tasks.
Our apps folder contains any apps that we produce. For now, this is one - our QA streamlit app, which is deployed on Digital Ocean. Each app folder should contain all code necessary for running it and deploying it (outside of GitHub Actions)
This folder contains bash utilities that are used across product builds. We're increasingly moving away from our bash utilities in favor of managing control flow of our processes in python, but for now they're still used across the codebase.
dcpy
is our internal python package. Python is increasingly our language of choice for various parts of our product lifecycle, and dcpy
contains numerous submodules for things like utilities, connectors to third parties, and our orchestrating lifecycle
code. For more info, see dcpy
Various code-generated documentation
In products are one folder for each of our data products (and an extra one - "template" is our sandbox data product for testing out new workflows and technology.
Each of these folders contains all information and code needed to build a product. The goal is for this to really be two things
- a recipe file. This is a yaml file used by
dcpy.lifecycle.builds
to resolve versions of source datasets and load them into our build engine database - transformation logic. We're moving in the direction of this being sql files (postgres) that are run by dbt, but have a variety of structures and approaches across our products at the moment. In addition, every product still has some amount of bash scripting specific to that product, be it for running specific transformation steps (specifying order of sql files for many products), or generating export files. This logic will likely be moved eventually to dcpy as well, so that our product definitions can really be just two things - declarative metadata/instructions in yaml, and actual transformation logic in sql.