PyTorch - an open-source deep learning framework primarily developed by Facebook's AI Research lab (FAIR). It provides a flexible and dynamic computational graph computation model, making it popular among researchers and developers for building and training deep neural networks.
PyTorch Lightning - a lightweight PyTorch wrapper that simplifies the process of building, training, and deploying complex deep learning models. It provides a high-level interface and abstractions that abstract away boilerplate code, making it easier for researchers and practitioners to focus on experimenting with and improving their models rather than dealing with low-level implementation details.
Hydra - a framework for elegantly configuring complex applications. The key feature is the ability to dynamically create a hierarchical configuration by composition and override it through config files and the command line.
The directory structure looks like this:
├── data <- Project data
│ └── raster_libraries <- Folder holding sets of individual rasters per CMA
| ├── maniac_mini_raster_library <- Raster Library for maniac_mini example
│ └── ...
├── docker <- Docker scripts to build images / run containers
│
├── logs <- Logs generated by hydra and lightning loggers
├── sri_maper <- Primary source code folder for MAPER
│ ├── ckpts <- Optional folder to hold pretrained models (if not in logs)
│ │
│ ├── configs <- Hydra configs
│ │ ├── callbacks <- Callbacks configs
│ │ ├── data <- Data configs
│ │ ├── debug <- Debugging configs
│ │ ├── experiment <- Experiment configs
│ │ ├── extras <- Extra utilities configs
│ │ ├── hparams_search <- Hyperparameter search configs
│ │ ├── hydra <- Hydra configs
│ │ ├── logger <- Logger configs
│ │ ├── model <- Model configs
│ │ ├── paths <- Project paths configs
│ │ ├── preprocess <- Preprocessing configs
│ │ ├── trainer <- Trainer configs
│ │ │
│ │ ├── __init__.py <- python module __init__
│ │ ├── test.yaml <- Main config for testing
│ │ └── train.yaml <- Main config for training
│ │
│ ├── notebooks <- Jupyter notebooks
│ │
│ ├── src <- Source code
│ │ ├── data <- Data code
│ │ ├── models <- Model code
│ │ ├── utils <- Utility code
│ │ │
│ │ ├── __init__.py <- python module __init__
│ │ ├── map.py <- Run mapping via CLI
│ │ ├── pretrain.py <- Run pretraining via CLI
│ │ ├── test.py <- Run testing via CLI
│ │ └── train.py <- Run training via CLI
│ │
│ ├── __init__.py <- python module __init__
│
├── .gitignore <- List of files ignored by git
├── LICENSE.txt <- License for code repo
├── project_vars.sh <- Project variables for infrastructure
├── setup.py <- File for installing project as a package
└── README.md
This repo is compatible with running locally, on docker locally, or on docker in a Kubernetes cluster. Please follow the corresponding instrcutions exactly, carefully so install is smooth. Once you are familiar with the structure, you can make changes. NOTE - the repo is currently DEPENDENT on having live CDR and StatMagic instances to recieve the inputs necessary to run as a server.
This setup presents the easiest installation but is more brittle than using docker containers. Please make a virtual environment of your choosing, source the environment, clone the repo, and install the code using setup.py
. Below are example commands to do so.
# creates and activates virtual environment
conda create -n [VIRTUAL_ENV_NAME] python=3.10
conda activate [VIRTUAL_ENV_NAME]
# clone repo source code locally
git clone https://github.com/DARPA-CRITICALMAAS/sri-ta3.git
cd sri-ta3
# sets environment variables
source project_vars.sh
# installs from source code
python3 -m pip install -e .
If installation succeeded without errors, you should be able to run the SRI TA3 server. Skip to SRI TA3 server.
This setup is slightly more involved but provides more robustness across physical devices by using docker. We've written convenience bash scripts to make building and running the docker container much eaiser. First, clone the repo locally.
# clone repo source code locally
git clone https://github.com/DARPA-CRITICALMAAS/sri-ta3.git
cd sri-ta3
Next, edit the variables in project_vars.sh relevant to your use case. Typically, one needs to edit JOB_TAG
REPO_HOST
, DUSER
, WANDB_API_KEY
, CDR_TOKEN
, CDR_HOST
, and NGROK_AUTHTOKEN
. After editing project_vars.sh, build and run the docker image. Below are example commands to do so using the conenivence scripts.
# builds docker image (installing source in image) and pushes to docker repo
bash docker/run_docker_build_push.sh
# runs docker image
bash docker/run_docker_local.sh
Optionally, if you would like to override the default logs
and data
folders within this repo that are empty to use exisitng ones (e.g. on datalake) that might contain existing logs and data, simply mount (or overwite) the corresponding folders on the datalake to the empty logs
and data
folders within this repo. Below are examles commands to do so.
sudo mount.cifs -o username=${USER},domain=sri,uid=$(id -u),gid=$(id -g) /datalake/path/to/existing/logs ./logs
sudo mount.cifs -o username=${USER},domain=sri,uid=$(id -u),gid=$(id -g) /datalake/path/to/existing/data ./data
If installation succeeded without errors, you should be able to run the SRI TA3 server. Skip to SRI TA3 server.
This setup is slightly more involved but provides more scalability to use more compute by using docker and Kubernetes. First we'll need to prepare some folders on the datalake to contain your data, code, and logs. Under the criticalmaas-ta3
folder (namespace) within the vt-open
datalake, make the following directory structure for YOUR use using your employee ID number (i.e. eXXXXX). NOTE, you only need to make the folders with the comment CREATE
in it, the others should exist already. Be careful not to corrupt the folders of other users or namespaces.
vt-open
├── ... # other folders for other namespaces - avoid
├── criticalmaas-ta3 # top-level of criticalmaas-ta3 namespace
│ └── k8s # contains criticalmaas-ta3 code & logs for ALL users - (k8s READ & WRITE)
│ ├── eXXXXX # folder you should CREATE to contain your code & logs
│ │ ├── code # folder you should CREATE to contain your code
│ │ ├── data # folder you should CREATE to contain your data
│ │ └── logs # folder you should CREATE to contain your logs
│ └── ... # other folders for other users - avoid
└── ... # other folders for other namespaces - avoid
Next you will need to mount the code
folder above locally. By mounting the code
folder on the datalake locally, your local edits to source code will be reflected in the datalake, and therefore, on the Kubernetes cluster.
# makes a local code folder
mkdir k8s-code
# mount the datalake folder that hosts the code (Kubernetes will have access)
sudo mount.cifs -o username=${USER},domain=sri,uid=$(id -u),gid=$(id -g) /datalake/path/to/vt-open/criticalmaas-ta3/k8s/${USER}/code ./k8s-code
Last, we'll install the repo. We've written convenience bash scripts to make building and running the docker container much eaiser. Start by cloning the repo locally.
# clone repo source code locally
git clone https://github.com/DARPA-CRITICALMAAS/sri-ta3.git
cd sri-ta3
Next, edit the variables in project_vars.sh relevant to your use case. Typically, one needs to edit JOB_TAG
REPO_HOST
, DUSER
, WANDB_API_KEY
, CDR_TOKEN
, CDR_HOST
, and NGROK_AUTHTOKEN
. After editing project_vars.sh, build and run the docker image. Below are example commands to do so using the conenivence scripts.
# builds docker image (installing source in image) and pushes to docker repo
bash docker/run_docker_build_push.sh
# runs docker image
bash docker/run_docker_local.sh
If installation succeeded without errors, you should be able to run the SRI TA3 server. Skip to SRI TA3 server.
Assuming the installation above succeeded. You should now be in a bash terminal that can run the SRI TA3 server now. Run the following to start the server:
python sri_maper/src/server.py
If the server runs successfully, it will register with the configured CDR instance and then wait for mineral assessment job requests to be made via the StatMagic instance. The output should be similar to the following:
Registering with CDR
Starting TA3 server
INFO: Started server process [9378]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:80 (Press CTRL+C to quit)
You can now start mineral assessments by interacting with the StatMagic GUI at https://statmagic.mtri.org/
Below is a video demonstrating how the SRI TA3 server processes a mineral assessment job initiated from the StatMagic GUI: