Data is sourced from: https://github.com/wouterboomsma/cath_datasets?tab=readme-ov-file
- Run
uv venv
to initialize the venv - Activate the venv by with the following CLI commands:
- MacOS: Run
source .venv/bin/activate
- Windows Run
.venv\Scripts\activate.ps1
- MacOS: Run
- Run
uv sync
to download the dependencies
- Run
mlflow ui
- Open your localhost at the specified port to let mlflow track the runs
- Run
python -m src.main
- Four models will be trained, evaluated, and logged (training time is based on an M1 Macbook Pro)
- NN - 1 Hidden Layer, 4 Neurons (~6s)
- NN - 2 Hidden Layers, 64 Neurons Each (~10s)
- CNN (<21min on my M1 Macbook Pro)
- Simplifed CNN w/ Early Stopping (<10min)
src.models
- Different experimental deep learning models using the keras/mlflow API
src.utils
- Interface for downloading/interacting with CATH protein dataset
- Utilities for data pre- and post-processing
src.main
- Main entry point for experiment tracking