Deep Learning Project Template

This template offers a lightweight yet functional project template for various deep learning projects. The template assumes PyTorch as the deep learning framework. However, one can easily transfer and utilize the template to any project implemented with other frameworks.

Getting Started

You can fork this repo and use it as a template when creating a new repo on Github like this:

Or directly use the template from the forked template repo like this:

Alternatively, you can simply download this repo in zipped format and get started:

Next, you can install all the dependencies by typing the following command in project root:

conda careate -n project_name python=3.8
conda install poetry  # or 'pip install poetry'
poetry new project_name

Finally, you can wrap up the setup by manually install and update any packages you'd like. Please refer to the Extra Packages section for some awesome packages.

Template Layout

dl-project-template
.
|
├── LICENSE.md
├── README.md
├── makefile            # makefile for various commands (install, train, pytest, mypy, lint, etc.) 
├── mypy.ini            # MyPy type checking configurations
├── pylint.rc           # Pylint code quality checking configurations
├── pyproject.toml      # Poetry project and environment configurations
|
├── data
|   ├── ...             # data reference files (index, readme, etc.)
│   ├── raw             # untreated data directly downloaded from source
│   ├── interim         # intermediate data processing results
│   └── processed       # processed data (features and targets) ready for learning
|
├── notebooks           # Jupyter Notebooks (mostly for data processing and visualization)
│── src    
│   ├── data            # data processing classes, functions, and scripts
│   ├── evaluations     # evaluation classes and functions (metrics, visualization, etc.)
│   ├── experiments     # experiment configuration files
│   ├── modules         # activations, layers, modules, and networks (subclass of torch.nn.Module)
│   └── utilities       # other useful functions and classes
├── tests               # unit tests module for ./src
│
├── docs                # documentation files (*.txt, *.doc, *.jpeg, etc.)
├── logs                # logs for deep learning experiments
└── models              # saved models with optimizer states

Extra Packages

Data Analysis, Augmentation, Validation and Cleaning

Great Expectation: data validation, documenting, and profiling
Cerberus: lightweight data validation functionality
PyJanitor: Pandas extension for data cleaning
PyDQC: automatic data quality checking
Feature-engine: transformer library for feature preparation and engineering
pydantic: data parsing and validation using Python type hints
Dora: exploratory data analysis toolkit for Python
datacleaner: automatically cleans data sets and readies them for analysis
whale: a lightweight data discovery, documentation, and quality engine for data warehouse
bamboolib: a tool for fast and easy data exploration & transformation of pandas DataFrames
pandas-summary: an extension to pandas dataframes describe function
AugLy: a data augmentations library for audio, image, text, and video.

Performance and Caching

Numba: JIT compiler that translates Python and NumPy to fast machine code
CuPy: NumPy-like API accelerated with CUDA
Dask: parallel computing library
Ray: framework for distributed applications
Modin: parallelized Pandas with Dask or Ray
Vaex: lazy memory-mapping dataframe for big data
Joblib: disk-caching and parallelization
RAPIDS: GPU acceleration for data science
Polars: a blazingly fast DataFrames library implemented in Rust & Python

Data Version Control and Workflow

DVC: data version control system
Pachyderm: data pipelining (versioning, lineage/tracking, and parallelization)
d6tflow: effective data workflow
Metaflow: end-to-end independent workflow
Dolt: relational database with version control
Airflow: platform to programmatically author, schedule and monitor workflows
Luigi: dependency resolution, workflow management, visualization, etc.

Visualization and Presentation

Seaborn: data visualization based on Matplotlib
HiPlot: interactive high-dimensional visualization for correlation and pattern discovery
Plotly.py: interactive browser-based graphing library
Altair: declarative visualization based on Vega and Vega-Lite
TabPy: Tableau visualizations with Python
Chartify: easy and flexible charts
Pandas-Profiling: HTML profiling reports for Pandas DataFrames
missingno: toolset of flexible and easy-to-use missing data visualizations and utilities
Yellowbrick: Scikit-Learn visualization for model selection and hyperparameter tuning
FlashTorch: visualization toolkit for neural networks in PyTorch
Streamlit: turn data scripts into sharable web apps in minutes
python-tabulate: pretty-print tabular data in Python, a library and a command-line utility
Lux: Python API for intelligent visual data discovery
bokeh: interactive data visualization in the browser, from Python

Project Lifecycles and Hyperparameter Optimization

NNI: automate ML/DL lifecycle (feature engineering, neural architecture search, model compression and hyperparameter tuning)
Comet.ml: self-hosted and cloud-based meta machine learning platform for tracking, comparing, explaining and optimizing experiments and models
MLflow: platform for ML lifecycle , including experimentation, reproducibility and deployment
Optuna: automatic hyperparameter optimization framework
Hyperopt: serial and parallel optimization
Tune: scalable experiment execution and hyperparameter tuning
Determined: deep learning training platform
Aim: a super-easy way to record, search and compare 1000s of ML training runs
TPOT: a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming

Distribution, Pipelining, and Sharding

torchgpipe: a scalable pipeline parallelism library, which allows efficient training of large, memory-consuming models
PipeDream: generalized pipeline parallelism for deep neural network training
DeepSpeed: a deep learning optimization library that makes distributed training easy, efficient, and effective
Horovod: a distributed deep learning training framework
RaySGD: lightweight wrappers for distributed deep learning
AdaptDL: a resource-adaptive deep learning training and scheduling framework

Other PyTorch Extensions

Ignite: high-level library based on PyTorch
PyTorch Lightning: lightweight wrapper for less boilerplate
fastai: out-of-the-box tools and models for vision, text, and other data
Skorch: Scikit-Learn interface for PyTorch models
PyRo: deep universal probabilistic programming with PyTorch
Kornia: differentiable computer vision library
DGL: package for deep learning on graphs
PyGeometric: geometric deep learning extension library for PyTorch
PyTorch-BigGraph: a distributed system for learning graph embeddings for large graphs
Torchmeta: datasets and models for few-shot-learning/meta-learning
PyTorch3D: library for deep learning with 3D data
learn2learn: meta-learning model implementations
higher: higher-order (unrolled first-order) optimization
Captum: model interpretability and understanding
PyTorch summary: Keras style summary for PyTorch models
Catalyst: PyTorch framework for Deep Learning research and development
Poutyne: a simplified framework for PyTorch and handles much of the ea code needed to train neural networks

Miscellaneous

Awesome-Pytorch-list: a comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.
DoWhy: causal inference combining causal graphical models and potential outcomes
CausalML: a suite of uplift modeling and causal inference methods using machine learning algorithms based on recent research
NetworkX: creation, manipulation, and study of complex networks/graphs
Gym: toolkit for developing and comparing reinforcement learning algorithms
Polygames: a platform of zero learning with a library of games
Mlxtend: extensions and helper modules for data analysis and machine learning
NLTK: a leading platform for building Python programs to work with human language data
PyCaret: low-code machine learning library
dabl: baseline library for data analysis
OGB: benchmark datasets, data loaders and evaluators for graph machine learning
AI Explainability 360: a toolkit for interpretability and explainability of datasets and machine learning models
SDV: synthetic data generation for tabular, relational, time series data
SHAP: game theoretic approach to explain the output of any machine learning mode
TextBlob: a Python (2 and 3) library for processing textual data

Resources

Datasets:

Google Datasets: high-demand public datasets
Google Dataset Search: a search engine for freely-available online data
OpenML: online platform for sharing data, ML algorithms and experiments
DoltHub: data collaboration with Dolt
OpenBlender: live-streamed open data sources
Data Portal: a comprehensive list of open data portals from around the world
Activeloop: unstructured dataset management for TensorFlow/PyTorch

Libraries:

Best-of Machine Learning with Python: a ranked list of awesome machine learning Python libraries

Readings:

Machine Learning Systems Design by Chip Huyen
Rules of Machine Learning: Best Practices for ML Engineering by Martin Zinkevich
Awesome Data Science: an awesome data science repository to learn and apply for real world problems

Other ML/DL Templates:

Cookiecutter Data Science: a logical, reasonably standardized, but flexible project structure
PyTorch Template Project: PyTorch deep learning project template

Authors

Xiaotian Duan (Email: xduan7 at gmail.com)

License

This project is licensed under the MIT License - see the LICENSE.md file for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Learning Project Template

Table of Contents

Getting Started

Template Layout

Extra Packages

Data Analysis, Augmentation, Validation and Cleaning

Performance and Caching

Data Version Control and Workflow

Visualization and Presentation

Project Lifecycles and Hyperparameter Optimization

Distribution, Pipelining, and Sharding

Other PyTorch Extensions

Miscellaneous

Resources

Datasets:

Libraries:

Readings:

Other ML/DL Templates:

Authors

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
docs		docs
logs		logs
models		models
notebooks		notebooks
src		src
tests		tests
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
makefile		makefile
mypy.ini		mypy.ini
pylint.rc		pylint.rc

License

iblamedom/kuenstliche-intelligenz

Folders and files

Latest commit

History

Repository files navigation

Deep Learning Project Template

Table of Contents

Getting Started

Template Layout

Extra Packages

Data Analysis, Augmentation, Validation and Cleaning

Performance and Caching

Data Version Control and Workflow

Visualization and Presentation

Project Lifecycles and Hyperparameter Optimization

Distribution, Pipelining, and Sharding

Other PyTorch Extensions

Miscellaneous

Resources

Datasets:

Libraries:

Readings:

Other ML/DL Templates:

Authors

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages