This template offers a lightweight yet functional project template for various deep learning projects. The template assumes PyTorch as the deep learning framework. However, one can easily transfer and utilize the template to any project implemented with other frameworks.
You can fork this repo and use it as a template when creating a new repo on Github like this:
Or directly use the template from the forked template repo like this:Alternatively, you can simply download this repo in zipped format and get started:
Next, you can install all the dependencies by typing the following command in project root:
conda careate -n project_name python=3.8
conda install poetry # or 'pip install poetry'
poetry new project_name
Finally, you can wrap up the setup by manually install and update any packages you'd like. Please refer to the Extra Packages section for some awesome packages.
dl-project-template
.
|
├── LICENSE.md
├── README.md
├── makefile # makefile for various commands (install, train, pytest, mypy, lint, etc.)
├── mypy.ini # MyPy type checking configurations
├── pylint.rc # Pylint code quality checking configurations
├── pyproject.toml # Poetry project and environment configurations
|
├── data
| ├── ... # data reference files (index, readme, etc.)
│ ├── raw # untreated data directly downloaded from source
│ ├── interim # intermediate data processing results
│ └── processed # processed data (features and targets) ready for learning
|
├── notebooks # Jupyter Notebooks (mostly for data processing and visualization)
│── src
│ ├── data # data processing classes, functions, and scripts
│ ├── evaluations # evaluation classes and functions (metrics, visualization, etc.)
│ ├── experiments # experiment configuration files
│ ├── modules # activations, layers, modules, and networks (subclass of torch.nn.Module)
│ └── utilities # other useful functions and classes
├── tests # unit tests module for ./src
│
├── docs # documentation files (*.txt, *.doc, *.jpeg, etc.)
├── logs # logs for deep learning experiments
└── models # saved models with optimizer states
- Great Expectation: data validation, documenting, and profiling
- Cerberus: lightweight data validation functionality
- PyJanitor: Pandas extension for data cleaning
- PyDQC: automatic data quality checking
- Feature-engine: transformer library for feature preparation and engineering
- pydantic: data parsing and validation using Python type hints
- Dora: exploratory data analysis toolkit for Python
- datacleaner: automatically cleans data sets and readies them for analysis
- whale: a lightweight data discovery, documentation, and quality engine for data warehouse
- bamboolib: a tool for fast and easy data exploration & transformation of pandas DataFrames
- pandas-summary: an extension to pandas dataframes describe function
- AugLy: a data augmentations library for audio, image, text, and video.
- Numba: JIT compiler that translates Python and NumPy to fast machine code
- CuPy: NumPy-like API accelerated with CUDA
- Dask: parallel computing library
- Ray: framework for distributed applications
- Modin: parallelized Pandas with Dask or Ray
- Vaex: lazy memory-mapping dataframe for big data
- Joblib: disk-caching and parallelization
- RAPIDS: GPU acceleration for data science
- Polars: a blazingly fast DataFrames library implemented in Rust & Python
- DVC: data version control system
- Pachyderm: data pipelining (versioning, lineage/tracking, and parallelization)
- d6tflow: effective data workflow
- Metaflow: end-to-end independent workflow
- Dolt: relational database with version control
- Airflow: platform to programmatically author, schedule and monitor workflows
- Luigi: dependency resolution, workflow management, visualization, etc.
- Seaborn: data visualization based on Matplotlib
- HiPlot: interactive high-dimensional visualization for correlation and pattern discovery
- Plotly.py: interactive browser-based graphing library
- Altair: declarative visualization based on Vega and Vega-Lite
- TabPy: Tableau visualizations with Python
- Chartify: easy and flexible charts
- Pandas-Profiling: HTML profiling reports for Pandas DataFrames
- missingno: toolset of flexible and easy-to-use missing data visualizations and utilities
- Yellowbrick: Scikit-Learn visualization for model selection and hyperparameter tuning
- FlashTorch: visualization toolkit for neural networks in PyTorch
- Streamlit: turn data scripts into sharable web apps in minutes
- python-tabulate: pretty-print tabular data in Python, a library and a command-line utility
- Lux: Python API for intelligent visual data discovery
- bokeh: interactive data visualization in the browser, from Python
- NNI: automate ML/DL lifecycle (feature engineering, neural architecture search, model compression and hyperparameter tuning)
- Comet.ml: self-hosted and cloud-based meta machine learning platform for tracking, comparing, explaining and optimizing experiments and models
- MLflow: platform for ML lifecycle , including experimentation, reproducibility and deployment
- Optuna: automatic hyperparameter optimization framework
- Hyperopt: serial and parallel optimization
- Tune: scalable experiment execution and hyperparameter tuning
- Determined: deep learning training platform
- Aim: a super-easy way to record, search and compare 1000s of ML training runs
- TPOT: a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming
- torchgpipe: a scalable pipeline parallelism library, which allows efficient training of large, memory-consuming models
- PipeDream: generalized pipeline parallelism for deep neural network training
- DeepSpeed: a deep learning optimization library that makes distributed training easy, efficient, and effective
- Horovod: a distributed deep learning training framework
- RaySGD: lightweight wrappers for distributed deep learning
- AdaptDL: a resource-adaptive deep learning training and scheduling framework
- Ignite: high-level library based on PyTorch
- PyTorch Lightning: lightweight wrapper for less boilerplate
- fastai: out-of-the-box tools and models for vision, text, and other data
- Skorch: Scikit-Learn interface for PyTorch models
- PyRo: deep universal probabilistic programming with PyTorch
- Kornia: differentiable computer vision library
- DGL: package for deep learning on graphs
- PyGeometric: geometric deep learning extension library for PyTorch
- PyTorch-BigGraph: a distributed system for learning graph embeddings for large graphs
- Torchmeta: datasets and models for few-shot-learning/meta-learning
- PyTorch3D: library for deep learning with 3D data
- learn2learn: meta-learning model implementations
- higher: higher-order (unrolled first-order) optimization
- Captum: model interpretability and understanding
- PyTorch summary: Keras style summary for PyTorch models
- Catalyst: PyTorch framework for Deep Learning research and development
- Poutyne: a simplified framework for PyTorch and handles much of the ea code needed to train neural networks
- Awesome-Pytorch-list: a comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.
- DoWhy: causal inference combining causal graphical models and potential outcomes
- CausalML: a suite of uplift modeling and causal inference methods using machine learning algorithms based on recent research
- NetworkX: creation, manipulation, and study of complex networks/graphs
- Gym: toolkit for developing and comparing reinforcement learning algorithms
- Polygames: a platform of zero learning with a library of games
- Mlxtend: extensions and helper modules for data analysis and machine learning
- NLTK: a leading platform for building Python programs to work with human language data
- PyCaret: low-code machine learning library
- dabl: baseline library for data analysis
- OGB: benchmark datasets, data loaders and evaluators for graph machine learning
- AI Explainability 360: a toolkit for interpretability and explainability of datasets and machine learning models
- SDV: synthetic data generation for tabular, relational, time series data
- SHAP: game theoretic approach to explain the output of any machine learning mode
- TextBlob: a Python (2 and 3) library for processing textual data
- Google Datasets: high-demand public datasets
- Google Dataset Search: a search engine for freely-available online data
- OpenML: online platform for sharing data, ML algorithms and experiments
- DoltHub: data collaboration with Dolt
- OpenBlender: live-streamed open data sources
- Data Portal: a comprehensive list of open data portals from around the world
- Activeloop: unstructured dataset management for TensorFlow/PyTorch
- Best-of Machine Learning with Python: a ranked list of awesome machine learning Python libraries
- Machine Learning Systems Design by Chip Huyen
- Rules of Machine Learning: Best Practices for ML Engineering by Martin Zinkevich
- Awesome Data Science: an awesome data science repository to learn and apply for real world problems
- Cookiecutter Data Science: a logical, reasonably standardized, but flexible project structure
- PyTorch Template Project: PyTorch deep learning project template
- Xiaotian Duan (Email: xduan7 at gmail.com)
This project is licensed under the MIT License - see the LICENSE.md file for more details.