Skip to content

Latest commit

 

History

History
53 lines (41 loc) · 2.04 KB

README.md

File metadata and controls

53 lines (41 loc) · 2.04 KB

DGL Operator

The DGL Operator makes it easy to run Deep Graph Library (DGL) graph neural network distributed or non-distributed training on Kubernetes. Please check out here for an introduction to DGL and dgl distributed training philosophy.

🛠Prerequisites

  • Kubernetes >= 1.16

🚀Installation

You can deploy the operator with default settings by running the following commands:

git clone https://github.com/Qihoo360/dgl-operator
cd dgl-operator
kubectl create -f deploy/v1alpha1/dgl-operator.yaml

You can check whether the DGL Job custom resource is installed via:

kubectl get crd

The output should include dgljobs.qihoo.net like the following:

NAME                                       AGE
...
dgljobs.qihoo.net                          1m
...

🔬Creating a DGL Job

You can create a DGL job by defining an DGLJob config file. See GraphSAGE.yaml or GraphSAGE_dist.yaml example config file for launching a single-node or multi-node GraphSAGE training job. You may change the config file based on your requirements.

# standalone GraphSAGE
cat examples/v1alpha1/GraphSAGE.yaml
# or a distributed version
cat examples/v1alpha1/GraphSAGE_dist.yaml

Deploy the DGLJob resource to start training:

# standalone GraphSAGE
kubectl create -f examples/v1alpha1/GraphSAGE.yaml
# or a distributed version
kubectl create -f examples/v1alpha1/GraphSAGE_dist.yaml

💭 Reference

Please check out these previous works that helped inspire the creation of DGL Operator