-
Notifications
You must be signed in to change notification settings - Fork 157
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Neuron SDK Release 2.19.0 - Release Notes
- Loading branch information
1 parent
78169c6
commit 215b421
Showing
103 changed files
with
2,682 additions
and
596 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,10 @@ | ||
Containers - Kubernetes - Getting Started | ||
========================================= | ||
|
||
The Neuron device plugin is a DaemonSet run on all Inferentia and Trainium nodes that enables the containers in your Kubernetes cluster to request and use Neuron cores or devices. | ||
The Neuron scheduler extension is required for containers in your Kubernetes cluster that request multiple Neuron resources. | ||
It helps find optimal sets of Neuron resources to minimize inter-resource communication costs. | ||
Below are directions for installing and using the Neuron device plugin and scheduler extension. | ||
|
||
|
||
.. include:: /containers/kubernetes-getting-started.txt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
.. _k8s-neuron-monitor: | ||
|
||
Neuron monitor Container | ||
======================== | ||
|
||
Neuron monitor is primary observability tool for neuron devices. For details of neuron monitor, please refer to the `neuron monitor guide <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-monitor-user-guide.html>`_. This tutorial describes deploying neuron monitor as a daemonset on the kubernetes cluster. | ||
|
||
|
||
* Download the neuron monitor yaml file. :download:`k8s-neuron-monitor-daemonset.yml </src/k8/k8s-neuron-monitor-daemonset.yml>` | ||
* Apply the Neuron monitor yaml to create a daemonset on the cluster with the following command | ||
|
||
.. code:: bash | ||
kubectl apply -f k8s-neuron-monitor.yml | ||
* Verify that neuron monitor daemonset is running | ||
|
||
.. code:: bash | ||
kubectl get ds neuron-monitor --namespace neuron-monitor | ||
Expected result (with 2 nodes in cluster): | ||
|
||
.. code:: bash | ||
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE | ||
neuron-monitor 2 2 2 2 2 <none> 27h | ||
* Get the neuron-monitor pod names | ||
.. code:: bash | ||
kubectl get pods | ||
Expected result | ||
|
||
.. code:: bash | ||
NAME READY STATUS RESTARTS AGE | ||
neuron-monitor-slsxf 1/1 Running 0 17m | ||
neuron-monitor-wc4f5 1/1 Running 0 17m | ||
* Verify the prometheus endpoint is available | ||
.. code:: bash | ||
kubectl exec neuron-monitor-wc4f5 -- wget -q --output-document - http://127.0.0.1:8000 | ||
Expected result | ||
|
||
.. code:: bash | ||
# HELP python_gc_objects_collected_total Objects collected during gc | ||
# TYPE python_gc_objects_collected_total counter | ||
python_gc_objects_collected_total{generation="0"} 362.0 | ||
python_gc_objects_collected_total{generation="1"} 0.0 | ||
python_gc_objects_collected_total{generation="2"} 0.0 | ||
# HELP python_gc_objects_uncollectable_total Uncollectable objects found during GC | ||
# TYPE python_gc_objects_uncollectable_total counter |
Oops, something went wrong.