-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Documentation] sagemaker-debugger open source documentation pre-launch #506
base: master
Are you sure you want to change the base?
Changes from all commits
aa789c7
df74588
2fa0fdb
6857d6c
8be632a
d787f4b
6c00d2a
4b6e0de
54c12ce
9e079dd
4afb5fc
9c20ef2
293f770
4996feb
782e8c6
aa7fcc5
83ad970
3f2beff
fd1b1c2
463f0b4
72e48df
557eae1
1eee9c6
70a594b
19754a1
dd13c6c
cefd9df
f10a3a1
fcc0236
7778131
9edd714
437d9d7
84edbae
c6a94ea
5d32864
b481d94
248de9e
f5051b5
d11b76e
2fe16db
d847ffc
90d84a5
9e8fac5
7bed697
98b8153
b170fd6
ccc802c
4a2746e
9cb753d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
# .readthedocs.yml | ||
# Read the Docs configuration file | ||
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details | ||
|
||
# Required | ||
version: 2 | ||
|
||
# Build documentation in the docs/ directory with Sphinx | ||
sphinx: | ||
configuration: docs/conf.py | ||
fail_on_warning: false | ||
|
||
# Build documentation with MkDocs | ||
#mkdocs: | ||
# configuration: mkdocs.yml | ||
|
||
# Optionally build your docs in additional formats such as PDF | ||
#formats: | ||
|
||
conda: | ||
environment: docs/environment.yml | ||
|
||
# Optionally set the version of Python and requirements required to build your docs | ||
python: | ||
version: 3.6 | ||
install: | ||
- method: setuptools | ||
path: . |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -57,33 +57,45 @@ pip install smdebug | |
For a complete overview of Amazon SageMaker Debugger to learn how it works, go to the [Use Debugger in AWS Containers](https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-container.html) developer guide. | ||
|
||
### AWS Deep Learning Containers with zero code change | ||
Debugger is installed by default in AWS Deep Learning Containers with TensorFlow, PyTorch, MXNet, and XGBoost. The following framework containers enable you to use Debugger with no changes to your training script, by automatically adding [SageMaker Debugger's Hook](docs/api.md#glossary). | ||
|
||
The following frameworks are available AWS Deep Learning Containers with the deep learning frameworks for the zero script change experience. | ||
Debugger is installed by default in AWS Deep Learning Containers | ||
(TensorFlow, PyTorch, MXNet) and the SageMaker XGBoost containers. The | ||
training containers are bundled and tested for integration with the | ||
SMDebug library the entire SageMaker platform. | ||
|
||
| Framework | Version | | ||
| --- | --- | | ||
| [TensorFlow](docs/tensorflow.md) | 1.15, 2.1.0, 2.2.0, 2.3.0, 2.3.1 | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 2.4 and 2.5 are also supported There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. incorporated |
||
| [MXNet](docs/mxnet.md) | 1.6, 1.7 | | ||
| [PyTorch](docs/pytorch.md) | 1.4, 1.5, 1.6 | | ||
| [XGBoost](docs/xgboost.md) | 0.90-2, 1.0-1 ([As a built-in algorithm](docs/xgboost.md#use-xgboost-as-a-built-in-algorithm))| | ||
Comment on lines
-67
to
-69
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Smdebug is supported on the latest versions of all available DLCs. See page. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. incorporated |
||
To find a complete list of available Deep Learning Containers, See | ||
[General Framework Containers](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#general-framework-containers) in the AWS Deep Learning Container | ||
repository. | ||
|
||
**Note**: Debugger with zero script change is partially available for TensorFlow v2.1.0. The `inputs`, `outputs`, `gradients`, and `layers` built-in collections are currently not available for these TensorFlow versions. | ||
This enables you to use Debugger with no changes to your training | ||
script, by automatically adding `hook-api`. | ||
|
||
### AWS training containers with script mode | ||
The following frameworks are available AWS Deep Learning Containers with | ||
the deep learning frameworks for the zero script change experience. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are we explaining what is 'zero script change experience' in the doc? If yes, can we link it here? |
||
The `smdebug` library supports frameworks other than the ones listed above while using AWS containers with script mode. If you want to use SageMaker Debugger with one of the following framework versions, you need to make minimal changes to your training script. | ||
### Frameworks supported by the SMDebug library | ||
|
||
| Framework | Versions | | ||
| --- | --- | | ||
| [TensorFlow](docs/tensorflow.md) | 1.13, 1.14, 1.15, 2.1.0, 2.2.0, 2.3.0, 2.3.1 | | ||
| Keras (with TensorFlow backend) | 2.3 | | ||
| [MXNet](docs/mxnet.md) | 1.4, 1.5, 1.6, 1.7 | | ||
| [PyTorch](docs/pytorch.md) | 1.2, 1.3, 1.4, 1.5, 1.6 | | ||
| [XGBoost](docs/xgboost.md) | 0.90-2, 1.0-1 (As a framework)| | ||
The SMDebug library supports machine learning frameworks for SageMaker | ||
training jobs with script mode and custom training containers. If you | ||
want to use SageMaker Debugger with one of the following framework | ||
versions, you need to make minimal changes to your training script using | ||
the SMDebug library. | ||
|
||
| Framework | Versions | | ||
|---------------------------------|------------------------------------------------------------| | ||
| `tensorflow` | 1.13, 1.14, 1.15, 2.1.0, 2.2.0, 2.3.0, 2.3.1, 2.4.1, 2.5.0 | | ||
| Keras (with TensorFlow backend) | 2.3 | | ||
| `mxnet` | 1.4, 1.5, 1.6, 1.7, 1.8 | | ||
| `pytorch` | 1.2, 1.3, 1.4, 1.5, 1.6, 1.8, 1.9 | | ||
| `xgboost` | 0.90-2, 1.0-1, 1.2-1 (As a framework) | | ||
|
||
### Debugger on custom containers or local machines | ||
You can also fully use the Debugger features in custom containers with the SageMaker Python SDK. Furthermore, `smdebug` is an open source library, so you can install it on your local machine for any advanced use cases that cannot be run in the SageMaker environment and for constructing `smdebug` custom hooks and rules. | ||
|
||
You can also fully use the Debugger features in custom containers with | ||
the SageMaker Python SDK. Furthermore, `smdebug` is an open source | ||
library, so you can install it on your local machine for any advanced | ||
use cases that cannot be run in the SageMaker environment and for | ||
constructing `smdebug` custom hooks and rules. | ||
|
||
--- | ||
|
||
|
@@ -110,10 +122,10 @@ To see a complete list of built-in rules and their functionalities, see [List of | |
You can use Debugger with your training script on your own container making only a minimal modification to your training script to add Debugger's `Hook`. | ||
For an example template of code to use Debugger on your own container in TensorFlow 2.x frameworks, see [Run Debugger in custom container](#Run-Debugger-in-custom-container). | ||
See the following instruction pages to set up Debugger in your preferred framework. | ||
- [TensorFlow](docs/tensorflow.md) | ||
- [MXNet](docs/mxnet.md) | ||
- [PyTorch](docs/pytorch.md) | ||
- [XGBoost](docs/xgboost.md) | ||
- [TensorFlow](tensorflow.md) | ||
- [MXNet](mxnet.md) | ||
- [PyTorch](pytorch.md) | ||
- [XGBoost](xgboost.md) | ||
|
||
#### Using SageMaker Debugger on custom containers | ||
|
||
|
@@ -177,7 +189,7 @@ When you run the `sagemaker_simple_estimator.fit()` API, | |
SageMaker will automatically monitor your training job for you with the Rules specified and create a `CloudWatch` event that tracks the status of the Rule, | ||
so you can take any action based on them. | ||
|
||
If you want additional configuration and control, see [Running SageMaker jobs with Debugger](docs/sagemaker.md) for more information. | ||
If you want additional configuration and control, see [Running SageMaker jobs with Debugger](sagemaker.md) for more information. | ||
|
||
#### Run Debugger in custom container | ||
|
||
|
@@ -235,23 +247,23 @@ print(f"Loss values during evaluation were {trial.tensor('CrossEntropyLoss:0').v | |
## SageMaker Debugger in Action | ||
- Through the model pruning process using Debugger and `smdebug`, you can iteratively identify the importance of weights and cut neurons below a threshold you define. This process allows you to train the model with significantly fewer neurons, which means a lighter, more efficient, faster, and cheaper model without compromising accuracy. The following accuracy versus the number of parameters graph is produced in Studio. It shows that the model accuracy started from about 0.9 with 12 million parameters (the data point moves from right to left along with the pruning process), improved during the first few pruning iterations, kept the quality of accuracy until it cut the number of parameters down to 6 million, and start sacrificing the accuracy afterwards. | ||
|
||
![Debugger Iterative Model Pruning using ResNet](docs/resources/results_resnet.png?raw=true) | ||
![Debugger Iterative Model Pruning using ResNet](resources/results_resnet.png?raw=true) | ||
Debugger provides you tools to access such training process and have a complete control over your model. See [Using SageMaker Debugger and SageMaker Experiments for iterative model pruning](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-debugger/pytorch_iterative_model_pruning/iterative_model_pruning_resnet.ipynb) notebook for the full example and more information. | ||
|
||
- Use Debugger with XGBoost in SageMaker Studio to save feature importance values and plot them in a notebook during training. ![Debugger XGBoost Visualization Example](docs/resources/xgboost_feature_importance.png?raw=true) | ||
- Use Debugger with XGBoost in SageMaker Studio to save feature importance values and plot them in a notebook during training. ![Debugger XGBoost Visualization Example](resources/xgboost_feature_importance.png?raw=true) | ||
|
||
- Use Debugger with TensorFlow in SageMaker Studio to run built-in rules and visualize the loss. ![Debugger TensorFlow Visualization Example](docs/resources/tensorflow_rules_loss.png?raw=true) | ||
- Use Debugger with TensorFlow in SageMaker Studio to run built-in rules and visualize the loss. ![Debugger TensorFlow Visualization Example](resources/tensorflow_rules_loss.png?raw=true) | ||
|
||
--- | ||
|
||
## Further Documentation and References | ||
|
||
| Section | Description | | ||
| --- | --- | | ||
| [SageMaker Training](docs/sagemaker.md) | SageMaker users, we recommend you start with this page on how to run SageMaker training jobs with SageMaker Debugger | | ||
| Frameworks <ul><li>[TensorFlow](docs/tensorflow.md)</li><li>[PyTorch](docs/pytorch.md)</li><li>[MXNet](docs/mxnet.md)</li><li>[XGBoost](docs/xgboost.md)</li></ul> | See the frameworks pages for details on what's supported and how to modify your training script if applicable | | ||
| [APIs for Saving Tensors](docs/api.md) | Full description of our APIs on saving tensors | | ||
| [Programming Model for Analysis](docs/analysis.md) | For description of the programming model provided by the APIs that enable you to perform interactive exploration of tensors saved, as well as to write your own Rules monitoring your training jobs. | | ||
| [SageMaker Training](sagemaker.md) | SageMaker users, we recommend you start with this page on how to run SageMaker training jobs with SageMaker Debugger | | ||
| Frameworks <ul><li>[TensorFlow](tensorflow.md)</li><li>[PyTorch](pytorch.md)</li><li>[MXNet](mxnet.md)</li><li>[XGBoost](xgboost.md)</li></ul> | See the frameworks pages for details on what's supported and how to modify your training script if applicable | | ||
| [APIs for Saving Tensors](api.md) | Full description of our APIs on saving tensors | | ||
| [Programming Model for Analysis](analysis.md) | For description of the programming model provided by the APIs that enable you to perform interactive exploration of tensors saved, as well as to write your own Rules monitoring your training jobs. | | ||
|
||
|
||
## License | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Minimal makefile for Sphinx documentation | ||
# | ||
|
||
# You can set these variables from the command line, and also | ||
# from the environment for the first two. | ||
SPHINXOPTS ?= | ||
SPHINXBUILD ?= sphinx-build | ||
SOURCEDIR = . | ||
BUILDDIR = _build | ||
|
||
# Put it first so that "make" without argument is like "make help". | ||
help: | ||
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) | ||
|
||
.PHONY: help Makefile | ||
|
||
# Catch-all target: route all unknown targets to Sphinx using the new | ||
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). | ||
%: Makefile | ||
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why Python 3.6? can we use Python 3.9?