Releases: awslabs/sagemaker-debugger
v0.9.5
v0.9.4
v0.9.3
v0.9.1
- Extends full support for TF 2.2.0.
- Users will now be able to save biases, weights, gradients, optimizer variables, labels, predictions and model inputs
- Introduces the
hook.save_tensor
api, a generic cross framework api that allows users to save any tensor to a collection during runtime - Extends tensor logging support for the Keras Estimator API
v0.8.1
This release includes the following bug fixes:
v0.8.0
Includes support for saving optimizer variable for TF 2.x and bug fixes -
1)Support saving optimizer variables and gradtape for TF 2.x
2)Support saving optimizer variable with Keras fit api eager mode for TF 2.x
3)Fix for metadata.json file being written again and again
4) Handle exception graciously in mxnet
5) Fix for name clash when operator is called multiple times during forward pass
v0.7.2
-
Experimental support for TF 2.x GradientTape - Introducing experimental support for TF 2.x training scripts using GradientTape. With this change, weights, bias, loss, metrics, and gradients are captured by SageMaker Debugger. These changes work with vanilla version of Tensorflow 2.x (not with the zero-code change version) #186
Note: Training scripts using GradientTape for higher-order gradients or multiple tapes are not supported.
Distributed training scripts that use GradientTape, are not supported at this time. -
Support SyncOnReadVariable in mirrored strategy - Fixes a bug that occurred because SyncOnRead distributed variable was not supported with smdebug. Also enables the use of smdebug with training scripts using TF 2.x MirroredStrategy with fit() API. #190
-
Turn off hook and write only from one worker for unsupported distributed training techniques – PyTorch users were observing a crash when distributed training was implemented using generic multiprocessing library, which is not a method supported by smdebug. This fix handles this case and ensures that tensors are saved. #167
-
Bug fix: Pytorch: Register only if tensors require gradients – Users were observing a crash when training with pretrained embeddings which does not need gradient updates. This fix checks if a gradient update is required and registers a backward hook only in those cases. #193
v0.7.1
v0.7.0
This release includes the following changes -
- Introducing experimental support for TF 2.x keras.fit() eager and non-eager mode. With this change, losses, metrics, weights and biases can be saved in TF 2.x eager mode (At present, gradients/inputs/outputs cannot be saved when TF 2.x eager mode is used) (#150)
- Raise Error For Invalid Collection Config - An exception is raised if a collection is incorrectly configured. Incorrect configuration includes collection created but no tensors/regex specified, spelling mistakes while specifying the collection name in the hook (#162)
- Fix a crash that occurred with the use of PyTorch’s DataParallel API (this API enables user’s to run training on multiple GPUs on PyTorch) (#165)
- Bug fix to allow users to read from data sources that contain more than 1000 files (#168)
- Update the save_scalar() method to accept and store the timestamp at along with the scalar value (#170)
v0.6.0
This release includes the following significant changes -
- Fix scalar write to event file & TF Tb write fix (#145)
- Fix bug in tensor_names() call (#159)
- Fixes: SMSimulator fix, listing files local should ignore tmp files (#137)
- Fix for bug in has_passed_step (#136)
- Bug fix in trial.py has_passed_step (#140)
- S3 upload fast (#122)
- Skip logging the input tensors to the loss block (#86)
- CI/CD and test suite changes.