Releases · awslabs/sagemaker-debugger

06 Nov 22:01

leleamol

0.9.5

4510e0c

v0.9.5

Bug fixes:

Returning list instead of dict keys (#376).
Add support for mixed precision training (#378)
Bugfix: Debugger breaks if should_save_tensor is called before collections are prepared (#372)

Assets 3

08 Oct 19:56

NihalHarish

0.9.4

bb8f4b9

v0.9.4

Bug Fixes:

Pass all arguments to the underlying layer in input output wrapper #366
Add support for the add_for_mode api in graph mode #353

Assets 2

11 Sep 17:58

NihalHarish

0.9.3

6f8e757

v0.9.3

Extends full support for TF 2.3.0.
Users will now be able to save biases, weights, gradients, optimizer variables, labels, predictions and model inputs
Address an issue with the model.save API: #333
New functions to determine default hook config in AWS TF, PT and MXNet

Assets 2

03 Aug 19:05

NihalHarish

0.9.1

e4d0843

v0.9.1

Extends full support for TF 2.2.0.
- Users will now be able to save biases, weights, gradients, optimizer variables, labels, predictions and model inputs
Introduces the hook.save_tensor api, a generic cross framework api that allows users to save any tensor to a collection during runtime
Extends tensor logging support for the Keras Estimator API

Assets 2

30 May 01:54

NihalHarish

0.8.1

6ccf776

v0.8.1

This release includes the following bug fixes:

Correct the number of tensors saved with TF MirroredStrategy running when on GPUs ( #257 )
Enable the ability to save scalars with MirroredStrategy on TF 2.x ( #259 )
Prevent collection files from being generated when smdebug is not supported by the training script ( #263 )

Assets 2

19 May 17:46

Vikas-kum

0.8.0

4cac480

v0.8.0

Includes support for saving optimizer variable for TF 2.x and bug fixes -

1)Support saving optimizer variables and gradtape for TF 2.x
2)Support saving optimizer variable with Keras fit api eager mode for TF 2.x
3)Fix for metadata.json file being written again and again
4) Handle exception graciously in mxnet
5) Fix for name clash when operator is called multiple times during forward pass

Assets 2

04 Apr 00:50

NihalHarish

0.7.2

b594676

v0.7.2

Experimental support for TF 2.x GradientTape - Introducing experimental support for TF 2.x training scripts using GradientTape. With this change, weights, bias, loss, metrics, and gradients are captured by SageMaker Debugger. These changes work with vanilla version of Tensorflow 2.x (not with the zero-code change version) #186

Note: Training scripts using GradientTape for higher-order gradients or multiple tapes are not supported.
Distributed training scripts that use GradientTape, are not supported at this time.
Support SyncOnReadVariable in mirrored strategy - Fixes a bug that occurred because SyncOnRead distributed variable was not supported with smdebug. Also enables the use of smdebug with training scripts using TF 2.x MirroredStrategy with fit() API. #190
Turn off hook and write only from one worker for unsupported distributed training techniques – PyTorch users were observing a crash when distributed training was implemented using generic multiprocessing library, which is not a method supported by smdebug. This fix handles this case and ensures that tensors are saved. #167
Bug fix: Pytorch: Register only if tensors require gradients – Users were observing a crash when training with pretrained embeddings which does not need gradient updates. This fix checks if a gradient update is required and registers a backward hook only in those cases. #193

Assets 3

14 Mar 05:47

ddavydenko

0.7.1

b1b3bac

v0.7.1

Fix a test case in ZCC scenario where the training script written in eager mode crashes when gradients and optimizer variables are saved. #178

Assets 3

11 Mar 02:13

anirudhacharya

0.7.0

01c96b2

v0.7.0

This release includes the following changes -

Introducing experimental support for TF 2.x keras.fit() eager and non-eager mode. With this change, losses, metrics, weights and biases can be saved in TF 2.x eager mode (At present, gradients/inputs/outputs cannot be saved when TF 2.x eager mode is used) (#150)
Raise Error For Invalid Collection Config - An exception is raised if a collection is incorrectly configured. Incorrect configuration includes collection created but no tensors/regex specified, spelling mistakes while specifying the collection name in the hook (#162)
Fix a crash that occurred with the use of PyTorch’s DataParallel API (this API enables user’s to run training on multiple GPUs on PyTorch) (#165)
Bug fix to allow users to read from data sources that contain more than 1000 files (#168)
Update the save_scalar() method to accept and store the timestamp at along with the scalar value (#170)

Assets 3

14 Feb 01:36

anirudhacharya

v0.6.0

fa766a4

v0.6.0

This release includes the following significant changes -

Fix scalar write to event file & TF Tb write fix (#145)
Fix bug in tensor_names() call (#159)
Fixes: SMSimulator fix, listing files local should ignore tmp files (#137)
Fix for bug in has_passed_step (#136)
Bug fix in trial.py has_passed_step (#140)
S3 upload fast (#122)
Skip logging the input tensors to the loss block (#86)
CI/CD and test suite changes.

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: awslabs/sagemaker-debugger

v0.9.5

v0.9.4

v0.9.3

v0.9.1

v0.8.1

v0.8.0

v0.7.2

v0.7.1

v0.7.0

v0.6.0