Skip to content

Releases: awslabs/sagemaker-debugger

v0.9.5

06 Nov 22:01
4510e0c
Compare
Choose a tag to compare

Bug fixes:

  • Returning list instead of dict keys (#376).
  • Add support for mixed precision training (#378)
  • Bugfix: Debugger breaks if should_save_tensor is called before collections are prepared (#372)

v0.9.4

08 Oct 19:56
bb8f4b9
Compare
Choose a tag to compare

Bug Fixes:

  • Pass all arguments to the underlying layer in input output wrapper #366
  • Add support for the add_for_mode api in graph mode #353

v0.9.3

11 Sep 17:58
6f8e757
Compare
Choose a tag to compare
  • Extends full support for TF 2.3.0.
    Users will now be able to save biases, weights, gradients, optimizer variables, labels, predictions and model inputs
  • Address an issue with the model.save API: #333
  • New functions to determine default hook config in AWS TF, PT and MXNet

v0.9.1

03 Aug 19:05
e4d0843
Compare
Choose a tag to compare
  • Extends full support for TF 2.2.0.
    • Users will now be able to save biases, weights, gradients, optimizer variables, labels, predictions and model inputs
  • Introduces the hook.save_tensor api, a generic cross framework api that allows users to save any tensor to a collection during runtime
  • Extends tensor logging support for the Keras Estimator API

v0.8.1

30 May 01:54
6ccf776
Compare
Choose a tag to compare

This release includes the following bug fixes:

  • Correct the number of tensors saved with TF MirroredStrategy running when on GPUs ( #257 )
  • Enable the ability to save scalars with MirroredStrategy on TF 2.x ( #259 )
  • Prevent collection files from being generated when smdebug is not supported by the training script ( #263 )

v0.8.0

19 May 17:46
4cac480
Compare
Choose a tag to compare

Includes support for saving optimizer variable for TF 2.x and bug fixes -

1)Support saving optimizer variables and gradtape for TF 2.x
2)Support saving optimizer variable with Keras fit api eager mode for TF 2.x
3)Fix for metadata.json file being written again and again
4) Handle exception graciously in mxnet
5) Fix for name clash when operator is called multiple times during forward pass

v0.7.2

04 Apr 00:50
b594676
Compare
Choose a tag to compare
  • Experimental support for TF 2.x GradientTape - Introducing experimental support for TF 2.x training scripts using GradientTape. With this change, weights, bias, loss, metrics, and gradients are captured by SageMaker Debugger. These changes work with vanilla version of Tensorflow 2.x (not with the zero-code change version) #186

    Note: Training scripts using GradientTape for higher-order gradients or multiple tapes are not supported.
    Distributed training scripts that use GradientTape, are not supported at this time.

  • Support SyncOnReadVariable in mirrored strategy - Fixes a bug that occurred because SyncOnRead distributed variable was not supported with smdebug. Also enables the use of smdebug with training scripts using TF 2.x MirroredStrategy with fit() API. #190

  • Turn off hook and write only from one worker for unsupported distributed training techniques – PyTorch users were observing a crash when distributed training was implemented using generic multiprocessing library, which is not a method supported by smdebug. This fix handles this case and ensures that tensors are saved. #167

  • Bug fix: Pytorch: Register only if tensors require gradients – Users were observing a crash when training with pretrained embeddings which does not need gradient updates. This fix checks if a gradient update is required and registers a backward hook only in those cases. #193

v0.7.1

14 Mar 05:47
b1b3bac
Compare
Choose a tag to compare
  • Fix a test case in ZCC scenario where the training script written in eager mode crashes when gradients and optimizer variables are saved. #178

v0.7.0

11 Mar 02:13
01c96b2
Compare
Choose a tag to compare

This release includes the following changes -

  • Introducing experimental support for TF 2.x keras.fit() eager and non-eager mode. With this change, losses, metrics, weights and biases can be saved in TF 2.x eager mode (At present, gradients/inputs/outputs cannot be saved when TF 2.x eager mode is used) (#150)
  • Raise Error For Invalid Collection Config - An exception is raised if a collection is incorrectly configured. Incorrect configuration includes collection created but no tensors/regex specified, spelling mistakes while specifying the collection name in the hook (#162)
  • Fix a crash that occurred with the use of PyTorch’s DataParallel API (this API enables user’s to run training on multiple GPUs on PyTorch) (#165)
  • Bug fix to allow users to read from data sources that contain more than 1000 files (#168)
  • Update the save_scalar() method to accept and store the timestamp at along with the scalar value (#170)

v0.6.0

14 Feb 01:36
fa766a4
Compare
Choose a tag to compare

This release includes the following significant changes -

  • Fix scalar write to event file & TF Tb write fix (#145)
  • Fix bug in tensor_names() call (#159)
  • Fixes: SMSimulator fix, listing files local should ignore tmp files (#137)
  • Fix for bug in has_passed_step (#136)
  • Bug fix in trial.py has_passed_step (#140)
  • S3 upload fast (#122)
  • Skip logging the input tensors to the loss block (#86)
  • CI/CD and test suite changes.