Skip to content

Latest commit

 

History

History
11 lines (8 loc) · 1.75 KB

02-distributed-training-tensorflow.md

File metadata and controls

11 lines (8 loc) · 1.75 KB

Distributed Training with TensorFlow

In this lab we will be using images of handwritten digits from the MNIST Database to demonstrate how to perform distributed training using SageMaker. Using a convolutional neural network model based on the TensorFlow MNIST Example, we will demonstrate how to use a Jupyter notebook and the SageMaker Python SDK to create your own script to pre-process data, train a model, create a SageMaker hosted endpoint, and make predictions against this endpoint. The model will predict what the handwritten digit is in the image presented for prediction. Besides demonstrating a "bring your own script" for TensorFlow use case, the example also showcases how easy it is to set up a cluster of multiple instances for model training in SageMaker.

  1. In your notebook instance, click the New button on the right and select Folder.
  2. Click the checkbox next to your new folder, click the Rename button above in the menu bar, and give the folder a name such as 'tensorflow-distributed'.
  3. Click the folder to enter it.
  4. To upload the notebook, click the Upload button on the right. Then in the file selection popup, select the file 'TensorFlow_Distributed_MNIST.ipynb' from the notebooks subdirectory in the folder on your computer where you downloaded this GitHub repository. Click the blue Upload button that appears to the right of the notebook's file name.
  5. You are now ready to begin the notebook: click the notebook's file name to open it, then follow the directions in the notebook.

NOTE: training the model for this example typically takes about 8 minutes.