From 4361683c7b79ddf0e7115be73c78fca909269a70 Mon Sep 17 00:00:00 2001 From: Jiaming Song Date: Tue, 26 Mar 2019 14:47:51 +0800 Subject: [PATCH] Add files via upload --- keras/3.5-classifying-movie-reviews.ipynb | 61 +++++++++++++++++++++++ 1 file changed, 61 insertions(+) diff --git a/keras/3.5-classifying-movie-reviews.ipynb b/keras/3.5-classifying-movie-reviews.ipynb index 9c38c15..2e87dc0 100644 --- a/keras/3.5-classifying-movie-reviews.ipynb +++ b/keras/3.5-classifying-movie-reviews.ipynb @@ -31,6 +31,13 @@ "sc = init_nncontext(init_spark_conf().setMaster(\"local[4]\"))" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note that you have to allocate 32g memory to `SPARK_DRIVER_MEMORY` if you are about to finish the contents in this notebook. Perhaps there is no such memory left on your machine, see memory saving approach [this link would be added after merge]()." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -1653,6 +1660,60 @@ "* As they get better on their training data, neural networks eventually start _overfitting_ and end up obtaining increasingly worse results on data \n", "never-seen-before. Make sure to always monitor performance on data that is outside of the training set.\n" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## \\* Memory saving\n", + "To run this notebook based on codes above, you need 32g `SPARK_DRIVER_MEMORY`, which is a bit expensive. Following is a viable memory saving approach which could save your `SPARK_DRIVER_MEMORY` to 12g.\n", + "\n", + "Taking a review of the time you have compiled the model, and prepared the `ndarray` type of datasets. And in old code above, the next step you would do is fit:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "model.fit(partial_x_train,\n", + " partial_y_train,\n", + " nb_epoch=20,\n", + " batch_size=512,\n", + " validation_data=(x_val, y_val))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Just hold on here! Before you call this `fit` method, use following code to do the training to save the memory:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from bigdl.util.common import to_sample_rdd\n", + "\n", + "train = to_sample_rdd(partial_x_train, partial_y_train)\n", + "val = to_sample_rdd(x_val, y_val)\n", + "\n", + "model.fit(train, None,\n", + " nb_epoch=20,\n", + " batch_size=512,\n", + " validation_data=val)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This code zip the training data and label into RDD. The reason why it works is that every time when `fit` method takes `ndarray` as input, it transforms the `ndarray` to RDD and some memory is taken for cache in this process. And in this notebook, we use the same dataset as input repeatedly. If we call this operation only once and reuse the RDD afterwards, all the subsequential memory use would be saved." + ] } ], "metadata": {