Add files via upload

Le-Zheng · Mar 26, 2019 · 4361683 · 4361683
1 parent d55c0b7
commit 4361683
Showing 1 changed file with 61 additions and 0 deletions.
diff --git a/keras/3.5-classifying-movie-reviews.ipynb b/keras/3.5-classifying-movie-reviews.ipynb
@@ -31,6 +31,13 @@
     "sc = init_nncontext(init_spark_conf().setMaster(\"local[4]\"))"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Note that you have to allocate 32g memory to `SPARK_DRIVER_MEMORY` if you are about to finish the contents in this notebook. Perhaps there is no such memory left on your machine, see memory saving approach [this link would be added after merge]()."
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -1653,6 +1660,60 @@
     "* As they get better on their training data, neural networks eventually start _overfitting_ and end up obtaining increasingly worse results on data \n",
     "never-seen-before. Make sure to always monitor performance on data that is outside of the training set.\n"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## \\* Memory saving\n",
+    "To run this notebook based on codes above, you need 32g `SPARK_DRIVER_MEMORY`, which is a bit expensive. Following is a viable memory saving approach which could save your `SPARK_DRIVER_MEMORY` to 12g.\n",
+    "\n",
+    "Taking a review of the time you have compiled the model, and prepared the `ndarray` type of datasets. And in old code above, the next step you would do is fit:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model.fit(partial_x_train,\n",
+    "          partial_y_train,\n",
+    "          nb_epoch=20,\n",
+    "          batch_size=512,\n",
+    "          validation_data=(x_val, y_val))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Just hold on here! Before you call this `fit` method, use following code to do the training to save the memory:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from bigdl.util.common import to_sample_rdd\n",
+    "\n",
+    "train = to_sample_rdd(partial_x_train, partial_y_train)\n",
+    "val = to_sample_rdd(x_val, y_val)\n",
+    "\n",
+    "model.fit(train, None,\n",
+    "          nb_epoch=20,\n",
+    "          batch_size=512,\n",
+    "          validation_data=val)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This code zip the training data and label into RDD. The reason why it works is that every time when `fit` method takes `ndarray` as input, it transforms the `ndarray` to RDD and some memory is taken for cache in this process. And in this notebook, we use the same dataset as input repeatedly. If we call this operation only once and reuse the RDD afterwards, all the subsequential memory use would be saved."
+   ]
   }
  ],
  "metadata": {