From 64cba48eba12f6199927beb38c3039cfa8b74de2 Mon Sep 17 00:00:00 2001 From: Jiaming Song Date: Thu, 21 Mar 2019 08:56:43 +0800 Subject: [PATCH 1/3] Add files via upload --- keras/3.7-regression.ipynb | 797 +++++++++++++++++++++++++++++++++++++ 1 file changed, 797 insertions(+) create mode 100644 keras/3.7-regression.ipynb diff --git a/keras/3.7-regression.ipynb b/keras/3.7-regression.ipynb new file mode 100644 index 0000000..0f37622 --- /dev/null +++ b/keras/3.7-regression.ipynb @@ -0,0 +1,797 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First of all, set environment variables and initialize spark context:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "env: SPARK_DRIVER_MEMORY=8g\n", + "env: PYSPARK_PYTHON=/usr/bin/python3.5\n", + "env: PYSPARK_DRIVER_PYTHON=/usr/bin/python3.5\n" + ] + } + ], + "source": [ + "%env SPARK_DRIVER_MEMORY=8g\n", + "%env PYSPARK_PYTHON=/usr/bin/python3.5\n", + "%env PYSPARK_DRIVER_PYTHON=/usr/bin/python3.5\n", + "\n", + "from zoo.common.nncontext import *\n", + "sc = init_nncontext(init_spark_conf().setMaster(\"local[4]\"))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Predicting house prices: a regression example\n", + "\n", + "\n", + "----\n", + "\n", + "\n", + "In our two previous examples, we were considering classification problems, where the goal was to predict a single discrete label of an \n", + "input data point. Another common type of machine learning problem is \"regression\", which consists of predicting a continuous value instead \n", + "of a discrete label. For instance, predicting the temperature tomorrow, given meteorological data, or predicting the time that a \n", + "software project will take to complete, given its specifications.\n", + "\n", + "Do not mix up \"regression\" with the algorithm \"logistic regression\": confusingly, \"logistic regression\" is not a regression algorithm, \n", + "it is a classification algorithm." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## The Boston Housing Price dataset\n", + "\n", + "\n", + "We will be attempting to predict the median price of homes in a given Boston suburb in the mid-1970s, given a few data points about the \n", + "suburb at the time, such as the crime rate, the local property tax rate, etc.\n", + "\n", + "The dataset we will be using has another interesting difference from our two previous examples: it has very few data points, only 506 in \n", + "total, split between 404 training samples and 102 test samples, and each \"feature\" in the input data (e.g. the crime rate is a feature) has \n", + "a different scale. For instance some values are proportions, which take a values between 0 and 1, others take values between 1 and 12, \n", + "others between 0 and 100...\n", + "\n", + "Let's take a look at the data:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "from zoo.pipeline.api.keras.datasets import boston_housing\n", + "(train_data, train_targets), (test_data, test_targets) = boston_housing.load_data()" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(404, 13)" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "train_data.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(102, 13)" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "test_data.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you can see, we have 404 training samples and 102 test samples. The data comprises 13 features. The 13 features in the input data are as \n", + "follow:\n", + "\n", + "1. Per capita crime rate.\n", + "2. Proportion of residential land zoned for lots over 25,000 square feet.\n", + "3. Proportion of non-retail business acres per town.\n", + "4. Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).\n", + "5. Nitric oxides concentration (parts per 10 million).\n", + "6. Average number of rooms per dwelling.\n", + "7. Proportion of owner-occupied units built prior to 1940.\n", + "8. Weighted distances to five Boston employment centres.\n", + "9. Index of accessibility to radial highways.\n", + "10. Full-value property-tax rate per $10,000.\n", + "11. Pupil-teacher ratio by town.\n", + "12. 1000 * (Bk - 0.63) ** 2 where Bk is the proportion of Black people by town.\n", + "13. % lower status of the population.\n", + "\n", + "The targets are the median values of owner-occupied homes, in thousands of dollars:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([22.6, 50. , 23. , 8.3, 21.2, 19.9, 20.6, 18.7, 16.1, 18.6, 8.8,\n", + " 17.2, 14.9, 10.5, 50. , 29. , 23. , 33.3, 29.4, 21. , 23.8, 19.1,\n", + " 20.4, 29.1, 19.3, 23.1, 19.6, 19.4, 38.7, 18.7, 14.6, 20. , 20.5,\n", + " 20.1, 23.6, 16.8, 5.6, 50. , 14.5, 13.3, 23.9, 20. , 19.8, 13.8,\n", + " 16.5, 21.6, 20.3, 17. , 11.8, 27.5, 15.6, 23.1, 24.3, 42.8, 15.6,\n", + " 21.7, 17.1, 17.2, 15. , 21.7, 18.6, 21. , 33.1, 31.5, 20.1, 29.8,\n", + " 15.2, 15. , 27.5, 22.6, 20. , 21.4, 23.5, 31.2, 23.7, 7.4, 48.3,\n", + " 24.4, 22.6, 18.3, 23.3, 17.1, 27.9, 44.8, 50. , 23. , 21.4, 10.2,\n", + " 23.3, 23.2, 18.9, 13.4, 21.9, 24.8, 11.9, 24.3, 13.8, 24.7, 14.1,\n", + " 18.7, 28.1, 19.8, 26.7, 21.7, 22. , 22.9, 10.4, 21.9, 20.6, 26.4,\n", + " 41.3, 17.2, 27.1, 20.4, 16.5, 24.4, 8.4, 23. , 9.7, 50. , 30.5,\n", + " 12.3, 19.4, 21.2, 20.3, 18.8, 33.4, 18.5, 19.6, 33.2, 13.1, 7.5,\n", + " 13.6, 17.4, 8.4, 35.4, 24. , 13.4, 26.2, 7.2, 13.1, 24.5, 37.2,\n", + " 25. , 24.1, 16.6, 32.9, 36.2, 11. , 7.2, 22.8, 28.7, 14.4, 24.4,\n", + " 18.1, 22.5, 20.5, 15.2, 17.4, 13.6, 8.7, 18.2, 35.4, 31.7, 33. ,\n", + " 22.2, 20.4, 23.9, 25. , 12.7, 29.1, 12. , 17.7, 27. , 20.6, 10.2,\n", + " 17.5, 19.7, 29.8, 20.5, 14.9, 10.9, 19.5, 22.7, 19.5, 24.6, 25. ,\n", + " 24.5, 50. , 14.3, 11.8, 31. , 28.7, 16.2, 43.5, 25. , 22. , 19.9,\n", + " 22.1, 46. , 22.9, 20.2, 43.1, 34.6, 13.8, 24.3, 21.5, 24.4, 21.2,\n", + " 23.8, 26.6, 25.1, 9.6, 19.4, 19.4, 9.5, 14. , 26.5, 13.8, 34.7,\n", + " 16.3, 21.7, 17.5, 15.6, 20.9, 21.7, 12.7, 18.5, 23.7, 19.3, 12.7,\n", + " 21.6, 23.2, 29.6, 21.2, 23.8, 17.1, 22. , 36.5, 18.8, 21.9, 23.1,\n", + " 20.2, 17.4, 37. , 24.1, 36.2, 15.7, 32.2, 13.5, 17.9, 13.3, 11.7,\n", + " 41.7, 18.4, 13.1, 25. , 21.2, 16. , 34.9, 25.2, 24.8, 21.5, 23.4,\n", + " 18.9, 10.8, 21. , 27.5, 17.5, 13.5, 28.7, 14.8, 19.1, 28.6, 13.1,\n", + " 19. , 11.3, 13.3, 22.4, 20.1, 18.2, 22.9, 20.6, 25. , 12.8, 34.9,\n", + " 23.7, 50. , 29. , 30.1, 22. , 15.6, 23.3, 30.1, 14.3, 22.8, 50. ,\n", + " 20.8, 6.3, 34.9, 32.4, 19.9, 20.3, 17.8, 23.1, 20.4, 23.2, 7. ,\n", + " 16.8, 46.7, 50. , 22.9, 23.9, 21.4, 21.7, 15.4, 15.3, 23.1, 23.9,\n", + " 19.4, 11.9, 17.8, 31.5, 33.8, 20.8, 19.8, 22.4, 5. , 24.5, 19.4,\n", + " 15.1, 18.2, 19.3, 27.1, 20.7, 37.6, 11.7, 33.4, 30.1, 21.4, 45.4,\n", + " 20.1, 20.8, 26.4, 10.4, 21.8, 32. , 21.7, 18.4, 37.9, 17.8, 28. ,\n", + " 28.2, 36. , 18.9, 15. , 22.5, 30.7, 20. , 19.1, 23.3, 26.6, 21.1,\n", + " 19.7, 20. , 12.1, 7.2, 14.2, 17.3, 27.5, 22.2, 10.9, 19.2, 32. ,\n", + " 14.5, 24.7, 12.6, 24. , 24.1, 50. , 16.1, 43.8, 26.6, 36.1, 21.8,\n", + " 29.9, 50. , 44. , 20.6, 19.6, 28.4, 19.1, 22.3, 20.9, 28.4, 14.4,\n", + " 32.7, 13.8, 8.5, 22.5, 35.1, 31.6, 17.8, 15.6])" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "train_targets" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The prices are typically between \\$10,000 and \\$50,000. If that sounds cheap, remember this was the mid-1970s, and these prices are not \n", + "inflation-adjusted." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Preparing the data\n", + "\n", + "\n", + "It would be problematic to feed into a neural network values that all take wildly different ranges. The network might be able to \n", + "automatically adapt to such heterogeneous data, but it would definitely make learning more difficult. A widespread best practice to deal \n", + "with such data is to do feature-wise normalization: for each feature in the input data (a column in the input data matrix), we \n", + "will subtract the mean of the feature and divide by the standard deviation, so that the feature will be centered around 0 and will have a \n", + "unit standard deviation. This is easily done in Numpy:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "mean = train_data.mean(axis=0)\n", + "train_data -= mean\n", + "std = train_data.std(axis=0)\n", + "train_data /= std\n", + "\n", + "test_data -= mean\n", + "test_data /= std" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note that the quantities that we use for normalizing the test data have been computed using the training data. We should never use in our \n", + "workflow any quantity computed on the test data, even for something as simple as data normalization." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Building our network\n", + "\n", + "\n", + "Because so few samples are available, we will be using a very small network with two \n", + "hidden layers, each with 64 units. In general, the less training data you have, the worse overfitting will be, and using \n", + "a small network is one way to mitigate overfitting." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from zoo.pipeline.api.keras import models\n", + "from zoo.pipeline.api.keras import layers\n", + "\n", + "def build_model():\n", + " # Because we will need to instantiate\n", + " # the same model multiple times,\n", + " # we use a function to construct it.\n", + " model = models.Sequential()\n", + " model.add(layers.Dense(64, activation='relu',\n", + " input_shape=(train_data.shape[1],)))\n", + " model.add(layers.Dense(64, activation='relu'))\n", + " model.add(layers.Dense(1))\n", + " model.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])\n", + " return model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Our network ends with a single unit, and no activation (i.e. it will be linear layer). \n", + "This is a typical setup for scalar regression (i.e. regression where we are trying to predict a single continuous value). \n", + "Applying an activation function would constrain the range that the output can take; for instance if \n", + "we applied a `sigmoid` activation function to our last layer, the network could only learn to predict values between 0 and 1. Here, because \n", + "the last layer is purely linear, the network is free to learn to predict values in any range.\n", + "\n", + "Note that we are compiling the network with the `mse` loss function -- Mean Squared Error, the square of the difference between the \n", + "predictions and the targets, a widely used loss function for regression problems.\n", + "\n", + "We are also monitoring a new metric during training: `mae`. This stands for Mean Absolute Error. It is simply the absolute value of the \n", + "difference between the predictions and the targets. For instance, a MAE of 0.5 on this problem would mean that our predictions are off by \n", + "\\$500 on average." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Validating our approach using K-fold validation\n", + "\n", + "\n", + "To evaluate our network while we keep adjusting its parameters (such as the number of epochs used for training), we could simply split the \n", + "data into a training set and a validation set, as we were doing in our previous examples. However, because we have so few data points, the \n", + "validation set would end up being very small (e.g. about 100 examples). A consequence is that our validation scores may change a lot \n", + "depending on _which_ data points we choose to use for validation and which we choose for training, i.e. the validation scores may have a \n", + "high _variance_ with regard to the validation split. This would prevent us from reliably evaluating our model.\n", + "\n", + "The best practice in such situations is to use K-fold cross-validation. It consists of splitting the available data into K partitions \n", + "(typically K=4 or 5), then instantiating K identical models, and training each one on K-1 partitions while evaluating on the remaining \n", + "partition. The validation score for the model used would then be the average of the K validation scores obtained." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Then let's start our training:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "processing fold # 0\n", + "creating: createZooKerasSequential\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createRMSprop\n", + "creating: createZooKerasMeanSquaredError\n", + "creating: createZooKerasMAE\n", + "processing fold # 1\n", + "creating: createZooKerasSequential\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createRMSprop\n", + "creating: createZooKerasMeanSquaredError\n", + "creating: createZooKerasMAE\n", + "processing fold # 2\n", + "creating: createZooKerasSequential\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createRMSprop\n", + "creating: createZooKerasMeanSquaredError\n", + "creating: createZooKerasMAE\n", + "processing fold # 3\n", + "creating: createZooKerasSequential\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createRMSprop\n", + "creating: createZooKerasMeanSquaredError\n", + "creating: createZooKerasMAE\n" + ] + } + ], + "source": [ + "import numpy as np\n", + "\n", + "k = 4\n", + "num_val_samples = len(train_data) // k\n", + "num_nb_epoch = 50\n", + "all_scores = []\n", + "for i in range(k):\n", + " print('processing fold #', i)\n", + " # Prepare the validation data: data from partition # k\n", + " val_data = train_data[i * num_val_samples: (i + 1) * num_val_samples]\n", + " val_targets = train_targets[i * num_val_samples: (i + 1) * num_val_samples]\n", + "\n", + " # Prepare the training data: data from all other partitions\n", + " partial_train_data = np.concatenate(\n", + " [train_data[:i * num_val_samples],\n", + " train_data[(i + 1) * num_val_samples:]],\n", + " axis=0)\n", + " partial_train_targets = np.concatenate(\n", + " [train_targets[:i * num_val_samples],\n", + " train_targets[(i + 1) * num_val_samples:]],\n", + " axis=0)\n", + "\n", + " # Build the model (already compiled)\n", + " model = build_model()\n", + " # Train the model (in silent mode, verbose=0)\n", + " #model.fit(partial_train_data, partial_train_targets,\n", + " # nb_epoch=num_nb_epoch, batch_size=1, verbose=0)\n", + " model.fit(partial_train_data, partial_train_targets,\n", + " nb_epoch=num_nb_epoch, batch_size=16)\n", + "\n", + " # Evaluate the model on the validation data\n", + " #val_mse, val_mae = model.evaluate(val_data, val_targets, verbose=0)\n", + " val_mse, val_mae = model.evaluate(val_data, val_targets)\n", + " all_scores.append(val_mae)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "_INFO - Trained 16 records in 0.011235845 seconds. Throughput is 1424.0139 records/second. Loss is 8.708786._\n", + "\n", + "_INFO - Trained 16 records in 0.009535034 seconds. Throughput is 1678.0223 records/second. Loss is 5.3613434._\n", + "\n", + "_INFO - Trained 16 records in 0.008636178 seconds. Throughput is 1852.6713 records/second. Loss is 18.106756._\n", + "\n", + "_INFO - Trained 16 records in 0.009207628 seconds. Throughput is 1737.6897 records/second. Loss is 7.0931993._" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[3.291872501373291, 2.496018171310425, 2.221175193786621, 2.6994853019714355]" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "all_scores" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2.677137792110443" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.mean(all_scores)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you can notice, the different runs do indeed show rather different validation scores, from 2.1 to 2.9. Their average (2.4) is a much more \n", + "reliable metric than any single of these scores -- that's the entire point of K-fold cross-validation. In this case, we are off by \\\\$2,400 on \n", + "average, which is still significant considering that the prices range from \\\\$10,000 to \\\\$50,000. \n", + "\n", + "Let's try training the network for a bit longer: 500 epochs. To keep a record of how well the model did at each epoch, we will modify our training loop \n", + "to save the per-epoch validation score log:" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "processing fold # 0\n", + "creating: createZooKerasSequential\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createRMSprop\n", + "creating: createZooKerasMeanSquaredError\n", + "creating: createZooKerasMAE\n", + "processing fold # 1\n", + "creating: createZooKerasSequential\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createRMSprop\n", + "creating: createZooKerasMeanSquaredError\n", + "creating: createZooKerasMAE\n", + "processing fold # 2\n", + "creating: createZooKerasSequential\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createRMSprop\n", + "creating: createZooKerasMeanSquaredError\n", + "creating: createZooKerasMAE\n", + "processing fold # 3\n", + "creating: createZooKerasSequential\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createRMSprop\n", + "creating: createZooKerasMeanSquaredError\n", + "creating: createZooKerasMAE\n" + ] + } + ], + "source": [ + "num_epochs = 500\n", + "all_mae_histories = []\n", + "for i in range(k):\n", + " print('processing fold #', i)\n", + " # Prepare the validation data: data from partition # k\n", + " val_data = train_data[i * num_val_samples: (i + 1) * num_val_samples]\n", + " val_targets = train_targets[i * num_val_samples: (i + 1) * num_val_samples]\n", + "\n", + " # Prepare the training data: data from all other partitions\n", + " partial_train_data = np.concatenate(\n", + " [train_data[:i * num_val_samples],\n", + " train_data[(i + 1) * num_val_samples:]],\n", + " axis=0)\n", + " partial_train_targets = np.concatenate(\n", + " [train_targets[:i * num_val_samples],\n", + " train_targets[(i + 1) * num_val_samples:]],\n", + " axis=0)\n", + "\n", + " # Build the model (already compiled)\n", + " model = build_model()\n", + " # Train the model (in silent mode, verbose=0)\n", + " import time\n", + " dir_name = '3-7 ' + str(time.ctime())\n", + " model.set_tensorboard('./', dir_name)\n", + " history = model.fit(partial_train_data, partial_train_targets,\n", + " validation_data=(val_data, val_targets),\n", + " nb_epoch=num_epochs, batch_size=16)\n", + " \n", + " #mae_history = history.history['val_mean_absolute_error']\n", + " mae_history = model.get_validation_summary(\"Loss\")\n", + " all_mae_histories.append(mae_history)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can then compute the average of the per-epoch MAE scores for all folds:" + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[[1.90000000e+01, 4.05375427e+02, 1.55307042e+09],\n", + " [3.80000000e+01, 2.64351837e+02, 1.55307042e+09],\n", + " [5.70000000e+01, 1.50977859e+02, 1.55307042e+09],\n", + " ...,\n", + " [9.46200000e+03, 2.07635689e+01, 1.55307053e+09],\n", + " [9.48100000e+03, 2.02473850e+01, 1.55307053e+09],\n", + " [9.50000000e+03, 2.02105141e+01, 1.55307053e+09]],\n", + "\n", + " [[1.90000000e+01, 4.76980957e+02, 1.55307053e+09],\n", + " [3.80000000e+01, 3.29584198e+02, 1.55307053e+09],\n", + " [5.70000000e+01, 1.80655548e+02, 1.55307053e+09],\n", + " ...,\n", + " [9.46200000e+03, 1.73588219e+01, 1.55307064e+09],\n", + " [9.48100000e+03, 1.78555279e+01, 1.55307064e+09],\n", + " [9.50000000e+03, 1.73744106e+01, 1.55307064e+09]],\n", + "\n", + " [[1.90000000e+01, 4.62182434e+02, 1.55307064e+09],\n", + " [3.80000000e+01, 3.34037567e+02, 1.55307064e+09],\n", + " [5.70000000e+01, 2.06141006e+02, 1.55307064e+09],\n", + " ...,\n", + " [9.46200000e+03, 1.72124062e+01, 1.55307075e+09],\n", + " [9.48100000e+03, 1.75751667e+01, 1.55307075e+09],\n", + " [9.50000000e+03, 1.74055386e+01, 1.55307075e+09]],\n", + "\n", + " [[1.90000000e+01, 5.21177673e+02, 1.55307075e+09],\n", + " [3.80000000e+01, 3.99685974e+02, 1.55307075e+09],\n", + " [5.70000000e+01, 2.67611786e+02, 1.55307075e+09],\n", + " ...,\n", + " [9.46200000e+03, 1.75390892e+01, 1.55307085e+09],\n", + " [9.48100000e+03, 1.76337471e+01, 1.55307085e+09],\n", + " [9.50000000e+03, 1.91227703e+01, 1.55307085e+09]]])" + ] + }, + "execution_count": 47, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "all_mae_histories = np.array(all_mae_histories)\n", + "all_mae_histories" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you can see, the `all_mae_histories` is a 3-d array, the last dimension are 3-element tuples. This 3-d array is built up with four 2-d arrays and all the first element of every 2-d array are equal. The first element of tuple stands for the training step and the third element stands for time stamp. You do need to worry about them, let's just calculate the average value through the first axis of this 3-d array. Actually we just want the second elements of this array, which stand for the MAE results. " + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[1.90000000e+01, 4.66429123e+02, 1.55307058e+09],\n", + " [3.80000000e+01, 3.31914894e+02, 1.55307058e+09],\n", + " [5.70000000e+01, 2.01346550e+02, 1.55307058e+09],\n", + " ...,\n", + " [9.46200000e+03, 1.82184715e+01, 1.55307069e+09],\n", + " [9.48100000e+03, 1.83279567e+01, 1.55307069e+09],\n", + " [9.50000000e+03, 1.85283084e+01, 1.55307069e+09]])" + ] + }, + "execution_count": 48, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "average_mae_history = np.mean(all_mae_histories, axis=0)\n", + "average_mae_history" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you can see, this operation does not mess up the first elements since they are all equal through the first axis. And we do not need to care about the third element because it is useless at this time.\n", + "\n", + "Let's plot this:" + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEKCAYAAAAfGVI8AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAIABJREFUeJzsnXmYHFW5/79vVXfP9MxklmSy7wsk7EtCWIPBQEBRQUWW64LKFS8uV/TqFfW64YaiICo/ERFUVFwAQUVEQCAsAUwgQIAkZN+3SSazT3dXnd8fVaf61KmleybT05Pp9/M880x3dS2nejnveXcSQoBhGIapXIxyD4BhGIYpLywIGIZhKhwWBAzDMBUOCwKGYZgKhwUBwzBMhcOCgGEYpsIpmSAgoslE9BgRvUZErxLRp9ztI4noYSJ6w/3fVKoxMAzDMIWhUuURENF4AOOFEC8Q0QgAywFcCOCDAPYJIa4jomsANAkhPl+SQTAMwzAFKZlGIITYIYR4wX3cDuB1ABMBXADgV+5uv4IjHBiGYZgyUTKNwHcRomkAlgA4GsBmIUSju50A7JfPtWOuBHAlANTW1s6dM2dOScb22o42NKaTmNCYLsn5GYZhysXy5cv3CiFGF9qv5IKAiOoAPAHgW0KIe4moVZ34iWi/ECLWTzBv3jyxbNmykoxv3jcfwTlHjsV33nVMSc7PMAxTLohouRBiXqH9Sho1RERJAPcA+K0Q4l538y7XfyD9CLtLOYZCJE2CZdvlHALDMExZKWXUEAH4BYDXhRA3KC/9BcDl7uPLAdxfqjEUg2kQchYX3mMYpnJJlPDcpwN4P4BXiGiFu+2LAK4D8EciugLAJgAXl3AMBUmaBrI2CwKGYSqXkgkCIcRTACji5UWlum5fSRhsGmIYprKp+Mxi0yBk2TTEMEwFU/GCIGkayFmsETAMU7lUvCBImIQc+wgYhqlgWBBw1BDDMBUOCwLDgMUaAcMwFQwLApOQ5aghhmEqGBYEbBpiGKbCYUFgGuwsZhimomFBYBCHjzIMU9GwIGCNgGGYCqfiBUHSIOTYWcwwTJnZ1tpdtgjGihcEXH2UYZhyc6A7i7OufxwPrtxRlutXvCBImAbXGmIYpqy092SRsWzsausty/UrXhBwYxqGYfrCqp1t6MlaA3pOaZUY6PMWS8ULAjYNMQxTLK1dGZz3wyfxubtfHtDzSj8lC4IykeSoIYZhiqSjNwcAWL5x34CeV5qnuzMsCMqCyVFDDMP0EacT78DhmYZyLAjKQtJtTPPAyzs4sYxhmFiE0J8LdGVyB33ejDv3dGfKMwdVvCBImM5b8PHfvYBbn1xf5tEwDHMoIBWCnz+5Hkd+5SHsbus5qPPJRShrBGXCNPIq3q4DB/dhMgwzvJEagRQEf3vZifvf0ce545+v7sS21m7vufRT9rCPoDwkzbwgGGi7H8MwwwvblQSE4uaKZ9e3BExHQgh8/Hcv4K7nNnvbsqwRlJeEUfFvAcP0mY7eHHrLNGmVE0sKAk0OhMUdbm/txqW3PhsINc1YNrKW8L1/OY4aKi8Jk7UAhukrR3/1Ibz7p8+UexiDjqwFVMys0d7jaAJrdrb7tvdkndW/WtEgn0fAzuKyoGoEbBlimOJZua2t3EMYdDxBoE0WYVOHQLj20OsmjakF5rKcWVxeEgbP/gzDFEdfNAKZnmRokkCu+tX8pXJnFifKctUhBJuGGIYplrxG4N8u1/ZCCPz+31tw2Jg6VCdNd1//zt3uZK+ahrzMYhYE5UHmETAMwxQip5mG1Cn+6399FUeMq8cX7n0FAHD/x08P7APkV/1qAmu+6Nww8xEQ0e1EtJuIVirbjiOipUT0ChH9lYjqS3X9YlFNQ8WGhDEMc+jSlcnh50vW96sJjK2lFgtl+x1Pb8T/3pOPEJKrez0w0RMEdtBZ3J21IPT05UGglMvhXwI4T9t2G4BrhBDHAPgzgM+V8PpFwT4ChhnarNjSis7egy/jIPnRo2vxrb+/jvtXbOvzsXLlrs8adohQ6XCjhgI+gpztOxfgNxP15gZfKyiZIBBCLAGgl+g7HMAS9/HDAN5dqusXS9LkqCGGGaq09WRx4c1P45N3vThg55Qr8n2dmT4fa2t5BHLKyITUKet0E8l0H0FeI8gfk1WOL4fDeLAN5K8CuMB9/B4Akwf5+gFM1ggYZsgiSy68vPXAgJ2zJuU4cfuTvJWPGvLPG5mQVbwsWS33PNCVxV9e2u5N9L48Ap8gGEYaQQQfBvAxIloOYASASJFMRFcS0TIiWrZnz56SDUiNGmKRwDClRwiB6x5chde2F85DkFPlQGrraTeapz8ROlFRQ2HmnLxpyHn+ibtewH/f9SLe2NXhOxfgFwrliBwaVEEghFglhFgshJgL4C4A62L2vVUIMU8IMW/06NElG1OSo4YYZlDpzlq45Yl1eM8thTOTpSmmv4r7HU9vwL0vbPVtS7saQdfBaASaJAgTBNKvIX0EW/c7ReakpqCag1QzUTlMQ4MaPkpEY4QQu4nIAPB/AG4ZzOuHoZqG2EfAMKVHBsVki2gRm3fO9u/H+fW/vgYAeNeJk7xtchLvz4Sbi0go6w05V0ev5V7PeS6FmhQavqih4aoRENFdAJYCmE1EW4noCgCXEdEaAKsAbAdwR6muXyxJX4kJlgQMU2pk4bZiOgNGmWIOBrkS78+EqzuLJaGmod6suy/5js14UUOqszgvCIaVRiCEuCzipZtKdc3+oGoE5YjfZZhKw3InvWLC+KWwKEYOSN/DxSdNxszRdZG/52xOdgMLn3DX7GrHhMY06qqC02OuDz6CTlcjkFOMlHuyLHVYHgFQGVFDQw61HwF3qmSY0mP1YcElV8rFaOvbD/TgZ0vW40N3/BsA0NYTnnsgNYLOkBaTQggsvnEJLr/9+dBj7YioobCS3PmoIcKn/7DCa0Qjt0flEVRC1NCQQy0xYXETe4YpOX3J6PV8BEWoBKa7k5yUW7vCgxIz7jnbQwSFHNvyTfvDxxOlEYRM3nLCNwzgzy9uC2zPqkXnlFVoOXoSsCBQTEO5fqScM0ylcbAm1D4JAmkaKiAIdrf3eAJACo+whLE1u9pxx9MbAITH/oclhqnYUc7imKghXXvoDNEIcrZAVcKZjsvRpYyLzimmIb2OCMMwQQ52vdQ3QVBc1ND8bz2Kppqk75jWrmxgv7f96Clvss+GTPphwiFsPLpkijMN6deRvgN/HoGNEdVJ9Hb0skZQDtTGNLkiwtkYptI52AVTXwSBnESLySPY70788vz7XdOQXGkD/hV/mAWgkCDwWlVq28OjhnK+8ejbs76oIRsjqhOR5yo1LAgM1VnMgoBhCnGwgqAvJthcEc5iPcrGcwa7E67sCxB1bhVVUNz82FrM/cbDvtctK9xUFeYj6PR8ASJ0u55HkE6aMKg8PgI2DZnsI2CYvtBfOZC1bCQM6pMgKaYj2IFuvwlIHiMjcaK0iUKmoesfWg3A8QsY7kmk7AhqBMHJW15fD0LJeePLb89YNpIJA+mkyeGj5SDpixpiQcAwheiPRtDWk8VhX3oQ/+/xdX0ywXqTZYwk0AWBnGgL/Z5DTUMhwqFdKYHtOYuLKDHhXSfiftXx9eZsVCcMVCdNdGctPLN2L9p6gj6OUlHxgsD0RQ1x+CjDFKI/66V9HY69/vf/3twnQSIna72mv4ouCCRZL+Io/NhincVtyvkDJSa8kNXouSPsOoBfQPTmbFQnTVQnTfz2uc34j9uew6+f2Rh5zoGm4gWB30dQxoEwzCHCwfgIhOibCVZOorGmoZDoICC6iYz+ukqoIFBW5pElJmLMOVE2fzWPoDdroSphoDqZn5IH02lc8T4CdbXACWUMUxhR5M9kza52WLbAEePrvQldiL6ZYIupNdQaoRHIJC1bCOw80INbnvAXOw6zAISZhtq686YhtQheW08WO9xs4bhJOyrDWb4XpkHozdmoSppeZVTAiS460JVFfTpR8jpoFa8RqLCzmGEKE6YRPL56d2Ci/cbfXsPX/vIqgPxEaQvRr8zi/piGVF/Bz59cj19qppZiw0dVjUAtj3HujUuwu70XQLwg6IhpsymFUW/WcnwEibwgWL5pP4679p949PXdkccPFCwIFNhZzDCFCRMEH7zj37juwVW+be09OW+ClJEwfRUE2SK09GIEwfiG6sDrckW+q60Hb+xqB1DYRyCtBrYQ2HGgx9uuRw2pJp4wZG7DfW7piZ6cjaqk4dMIZP+CkXWp2HMNBCwIFFgjYBhnUnt9R3T3sGJ/Jj1ZK1CD3+63aShaI2iLchZ7xeUsPL46vMth1rLxpusfwzk3Oq3UQ01DimlHvqwLQz2PYGJjOnK8ADCi2smC/vw9r7jHW6hOmKhSNAJZIqO5tir2XAMBCwIFmwUBw+Ar972Kt9z0JHa19YS+XmytIb8gcDUCW/jMK4XOpcbi5ywbLR29uHXJOt9xUcXlVGfwU2v3hu9jC6/ap2WL0AifMI1A9zPrpqGRtc4qPpUIn2LTKf/23hCNwDvXIGgEFe8sVmGNgGGA5Zudyptt3VmMrQ+aVIrXCGyk3TlMrpgd05C/zINaCl5HOnzX7OrA2378FKaOqsFDr+7C02tbsGZXO5Z+YVFB01AcatXPPe29hX0E7su6ANNNQw3ujY+sSWFniEBV6yDlLNstOmeiWhMcVQkDtSHCYaBhjUCBfQQMU5hiw0e7s5Y3YcoVs4A/TLvoIm8AVu1s91bvT6zZ49noD3RnQ1feuSLiwdU+ANsPdPvGM7I25Sv5sKutB7e7lUv1uULXCCY1OaahKI1ALYEtj60O0Qia66oGpXMiCwIF1ggYJk/Ur6FYQdCTtbwJ02caUjSCgoJAs8HUaBNl1rJxoDuLUbVB80kxv+ftbvinfCwn5dNnjcJjn12IaaNq8djq3ViyZg9+vmS9t69+avU+TIPwP4sPx6cWHYa3Hzc+9LqfPvtwAEBzXcpzpFclzEBdpJEh91UKWBAocB4Bw0QnYP308XX498Z9RdUasm2B3pztCQ25kncidfL77enojT2PHutfk/Jbs7uzFg50Z9FUE5wwozJ6337cBHzv3ccCAC64+Wlv+84DPZ6GcNsHTkJDOol0ysSutl584PbnYSomrDjfhmkQRlQn8elzDkfandhlZVHJMZPqcdHcSUiZhk8jMLXCSKMGwT8AsCAAAHz8rJkA2DTEMCr6XPfdf6zCe25ZGtAI1CJpMuBCNleRPylPIxDCN7lv3NsZO4ZsAY2gO+MIgrCVc9TvOWfZoSab3pztrezl67WK4EkaxdUlMxVTzrTmWud8pv961QkTqYSBjGV7gqAqYQbe8/EN8dFHAwU7iwF87tw52La/Gyu2tJZ7KAxTduQ8JiKMQ/ocqCZMWULAAPkicQDVWew3LW1q6Yodi27n1wVBS0cGWUsEBIEQIiBEJBMa04GVN+BoEFnLhmmQ97pqs1cdwnHmMbVszfnHjMfO83swoTGNj/32BW97VdL0tIG8aSgonGaOro28zkDCgsDFNAz2ETAM8t3Aola9+iTY4YuzF0iajskGADbv68ItT6zzVr1Zy/bZ/Tfti9cI9N9kQosw2tnmJl1pguD1He2hJSTu+NBJOHXGKDy+Opit+8NH3gAAz5wDALVV+cf7u9SaQ9FjVk1IRIT/XDADti1w+qxReHptCwDHDJRKGMgqGkF10gwI3+nNgyMI2DTkYhrcoYxhVKIEgW4fVzUCOXGr5qLrHlzlhWDmbIEed+Jrqklie2t4rkL+fFotf+03us09fvQIf9LVTY+uCf09nzV7DKqTpq8zoY4azqr6JNR8hTiNwAyJ8jEMwp0fPtl7nnY1gkzO9grWhWkEE5vYNDSoNNWksK8r42tCwTCViJzH1NW4mmypy4duZdK33MlXr7i5bX8+OkdOfLVVidCGLpID3Vn85tnNvm26uUeeV40aInKycsPMPxJds1BJKdm9ar8Sv0YQ7ywOwzAIBjnvX3XS8RHYAuhy36sqRRO5bP5kjBlRjcPHjIi8zkDCgsBl6qhaZHI2drT1FEwPZ5hKQJ38fUJBmwSzSuikXMHrXba2qoLA3b8mZcaGj/72uU2BbS9u2e97LsM/m+vyGsH05lq0dGYwMiSSSJI0ozUCtU6Q+h7s78pg4ezRSBhGbAmORJwAMhwHcXXS9MYgtaWqhIGGtFN64qgJDXjfKVMjzzPQsGnIZdqoGgDApgJRDAxTKfh66iomGt30rvbktTzTkH+nrfvzTmEpCNKpREAQCCG8yX1HiNnoxc2tMAi4+79OBaAIAsU0NLquynEixxjy4yZr1USjlsPY15nBxMY0RtYmYzWCOIuCtEhJHwEA7G5zQmgb0klcccZ0fPltR+LSkyZHnqMUsCBwmeo6ZTa0sCBgGMDvI4jSCIQQmkbgmoY0jaBTMRXJyT+dNAIZufe8sA2nXfcvLN+0Hy9vDY/iG1df7dnOpSBQTUNj6qtxoDuLHs08pcbyJ2I1gryJxvbVNHLyFQyi2PDRQhoBkA8fBYBt7j2MHlGFqoSJK86YHju+UsCCwGV8fTUSBvlUWIapRGRJA3Xytyx18s/vawt/4pYV4izWkX6BmhCN4CU3hPvlra3YcaAHl8ybjAWHNfv2SSYM1CSdSX37gR6MqE6gSjHnjHG1gz0dvb6GNn/75Bn5cyg+gi+99Qifs1nVCPRClI01SRgGhUYNzR7r2PPjNALTIKQSBgyDkHLHsL21GyOqEoGs4sGkZIKAiG4not1EtFLZdjwRPUtEK4hoGRHNL9X1+4phEJpqU9jfGV7JkGEqBTmNqZn2al8AdZVs2cJngonSCFQ8jSBlYuv+bm/yBxwHMgB09ubQnbWQTpmB6B/TIFQr1TubalK+SB05qe/rzHiT+pmHj8bUUflQTDVq6CNnzvBN/mopaL3Oj6MRhDuLj53U4J47TiMgr7Cc1Ai2H+gORD0NNqXUCH4J4Dxt2/cAfF0IcTyAr7jPhwyjalNoYUHAMAD8oZpWhGnI1kxDltJxKwopCGqSJjKW7SvzIM037b05dGcs1KRMn50ecDSSlJkvx9BUm/JN7DOU2HvpkNWDhPSKp+rkrTqLv/DWOb7XRtY6QidMEMhy0WZMaKppkLfyT5nO/+2tPT5ndzkomSAQQiwBsE/fDKDefdwAYHuprt8fRtamvGYQDFOpyEWwz0dgqZN/ft8Tv/GwzzRUlEZg5aOGomjtzCJnC6STZsAebwsBIvKOH1mThDr3HjmhHouPHOvci7tND+nUbfDqc1UjGDOiGjdecrz3vLEmCSLylZEGgFNnjPK0kjjzfkIRBFIY7evMoHnE4NQUimKwfQRXA7ieiLYA+D6AL0TtSERXuuajZXv2hHcXGmiaWBAwjIe6Elf9BWpCWVfG8gsCL48gOiy0N2uDyB83L8/ZlXGS03a4NfzTqaAgkM+lg7ipJuWb6A0izB7n2OvlkQFBoD1XTUt6m0m1LpF0FqssOKwZd115ineNWI3AJC9zWT3vuPryhqxHjpiI/ld5/B7ttW/383pXAfi0EGIygE8D+EXUjkKIW4UQ84QQ80aPHt3Py/WNUbUptBSohsgwlYLlCwtVfQT+/bIhJqSekEQxOcFmLBsmkc8u/7W/vIqcZXsCZM1Op4dwTSoR1AikIHDNKU21fkFgGoRGN4dAZj3rgkDO5c2uOUed2/UcgypNEKgvv/vESfjp++YCgCcgYnLVkDAM731QBcFbjxkXfdAgEKcRXKo81lfuuu2/WC4HcK/7+E8AhoyzGHBMQ209ucjytQxTCXiZxVa4RhBIKAszDWWCgkBW8szknMJuakXOXy3dhKXrWzyT0k5PIzBCTEPO/zrXsdxUk/St6ImcbUA+wklfxcuxXDZ/im+/sH1VU9GI6oTv9aMn1nvjkNvjyleYBnmakHr/c6c2RR4zGMQJAop4HPa8WLYDeJP7+M0A3ujneUqCVDX3d2WwfNN+7G6Pr4PCMIc67T1ZfP+h1b4wzrCic34fQbQg0BvRqMiIIE8QaLV1crZAdybnhX8CQDoZ1AikyUp1Fvs0AqJAfwJdI2iqTWHFV87xGsTEoYamGoY/h0D1LciHVcnoaVX1Eaj3PxhdyOKIKzEhIh6HPQ9ARHcBWAigmYi2AvgqgI8AuImIEgB6AFzZp9GWmDo3YqGjJ4d3//QZTB6ZxpP/++Yyj4phSsfNj63DLU+sw/jGarz3ZKekQVitIb+PwH+OjE9zcIRCmEYgnbu9OctZGWuCYH9nBl0ZC001Kexud0y06ZQZKDwnTUNybm9IJ30TqWMaSvqOOW3mqMB4GhVhoVb91OdkfZxdiiM8qfom3Mdx+QAnTm3C2BHV7vid/ae6VQ3KSZwgOI6I2uCs/tPuY7jPgx2tNYQQl0W8NLdvQxw8pBNnj/sl3LKPk8sYP5YtcMPDq/Hh06d7NupDGTnp7esIBklIv0BLR6/3mwCCGkEuNKEsaF6VGkGvpxH4J8zP/PElAMBxkxu9bTUpM+CTkBoBeTZ53f6f9xFMbEzj3o+d5tMywlBvKUoQyCbyasayrokA8YLg2+88xnssQ2UvPWlK7NgGg0hBIIQoX5pbmUi7dsMtnF3MRPDU2r24+bF1WLe7E7e8f8iuaQA4pQvGjKiKLbAmJyO1lLSc2nK2QCZnY+43H/EdE3QWFxc+KjWCTM5GIkQj8PZTJtJ0MqgRSGFz3KQGPPzaLoxt8K9LTYO84m0dvTmMrS+4btXwSwIpJMa451HvLaPcuxQK1RH3pTN1VC2euebNGN/Q1/ENPH0KHyWiWiJ6HxE9UKoBlROpEWzeF981ialcvISpmPLJQ4HujIVFP3gcf35xW+x+I9xVepvSXEYuiS1b4NdLNwaO0TUC1b/glaGOEQS9ORsGBX0EErUrWDpl4toLjvZf3xUEVy2chfs+fjpOnOJ3tJrkCILxDdW49oKjQq8Rh64RTB5Zg2MnNeD6i5w+x+q9qSYwqaGkY/IjdCY0psvuHwCKKENNRCkA5wP4DwDnArgHwC0lHldZkIJgqysI0mWs/cEMTYpp3D4U6M5a6MnaPpNOGPJ22nuygddytsCtS9YHj9HehC5lMpQaQVhmsRc1ZNlIGhSIzpH4BEHSxFmzx2Djdedj3Z4OLPrBE55GYhqE4xUzkoTIeW3pFxaFnr+vVCdN/OUT+TpF6v2qgkBqRofivBEpCIhoMYDLACwG8BiAXwM4SQjxoUEa26Ajv4Bb3JK53VkLD726E+ceVd4YX4bpK9JuH1fqAciv5ttVjcCloycXmmCpl6FWnaff/vvreHHz/lCNQIZNZnI2qqoToa0kAf9EqmYfS1OSXnJCJ64hTRTqGQsdrRbUU+/T6z18CAqCONPQPwDMAHCGEOJ9Qoi/AhjWAfZSEKimoY/euRyAswq6+bG12MJmI+YQwGsZGdP4BcjbuFWNQE6Ee93kSn1i1U1D6qp4874u/GzJ+lBBIF0VB7qzqKtKRNb0b6pJes2hVMerjOfXK4IGrnOQppZCh6v3e/G8fN8A6SDXM5MPBeJMQyfCSSp7hIjWA/g9gENP1PUBuRLZ1RZUp3e19eL6h1bjzy9uwyOfcVIhWjp6kTAMNGihaszwZyjYdeOwYkw0KlIjUM0dcprd60YSjW+o9pVn1+fhrkwOSZN8GcZhEXfqBF2fTob2FL7zivk4dlIjLj9tGp5Z1+IXBMniNIKD/Wg+smBG7OvyvVryubMwRQn9lBpBdeLQmyYjRZcQYoUQ4hohxEw4OQDHA0gS0YNENKTi/weKONueXBWpJSjmfvMRHHftP0s+LobpK9JeHRbGGbZfWH/ilk7nuz6hwV8HR/cRdGes0MkvpUUrqcKzIZ3EjNG1+iFYcNhoNKSTmNRU41ttA3nTUCE/TX+EtLynX314vq9cdRg/uuwEnH3EWExo9Ef7eIJgmJmGPIQQzwghPglgEoAbAZxS0lGViSiVbseBblzys6UACv+wGGYoEFfzZ+m6Fky75gHs7ej1NIKwXIAWqRFoE56uEXRmLFSHRMqMbfDH7qsmpvrqJOZOHYl7rjqt2FvyBMtbjh54n528pWQR/oW5U5tw2+XzAhVMpSBIp4aRaYiITox4aS+An5RmOOWFyKkMqNs3b3l8Hda7vYzjyusywx+5GtVXxUMNaabpDVm43LpkHQBgxeZWTxBkLYHenIXbntzgTWhS+x2vaQRhPoIwbXpGc53PRKTmM8g4/zlulVDAWWnHQUR4/kuLvGNLwcG0iOw+hE1DcT6CZQBWwpn4Ab8zXcCpFTTsSKeCgmCo24MZRidOI5AQ5ctDZCwbyzbux/UPrfZel32GdROILgi6MjmMGRFMipreXIsn1uzBxMY0zpozGhfPm4RbnnCEkJzM1aSydxw3oeB9hV1nIOlPxJHkc+fOwf6uLBYcPjjVkgeSOPH3GQBtALoB3AHg7UKIs9y/YSkEgLyf4PxjxgMI1hlhKhu5Jhjqi4Oc1yksqBHIaZwIPtNQWJ/hdNL01eQBgjb6rowValaVPoCGdBLfvPAYr8SEs815LFfgR02oDxxfDvTOZX1h1pg6/PGjp3rVSA8l4pzFPxRCnAHgkwAmA3iUiP5IRMdHHTMckCGkE5vS+PhZM5GzxUFHITDMYJOL0QjkRE4gL3zUMQ0FhUZjTTJg9tE1gt6c7XOQvu+UKZjUlEZ9td+E4/MRKOadB/77DPzuI2V2O7q3FFdCejhTUHQJIdYT0f0A0gDeD+BwACtKPbByITsX1VcnYNmOil3IHCzc1nkMM1SQoZlhwQ3e15ng9RvOWHZo2YzGmlSgpWRYGL+aDfyNC44GEeH+FU55C/nTULuCqXb+oyY0FLyfUuM5iw9CIziUietQNoOIvkhEzwH4OoCXABwhhPjjoI2uDIxzC0BVJ02vFkqmQKMajiSqHIayj7grk8NZ338cz61vie0LIB3dhPx3O2fZoWakxnQyUDsnLKFLdZB6VUHdiV8KAlUjGKrmE+MgfASHMnF60FoAF8PJMF4KYAqAq4joM0T0mcEYXDk4x2163ZuzPf9AVlOZN7gRRJLOTDA9nxne/GvVbizbuK/cw/DR0pHBhr2deHV7G7IhPgLLFrj9qQ1eOQmBfB7njMARAAAgAElEQVSBLfxJZZKm2qBpSJ77J/+Rj/KpShqYNaYO31CKvMkEMtnoRjW71A5RQVCpxH0a1yKvMdUNwliGBJfMm4xszsa7507C/Su2A0CgdeWFNz+Nl7662Hve1WtV0DvESC66ZSk2Xnd+uYfhIVf3Hb05rwqoqhEsXdeCa//2mvc8p/kF1FLUkoZ0KlDLf2+7k1+gJoylTMPLuJfo5lLV/N6XCp2DgaolVSJx/Qi+NojjGDIkTAMfPH06gHwruayWCn+g21+pMewHxAxPhrBlyPMLdPTmvKgh1Wy5YW+Hb/8bH17j++52hnyPG2uSgQY8Nz6yBoC/uFoyJLpOD8VUNQLd78CUl8p0kRdJVYSPYKaWGt/FpqGKQe+fO5TIegXkcvmoISUkdPWudt/+r+1o8xVYbA8RBE0xdbRUjSAsI9fwQm39zwGgJjm0TEPvcctZDIeuc/2BBUEM8ouuNt6YNqomoAF0hthWmeHJUM4o9pmGXEGQs4VXPuKNXR2RxwJO2Wkd2bVPNmVRURvLhHVB0x2vqqloqJmGPrZwJt741ltKmrU8lGFBEEPeNJQXBM11VdjflfVNCF1sGqoYClW+LCeeaagn6zNn7uvKYHd7T8CkqaMucKQ2LB3F75k3OdDtS022DDMNGZ6zOEhUd7JyQUSxLT2HO8V0KKsC8G4A09T9hRDXlm5YQwNZ/1wXBJmc7W9IMcTbFjIDx1CzDH30zmWoSpj40WUn+ExDltL05a03PYm9HRnMaA6vqpkyDWQs26cRLD5qHC6eNwmnzWz2tuk2f58gCDENmaTZhpghSzEi8H4AFwDIAehU/oY9+TyC/K9/VJ2Tbr+/K7+64jyCyqFQUxSdu5dvxeW3P1+i0QAPvboLf3nJiW5TTUOqRiD7CoSVkACAOreBveojaOvOYsFho32Tv37vVUruQLhpyPnPYmDoU4zHZpIQ4rySj2QIkgrJI5DOpH0d+RZ+UT8wZvgR1VUris/+6aWDvuZDr+7EqTNHBUo26EjTkKMRBMfZ0plBTcoM5AscPrYOz67f54saagvpYaz7wlJFmoYGi4+eOcOrEsz0jWI0gmeI6JiSj2QIIp3FftOQoxHIph0AawSVxGBHDa3f04GP3rkc19zzcsF9s4pGkAsZZ2/ODnWGHjep0TtOFo9rC/En6OGlaqJZmEZwMJU8+8MX3noEfv6BeYN6zeFCMYLgDADLiWg1Eb1MRK8QUeFv5TAgrMSETKVXJ3/WCCqHwfYVt7jN48Pap+r4BEFEWZQwQXD0RKfWT0dPvpx0WDP7zt4YjSCkRo8ePsoMXYoxDb2l5KMYooSVmAgTDiwIKoe+moYOFmnGCUvAUtumAvnER8sWkUmOjVpewCXzJnuZwxnLxqSmNM44rBmXnjQ5cKyuEaiZwnERNywHhj4FNQIhxCYAjQDe7v41utuGPVIQqGn4XpJZjgVBJdLf8NG+Opkl3W6yYlgHsLnffMR7fPNja7F0XYv3fH9XJrA/ENQIrj7nMF9XrnTSxLffeQyOdc1FKm85xt8iUvUBhAkCr9w1qwRDnoKCgIg+BeC3AMa4f78hok8WcdztRLSbiFYq2/5ARCvcv41ENKTLWcvVvzrRy21/e3m7t419BMObnqyFL/75FezrzPQ7fLRYAdLS0YsbHl7jCY4ojUAXLNc/tBr3vLDVe97aFZ4z0Jj2N5mpSpi+DOGqiL7dALBw9hhfbSV1ei+2fPOnFh2GGy85rqh9mcGjGNPQFQBOFkJ0AgARfRdONdIfFzjul3B6G/9abhBCXCIfE9EPABzo43gHFTnpqzkDcuXz+Oo93rZy5BHYtsC3/v463nvyFMwYzRXvSsm9L2zD757bDJMIs8b07722bIGQRX2Ar9z/Kh54ZQfmTW3CmYeP9iJ1arRqnVk7fvERKQg001AqYSCZyE/ixfTb/eWHTsJdz2/2OYNDNYKQYz99zuEFz88MPsUIAgKgznQWijD7CSGWENG00BM6uuLFGOJ9j+VKSV18hWVE9mQtvPe2Z2EQ4c4rTh6Usb2xuwO/eGoDlq5rwd8/tWBQrlmpyAqepkH99hEUG20ktU9pjpSmoRpNihQ6X5RpqF4zDaVMw1cMTm9LGcbC2WOwcPYY3zb2ERzaFCMI7gDwHBH92X1+IYBfHOR1FwDYJYR4I2oHIroSwJUAMGXKlIO8XP9ImAYMKkYQ2Hh6bUtgeylRJyemtEh/UNKkfoePhoVzhiE/T3kdGalTrfcEsOLPV6xGkDTJZxqKKzIXR5hpaAhX42A0imlVeQMRPQ4njBQAPiSEePEgr3sZgLsKXPdWALcCwLx588r2lapKmD7TUCpk5VMOZ7FcMVYNsZotwxEZlplKGP2e3Ip1FuuCIN9EJn/8y1tbfcEKYbR2ZZA0KSAwdB8BEflMQ021hTUC//HOhB/uLBbePszQJlIQEFG9EKKNiEYC2Oj+yddGCiH61Z6JiBIA3gVgbn+OH2xSCcMnCMIm3rIIAtdBHefcYwYGWWIkaRpFOX33dWawvyuDmYrvplhnsScIhOwt4Kzs1bSAd/zk6YLn6cxYaKxJBjSD+nTwJ69O4iP7KAhMIuSEqOiCbcOBOI3gdwDeBmA5/H4fcp/P6Oc1zwawSgixteCeQwDdFBRmGgqr4y7Zsq8LrV1Z7O3oxYtbWnHlmTMGpF+rNA1VFeHcYw6OvGnIKMpHcN4Pl2B3e68vwqZYk5IUBDklOcw53m0pWeA8soAcEL5o0U1MgNOX2HvcR9OQ4dpOQ01D7n9iL8GQJ65D2dvc/9P7c2IiugvAQgDNRLQVwFeFEL8AcCkKmIWGEropKCAYTAO7DvREHn/xz5Zix4EeJAxCzhY4fnID3jxn7EGPS5qGwkxVlcDKbQewq60Hi444+PeyEFllYi1G+9vdHswC7qsgkCHJMnxUagQy0ziKdMoEZZ3vh+oEllQnTDz7hUU45TuPetsSB6kRAOGmobH1Tpby/Okj+3ROZvAppgz1o0KIRYW26QghLovY/sE+jbDM6Ksq/QtfW2X6KpHqtLjF6aSzsJCTr1jkKrVSTUNv+/FTADAoPYOlIEiaRqBgm22LQAMWiSo0ihUECfdcsuud/JyXrm/Buj0doaUfVJKmgaRpoLej19c5b0JDNbYf6EEqYWBcQ3Xk8SOLiBpSkYIrTBBMb67FY59diCkja/p0TmbwiZxFiKja9Q80E1ETEY10/6YBmDhYAyw3ugZgap6v2gJmnrlTm3zP+5thqpM3DVWmIBhMVNOQPqHHxfOrk3bfNQLLd+3Xd7Rh0Q+ewPUPrYo9PmkSPrVoFgB/qXQ9D0Hno2c6lt5iwkdVpAyMSiib3lzLkW2HAHHfjo8CuBrABDh+AvlptsFJFKsI9IlWX/0Vsvfr2nmxYYSFkKYD9hGUHmlzFxCBVpU5SyDqK6CWcn5w5U5ctXBmwWtJuSI1D71fdqEwZVsIvP/UaUglDLR2ZfGdBx3BcfrMUVi7uwO1VeHfl8+fNwdXn314nzuHGTEaAXPoEOcjuAnATUT0SSFEoSziYYv+w0hogiCsGJiKHuY3UEXL5IqRNQI/3/3HKvz08XUDajKS5jzbFoHon2xElU/AX8r5u/9YhWMnNeD0Wc2R+wP5ib9b0wiKRfYkuOQkJ/dm9rgRGFVbhTnjR+DS+VMwqSncTGMY1K8+wnE+AubQoZg8gh8T0dEAjgRQrWz/dfRRw4eAaciINg2FacD6D3mg6tnLFWNYQ5BK5qePrxvwc2ZcM5wtgq0q43w+eo/gPSFO5MC1XEGgm4aKRdcg1AzgI8bXe4/vuepUxMiwopEaAecKHNoU4yz+KpzonyMB/B1OWeqnoNQQGs7oUTl616XaVP4tDFOre7Uf8sCZhrji6WChlnfWfTxhGoEM4WzTHLtx2oNETvxRpqFCFHImS+ZOHZhInvnTRuKBV3ZUbPTacKGYT+8iAIsA7BRCfAjAcQAaSjqqIUQhjaCmKr5LU8A0dBCCQG02Ik0Hg10fvxKRE7gtROD9zoVoBFURXb6KiRhTncN/fnGr7/tTaNWdMCi0j0Ap+f57jsNfP3FGnzOSmaFFMYKgWwhhA8gRUT2A3QAG99tWRlKaM1Y3/xRyFusaQX/r2d/7wlbM+tKD2NzSBQDodleMAxWFdKiiO28Lbe8P8jN0BIH/tbAVu0za0k1DfdEI1uzqwKf/8JLv/IV6Fv/5Y6fjuncfW/AaA0k6ZeKYSRWzLhy2FCMIlhFRI4Cfw4keegFOGeqKQHfG6k02VB9B2KSsTxT99RH89SWn/8Ebu9sBAF3ZvN1aRwgR2nx8OBL1dg5kb2E5gVt28Lxhk7v8hqzf01lwXwD42RPrsHZ3R+g+qlYR1mZSRa0ZxDB9oZgOZR8TQrQKIW4BcA6Ay10TUUVQKJxO1QjC7P8D5Sz20vXd33qXV3ogeL5fPbMRx37tn9iyr6tf1zqUiJpcB8oXo17DFsHw0U0twfdYCv/nNrSEblc50O2EeL7z/z2Nj965DBtburDgsGacOmOUe838vmF1glQ4cofpL3EJZSfqfwBGAki4jyuCQk4wNXw0bFIeMEGgtf3zqlIKge88+Dou+ukzsG2BVTvb8M/XdgEANrZ0hp5rOBE14Yf5Tixb4DfPbsKB7myfPgf5GYaFj67cFuytJPffur/bP1ZL4Orfv4g/Ltvibdvr9h1u78nhoVd3YW9HL2pSJt49d1LgvIVMQ+ywZfpL3BLjB+7/agDzALwER+s9FsAyAKeWdmhDg0Jx+qppSE4Sti3w3X+swn+cPGXATENyYpPKv7Q/2wL42RPrAQAzvvh33zEHaya/8eE12LK/CzdcfPzBnaiE5CI0grD3+e7lW/B/963E/923EpfNn4LvvOuYyPPe8PAazBxdiwuOn+j5CKwQH8ErIYKgN2dj8sg0tuzzC4KerIX7VmzHfSu24+J5jpstLKQ0aRqoC0n8KiQIWCNg+kvkN0cIcZYQ4iwAOwCcKISYJ4SYC+AEANsGa4DlRhUEx0wMOsXU8FEhHCGwrbUbP1uyHo++vjswIfXXWSwPk+Gr0gcQd76DjSi66dE3cO8LQ/ujjtIIwgRBW3c+tPKu5zfj+Gv/Gbqi39eZwY8efQOf+r3TUlt1zKt+oAkN1d6KXr2uZQs0hZRqaO0O+m3CCtSlEgZqUsE1WmHTEPsImP5RzBJithDiFflECLESwBGlG9LQQtZoOWFKI/76yTMCr+s+hJwtvBrwatEvSVyUz4HuLHa3+yuZrtnVjo17O32NSYD8pBYXHVMJ8URh4ZtAcZpXa1cWtz25XtuWwYnfeNi3zRMEwi9cq5NmwEchzUJhjt1tmqkICNcIqhJGaCmIws5i1giY/lHMN+dlIrqNiBa6fz8H8HKpBzZUqHV9AN2Z8AQufRVmC+H1i+3oDR4T58Q847p/Yf63HvVtW3zjEiz8/uNeDRrLFsjkbC+PIHbCc1/qyuS8uvbDDTkR//2VHZh2zQPe9jBNSRemAGBqxaD0kE8hhBehZQnhy8atSpqB3ABZDDCsnPP21rwg+PJ9K7F6Z3uoIEiZRmgxwxGuaUgNXHv+S/kiwOwjYPpLMd+cDwF4FcCn3L/X3G0VgfxB6vkAkjCNQAqCW54IljuI0wjiGtzISSxj2WhXQkPj5IBcvZ78rUdx9Fcfit7xEEYK1nuW+/scxRQF9aHPnXrCYMayPWFr2/6ooeqkEQgGkM/DTEPbFEFw57Ob8OQbewKmJcD5TtWGmYaqnW2L5uTLRqg9B9hHwPSXYmoN9QC40f2rOKQgiNIIpo2q9T23LIH9Mc1DDtZHkLOEr3RBnB9ACok4AXOos3pnG5ImoV4zm+SKlAS6RqAfpn7utvBHDVUn8qahtp4s6quT3oIhzIyj9zLozdmh36vqpBmqEdSnk/jH1QswdWQtjvnaQ8jZAglFI+Vyz0x/iQsf/aP7/xUieln/G7whlhe5MusOqe3z+GcXYly9v8mHJUSgUc133nUM7v3YaUgljKIKfYXZ/T1BYNs+80WchjGQ2bVDlf/6zQs447uPBSbe/moEen8BdfK2bL8GVp00kLVsPPLaLhz7tX9i+ab9niAoptNXJmeH1owaUZ0IrWqbMA3MGVePdMr0KoUmQ7qQMUxfidMIPuX+f9tgDGSoIp12YT9Y06BAf4KcbXumIUl9dRInTmmCSeT1ntXpVFbtXRkrsCKUK/+sJXzZpkW4CHw8u74Fu9p6cMHxw6u3kK4RFN0sXssU153PXZpGoAreatdH8MArOwAA6/Z04OgJTmRZMb1/s5YdusCor06Ghi2rJdBrUwm09+RgGoQ/XHkKnl67t+D1GCaKuH4EO9z/mwZvOEOPuhgfgVTFX/zyOfjby9vx5ftfhWUHNQJp3zcNitQIVKfh7vZe/OLB1/G5xXOUczjkLNsXjRQ34YVpBJfe+iwADDtBUK217Cw2X0N33utRQLppKCxqaFebE+m1akc7/vduR1kuFOEDxGkEyUApE8AvCKTGkDQJJ88YhZPdTGSG6Q+RgoCI2hG+qCQAQghRH/LasCOuxZ/8YTbVplDlFhqzbIFWTSOQgsE0CLYQ2HGgG6d+51/45YdO8urFy8kEAH7z7Cb85tnNSJl584Cc1LO28LqTmQbFmn8GsszCUMfSVvJhgoAQnFzlinzt7nYsXb8Px2q5Ij6h6+YISKRpSH52D726EwCw4LBmnFLExJyxbO+zVBlRHf6dU53B6ZQJ06BQgcEwfSUuoWyEEKI+5G9EpQgBAKgLid6QqGYhKRQsW2Cf5iye4DYLNw1CzraxfNN+AMCfluUjXf6glB2Q2bJqRImcf7I52wtRrEmasbbwgSy8NtTJ6ol7IfceFj4qV/wX3vwMvnzfykAmeFdW1wjyr1UlTNgCWOcWl9vW2o3JI9O484qTvQqkcfzh31vw2o62wHYpCE6Z4e8ZYGoaATuHmYGiaE8TEY0hoinyr5SDGkrURPR4BfyquvxRqgllAHDh8ROw6IixAJysYMtWm6Hnj3/yjb2Y2JgGkI/yUX0NnmnIzq8i0ykz1jRUTP37ctKbs3D7UxsiBdY/X92Jadc8gJ0HekJfV9F9L8VmVUuNQOZZ6D0EfKYhG1r4aPC7IT/DYogKSZb5AjddegLeM3eSZ2ZSI4TSqQSSLAiYAaKgICCidxDRGwA2AHgCwEYAD5Z4XEOGuNhsI0QQ2JpGcLRiajAN53U5AaiN57t6c5g6yuknK00NezsUQaA4i6VduSZlxpuGBqIXYQn56ePrcO3fXgvkAEh+/29HSwqr56OjO3mL1Yb08E09oUx14lta+GiYQ3dCHwRBFDJfYGx9Na5/z3HeddScgdqUiQTnDTADRME8AgDfAHAKgEeEECcQ0VkA3lfaYR0aJEJMQ50ZyxcJok4WCcNAzs0MBvLJaLYt0JmxMNYNRd3RKgWBahpyJqCclbdTp1OJ2Kgh3VyiIoSItC+/svUAHn59V/SJBwipOUXlOci6SsWs7nXtp1j/iO6sbdUc/fKzTCdNp9aQL3w0XiO47l3HIGsLfPm+lUWNRTJCKy4nL+nXCEzf949hDoZiBEFWCNFCRAYRGUKIx4johyUf2RDiGxcejdljRwS2mz6NwJnU/6VNoKpGYRjOpKYLAjnZjKmvAgDscE0h6upUrnhztpPpmjAIKZNiV75WjEaQsWyfRqLy9p885XseJzRKiXzriunCpieQSeEx4wsPYGJTGl97+1Ghx+nhm7pGIMNHa6sSIeGjwRX56BFV3uNL509BT9byBMGPLzsBG/d24gcPr/EdM3VUja+vgX5eKQfViX/h7DEFu+MxTLEUo1u2ElEdgCUAfktENwEY/oXuFd5/ylTMnx5s9q3GoMtJ60f/WuvbRy1B4eQRCM8hKV/rdCNTRrlJSHJyUssXyGMc05CNqoQBw41CiiJuVRxlnw6jVE7nQrJFCtpicgJ0jcArCyGALfu6ccWvloUeJyd6ORZdEDy73mkuU1tlwgopOgf4J2g9bNRQbvLtx00I7e178bzJ2Hjd+VhwWLM7Fv8bI81/qmnoHcdNwLUXHB16TwzTV4oRBBcA6AbwaQD/ALAOwNtLOahDhTCNQEfVCJw8AoFed6KXr3W6xeniYs+zniBwooaqkyYMihcEcc5ivUZOHMVMxG09WUy75oFIe38Y8rRRfo68aaiIMeoaQchB+qbalOmZhrzy3pogeHz1HlQnDSQMchrThGgEZowg0CN7wvwKctsvLj8JL39tceD1MNMQwwwkcSUmbiai04UQnUIISwiRE0L8SgjxIyFES9RxyvG3E9FuIlqpbf8kEa0ioleJ6HsDcRPlQl25qavCOePyZqQwQdClNZ6XDsk4QSAjhXJu7Hl10oRJVCB81P+imizVJ0FQxEy82TVt3PbUhqLPWwg5OXdncr48izB0Z3GYNqTb/yePrPHeVyNCIwCAmlTCywFRw0ulaS1OEOhm/KoQv4K8z1TCCG0+Y3saAQsCpjTEaQRrAHyfiDYS0feI6IQ+nvuXAM5TN7iO5gsAHCeEOArA9/t4ziGLagL4x9VnerVm1BWgQQRLCM8UJCdmKRhGVCcDE4dE9ijO2gI9OQtVCQNE8Y5UXSNQI2D6axq678VteDTEkSzH0Ze5qljT0OfveQUnf/vR2H0DeQQh78tuRZiMb6jG6bOavYldJpuFCYK0q33pOSKy7HOcINDNPNUhGkFU32VJ3kfAUUJMaYhLKLtJCHEqgDcBaAFwu7uS/yoRHV7oxEKIJQD2aZuvAnCdEKLX3Wd3/4c+tNDVdmnuUDWChOmYF2Sfgo0tnejszWH1rnYATjhoVEhgpysscpaN3qyNKndyivMD6A5UtSdBXzQC9TRX/2FFqL1dDsPoh1NZn7O37u/CtGsewPMb/F8fdcLUBY4eKhtmGtqpCILrLzoONSmnRITjDHe2yy5i5x41Nn8tA54Zbp8S0pv0wjqjBYHO7HEjMGtMna8WUWFB4GoEbBpiSkTBJYYQYpMQ4rtCiBMAXAbgQgCv9/N6hwNYQETPEdETRHRS1I5EdCURLSOiZXv27Onn5QYPuSqUq0Q5DenO4pwtvJX5Q6/uwtk3POFFldRWFU4SylnC9REYMA3ysoyj9lXpryAopqSzpxH0w3yhZ/NKAaDW7wf8oZ667T3KWayyQ0lMMwznsxLC2Vd3FqvVQ23buV5P1vaFusqEQHUsevE7namjavHIZ97kO3+mQOIf+wiYUlNMQlmCiN5ORL+Fk0i2GsC7+nm9BICRcPISPgfgjxQRlyiEuNXtkzxv9OjR/bzc4CEjiNJa+WA1e1hG+agF5tTJqSZlFmw36NQaypuG1Am9NmVi1pg677muLXT0qKahaAGiU4yzWPTRNPTSllbc8fRGZyxaCGeUVqGGeupfG90fYmlNZABg8758iKZJ5L3XW/d3e4JECgI1R8AWAgbl8zq+/c5jsPG680NNQ2EJiB88bRru+JB/zaNqLKcWqkvEpiGmxMQVnTsHjgbwVgDPA/g9gCuFEAcTOroVwL3C+YU+T0Q2gGYAQ3/JXwA5GaTdCUTOQSlfQhnhuQ37IlfjdVWJgj/2bM5xFo+qc1aUqq2fiHxmCt1c0t5PjaAYZ7HUPoo1DV10yzPeY91fEXWKnkx+v6RBUCs66ULPEiK4TXluGuRN2gu//7i3Xb4vqiCwbAHDIK/RvFzNy+MLfWZfe0cwh0EO5XcfORmnzowXBDabhpgSE5eR8gUAvwPwP0KI/QN0vfsAnAXgMdfPkAIwLAqpe4IgJQWB8+NV+8gaRLETcDplehpE0qTQ8M+c7YaPJkxkLBtZ7XzqZKE7UAfCWRyFNO/o9f2jUE9Z7FhUjUAXOLqd3bJFZGN7wNHOUjETa9qnETj3JR3FzXV+QWAYwK3vn4stIc3po5DvaViUkI68i/74XximGOL6Ebz5YE5MRHcBWAigmYi2AvgqgNvhOJ1XAsgAuFwMkzZaXtkHqRG4230+Aq02kTrBnn3EWFQlTG8ir0klQiNYZEJZddJAzrYD9nU1nyHnOkIlUYLgt89tQjpp4l0nToq9tzikgCtmrtrV1uM7p26mispiVktC69FS+qRvCxHr2zCJYutIqdm9jmkoP6YxI5xSIFJoJwwDi48aF3muMOT4dVNiGPOmjcSSNXs4fJQpGSXLURdCXBbx0iFfp+jwsXVYs6vDt82rSSN/2O68pOcRSBKaIPjcubOd/d2JvDZlhgoCp/qohaqEia6MhV6lnv2oupTP2ZzTEqBk4hrgd9B+6c+Os/pgBIFckRezatVDQXu1mvxRZ1A1At1voWs/OSteI1BNQ2GompxjGnIen33EWEwe6dQTkmfvTzlo+Z4WU676p+89EZtauoral2H6A3uf+sFfPnEGXvqqPwN0jFtjZvGRY33b4wSBiuw45WkEEXVksjmB9p4cRlQnYBCh152A508fid/+58m+a+Qsf7XM3YqTWnfQxmELx/H640ffUM7tn7ylhlHIn9kZUmCu17Jh2wJ/eWm7L4JHp8fXG8D/2ktbWgNjjgutNQ2KdMwnTfKF8ap9CI6f3OBpLHIyL9Ycpo9PXqsQtVUJHDmhYlqAMGWAq1b1g+qkGVidTR1Vi+e/uMgrOhYVPiq36XOUbiKoDTEZNNdVYWdbD7qzFppqU9h+oNszybz/lKmY1FTj8xHIAnWSW55Y5z2WGkExlrmcLbCxpctXLK07a2GEMllmi3QWy6Y8Kr1ZG39avgWfv+cV7O/MoLmuKuRIoFtxFhcat2X7NZlUwvD5Z8wYH0HCMHyC+rAxdZ6gUz93WXL6igXTY8cSxjlHjsNdz29GbUzjI4YZLFgjGEDG1Fd7q8V8oTB/+CjgZBvr5hY5Icj5TW9eDwAzmmu9EMjGmqRv0lXt1ZKcFb0qluaYsObpOpYtsGKLfwLX6/jLSbaQINivtfEEHB/BVtfRGva6d80YjSA4ZtVYGn4AABaBSURBVNvnQK7RBKtB5BPSAEIbwIxvqMZtl5/kaVA1ysTdkE5i43Xn4+J5k+MHE8K1FxyF5764KPRzZpjBhgVBibj1A/OwaM4YX/RJwhMEps+RSRQsPVwTslKc3lzrPW6qSfkmXekkTug+ggg7+Zpd7djd3uNV14zDsgVe2OQ3vXQFBIEs3uaYjRZ871/4+ys7AucKi4TqVZq4VyfNSCevz0dQQBLo/YX1lXeYj6DJzfZNmoYnQM+aMwYja1OeBpVODcxPJmkaXv8Jhik3vBwpEafPasbps5p926RGUJ30m4bSSTOvSbjbakNaZM4Y7RcEPp+D1AjU8FHLDk0GWzRnDB54ZQfufXFbUfkEli2wYa8/feSjdy7HQ58+U7lW3jTU2Wthy75ufP7ul/HWY8Zr5wperzeXb79ZnTAinbw9rvApxpy1rbXblx+gawRhUUNNtSlsbOlyKo0Kv/1falBpdtgywxDWCAYROanopYjVyUVOcgU1gtqkz6ma9DQC1W5vh66cT5vVjPaeXNFJZbYQ2Lq/y7dN1keSeMXbiLwm8T0h2cthGkFPxvJCSA0jutmO1AiKKUv9q6WbfM9157thBLOAm2qc/ICEQZ4wksI2rxHw2okZfrAgGEQSnkag2auVlb2nEYQ4i9XyEbppqCqkNn53xgr1EYzro0kikxOBuj8qu9t6cP1DqwE4piE52YcmxIUUWNu6vwvd7oq7N2tH+jXygqD4vAZJjfaeO87iCEFgGoHSz71Ky0qGGW6wIBhEVGexb7uyspdznG7KAPwaQWNN0he2KG3gqmmoozcXWoVTbacoiYuF39nWHdvkRgoBIGib11En+ROmNOLr7zgKnRkL63Y7eRk9WSvaR+CahvrTMS3UNJTw37P0ESTMfFVX030/ZdQQCwJmOMKCYBCRE7euEailiz3TkGLKkCWLiQi/+vB8XDJvMqoSpi9mf0S1KwiUCb0rQiNoqgmWNYiLZ9+4tyt0uxQy6hUeXbUbz2/Uq4/nUQVKyjRw9EQnPv61HW0AHHNSlI+gN2fh7uVbcfp1/4o8fxRhWphuGpJN42XvASD/mQ20s5hhhhJs8BxEpPlG1wjOVcoT5J3F+Y/mic+e5dXSf9Pho/Gmw51qrKppSO6vJkJ19OZCnbNhfXOTMZlgatVOAJjYmMa21m5kbRtVhhnIBP7vu170jeFAdxY5y8bUUbU+05BlC8xorvMd25MN92sAzqr8s396KXKcceiCzqSgaaha+XxkRM+kphoAeU2NfQTMcIS/1YPIaDdRSq2l859nTMfHFs7ynnt5BIopo6EmiYaQVbxfEDj7q6aL9p4clq4Lhoc2htTMjwv/39TS6Y2pM2N5yW8/e2I9PvnmWdEHAnjHj5/Cejfi6InPLfSVgjj7yLGo0aKjerIWshGmoULO7QWHNePJN8JrGOo5A6YZ1Aik1tBcV4V3nzgRjekkFh0xxrcPm4aY4QgLgkFkbIOzyjzQncVDV5+Jl7a04uKTwpORwqKGdKQVKGUaXv9caSKSfPn+VwPHJUwDDemkr5ZRXDmGTS1dGD2iCkIIdGYsz95+w8NrMG9qk6/hjc56Jex0b0cGOctGwiA8/6WzPRMVUV4A9mTtyNyHsPIUkvnTR+K6dx8baTbSJ30zJKFM7jOqNgUiwtlauRCABQEzPGGD5yAio3VauzKYPW5EqBCQoZdheQQ60vms7juiiLLGQNBPIAVBmHO5pTODSU1pyHJwqr29M2P5+vjGkTAcJ2zCJIx0J1siQnUif77eXLhfAwC2t/ob2P/f+Ufgv940E4AzsU9sTGPjdeeHHhvQCAwKmIukYJS9HsLQE/8YZjjA3+pBRNqdw6qKSvJRQ8VoBFIQ5PfVNYIo9PNL232UWWbaqFrPfKRG4Fi2HVsWQsU0CFnLDvgj1Mm1J2v7ooZGKPe2q90vCFIJA9Oba7xzx6ELAiMkoUzex6iIWkdAdIlshjmUYdPQICI1gv1dhQWBXOVfNn9K5L5y7qtTBUGRtWv0la0tgC37uny9dFUuOH4Cnlnn2N9V80jWEtjXGX0/OjlLBDptqZO0rhG8afZoHDWhAfev2IZVO/1JbETkCcNC87PuGDYNgurmvueq0zCpKY0Neztx2UnB93zm6Fqs23MwzfkYZujCgmAQqU8n0FSTxGcWz47cR4aPViVMrPz6ubE2aWka8gmCIk1DYbXtF3zvMbz0lcUhewNnHjYahGBfZtkf4YozpmPFltbQ6qKSrOWs9hNmcHUu6claPh9ByjRw1cKZeHZ9S0AQGJTXBPx9g4Pd3fTVv0H+1f3cqU0AgJ9/YF7o2P/88dNxIEaAM8yhDJuGBhEiwotfWYz3nzI1ch85fSUMQl1VItbk0RfT0G//82Tfcz2EVdKVDXfIGgZ5GohqGurKWMjkbKQSRmjWsErOFshawtc8B8hrQYA0Dfl7CwNB0w7g3L98XRUm/xGiRemCQBUCY+ujTUGS+uokJo+sKbgfwxyKsEYwxJCTYjFdr2SyU10RgkA3BUV1u9qjNK/RkZOnqqV09OSQsWykTCOwCjfIXxcoa9nIWbaXrRuGnlkstQfdtAM4958XBPntX37bkTj7yLF4/y+ez58n4v3840dPxbRRPMEzlQ1rBIcwclGrloyQpqGUaeATZ83CKNfmr/cJiBIE24powK4mVbW6ju8qt4eyiu6QtmyBrC1ik9d6cpYvoSyhaQS1KdOL9iHKC0P1/hKmgQWHjfadNyraZ/70kRjD5aCZCocFwRBDOocbQxLIdNp7HDPORLdTFpDXCDKWjc+eOxuzx40AALT1+E0+URPjG7s7AtvkvnL+VjWC/W7oaEqp4Z8/zi9snD7CdsBZLBlZm0J3xvJpFnJfqRE01qS8nAmDyNM44jSoD5w6FZeEOIAZhnFgQTDE+O9Fs7D2W28pKnx0xwFn9T5BEQRy8v3Q6dMAAOcf6/QDGKMVmqtKhGsEa7Ty0gBwnlsCQzqLVR/Bv1btds8X7COgF3pzTEPCVyobyOdONNel0N6Ti9UIGtJJz79hGPlKpHGd0f7nnNmhPgaGYRzYRzDEIKLIFbPOjgNOXP34Rr9pY8N33urZ89978lScfcRYjK2vxjcuPBoz3eY2URPjG7v8GsE/rl7g1QOSi241ami361OoSpiBGkG6IMhJ05AZ7ixurqvCml0dvjaYh411NBrp7K1PJ9Da5QoCyjeQMWI0gjifBMMwLAgOaeTKWDUNAcGkJ5nIpkYrRa2g1+xu9zl554yrD5w3LKQ1lTB8PYIBv8AA8s5iPXxUIpvWt3bnE9TefuwE7/wAUFeVRFUybxqS14yb66McxQzDOLAgOIT58WUnYun6vf3qfRsxF0MIJ7N2b0cwekhOp2GO5lBBoO1n2cIpMaGHj7r/PUHQlcXhY+tw91WnecJECoL66kTeNKT4CMIEm+l2OysmAothKhk2nB7CjGuoxjtPmNSvY+XEOao2hWveMsf3WnNErR0514aZrsKcxbogkM5iPabfu+4I57qtXVlUJUzUK8lxKfeaVUnTEwoG5WsjhZV+uP/jp+NjC2d6gmf5/50del2GqXRYI6hQDM+HMAWHj/X3BJgxujaQxQvkJ9uw8M+qZNBZrJuGnt3Qghc2t2LhbH9op0Q1DY1r8Gs5cvKvShieRkCKjyBMthw9sQFHT2zwno+qq8I/rl6ArfsKh8gyTCXBgqBCkeYSWyAQxXPMxEaMb0jj2EkNvu1GAY3gzivm46JblnrbdI3g3he2OcdHmGpkz+CerB1wKEurU1XS8ExTplFc+KjKnHH1Pr8HwzAlNA0R0e1EtJuIVirbvkZE24hohfv31lJdn4lHTpyWCNrQZ42pw5ffdiQuOH6ib/sNFx+Ps48YgzluboJKKmFg3rSRuOHi47xtYX2XAaCz1/I9v3LBDADAeEUL0E09vTnnmKqE6WkEQghMccs+HDmeJ3eG6S+l1Ah+CeAnAH6tbb9RCPH9El6XKQI5z9pKs/kTpzTiPfMmY9GcMaHHHD2xAbddfpI3KavIvAQ1LLXaFQSytaVEL8P9kTNn4CNnzsCutnyZ6ec3+Psey+bxjmnIOW/GsrH4sGb87ZNn4KgJLAgYpr+UTCMQQiwBEN3FnCkrsjSDLYRXGqI+ncRl86fExuQD4T4CKQDUmkDSNLT4KH+nr7ae8CqedTEltHuzeUEgryW3HT2xgfsEMMxBUA4fwSeI6AMAlgH4HyFEdN1ipmRMb651/9d5JR10X0EUYYJCmmtUjUCu3HWfQFRjnpqUiUvmTUZTbQpv1rSSvGko7yzOFKh2yjBMcQx2+OhPAcwEcDyAHQB+ELUjEV1JRMuIaNmePXsGa3wVw+KjxuGeq07FZfMne6Yh3UEbx+fOnY37P36691xOzmpJbLlIdxrI5I9t7wkvdU1E+O5Fx+Kat8zB/Okjfa9dccZ0TG+uxVuOGe85i3uzQRMVwzB9Z1A1AiHELvmYiH4O4G8x+94K4FYAmDdvXnRndabfzJ3qTLanzRyFiY1pfPysWUUfq+8rNQG1F7Kc+4UQSJgGMrn+r+BnjK7DY59dCCAvdHoO4nwMw+QZVEFAROOFEDvcp+8EsDJuf2ZwaKxJ4elr3nxQ55CCoLEmn4wmNQIhgKRBkIUjZEG8/nLCFKebmJ7/wDBM/yiZICCiuwAsBNBMRFsBfBXAQiI6Hk5VgY0APlqq6zODi1cmOq1qBPlcBRmi+pEF0/Gl8488qGudd/Q4LPncWZjCDWUYZkAomSAQQlwWsvkXpboeU15kIbmognJSEBRTXrsYWAgwzMDBmcXMQXHbB+bhry9vD33NMw1BeJFJtVXhSWYMw5QPFgTMQXH2kWNx9pFjQ1+Tsf1CwHMUD5RGwDDMwMHVR5mSoQajyph/1ggYZujByzNmwHnfKVPw2vY2JWooH/3bn94JDMOUFhYEzIDzzQuPAQD8a5WTNjJrTD7MU7a9ZBhm6MCmIaZkvHnOWNxz1al4n9Iic2x9VRlHxDBMGKwRMCVFZi9LuDgcwww9WBAwg8KNlxwHs8iidgzDDC4sCJhBob+9lRmGKT28RGMYhqlwWBAwDMNUOCwIGIZhKhwWBAzDMBUOCwKGYZgKhwUBwzBMhcOCgGEYpsJhQcAwDFPhsCBgGIapcFgQMAzDVDgsCBiGYSocFgQMwzAVDgsChmGYCocFAcMwTIXDgoBhGKbCYUHAMAxT4bAgYBiGqXBYEDAMw1Q4LAgYhmEqnJIJAiK6nYh2E9HKkNf+h4gEETWX6voMwzBMcZRSI/glgPP0jUQ0GcBiAJtLeG2GYRimSEomCIQQSwDsC3npRgD/C0CU6toMwzBM8SQG82JEdAGAbUKIl4io0L5XArjSfdpBRKv7edlmAHv7eexwgO+f75/vv3KZWsxOgyYIiKgGwBfhmIUKIoS4FcCtA3DdZUKIeQd7nkMVvn++f77/yr3/YhnMqKGZAKYDeImINgKYBOAFIho3iGNgGIZhNAZNIxBCvAJgjHzuCoN5QohKVtsYhmHKTinDR+8CsBTAbCLaSkRXlOpaBTho89IhDt9/ZcP3zxSEhODgHYZhmEqGM4sZhmEqHBYEDMMwFc6wFQREdB4RrSaitUR0TbnHM1AQ0WQieoyIXiOiV4noU+72kUT0MBG94f5vcrcTEf3IfR9eJqITlXNd7u7/BhFdXq576g9EZBLRi0T0N/f5dCJ6zr3PPxBRyt1e5T5f674+TTnHF9ztq4no3PLcSd8hokYiupuIVhHR60R0agV+/p92v/8rieguIqqupO/AgCOEGHZ/AEwA6wDMAJAC8BKAI8s9rgG6t/EATnQfjwCwBsCRAL4H4Bp3+zUAvus+fiuABwEQgFMAPOduHwlgvfu/yX3cVO7768P78BkAvwPwN/f5HwFc6j6+BcBV7uOPAbjFfXwpgD+4j490vxdVcMKa1wEwy31fRd77rwD8p/s4BaCxkj5/ABMBbACQVj77D1bSd2Cg/4arRjAfwFohxHohRAbA7wFcUOYxDQhCiB1CiBfcx+0AXofzw7gAzgQB9/+F7uMLAPxaODwLoJGIxgM4F8DDQoh9Qoj9AB5GSG2ooQgRTQJwPoDb3OcE4M0A7nZ30e9fvi93A1jk7n8BgN8LIXqFEBsArIXzvRnSEFEDgDMB/AIAhBAZIUQrKujzd0kASBNRAkANgB2okO9AKRiugmAigC3K863utmGFq+KeAOA5AGOFEDvcl3YCGOs+jnovDuX36Idw6lXZ7vNRAFqFEDn3uXov3n26rx9w9z9U7386gD0A7nBNY7cRUS0q6PMXQmwD8H04hSt3wPlMl6NyvgMDznAVBMMeIqoDcA+Aq4UQbeprwtF7h2VcMBG9DcBuIcTyco+lTCQAnAjgp0KIEwB0wjEFeQznzx8AXP/HBXCE4gQAtTi0tJkhx3AVBNsATFaeT3K3DQuIKAlHCPxWCHGvu3mXq/LD/b/b3R71Xhyq79HpAN7hZqb/Ho454CY4Jg+ZKa/ei3ef7usNAFpw6N7/VgBbhRDPuc/vhiMYKuXzB4CzAWwQQuwRQmQB3Avne1Ep34EBZ7gKgn8DOMyNIkjBcRD9pcxjGhBc2+YvALwuhLhBeekvAGTkx+UA7le2f8CNHjkFwAHXhPAQgMVE1OSusBa724Y0QogvCCEmCSGmwflc/yWEeC+AxwBc5O6m3798Xy5y9xfu9kvdiJLpAA4D8Pwg3Ua/EULsBLCFiGa7mxYBeA0V8vm7bAZwChHVuL8H+R5UxHegJJTbW12qPzjREmvgRAJ8qdzjGcD7OgOO2v8ygBXu31vh2DwfBfAGgEcAjHT3JwA3u+/DK3DqO8lzfRiOg2wtgA+V+9768V4sRD5qaAacH/FaAH8CUOVur3afr3Vfn6Ec/yX3fVkN4C3lvp8+3PfxAJa534H74ET9VNTnD+DrAFYBWAngTjiRPxXzHRjoPy4xwTAMU+EMV9MQwzAMUyQsCBiGYSocFgQMwzAVDgsChmGYCocFAcMwTIXDgoBhFIjoS25Vy5eJaAURnUxEVxNRTbnHxjClgsNHGcaFiE4FcAOAhUKIXiJqhlPd8xlwf21mGMMaAcPkGQ9grxCiFwDcif8iOPVsHiOixwCAiBYT0VIieoGI/uTWfQIRbSSi7xHRK0T0PBHNcre/x62b/xIRLSnPrTFMNKwRMIyLO6E/Baes8SNw6tY/4dY1mieE2OtqCffCyULtJKLPw8lgvdbd7+dCiG8R0QcAXCyEeBsRvQLgPCHENiJqFE7ZaIYZMrBGwDAuQogOAHMBXAmn1PMfiOiD2m6nwGlo8jQRrYBTw2aq8vpdyv9T3cdPA/glEX0ETtMkhhlSJArvwjCVgxDCAvA4gMfdlbzewpHgNHS5LOoU+mMhxH8R0clwmuksJ6K5QoiWgR05w/Qf1ggYxoWIZhPRYcqm4wFsAtAOpy0oADwL4HTF/v//27ljE4RjIIrD71UWWugMgns4hbiAU4gLWDqEK7iBWOoQ7iAIZ5EDUyqI/+J+X5MmB0n1uARubHvR1ay69Zx75hFxiYidWqfRjz4GBkdHALxNJB1sTyU91aZVbiStJZ1s3yNimc9FR9ujrNuqTbqVpJntm6RH1knSPgPGahNCr3+5DfAhPouBH+k/lYc+C/ANnoYAoDg6AgAojo4AAIojCACgOIIAAIojCACgOIIAAIp7ASk4f2RpzNUVAAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "import matplotlib.pyplot as plt\n", + "plt.plot(average_mae_history[:,0],average_mae_history[:,1])\n", + "plt.xlabel('Steps')\n", + "plt.ylabel('Validation MAE')\n", + "plt.ylim((14, 20))\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's plot this:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "According to this plot, it seems that validation MAE stops improving significantly after 150 epochs. Past that point, we start overfitting.\n", + "\n", + "Once we are done tuning other parameters of our model (besides the number of epochs, we could also adjust the size of the hidden layers), we \n", + "can train a final \"production\" model on all of the training data, with the best parameters, then look at its performance on the test data:" + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "creating: createZooKerasSequential\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createRMSprop\n", + "creating: createZooKerasMeanSquaredError\n", + "creating: createZooKerasMAE\n" + ] + } + ], + "source": [ + "# Get a fresh, compiled model.\n", + "model = build_model()\n", + "# Train it on the entirety of the data.\n", + "model.fit(train_data, train_targets,\n", + " nb_epoch=150, batch_size=16)\n", + "test_mse_score, test_mae_score = model.evaluate(test_data, test_targets)" + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1.7991065979003906" + ] + }, + "execution_count": 51, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "test_mae_score" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We are still off by about \\$1,800." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Wrapping up\n", + "\n", + "\n", + "Here's what you should take away from this example:\n", + "\n", + "* Regression is done using different loss functions from classification; Mean Squared Error (MSE) is a commonly used loss function for \n", + "regression.\n", + "* Similarly, evaluation metrics to be used for regression differ from those used for classification; naturally the concept of \"accuracy\" \n", + "does not apply for regression. A common regression metric is Mean Absolute Error (MAE).\n", + "* When features in the input data have values in different ranges, each feature should be scaled independently as a preprocessing step.\n", + "* When there is little data available, using K-Fold validation is a great way to reliably evaluate a model.\n", + "* When little training data is available, it is preferable to use a small network with very few hidden layers (typically only one or two), \n", + "in order to avoid severe overfitting.\n", + "\n", + "This example concludes our series of three introductory practical examples. You are now able to handle common types of problems with vector data input:\n", + "\n", + "* Binary (2-class) classification.\n", + "* Multi-class, single-label classification.\n", + "* Scalar regression.\n", + "\n", + "In the next chapter, you will acquire a more formal understanding of some of the concepts you have encountered in these first examples, \n", + "such as data preprocessing, model evaluation, and overfitting." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.5.2" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 024733581dd9aed443b9131b9c5993d99e7250bd Mon Sep 17 00:00:00 2001 From: Jiaming Song Date: Thu, 21 Mar 2019 08:57:35 +0800 Subject: [PATCH 2/3] Add files via upload --- keras/3.7-predicting-house-prices.ipynb | 797 ++++++++++++++++++++++++ 1 file changed, 797 insertions(+) create mode 100644 keras/3.7-predicting-house-prices.ipynb diff --git a/keras/3.7-predicting-house-prices.ipynb b/keras/3.7-predicting-house-prices.ipynb new file mode 100644 index 0000000..0f37622 --- /dev/null +++ b/keras/3.7-predicting-house-prices.ipynb @@ -0,0 +1,797 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First of all, set environment variables and initialize spark context:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "env: SPARK_DRIVER_MEMORY=8g\n", + "env: PYSPARK_PYTHON=/usr/bin/python3.5\n", + "env: PYSPARK_DRIVER_PYTHON=/usr/bin/python3.5\n" + ] + } + ], + "source": [ + "%env SPARK_DRIVER_MEMORY=8g\n", + "%env PYSPARK_PYTHON=/usr/bin/python3.5\n", + "%env PYSPARK_DRIVER_PYTHON=/usr/bin/python3.5\n", + "\n", + "from zoo.common.nncontext import *\n", + "sc = init_nncontext(init_spark_conf().setMaster(\"local[4]\"))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Predicting house prices: a regression example\n", + "\n", + "\n", + "----\n", + "\n", + "\n", + "In our two previous examples, we were considering classification problems, where the goal was to predict a single discrete label of an \n", + "input data point. Another common type of machine learning problem is \"regression\", which consists of predicting a continuous value instead \n", + "of a discrete label. For instance, predicting the temperature tomorrow, given meteorological data, or predicting the time that a \n", + "software project will take to complete, given its specifications.\n", + "\n", + "Do not mix up \"regression\" with the algorithm \"logistic regression\": confusingly, \"logistic regression\" is not a regression algorithm, \n", + "it is a classification algorithm." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## The Boston Housing Price dataset\n", + "\n", + "\n", + "We will be attempting to predict the median price of homes in a given Boston suburb in the mid-1970s, given a few data points about the \n", + "suburb at the time, such as the crime rate, the local property tax rate, etc.\n", + "\n", + "The dataset we will be using has another interesting difference from our two previous examples: it has very few data points, only 506 in \n", + "total, split between 404 training samples and 102 test samples, and each \"feature\" in the input data (e.g. the crime rate is a feature) has \n", + "a different scale. For instance some values are proportions, which take a values between 0 and 1, others take values between 1 and 12, \n", + "others between 0 and 100...\n", + "\n", + "Let's take a look at the data:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "from zoo.pipeline.api.keras.datasets import boston_housing\n", + "(train_data, train_targets), (test_data, test_targets) = boston_housing.load_data()" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(404, 13)" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "train_data.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(102, 13)" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "test_data.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you can see, we have 404 training samples and 102 test samples. The data comprises 13 features. The 13 features in the input data are as \n", + "follow:\n", + "\n", + "1. Per capita crime rate.\n", + "2. Proportion of residential land zoned for lots over 25,000 square feet.\n", + "3. Proportion of non-retail business acres per town.\n", + "4. Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).\n", + "5. Nitric oxides concentration (parts per 10 million).\n", + "6. Average number of rooms per dwelling.\n", + "7. Proportion of owner-occupied units built prior to 1940.\n", + "8. Weighted distances to five Boston employment centres.\n", + "9. Index of accessibility to radial highways.\n", + "10. Full-value property-tax rate per $10,000.\n", + "11. Pupil-teacher ratio by town.\n", + "12. 1000 * (Bk - 0.63) ** 2 where Bk is the proportion of Black people by town.\n", + "13. % lower status of the population.\n", + "\n", + "The targets are the median values of owner-occupied homes, in thousands of dollars:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([22.6, 50. , 23. , 8.3, 21.2, 19.9, 20.6, 18.7, 16.1, 18.6, 8.8,\n", + " 17.2, 14.9, 10.5, 50. , 29. , 23. , 33.3, 29.4, 21. , 23.8, 19.1,\n", + " 20.4, 29.1, 19.3, 23.1, 19.6, 19.4, 38.7, 18.7, 14.6, 20. , 20.5,\n", + " 20.1, 23.6, 16.8, 5.6, 50. , 14.5, 13.3, 23.9, 20. , 19.8, 13.8,\n", + " 16.5, 21.6, 20.3, 17. , 11.8, 27.5, 15.6, 23.1, 24.3, 42.8, 15.6,\n", + " 21.7, 17.1, 17.2, 15. , 21.7, 18.6, 21. , 33.1, 31.5, 20.1, 29.8,\n", + " 15.2, 15. , 27.5, 22.6, 20. , 21.4, 23.5, 31.2, 23.7, 7.4, 48.3,\n", + " 24.4, 22.6, 18.3, 23.3, 17.1, 27.9, 44.8, 50. , 23. , 21.4, 10.2,\n", + " 23.3, 23.2, 18.9, 13.4, 21.9, 24.8, 11.9, 24.3, 13.8, 24.7, 14.1,\n", + " 18.7, 28.1, 19.8, 26.7, 21.7, 22. , 22.9, 10.4, 21.9, 20.6, 26.4,\n", + " 41.3, 17.2, 27.1, 20.4, 16.5, 24.4, 8.4, 23. , 9.7, 50. , 30.5,\n", + " 12.3, 19.4, 21.2, 20.3, 18.8, 33.4, 18.5, 19.6, 33.2, 13.1, 7.5,\n", + " 13.6, 17.4, 8.4, 35.4, 24. , 13.4, 26.2, 7.2, 13.1, 24.5, 37.2,\n", + " 25. , 24.1, 16.6, 32.9, 36.2, 11. , 7.2, 22.8, 28.7, 14.4, 24.4,\n", + " 18.1, 22.5, 20.5, 15.2, 17.4, 13.6, 8.7, 18.2, 35.4, 31.7, 33. ,\n", + " 22.2, 20.4, 23.9, 25. , 12.7, 29.1, 12. , 17.7, 27. , 20.6, 10.2,\n", + " 17.5, 19.7, 29.8, 20.5, 14.9, 10.9, 19.5, 22.7, 19.5, 24.6, 25. ,\n", + " 24.5, 50. , 14.3, 11.8, 31. , 28.7, 16.2, 43.5, 25. , 22. , 19.9,\n", + " 22.1, 46. , 22.9, 20.2, 43.1, 34.6, 13.8, 24.3, 21.5, 24.4, 21.2,\n", + " 23.8, 26.6, 25.1, 9.6, 19.4, 19.4, 9.5, 14. , 26.5, 13.8, 34.7,\n", + " 16.3, 21.7, 17.5, 15.6, 20.9, 21.7, 12.7, 18.5, 23.7, 19.3, 12.7,\n", + " 21.6, 23.2, 29.6, 21.2, 23.8, 17.1, 22. , 36.5, 18.8, 21.9, 23.1,\n", + " 20.2, 17.4, 37. , 24.1, 36.2, 15.7, 32.2, 13.5, 17.9, 13.3, 11.7,\n", + " 41.7, 18.4, 13.1, 25. , 21.2, 16. , 34.9, 25.2, 24.8, 21.5, 23.4,\n", + " 18.9, 10.8, 21. , 27.5, 17.5, 13.5, 28.7, 14.8, 19.1, 28.6, 13.1,\n", + " 19. , 11.3, 13.3, 22.4, 20.1, 18.2, 22.9, 20.6, 25. , 12.8, 34.9,\n", + " 23.7, 50. , 29. , 30.1, 22. , 15.6, 23.3, 30.1, 14.3, 22.8, 50. ,\n", + " 20.8, 6.3, 34.9, 32.4, 19.9, 20.3, 17.8, 23.1, 20.4, 23.2, 7. ,\n", + " 16.8, 46.7, 50. , 22.9, 23.9, 21.4, 21.7, 15.4, 15.3, 23.1, 23.9,\n", + " 19.4, 11.9, 17.8, 31.5, 33.8, 20.8, 19.8, 22.4, 5. , 24.5, 19.4,\n", + " 15.1, 18.2, 19.3, 27.1, 20.7, 37.6, 11.7, 33.4, 30.1, 21.4, 45.4,\n", + " 20.1, 20.8, 26.4, 10.4, 21.8, 32. , 21.7, 18.4, 37.9, 17.8, 28. ,\n", + " 28.2, 36. , 18.9, 15. , 22.5, 30.7, 20. , 19.1, 23.3, 26.6, 21.1,\n", + " 19.7, 20. , 12.1, 7.2, 14.2, 17.3, 27.5, 22.2, 10.9, 19.2, 32. ,\n", + " 14.5, 24.7, 12.6, 24. , 24.1, 50. , 16.1, 43.8, 26.6, 36.1, 21.8,\n", + " 29.9, 50. , 44. , 20.6, 19.6, 28.4, 19.1, 22.3, 20.9, 28.4, 14.4,\n", + " 32.7, 13.8, 8.5, 22.5, 35.1, 31.6, 17.8, 15.6])" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "train_targets" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The prices are typically between \\$10,000 and \\$50,000. If that sounds cheap, remember this was the mid-1970s, and these prices are not \n", + "inflation-adjusted." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Preparing the data\n", + "\n", + "\n", + "It would be problematic to feed into a neural network values that all take wildly different ranges. The network might be able to \n", + "automatically adapt to such heterogeneous data, but it would definitely make learning more difficult. A widespread best practice to deal \n", + "with such data is to do feature-wise normalization: for each feature in the input data (a column in the input data matrix), we \n", + "will subtract the mean of the feature and divide by the standard deviation, so that the feature will be centered around 0 and will have a \n", + "unit standard deviation. This is easily done in Numpy:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "mean = train_data.mean(axis=0)\n", + "train_data -= mean\n", + "std = train_data.std(axis=0)\n", + "train_data /= std\n", + "\n", + "test_data -= mean\n", + "test_data /= std" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note that the quantities that we use for normalizing the test data have been computed using the training data. We should never use in our \n", + "workflow any quantity computed on the test data, even for something as simple as data normalization." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Building our network\n", + "\n", + "\n", + "Because so few samples are available, we will be using a very small network with two \n", + "hidden layers, each with 64 units. In general, the less training data you have, the worse overfitting will be, and using \n", + "a small network is one way to mitigate overfitting." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from zoo.pipeline.api.keras import models\n", + "from zoo.pipeline.api.keras import layers\n", + "\n", + "def build_model():\n", + " # Because we will need to instantiate\n", + " # the same model multiple times,\n", + " # we use a function to construct it.\n", + " model = models.Sequential()\n", + " model.add(layers.Dense(64, activation='relu',\n", + " input_shape=(train_data.shape[1],)))\n", + " model.add(layers.Dense(64, activation='relu'))\n", + " model.add(layers.Dense(1))\n", + " model.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])\n", + " return model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Our network ends with a single unit, and no activation (i.e. it will be linear layer). \n", + "This is a typical setup for scalar regression (i.e. regression where we are trying to predict a single continuous value). \n", + "Applying an activation function would constrain the range that the output can take; for instance if \n", + "we applied a `sigmoid` activation function to our last layer, the network could only learn to predict values between 0 and 1. Here, because \n", + "the last layer is purely linear, the network is free to learn to predict values in any range.\n", + "\n", + "Note that we are compiling the network with the `mse` loss function -- Mean Squared Error, the square of the difference between the \n", + "predictions and the targets, a widely used loss function for regression problems.\n", + "\n", + "We are also monitoring a new metric during training: `mae`. This stands for Mean Absolute Error. It is simply the absolute value of the \n", + "difference between the predictions and the targets. For instance, a MAE of 0.5 on this problem would mean that our predictions are off by \n", + "\\$500 on average." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Validating our approach using K-fold validation\n", + "\n", + "\n", + "To evaluate our network while we keep adjusting its parameters (such as the number of epochs used for training), we could simply split the \n", + "data into a training set and a validation set, as we were doing in our previous examples. However, because we have so few data points, the \n", + "validation set would end up being very small (e.g. about 100 examples). A consequence is that our validation scores may change a lot \n", + "depending on _which_ data points we choose to use for validation and which we choose for training, i.e. the validation scores may have a \n", + "high _variance_ with regard to the validation split. This would prevent us from reliably evaluating our model.\n", + "\n", + "The best practice in such situations is to use K-fold cross-validation. It consists of splitting the available data into K partitions \n", + "(typically K=4 or 5), then instantiating K identical models, and training each one on K-1 partitions while evaluating on the remaining \n", + "partition. The validation score for the model used would then be the average of the K validation scores obtained." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Then let's start our training:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "processing fold # 0\n", + "creating: createZooKerasSequential\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createRMSprop\n", + "creating: createZooKerasMeanSquaredError\n", + "creating: createZooKerasMAE\n", + "processing fold # 1\n", + "creating: createZooKerasSequential\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createRMSprop\n", + "creating: createZooKerasMeanSquaredError\n", + "creating: createZooKerasMAE\n", + "processing fold # 2\n", + "creating: createZooKerasSequential\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createRMSprop\n", + "creating: createZooKerasMeanSquaredError\n", + "creating: createZooKerasMAE\n", + "processing fold # 3\n", + "creating: createZooKerasSequential\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createRMSprop\n", + "creating: createZooKerasMeanSquaredError\n", + "creating: createZooKerasMAE\n" + ] + } + ], + "source": [ + "import numpy as np\n", + "\n", + "k = 4\n", + "num_val_samples = len(train_data) // k\n", + "num_nb_epoch = 50\n", + "all_scores = []\n", + "for i in range(k):\n", + " print('processing fold #', i)\n", + " # Prepare the validation data: data from partition # k\n", + " val_data = train_data[i * num_val_samples: (i + 1) * num_val_samples]\n", + " val_targets = train_targets[i * num_val_samples: (i + 1) * num_val_samples]\n", + "\n", + " # Prepare the training data: data from all other partitions\n", + " partial_train_data = np.concatenate(\n", + " [train_data[:i * num_val_samples],\n", + " train_data[(i + 1) * num_val_samples:]],\n", + " axis=0)\n", + " partial_train_targets = np.concatenate(\n", + " [train_targets[:i * num_val_samples],\n", + " train_targets[(i + 1) * num_val_samples:]],\n", + " axis=0)\n", + "\n", + " # Build the model (already compiled)\n", + " model = build_model()\n", + " # Train the model (in silent mode, verbose=0)\n", + " #model.fit(partial_train_data, partial_train_targets,\n", + " # nb_epoch=num_nb_epoch, batch_size=1, verbose=0)\n", + " model.fit(partial_train_data, partial_train_targets,\n", + " nb_epoch=num_nb_epoch, batch_size=16)\n", + "\n", + " # Evaluate the model on the validation data\n", + " #val_mse, val_mae = model.evaluate(val_data, val_targets, verbose=0)\n", + " val_mse, val_mae = model.evaluate(val_data, val_targets)\n", + " all_scores.append(val_mae)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "_INFO - Trained 16 records in 0.011235845 seconds. Throughput is 1424.0139 records/second. Loss is 8.708786._\n", + "\n", + "_INFO - Trained 16 records in 0.009535034 seconds. Throughput is 1678.0223 records/second. Loss is 5.3613434._\n", + "\n", + "_INFO - Trained 16 records in 0.008636178 seconds. Throughput is 1852.6713 records/second. Loss is 18.106756._\n", + "\n", + "_INFO - Trained 16 records in 0.009207628 seconds. Throughput is 1737.6897 records/second. Loss is 7.0931993._" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[3.291872501373291, 2.496018171310425, 2.221175193786621, 2.6994853019714355]" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "all_scores" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2.677137792110443" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.mean(all_scores)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you can notice, the different runs do indeed show rather different validation scores, from 2.1 to 2.9. Their average (2.4) is a much more \n", + "reliable metric than any single of these scores -- that's the entire point of K-fold cross-validation. In this case, we are off by \\\\$2,400 on \n", + "average, which is still significant considering that the prices range from \\\\$10,000 to \\\\$50,000. \n", + "\n", + "Let's try training the network for a bit longer: 500 epochs. To keep a record of how well the model did at each epoch, we will modify our training loop \n", + "to save the per-epoch validation score log:" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "processing fold # 0\n", + "creating: createZooKerasSequential\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createRMSprop\n", + "creating: createZooKerasMeanSquaredError\n", + "creating: createZooKerasMAE\n", + "processing fold # 1\n", + "creating: createZooKerasSequential\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createRMSprop\n", + "creating: createZooKerasMeanSquaredError\n", + "creating: createZooKerasMAE\n", + "processing fold # 2\n", + "creating: createZooKerasSequential\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createRMSprop\n", + "creating: createZooKerasMeanSquaredError\n", + "creating: createZooKerasMAE\n", + "processing fold # 3\n", + "creating: createZooKerasSequential\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createRMSprop\n", + "creating: createZooKerasMeanSquaredError\n", + "creating: createZooKerasMAE\n" + ] + } + ], + "source": [ + "num_epochs = 500\n", + "all_mae_histories = []\n", + "for i in range(k):\n", + " print('processing fold #', i)\n", + " # Prepare the validation data: data from partition # k\n", + " val_data = train_data[i * num_val_samples: (i + 1) * num_val_samples]\n", + " val_targets = train_targets[i * num_val_samples: (i + 1) * num_val_samples]\n", + "\n", + " # Prepare the training data: data from all other partitions\n", + " partial_train_data = np.concatenate(\n", + " [train_data[:i * num_val_samples],\n", + " train_data[(i + 1) * num_val_samples:]],\n", + " axis=0)\n", + " partial_train_targets = np.concatenate(\n", + " [train_targets[:i * num_val_samples],\n", + " train_targets[(i + 1) * num_val_samples:]],\n", + " axis=0)\n", + "\n", + " # Build the model (already compiled)\n", + " model = build_model()\n", + " # Train the model (in silent mode, verbose=0)\n", + " import time\n", + " dir_name = '3-7 ' + str(time.ctime())\n", + " model.set_tensorboard('./', dir_name)\n", + " history = model.fit(partial_train_data, partial_train_targets,\n", + " validation_data=(val_data, val_targets),\n", + " nb_epoch=num_epochs, batch_size=16)\n", + " \n", + " #mae_history = history.history['val_mean_absolute_error']\n", + " mae_history = model.get_validation_summary(\"Loss\")\n", + " all_mae_histories.append(mae_history)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can then compute the average of the per-epoch MAE scores for all folds:" + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[[1.90000000e+01, 4.05375427e+02, 1.55307042e+09],\n", + " [3.80000000e+01, 2.64351837e+02, 1.55307042e+09],\n", + " [5.70000000e+01, 1.50977859e+02, 1.55307042e+09],\n", + " ...,\n", + " [9.46200000e+03, 2.07635689e+01, 1.55307053e+09],\n", + " [9.48100000e+03, 2.02473850e+01, 1.55307053e+09],\n", + " [9.50000000e+03, 2.02105141e+01, 1.55307053e+09]],\n", + "\n", + " [[1.90000000e+01, 4.76980957e+02, 1.55307053e+09],\n", + " [3.80000000e+01, 3.29584198e+02, 1.55307053e+09],\n", + " [5.70000000e+01, 1.80655548e+02, 1.55307053e+09],\n", + " ...,\n", + " [9.46200000e+03, 1.73588219e+01, 1.55307064e+09],\n", + " [9.48100000e+03, 1.78555279e+01, 1.55307064e+09],\n", + " [9.50000000e+03, 1.73744106e+01, 1.55307064e+09]],\n", + "\n", + " [[1.90000000e+01, 4.62182434e+02, 1.55307064e+09],\n", + " [3.80000000e+01, 3.34037567e+02, 1.55307064e+09],\n", + " [5.70000000e+01, 2.06141006e+02, 1.55307064e+09],\n", + " ...,\n", + " [9.46200000e+03, 1.72124062e+01, 1.55307075e+09],\n", + " [9.48100000e+03, 1.75751667e+01, 1.55307075e+09],\n", + " [9.50000000e+03, 1.74055386e+01, 1.55307075e+09]],\n", + "\n", + " [[1.90000000e+01, 5.21177673e+02, 1.55307075e+09],\n", + " [3.80000000e+01, 3.99685974e+02, 1.55307075e+09],\n", + " [5.70000000e+01, 2.67611786e+02, 1.55307075e+09],\n", + " ...,\n", + " [9.46200000e+03, 1.75390892e+01, 1.55307085e+09],\n", + " [9.48100000e+03, 1.76337471e+01, 1.55307085e+09],\n", + " [9.50000000e+03, 1.91227703e+01, 1.55307085e+09]]])" + ] + }, + "execution_count": 47, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "all_mae_histories = np.array(all_mae_histories)\n", + "all_mae_histories" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you can see, the `all_mae_histories` is a 3-d array, the last dimension are 3-element tuples. This 3-d array is built up with four 2-d arrays and all the first element of every 2-d array are equal. The first element of tuple stands for the training step and the third element stands for time stamp. You do need to worry about them, let's just calculate the average value through the first axis of this 3-d array. Actually we just want the second elements of this array, which stand for the MAE results. " + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[1.90000000e+01, 4.66429123e+02, 1.55307058e+09],\n", + " [3.80000000e+01, 3.31914894e+02, 1.55307058e+09],\n", + " [5.70000000e+01, 2.01346550e+02, 1.55307058e+09],\n", + " ...,\n", + " [9.46200000e+03, 1.82184715e+01, 1.55307069e+09],\n", + " [9.48100000e+03, 1.83279567e+01, 1.55307069e+09],\n", + " [9.50000000e+03, 1.85283084e+01, 1.55307069e+09]])" + ] + }, + "execution_count": 48, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "average_mae_history = np.mean(all_mae_histories, axis=0)\n", + "average_mae_history" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you can see, this operation does not mess up the first elements since they are all equal through the first axis. And we do not need to care about the third element because it is useless at this time.\n", + "\n", + "Let's plot this:" + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "import matplotlib.pyplot as plt\n", + "plt.plot(average_mae_history[:,0],average_mae_history[:,1])\n", + "plt.xlabel('Steps')\n", + "plt.ylabel('Validation MAE')\n", + "plt.ylim((14, 20))\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's plot this:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "According to this plot, it seems that validation MAE stops improving significantly after 150 epochs. Past that point, we start overfitting.\n", + "\n", + "Once we are done tuning other parameters of our model (besides the number of epochs, we could also adjust the size of the hidden layers), we \n", + "can train a final \"production\" model on all of the training data, with the best parameters, then look at its performance on the test data:" + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "creating: createZooKerasSequential\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createZooKerasDense\n", + "creating: createRMSprop\n", + "creating: createZooKerasMeanSquaredError\n", + "creating: createZooKerasMAE\n" + ] + } + ], + "source": [ + "# Get a fresh, compiled model.\n", + "model = build_model()\n", + "# Train it on the entirety of the data.\n", + "model.fit(train_data, train_targets,\n", + " nb_epoch=150, batch_size=16)\n", + "test_mse_score, test_mae_score = model.evaluate(test_data, test_targets)" + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1.7991065979003906" + ] + }, + "execution_count": 51, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "test_mae_score" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We are still off by about \\$1,800." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Wrapping up\n", + "\n", + "\n", + "Here's what you should take away from this example:\n", + "\n", + "* Regression is done using different loss functions from classification; Mean Squared Error (MSE) is a commonly used loss function for \n", + "regression.\n", + "* Similarly, evaluation metrics to be used for regression differ from those used for classification; naturally the concept of \"accuracy\" \n", + "does not apply for regression. A common regression metric is Mean Absolute Error (MAE).\n", + "* When features in the input data have values in different ranges, each feature should be scaled independently as a preprocessing step.\n", + "* When there is little data available, using K-Fold validation is a great way to reliably evaluate a model.\n", + "* When little training data is available, it is preferable to use a small network with very few hidden layers (typically only one or two), \n", + "in order to avoid severe overfitting.\n", + "\n", + "This example concludes our series of three introductory practical examples. You are now able to handle common types of problems with vector data input:\n", + "\n", + "* Binary (2-class) classification.\n", + "* Multi-class, single-label classification.\n", + "* Scalar regression.\n", + "\n", + "In the next chapter, you will acquire a more formal understanding of some of the concepts you have encountered in these first examples, \n", + "such as data preprocessing, model evaluation, and overfitting." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.5.2" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 6ec5eb4acc5fddc2d2336f0cf1dbda65c5edb0b0 Mon Sep 17 00:00:00 2001 From: Jiaming Song Date: Thu, 21 Mar 2019 08:57:44 +0800 Subject: [PATCH 3/3] Delete 3.7-regression.ipynb --- keras/3.7-regression.ipynb | 797 ------------------------------------- 1 file changed, 797 deletions(-) delete mode 100644 keras/3.7-regression.ipynb diff --git a/keras/3.7-regression.ipynb b/keras/3.7-regression.ipynb deleted file mode 100644 index 0f37622..0000000 --- a/keras/3.7-regression.ipynb +++ /dev/null @@ -1,797 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "First of all, set environment variables and initialize spark context:" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "env: SPARK_DRIVER_MEMORY=8g\n", - "env: PYSPARK_PYTHON=/usr/bin/python3.5\n", - "env: PYSPARK_DRIVER_PYTHON=/usr/bin/python3.5\n" - ] - } - ], - "source": [ - "%env SPARK_DRIVER_MEMORY=8g\n", - "%env PYSPARK_PYTHON=/usr/bin/python3.5\n", - "%env PYSPARK_DRIVER_PYTHON=/usr/bin/python3.5\n", - "\n", - "from zoo.common.nncontext import *\n", - "sc = init_nncontext(init_spark_conf().setMaster(\"local[4]\"))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Predicting house prices: a regression example\n", - "\n", - "\n", - "----\n", - "\n", - "\n", - "In our two previous examples, we were considering classification problems, where the goal was to predict a single discrete label of an \n", - "input data point. Another common type of machine learning problem is \"regression\", which consists of predicting a continuous value instead \n", - "of a discrete label. For instance, predicting the temperature tomorrow, given meteorological data, or predicting the time that a \n", - "software project will take to complete, given its specifications.\n", - "\n", - "Do not mix up \"regression\" with the algorithm \"logistic regression\": confusingly, \"logistic regression\" is not a regression algorithm, \n", - "it is a classification algorithm." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## The Boston Housing Price dataset\n", - "\n", - "\n", - "We will be attempting to predict the median price of homes in a given Boston suburb in the mid-1970s, given a few data points about the \n", - "suburb at the time, such as the crime rate, the local property tax rate, etc.\n", - "\n", - "The dataset we will be using has another interesting difference from our two previous examples: it has very few data points, only 506 in \n", - "total, split between 404 training samples and 102 test samples, and each \"feature\" in the input data (e.g. the crime rate is a feature) has \n", - "a different scale. For instance some values are proportions, which take a values between 0 and 1, others take values between 1 and 12, \n", - "others between 0 and 100...\n", - "\n", - "Let's take a look at the data:" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "from zoo.pipeline.api.keras.datasets import boston_housing\n", - "(train_data, train_targets), (test_data, test_targets) = boston_housing.load_data()" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(404, 13)" - ] - }, - "execution_count": 3, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "train_data.shape" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(102, 13)" - ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "test_data.shape" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "As you can see, we have 404 training samples and 102 test samples. The data comprises 13 features. The 13 features in the input data are as \n", - "follow:\n", - "\n", - "1. Per capita crime rate.\n", - "2. Proportion of residential land zoned for lots over 25,000 square feet.\n", - "3. Proportion of non-retail business acres per town.\n", - "4. Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).\n", - "5. Nitric oxides concentration (parts per 10 million).\n", - "6. Average number of rooms per dwelling.\n", - "7. Proportion of owner-occupied units built prior to 1940.\n", - "8. Weighted distances to five Boston employment centres.\n", - "9. Index of accessibility to radial highways.\n", - "10. Full-value property-tax rate per $10,000.\n", - "11. Pupil-teacher ratio by town.\n", - "12. 1000 * (Bk - 0.63) ** 2 where Bk is the proportion of Black people by town.\n", - "13. % lower status of the population.\n", - "\n", - "The targets are the median values of owner-occupied homes, in thousands of dollars:" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "array([22.6, 50. , 23. , 8.3, 21.2, 19.9, 20.6, 18.7, 16.1, 18.6, 8.8,\n", - " 17.2, 14.9, 10.5, 50. , 29. , 23. , 33.3, 29.4, 21. , 23.8, 19.1,\n", - " 20.4, 29.1, 19.3, 23.1, 19.6, 19.4, 38.7, 18.7, 14.6, 20. , 20.5,\n", - " 20.1, 23.6, 16.8, 5.6, 50. , 14.5, 13.3, 23.9, 20. , 19.8, 13.8,\n", - " 16.5, 21.6, 20.3, 17. , 11.8, 27.5, 15.6, 23.1, 24.3, 42.8, 15.6,\n", - " 21.7, 17.1, 17.2, 15. , 21.7, 18.6, 21. , 33.1, 31.5, 20.1, 29.8,\n", - " 15.2, 15. , 27.5, 22.6, 20. , 21.4, 23.5, 31.2, 23.7, 7.4, 48.3,\n", - " 24.4, 22.6, 18.3, 23.3, 17.1, 27.9, 44.8, 50. , 23. , 21.4, 10.2,\n", - " 23.3, 23.2, 18.9, 13.4, 21.9, 24.8, 11.9, 24.3, 13.8, 24.7, 14.1,\n", - " 18.7, 28.1, 19.8, 26.7, 21.7, 22. , 22.9, 10.4, 21.9, 20.6, 26.4,\n", - " 41.3, 17.2, 27.1, 20.4, 16.5, 24.4, 8.4, 23. , 9.7, 50. , 30.5,\n", - " 12.3, 19.4, 21.2, 20.3, 18.8, 33.4, 18.5, 19.6, 33.2, 13.1, 7.5,\n", - " 13.6, 17.4, 8.4, 35.4, 24. , 13.4, 26.2, 7.2, 13.1, 24.5, 37.2,\n", - " 25. , 24.1, 16.6, 32.9, 36.2, 11. , 7.2, 22.8, 28.7, 14.4, 24.4,\n", - " 18.1, 22.5, 20.5, 15.2, 17.4, 13.6, 8.7, 18.2, 35.4, 31.7, 33. ,\n", - " 22.2, 20.4, 23.9, 25. , 12.7, 29.1, 12. , 17.7, 27. , 20.6, 10.2,\n", - " 17.5, 19.7, 29.8, 20.5, 14.9, 10.9, 19.5, 22.7, 19.5, 24.6, 25. ,\n", - " 24.5, 50. , 14.3, 11.8, 31. , 28.7, 16.2, 43.5, 25. , 22. , 19.9,\n", - " 22.1, 46. , 22.9, 20.2, 43.1, 34.6, 13.8, 24.3, 21.5, 24.4, 21.2,\n", - " 23.8, 26.6, 25.1, 9.6, 19.4, 19.4, 9.5, 14. , 26.5, 13.8, 34.7,\n", - " 16.3, 21.7, 17.5, 15.6, 20.9, 21.7, 12.7, 18.5, 23.7, 19.3, 12.7,\n", - " 21.6, 23.2, 29.6, 21.2, 23.8, 17.1, 22. , 36.5, 18.8, 21.9, 23.1,\n", - " 20.2, 17.4, 37. , 24.1, 36.2, 15.7, 32.2, 13.5, 17.9, 13.3, 11.7,\n", - " 41.7, 18.4, 13.1, 25. , 21.2, 16. , 34.9, 25.2, 24.8, 21.5, 23.4,\n", - " 18.9, 10.8, 21. , 27.5, 17.5, 13.5, 28.7, 14.8, 19.1, 28.6, 13.1,\n", - " 19. , 11.3, 13.3, 22.4, 20.1, 18.2, 22.9, 20.6, 25. , 12.8, 34.9,\n", - " 23.7, 50. , 29. , 30.1, 22. , 15.6, 23.3, 30.1, 14.3, 22.8, 50. ,\n", - " 20.8, 6.3, 34.9, 32.4, 19.9, 20.3, 17.8, 23.1, 20.4, 23.2, 7. ,\n", - " 16.8, 46.7, 50. , 22.9, 23.9, 21.4, 21.7, 15.4, 15.3, 23.1, 23.9,\n", - " 19.4, 11.9, 17.8, 31.5, 33.8, 20.8, 19.8, 22.4, 5. , 24.5, 19.4,\n", - " 15.1, 18.2, 19.3, 27.1, 20.7, 37.6, 11.7, 33.4, 30.1, 21.4, 45.4,\n", - " 20.1, 20.8, 26.4, 10.4, 21.8, 32. , 21.7, 18.4, 37.9, 17.8, 28. ,\n", - " 28.2, 36. , 18.9, 15. , 22.5, 30.7, 20. , 19.1, 23.3, 26.6, 21.1,\n", - " 19.7, 20. , 12.1, 7.2, 14.2, 17.3, 27.5, 22.2, 10.9, 19.2, 32. ,\n", - " 14.5, 24.7, 12.6, 24. , 24.1, 50. , 16.1, 43.8, 26.6, 36.1, 21.8,\n", - " 29.9, 50. , 44. , 20.6, 19.6, 28.4, 19.1, 22.3, 20.9, 28.4, 14.4,\n", - " 32.7, 13.8, 8.5, 22.5, 35.1, 31.6, 17.8, 15.6])" - ] - }, - "execution_count": 5, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "train_targets" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The prices are typically between \\$10,000 and \\$50,000. If that sounds cheap, remember this was the mid-1970s, and these prices are not \n", - "inflation-adjusted." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Preparing the data\n", - "\n", - "\n", - "It would be problematic to feed into a neural network values that all take wildly different ranges. The network might be able to \n", - "automatically adapt to such heterogeneous data, but it would definitely make learning more difficult. A widespread best practice to deal \n", - "with such data is to do feature-wise normalization: for each feature in the input data (a column in the input data matrix), we \n", - "will subtract the mean of the feature and divide by the standard deviation, so that the feature will be centered around 0 and will have a \n", - "unit standard deviation. This is easily done in Numpy:" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "mean = train_data.mean(axis=0)\n", - "train_data -= mean\n", - "std = train_data.std(axis=0)\n", - "train_data /= std\n", - "\n", - "test_data -= mean\n", - "test_data /= std" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Note that the quantities that we use for normalizing the test data have been computed using the training data. We should never use in our \n", - "workflow any quantity computed on the test data, even for something as simple as data normalization." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Building our network\n", - "\n", - "\n", - "Because so few samples are available, we will be using a very small network with two \n", - "hidden layers, each with 64 units. In general, the less training data you have, the worse overfitting will be, and using \n", - "a small network is one way to mitigate overfitting." - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "from zoo.pipeline.api.keras import models\n", - "from zoo.pipeline.api.keras import layers\n", - "\n", - "def build_model():\n", - " # Because we will need to instantiate\n", - " # the same model multiple times,\n", - " # we use a function to construct it.\n", - " model = models.Sequential()\n", - " model.add(layers.Dense(64, activation='relu',\n", - " input_shape=(train_data.shape[1],)))\n", - " model.add(layers.Dense(64, activation='relu'))\n", - " model.add(layers.Dense(1))\n", - " model.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])\n", - " return model" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Our network ends with a single unit, and no activation (i.e. it will be linear layer). \n", - "This is a typical setup for scalar regression (i.e. regression where we are trying to predict a single continuous value). \n", - "Applying an activation function would constrain the range that the output can take; for instance if \n", - "we applied a `sigmoid` activation function to our last layer, the network could only learn to predict values between 0 and 1. Here, because \n", - "the last layer is purely linear, the network is free to learn to predict values in any range.\n", - "\n", - "Note that we are compiling the network with the `mse` loss function -- Mean Squared Error, the square of the difference between the \n", - "predictions and the targets, a widely used loss function for regression problems.\n", - "\n", - "We are also monitoring a new metric during training: `mae`. This stands for Mean Absolute Error. It is simply the absolute value of the \n", - "difference between the predictions and the targets. For instance, a MAE of 0.5 on this problem would mean that our predictions are off by \n", - "\\$500 on average." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Validating our approach using K-fold validation\n", - "\n", - "\n", - "To evaluate our network while we keep adjusting its parameters (such as the number of epochs used for training), we could simply split the \n", - "data into a training set and a validation set, as we were doing in our previous examples. However, because we have so few data points, the \n", - "validation set would end up being very small (e.g. about 100 examples). A consequence is that our validation scores may change a lot \n", - "depending on _which_ data points we choose to use for validation and which we choose for training, i.e. the validation scores may have a \n", - "high _variance_ with regard to the validation split. This would prevent us from reliably evaluating our model.\n", - "\n", - "The best practice in such situations is to use K-fold cross-validation. It consists of splitting the available data into K partitions \n", - "(typically K=4 or 5), then instantiating K identical models, and training each one on K-1 partitions while evaluating on the remaining \n", - "partition. The validation score for the model used would then be the average of the K validation scores obtained." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Then let's start our training:" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "processing fold # 0\n", - "creating: createZooKerasSequential\n", - "creating: createZooKerasDense\n", - "creating: createZooKerasDense\n", - "creating: createZooKerasDense\n", - "creating: createRMSprop\n", - "creating: createZooKerasMeanSquaredError\n", - "creating: createZooKerasMAE\n", - "processing fold # 1\n", - "creating: createZooKerasSequential\n", - "creating: createZooKerasDense\n", - "creating: createZooKerasDense\n", - "creating: createZooKerasDense\n", - "creating: createRMSprop\n", - "creating: createZooKerasMeanSquaredError\n", - "creating: createZooKerasMAE\n", - "processing fold # 2\n", - "creating: createZooKerasSequential\n", - "creating: createZooKerasDense\n", - "creating: createZooKerasDense\n", - "creating: createZooKerasDense\n", - "creating: createRMSprop\n", - "creating: createZooKerasMeanSquaredError\n", - "creating: createZooKerasMAE\n", - "processing fold # 3\n", - "creating: createZooKerasSequential\n", - "creating: createZooKerasDense\n", - "creating: createZooKerasDense\n", - "creating: createZooKerasDense\n", - "creating: createRMSprop\n", - "creating: createZooKerasMeanSquaredError\n", - "creating: createZooKerasMAE\n" - ] - } - ], - "source": [ - "import numpy as np\n", - "\n", - "k = 4\n", - "num_val_samples = len(train_data) // k\n", - "num_nb_epoch = 50\n", - "all_scores = []\n", - "for i in range(k):\n", - " print('processing fold #', i)\n", - " # Prepare the validation data: data from partition # k\n", - " val_data = train_data[i * num_val_samples: (i + 1) * num_val_samples]\n", - " val_targets = train_targets[i * num_val_samples: (i + 1) * num_val_samples]\n", - "\n", - " # Prepare the training data: data from all other partitions\n", - " partial_train_data = np.concatenate(\n", - " [train_data[:i * num_val_samples],\n", - " train_data[(i + 1) * num_val_samples:]],\n", - " axis=0)\n", - " partial_train_targets = np.concatenate(\n", - " [train_targets[:i * num_val_samples],\n", - " train_targets[(i + 1) * num_val_samples:]],\n", - " axis=0)\n", - "\n", - " # Build the model (already compiled)\n", - " model = build_model()\n", - " # Train the model (in silent mode, verbose=0)\n", - " #model.fit(partial_train_data, partial_train_targets,\n", - " # nb_epoch=num_nb_epoch, batch_size=1, verbose=0)\n", - " model.fit(partial_train_data, partial_train_targets,\n", - " nb_epoch=num_nb_epoch, batch_size=16)\n", - "\n", - " # Evaluate the model on the validation data\n", - " #val_mse, val_mae = model.evaluate(val_data, val_targets, verbose=0)\n", - " val_mse, val_mae = model.evaluate(val_data, val_targets)\n", - " all_scores.append(val_mae)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "_INFO - Trained 16 records in 0.011235845 seconds. Throughput is 1424.0139 records/second. Loss is 8.708786._\n", - "\n", - "_INFO - Trained 16 records in 0.009535034 seconds. Throughput is 1678.0223 records/second. Loss is 5.3613434._\n", - "\n", - "_INFO - Trained 16 records in 0.008636178 seconds. Throughput is 1852.6713 records/second. Loss is 18.106756._\n", - "\n", - "_INFO - Trained 16 records in 0.009207628 seconds. Throughput is 1737.6897 records/second. Loss is 7.0931993._" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[3.291872501373291, 2.496018171310425, 2.221175193786621, 2.6994853019714355]" - ] - }, - "execution_count": 9, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "all_scores" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "2.677137792110443" - ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "np.mean(all_scores)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "As you can notice, the different runs do indeed show rather different validation scores, from 2.1 to 2.9. Their average (2.4) is a much more \n", - "reliable metric than any single of these scores -- that's the entire point of K-fold cross-validation. In this case, we are off by \\\\$2,400 on \n", - "average, which is still significant considering that the prices range from \\\\$10,000 to \\\\$50,000. \n", - "\n", - "Let's try training the network for a bit longer: 500 epochs. To keep a record of how well the model did at each epoch, we will modify our training loop \n", - "to save the per-epoch validation score log:" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "processing fold # 0\n", - "creating: createZooKerasSequential\n", - "creating: createZooKerasDense\n", - "creating: createZooKerasDense\n", - "creating: createZooKerasDense\n", - "creating: createRMSprop\n", - "creating: createZooKerasMeanSquaredError\n", - "creating: createZooKerasMAE\n", - "processing fold # 1\n", - "creating: createZooKerasSequential\n", - "creating: createZooKerasDense\n", - "creating: createZooKerasDense\n", - "creating: createZooKerasDense\n", - "creating: createRMSprop\n", - "creating: createZooKerasMeanSquaredError\n", - "creating: createZooKerasMAE\n", - "processing fold # 2\n", - "creating: createZooKerasSequential\n", - "creating: createZooKerasDense\n", - "creating: createZooKerasDense\n", - "creating: createZooKerasDense\n", - "creating: createRMSprop\n", - "creating: createZooKerasMeanSquaredError\n", - "creating: createZooKerasMAE\n", - "processing fold # 3\n", - "creating: createZooKerasSequential\n", - "creating: createZooKerasDense\n", - "creating: createZooKerasDense\n", - "creating: createZooKerasDense\n", - "creating: createRMSprop\n", - "creating: createZooKerasMeanSquaredError\n", - "creating: createZooKerasMAE\n" - ] - } - ], - "source": [ - "num_epochs = 500\n", - "all_mae_histories = []\n", - "for i in range(k):\n", - " print('processing fold #', i)\n", - " # Prepare the validation data: data from partition # k\n", - " val_data = train_data[i * num_val_samples: (i + 1) * num_val_samples]\n", - " val_targets = train_targets[i * num_val_samples: (i + 1) * num_val_samples]\n", - "\n", - " # Prepare the training data: data from all other partitions\n", - " partial_train_data = np.concatenate(\n", - " [train_data[:i * num_val_samples],\n", - " train_data[(i + 1) * num_val_samples:]],\n", - " axis=0)\n", - " partial_train_targets = np.concatenate(\n", - " [train_targets[:i * num_val_samples],\n", - " train_targets[(i + 1) * num_val_samples:]],\n", - " axis=0)\n", - "\n", - " # Build the model (already compiled)\n", - " model = build_model()\n", - " # Train the model (in silent mode, verbose=0)\n", - " import time\n", - " dir_name = '3-7 ' + str(time.ctime())\n", - " model.set_tensorboard('./', dir_name)\n", - " history = model.fit(partial_train_data, partial_train_targets,\n", - " validation_data=(val_data, val_targets),\n", - " nb_epoch=num_epochs, batch_size=16)\n", - " \n", - " #mae_history = history.history['val_mean_absolute_error']\n", - " mae_history = model.get_validation_summary(\"Loss\")\n", - " all_mae_histories.append(mae_history)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can then compute the average of the per-epoch MAE scores for all folds:" - ] - }, - { - "cell_type": "code", - "execution_count": 47, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "array([[[1.90000000e+01, 4.05375427e+02, 1.55307042e+09],\n", - " [3.80000000e+01, 2.64351837e+02, 1.55307042e+09],\n", - " [5.70000000e+01, 1.50977859e+02, 1.55307042e+09],\n", - " ...,\n", - " [9.46200000e+03, 2.07635689e+01, 1.55307053e+09],\n", - " [9.48100000e+03, 2.02473850e+01, 1.55307053e+09],\n", - " [9.50000000e+03, 2.02105141e+01, 1.55307053e+09]],\n", - "\n", - " [[1.90000000e+01, 4.76980957e+02, 1.55307053e+09],\n", - " [3.80000000e+01, 3.29584198e+02, 1.55307053e+09],\n", - " [5.70000000e+01, 1.80655548e+02, 1.55307053e+09],\n", - " ...,\n", - " [9.46200000e+03, 1.73588219e+01, 1.55307064e+09],\n", - " [9.48100000e+03, 1.78555279e+01, 1.55307064e+09],\n", - " [9.50000000e+03, 1.73744106e+01, 1.55307064e+09]],\n", - "\n", - " [[1.90000000e+01, 4.62182434e+02, 1.55307064e+09],\n", - " [3.80000000e+01, 3.34037567e+02, 1.55307064e+09],\n", - " [5.70000000e+01, 2.06141006e+02, 1.55307064e+09],\n", - " ...,\n", - " [9.46200000e+03, 1.72124062e+01, 1.55307075e+09],\n", - " [9.48100000e+03, 1.75751667e+01, 1.55307075e+09],\n", - " [9.50000000e+03, 1.74055386e+01, 1.55307075e+09]],\n", - "\n", - " [[1.90000000e+01, 5.21177673e+02, 1.55307075e+09],\n", - " [3.80000000e+01, 3.99685974e+02, 1.55307075e+09],\n", - " [5.70000000e+01, 2.67611786e+02, 1.55307075e+09],\n", - " ...,\n", - " [9.46200000e+03, 1.75390892e+01, 1.55307085e+09],\n", - " [9.48100000e+03, 1.76337471e+01, 1.55307085e+09],\n", - " [9.50000000e+03, 1.91227703e+01, 1.55307085e+09]]])" - ] - }, - "execution_count": 47, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "all_mae_histories = np.array(all_mae_histories)\n", - "all_mae_histories" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "As you can see, the `all_mae_histories` is a 3-d array, the last dimension are 3-element tuples. This 3-d array is built up with four 2-d arrays and all the first element of every 2-d array are equal. The first element of tuple stands for the training step and the third element stands for time stamp. You do need to worry about them, let's just calculate the average value through the first axis of this 3-d array. Actually we just want the second elements of this array, which stand for the MAE results. " - ] - }, - { - "cell_type": "code", - "execution_count": 48, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "array([[1.90000000e+01, 4.66429123e+02, 1.55307058e+09],\n", - " [3.80000000e+01, 3.31914894e+02, 1.55307058e+09],\n", - " [5.70000000e+01, 2.01346550e+02, 1.55307058e+09],\n", - " ...,\n", - " [9.46200000e+03, 1.82184715e+01, 1.55307069e+09],\n", - " [9.48100000e+03, 1.83279567e+01, 1.55307069e+09],\n", - " [9.50000000e+03, 1.85283084e+01, 1.55307069e+09]])" - ] - }, - "execution_count": 48, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "average_mae_history = np.mean(all_mae_histories, axis=0)\n", - "average_mae_history" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "As you can see, this operation does not mess up the first elements since they are all equal through the first axis. And we do not need to care about the third element because it is useless at this time.\n", - "\n", - "Let's plot this:" - ] - }, - { - "cell_type": "code", - "execution_count": 49, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "import matplotlib.pyplot as plt\n", - "plt.plot(average_mae_history[:,0],average_mae_history[:,1])\n", - "plt.xlabel('Steps')\n", - "plt.ylabel('Validation MAE')\n", - "plt.ylim((14, 20))\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's plot this:" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "According to this plot, it seems that validation MAE stops improving significantly after 150 epochs. Past that point, we start overfitting.\n", - "\n", - "Once we are done tuning other parameters of our model (besides the number of epochs, we could also adjust the size of the hidden layers), we \n", - "can train a final \"production\" model on all of the training data, with the best parameters, then look at its performance on the test data:" - ] - }, - { - "cell_type": "code", - "execution_count": 50, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "creating: createZooKerasSequential\n", - "creating: createZooKerasDense\n", - "creating: createZooKerasDense\n", - "creating: createZooKerasDense\n", - "creating: createRMSprop\n", - "creating: createZooKerasMeanSquaredError\n", - "creating: createZooKerasMAE\n" - ] - } - ], - "source": [ - "# Get a fresh, compiled model.\n", - "model = build_model()\n", - "# Train it on the entirety of the data.\n", - "model.fit(train_data, train_targets,\n", - " nb_epoch=150, batch_size=16)\n", - "test_mse_score, test_mae_score = model.evaluate(test_data, test_targets)" - ] - }, - { - "cell_type": "code", - "execution_count": 51, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "1.7991065979003906" - ] - }, - "execution_count": 51, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "test_mae_score" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We are still off by about \\$1,800." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Wrapping up\n", - "\n", - "\n", - "Here's what you should take away from this example:\n", - "\n", - "* Regression is done using different loss functions from classification; Mean Squared Error (MSE) is a commonly used loss function for \n", - "regression.\n", - "* Similarly, evaluation metrics to be used for regression differ from those used for classification; naturally the concept of \"accuracy\" \n", - "does not apply for regression. A common regression metric is Mean Absolute Error (MAE).\n", - "* When features in the input data have values in different ranges, each feature should be scaled independently as a preprocessing step.\n", - "* When there is little data available, using K-Fold validation is a great way to reliably evaluate a model.\n", - "* When little training data is available, it is preferable to use a small network with very few hidden layers (typically only one or two), \n", - "in order to avoid severe overfitting.\n", - "\n", - "This example concludes our series of three introductory practical examples. You are now able to handle common types of problems with vector data input:\n", - "\n", - "* Binary (2-class) classification.\n", - "* Multi-class, single-label classification.\n", - "* Scalar regression.\n", - "\n", - "In the next chapter, you will acquire a more formal understanding of some of the concepts you have encountered in these first examples, \n", - "such as data preprocessing, model evaluation, and overfitting." - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.5.2" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -}