From 074bdc309f0c422ab566e1df9a37bfa38fd3124f Mon Sep 17 00:00:00 2001 From: Xu Senbo <1170676717@qq.com> Date: Sun, 19 Nov 2023 21:28:00 +0800 Subject: [PATCH 1/9] I have run all the code in the local --- .../deep-learning/autoencoder.ipynb | 582 ++++++++ .../deep-learning/autoencoder.md | 316 ----- .../deep-learning/cnn.ipynb | 1216 +++++++++++++++++ .../deep-learning/cnn.md | 1205 ---------------- 4 files changed, 1798 insertions(+), 1521 deletions(-) create mode 100644 open-machine-learning-jupyter-book/deep-learning/autoencoder.ipynb delete mode 100644 open-machine-learning-jupyter-book/deep-learning/autoencoder.md create mode 100644 open-machine-learning-jupyter-book/deep-learning/cnn.ipynb delete mode 100644 open-machine-learning-jupyter-book/deep-learning/cnn.md diff --git a/open-machine-learning-jupyter-book/deep-learning/autoencoder.ipynb b/open-machine-learning-jupyter-book/deep-learning/autoencoder.ipynb new file mode 100644 index 0000000000..d8facd45fa --- /dev/null +++ b/open-machine-learning-jupyter-book/deep-learning/autoencoder.ipynb @@ -0,0 +1,582 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "tags": [ + "hide-cell" + ] + }, + "outputs": [], + "source": [ + "# Install the necessary dependencies\n", + "\n", + "import os\n", + "import sys \n", + "!{sys.executable} -m pip install --quiet pandas scikit-learn numpy matplotlib jupyterlab_myst ipython" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [ + "remove-cell" + ] + }, + "source": [ + "---\n", + "license:\n", + " code: MIT\n", + " content: CC-BY-4.0\n", + "github: https://github.com/ocademy-ai/machine-learning\n", + "venue: By Ocademy\n", + "open_access: true\n", + "bibliography:\n", + " - https://raw.githubusercontent.com/ocademy-ai/machine-learning/main/open-machine-learning-jupyter-book/references.bib\n", + "---" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Autoencoder\n", + "\n", + "## Overview\n", + "\n", + "An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data (unsupervised learning). An autoencoder learns two functions: an encoding function that transforms the input data, and a decoding function that recreates the input data from the encoded representation. The autoencoder learns an efficient representation (encoding) for a set of data, typically for dimensionality reduction.\n", + "\n", + "## Unsupervised Learning\n", + "\n", + "Autoencoder is a kind of unsupervised learning, which means working with datasets without considering a target variable. There are some Applications and Goals for it:\n", + "\n", + "- Finding hidden structures in data.\n", + "- Data compression.\n", + "- Clustering.\n", + "- Retrieving similar objects.\n", + "- Exploratory data analysis.\n", + "- Generating new examples.\n", + "\n", + "And for unsupervised learning, its main Principal Component Analysis (PCA) is:\n", + "\n", + "- Find directions of maximum variance\n", + "\n", + ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/autoencoder/01_PCA1.png\n", + "---\n", + "name: Illustration of PCA\n", + ":::\n", + "\n", + "- Transform features onto directions of maximum variance\n", + "\n", + ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/autoencoder/02_PCA2.png\n", + "---\n", + "name: Illustration of PCA\n", + ":::\n", + "\n", + "- Usually consider a subset of vectors of most variance (dimensionality reduction)\n", + "\n", + ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/autoencoder/03_PCA3.png\n", + "---\n", + "name: Illustration of PCA\n", + ":::" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Fully-connected Autoencoder\n", + "\n", + "Here is an example of a basic fully-connected autoencoder\n", + "\n", + ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/autoencoder/04_simple.png\n", + "---\n", + "name: Illustration of Fully Connected autoencoder\n", + ":::\n", + "\n", + ":::{note}\n", + "If we don't use non-linear activation functions and minimize the MSE, this is very similar to PCA. However, the latent dimensions will not necessarily be orthogonal and will have same variance.\n", + ":::\n", + "\n", + "The loss function of this simple model is \n", + "$$L(x, x') = \\left\\lVert x - x' \\right\\rVert^2_2 = \\sum_i (x_i - x_i')^2$$\n", + "\n", + "\n", + "### Potential Autoencoder Applications\n", + "\n", + "And there are some potential autoencoder applications, for example:\n", + "- After training, disregard the output part, we can use embedding as input to classic machine learning methods (SVM, KNN, Random Forest, ...).\n", + "- Similar to transfer learning, we can train autoencoder on large image dataset, then fine tune encoder part on your own, smaller dataset and/or provide your own output (classification) layer.\n", + "- Latent space can also be used for visualization (EDA, clustering), but there are better methods for that.\n", + "\n", + "## Convolutional Autoencoder\n", + "\n", + "For convolutional autoencoder, we mainly use transposed convolution construct the output, and transposed convolution (sometimes called \"deconvolution\") allows us to increase the size of the output feature map compared to the input feature map.\n", + "\n", + "The difference between regular convolution and transposed convolution can be seen from the following image.\n", + "\n", + ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/autoencoder/05_diff_conv.png\n", + "---\n", + "name: Difference between regular and transposed convolution\n", + ":::\n", + "\n", + "In transposed convolutions, we stride over the output; hence, larger strides will result in larger outputs (opposite to regular convolutions); and we pad the output; hence, larger padding will result in smaller output maps.\n", + "\n", + "So, the whole model consists of two parts, encoder and decoder, and they are composed with regular convolution and transposed convolution respectively.\n", + "\n", + ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/autoencoder/06_convmodel.png\n", + "---\n", + "name: Structure of convoluted autoencoder\n", + ":::\n", + "\n", + ":::{note}\n", + "Here is some other tricks to help our training:\n", + "1. Add dropout layers to force networks to learn redundant features.\n", + "2. Add dropout after the input, or add noise to the input to learn to denoise images.\n", + "3. Add L1 penalty to the loss to learn sparse feature representations.\n", + ":::\n", + "\n", + "## Code\n", + "\n", + "Let's build a 2-layers auto-encoder with TensorFlow to compress images to a lower latent space and then reconstruct them. And this project will be done on MNIST dataste." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import tensorflow as tf\n", + "import numpy as np" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "MNIST Dataset parameters." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "num_features = 784 # data features (img shape: 28*28)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Training parameters." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "learning_rate = 0.01\n", + "training_steps = 20000\n", + "batch_size = 256\n", + "display_step = 1000" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Network Parameters" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "num_hidden_1 = 128 # 1st layer num features.\n", + "num_hidden_2 = 64 # 2nd layer num features (the latent dim)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Prepare MNIST data." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "from tensorflow.keras.datasets import mnist\n", + "(x_train, y_train), (x_test, y_test) = mnist.load_data()\n", + "# Convert to float32.\n", + "x_train, x_test = x_train.astype(np.float32), x_test.astype(np.float32)\n", + "# Flatten images to 1-D vector of 784 features (28*28).\n", + "x_train, x_test = x_train.reshape([-1, num_features]), x_test.reshape([-1, num_features])\n", + "# Normalize images value from [0, 255] to [0, 1].\n", + "x_train, x_test = x_train / 255., x_test / 255." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Use tf.data API to shuffle and batch data." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "train_data = tf.data.Dataset.from_tensor_slices((x_train, y_train))\n", + "train_data = train_data.repeat().shuffle(10000).batch(batch_size).prefetch(1)\n", + "\n", + "test_data = tf.data.Dataset.from_tensor_slices((x_test, y_test))\n", + "test_data = test_data.repeat().batch(batch_size).prefetch(1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Store layers weight & bias.\n", + "A random value generator to initialize weights." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "C:\\Users\\Victor\\anaconda3\\lib\\site-packages\\keras\\initializers\\initializers.py:120: UserWarning: The initializer RandomNormal is unseeded and being called multiple times, which will return identical values each time (even if the initializer is unseeded). Please update your code to provide a seed to the initializer, or avoid using the same initalizer instance more than once.\n", + " warnings.warn(\n" + ] + } + ], + "source": [ + "random_normal = tf.initializers.RandomNormal()\n", + "\n", + "weights = {\n", + " 'encoder_h1': tf.Variable(random_normal([num_features, num_hidden_1])),\n", + " 'encoder_h2': tf.Variable(random_normal([num_hidden_1, num_hidden_2])),\n", + " 'decoder_h1': tf.Variable(random_normal([num_hidden_2, num_hidden_1])),\n", + " 'decoder_h2': tf.Variable(random_normal([num_hidden_1, num_features])),\n", + "}\n", + "biases = {\n", + " 'encoder_b1': tf.Variable(random_normal([num_hidden_1])),\n", + " 'encoder_b2': tf.Variable(random_normal([num_hidden_2])),\n", + " 'decoder_b1': tf.Variable(random_normal([num_hidden_1])),\n", + " 'decoder_b2': tf.Variable(random_normal([num_features])),\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Building the encoder." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "def encoder(x):\n", + " # Encoder Hidden layer with sigmoid activation.\n", + " layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(x, weights['encoder_h1']),\n", + " biases['encoder_b1']))\n", + " # Encoder Hidden layer with sigmoid activation.\n", + " layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['encoder_h2']),\n", + " biases['encoder_b2']))\n", + " return layer_2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Building the decoder." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "def decoder(x):\n", + " # Decoder Hidden layer with sigmoid activation.\n", + " layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(x, weights['decoder_h1']),\n", + " biases['decoder_b1']))\n", + " # Decoder Hidden layer with sigmoid activation.\n", + " layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['decoder_h2']),\n", + " biases['decoder_b2']))\n", + " return layer_2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Mean square loss between original images and reconstructed ones." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "def mean_square(reconstructed, original):\n", + " return tf.reduce_mean(tf.pow(original - reconstructed, 2))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Adam optimizer." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "optimizer = tf.optimizers.Adam(learning_rate=learning_rate)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Optimization process. " + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "def run_optimization(x):\n", + " # Wrap computation inside a GradientTape for automatic differentiation.\n", + " with tf.GradientTape() as g:\n", + " reconstructed_image = decoder(encoder(x))\n", + " loss = mean_square(reconstructed_image, x)\n", + "\n", + " # Variables to update, i.e. trainable variables.\n", + " trainable_variables = list(weights.values()) + list(biases.values())\n", + " \n", + " # Compute gradients.\n", + " gradients = g.gradient(loss, trainable_variables)\n", + " \n", + " # Update W and b following gradients.\n", + " optimizer.apply_gradients(zip(gradients, trainable_variables))\n", + " \n", + " return loss" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Run training for the given number of steps." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "step: 0, loss: 0.234978\n", + "step: 1000, loss: 0.016520\n", + "step: 2000, loss: 0.010679\n", + "step: 3000, loss: 0.008460\n", + "step: 4000, loss: 0.007236\n", + "step: 5000, loss: 0.006323\n", + "step: 6000, loss: 0.006220\n", + "step: 7000, loss: 0.005524\n", + "step: 8000, loss: 0.005355\n", + "step: 9000, loss: 0.005005\n", + "step: 10000, loss: 0.004884\n", + "step: 11000, loss: 0.004767\n", + "step: 12000, loss: 0.004663\n", + "step: 13000, loss: 0.004198\n", + "step: 14000, loss: 0.004016\n", + "step: 15000, loss: 0.003990\n", + "step: 16000, loss: 0.004066\n", + "step: 17000, loss: 0.004013\n", + "step: 18000, loss: 0.003900\n", + "step: 19000, loss: 0.003652\n", + "step: 20000, loss: 0.003604\n" + ] + } + ], + "source": [ + "for step, (batch_x, _) in enumerate(train_data.take(training_steps + 1)):\n", + " \n", + " # Run the optimization.\n", + " loss = run_optimization(batch_x)\n", + " \n", + " if step % display_step == 0:\n", + " print(\"step: %i, loss: %f\" % (step, loss))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Testing and Visualization." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Encode and decode images from test set and visualize their reconstruction." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Original Images\n" + ] + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Reconstructed Images\n" + ] + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "n = 4\n", + "canvas_orig = np.empty((28 * n, 28 * n))\n", + "canvas_recon = np.empty((28 * n, 28 * n))\n", + "for i, (batch_x, _) in enumerate(test_data.take(n)):\n", + " # Encode and decode the digit image.\n", + " reconstructed_images = decoder(encoder(batch_x))\n", + " # Display original images.\n", + " for j in range(n):\n", + " # Draw the generated digits.\n", + " img = batch_x[j].numpy().reshape([28, 28])\n", + " canvas_orig[i * 28:(i + 1) * 28, j * 28:(j + 1) * 28] = img\n", + " # Display reconstructed images.\n", + " for j in range(n):\n", + " # Draw the generated digits.\n", + " reconstr_img = reconstructed_images[j].numpy().reshape([28, 28])\n", + " canvas_recon[i * 28:(i + 1) * 28, j * 28:(j + 1) * 28] = reconstr_img\n", + "\n", + "print(\"Original Images\") \n", + "plt.figure(figsize=(n, n))\n", + "plt.imshow(canvas_orig, origin=\"upper\", cmap=\"gray\")\n", + "plt.show()\n", + "\n", + "print(\"Reconstructed Images\")\n", + "plt.figure(figsize=(n, n))\n", + "plt.imshow(canvas_recon, origin=\"upper\", cmap=\"gray\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Your turn! 🚀\n", + "\n", + "TBD.\n", + "\n", + "## Self study\n", + "\n", + "You can refer to this book chapter for further study:\n", + "\n", + "- [deeplearningbook](https://www.deeplearningbook.org/contents/autoencoders.html)\n", + "\n", + "## Acknowledgments\n", + "\n", + "Thanks to [Sebastian Raschka](https://github.com/rasbt) for creating the open-source project [stat453-deep-learning-ss20](https://github.com/rasbt/stat453-deep-learning-ss20) and [Aymeric Damien](https://github.com/aymericdamien) for creating the open-source project [TensorFlow-Examples](https://github.com/aymericdamien/TensorFlow-Examples/). They inspire the majority of the content in this chapter.\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.16" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/open-machine-learning-jupyter-book/deep-learning/autoencoder.md b/open-machine-learning-jupyter-book/deep-learning/autoencoder.md deleted file mode 100644 index 5266b142d8..0000000000 --- a/open-machine-learning-jupyter-book/deep-learning/autoencoder.md +++ /dev/null @@ -1,316 +0,0 @@ ---- -jupytext: - cell_metadata_filter: -all - formats: md:myst - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.11.5 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -# Autoencoder - -## Overview - -An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data (unsupervised learning). An autoencoder learns two functions: an encoding function that transforms the input data, and a decoding function that recreates the input data from the encoded representation. The autoencoder learns an efficient representation (encoding) for a set of data, typically for dimensionality reduction. - -## Unsupervised Learning - -Autoencoder is a kind of unsupervised learning, which means working with datasets without considering a target variable. There are some Applications and Goals for it: - -- Finding hidden structures in data. -- Data compression. -- Clustering. -- Retrieving similar objects. -- Exploratory data analysis. -- Generating new examples. - -And for unsupervised learning, its main Principal Component Analysis (PCA) is: - -- Find directions of maximum variance - -:::{figure-md} 01_PCA1 - - -Illustration of PCA -::: - -- Transform features onto directions of maximum variance - -:::{figure-md} 02_PCA2 - - -Illustration of PCA -::: - -- Usually consider a subset of vectors of most variance (dimensionality reduction) - -:::{figure-md} 03_PCA3 - - -Illustration of PCA -::: - -## Fully-connected Autoencoder - -Here is an example of a basic fully-connected autoencoder - -:::{figure-md} 04_simple - - -Illustration of Fully Connected autoencoder -::: - -```{note} -If we don't use non-linear activation functions and minimize the MSE, this is very similar to PCA. However, the latent dimensions will not necessarily be orthogonal and will have same variance. -``` - -The loss function of this simple model is $L(x, x^') = ||x - x^'||^2_2 = \sum_i (x_i - x_i^')^2$. - -### Potential Autoencoder Applications - -And there are some potential autoencoder applications, for example: -- After training, disregard the output part, we can use embedding as input to classic machine learning methods (SVM, KNN, Random Forest, ...). -- Similar to transfer learning, we can train autoencoder on large image dataset, then fine tune encoder part on your own, smaller dataset and/or provide your own output (classification) layer. -- Latent space can also be used for visualization (EDA, clustering), but there are better methods for that. - -## Convolutional Autoencoder - -For convolutional autoencoder, we mainly use transposed convolution construct the output, and transposed convolution (sometimes called "deconvolution") allows us to increase the size of the output feature map compared to the input feature map. - -The difference between regular convolution and transposed convolution can be seen from the following image. - -:::{figure-md} 05_diff_conv - - -Difference between regular and transposed convolution -::: - -In transposed convolutions, we stride over the output; hence, larger strides will result in larger outputs (opposite to regular convolutions); and we pad the output; hence, larger padding will result in smaller output maps. - -So, the whole model consists of two parts, encoder and decoder, and they are composed with regular convolution and transposed convolution respectively. - -:::{figure-md} 06_convmodel - - -Structure of convoluted autoencoder -::: - -```{note} -Here is some other tricks to help our training: -1. Add dropout layers to force networks to learn redundant features. -2. Add dropout after the input, or add noise to the input to learn to denoise images. -3. Add L1 penalty to the loss to learn sparse feature representations. -``` - -## Code - -Let's build a 2-layers auto-encoder with TensorFlow to compress images to a lower latent space and then reconstruct them. And this project will be done on MNIST dataste. - -```{code-cell} -import tensorflow as tf -import numpy as np -``` - -MNIST Dataset parameters. - -```{code-cell} -num_features = 784 # data features (img shape: 28*28). -``` - -Training parameters. - -```{code-cell} -learning_rate = 0.01 -training_steps = 20000 -batch_size = 256 -display_step = 1000 -``` - -Network Parameters - -```{code-cell} -num_hidden_1 = 128 # 1st layer num features. -num_hidden_2 = 64 # 2nd layer num features (the latent dim). -``` - -Prepare MNIST data. - -```{code-cell} -from tensorflow.keras.datasets import mnist -(x_train, y_train), (x_test, y_test) = mnist.load_data() -# Convert to float32. -x_train, x_test = x_train.astype(np.float32), x_test.astype(np.float32) -# Flatten images to 1-D vector of 784 features (28*28). -x_train, x_test = x_train.reshape([-1, num_features]), x_test.reshape([-1, num_features]) -# Normalize images value from [0, 255] to [0, 1]. -x_train, x_test = x_train / 255., x_test / 255. -``` - -Store layers weight & bias. -A random value generator to initialize weights. - -```{code-cell} -random_normal = tf.initializers.RandomNormal() - -weights = { - 'encoder_h1': tf.Variable(random_normal([num_features, num_hidden_1])), - 'encoder_h2': tf.Variable(random_normal([num_hidden_1, num_hidden_2])), - 'decoder_h1': tf.Variable(random_normal([num_hidden_2, num_hidden_1])), - 'decoder_h2': tf.Variable(random_normal([num_hidden_1, num_features])), -} -biases = { - 'encoder_b1': tf.Variable(random_normal([num_hidden_1])), - 'encoder_b2': tf.Variable(random_normal([num_hidden_2])), - 'decoder_b1': tf.Variable(random_normal([num_hidden_1])), - 'decoder_b2': tf.Variable(random_normal([num_features])), -} -``` - -Building the encoder. - -```{code-cell} -def encoder(x): - # Encoder Hidden layer with sigmoid activation. - layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(x, weights['encoder_h1']), - biases['encoder_b1'])) - # Encoder Hidden layer with sigmoid activation. - layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['encoder_h2']), - biases['encoder_b2'])) - return layer_2 -``` - -Building the decoder. - -```{code-cell} -def decoder(x): - # Decoder Hidden layer with sigmoid activation. - layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(x, weights['decoder_h1']), - biases['decoder_b1'])) - # Decoder Hidden layer with sigmoid activation. - layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['decoder_h2']), - biases['decoder_b2'])) - return layer_2 -``` - -Mean square loss between original images and reconstructed ones. - -```{code-cell} -def mean_square(reconstructed, original): - return tf.reduce_mean(tf.pow(original - reconstructed, 2)) -``` - -Adam optimizer. - -```{code-cell} -optimizer = tf.optimizers.Adam(learning_rate=learning_rate) -``` - - -Optimization process. - -```{code-cell} -def run_optimization(x): - # Wrap computation inside a GradientTape for automatic differentiation. - with tf.GradientTape() as g: - reconstructed_image = decoder(encoder(x)) - loss = mean_square(reconstructed_image, x) - - # Variables to update, i.e. trainable variables. - trainable_variables = weights.values() + biases.values() - - # Compute gradients. - gradients = g.gradient(loss, trainable_variables) - - # Update W and b following gradients. - optimizer.apply_gradients(zip(gradients, trainable_variables)) - - return loss -``` - -Run training for the given number of steps. - -```{code-cell} -for step, (batch_x, _) in enumerate(train_data.take(training_steps + 1)): - - # Run the optimization. - loss = run_optimization(batch_x) - - if step % display_step == 0: - print("step: %i, loss: %f" % (step, loss)) -``` - -Testing and Visualization. - -```{code-cell} -import matplotlib.pyplot as plt -``` - -Encode and decode images from test set and visualize their reconstruction. - -```{code-cell} -n = 4 -canvas_orig = np.empty((28 * n, 28 * n)) -canvas_recon = np.empty((28 * n, 28 * n)) -for i, (batch_x, _) in enumerate(test_data.take(n)): - # Encode and decode the digit image. - reconstructed_images = decoder(encoder(batch_x)) - # Display original images. - for j in range(n): - # Draw the generated digits. - img = batch_x[j].numpy().reshape([28, 28]) - canvas_orig[i * 28:(i + 1) * 28, j * 28:(j + 1) * 28] = img - # Display reconstructed images. - for j in range(n): - # Draw the generated digits. - reconstr_img = reconstructed_images[j].numpy().reshape([28, 28]) - canvas_recon[i * 28:(i + 1) * 28, j * 28:(j + 1) * 28] = reconstr_img - -print("Original Images") -plt.figure(figsize=(n, n)) -plt.imshow(canvas_orig, origin="upper", cmap="gray") -plt.show() - -print("Reconstructed Images") -plt.figure(figsize=(n, n)) -plt.imshow(canvas_recon, origin="upper", cmap="gray") -plt.show() -``` - - - - - - -## Your turn! 🚀 - -TBD. - -## Self study - -You can refer to this book chapter for further study: - -- [deeplearningbook](https://www.deeplearningbook.org/contents/autoencoders.html) - -## Acknowledgments - -Thanks to [Sebastian Raschka](https://github.com/rasbt) for creating the open-source project [stat453-deep-learning-ss20](https://github.com/rasbt/stat453-deep-learning-ss20) and [Aymeric Damien](https://github.com/aymericdamien) for creating the open-source project [TensorFlow-Examples](https://github.com/aymericdamien/TensorFlow-Examples/). They inspire the majority of the content in this chapter. - ---- - -```{bibliography} -:filter: docname in docnames -``` \ No newline at end of file diff --git a/open-machine-learning-jupyter-book/deep-learning/cnn.ipynb b/open-machine-learning-jupyter-book/deep-learning/cnn.ipynb new file mode 100644 index 0000000000..26a91bb947 --- /dev/null +++ b/open-machine-learning-jupyter-book/deep-learning/cnn.ipynb @@ -0,0 +1,1216 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "hide-cell" + ] + }, + "outputs": [], + "source": [ + "# Install the necessary dependencies\n", + "\n", + "import os\n", + "import sys \n", + "!{sys.executable} -m pip install --quiet pandas scikit-learn numpy matplotlib jupyterlab_myst ipython imageio scikit-image requests\n", + "# Convolutional Neural Networks" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [ + "remove-cell" + ] + }, + "source": [ + "---\n", + "license:\n", + " code: MIT\n", + " content: CC-BY-4.0\n", + "github: https://github.com/ocademy-ai/machine-learning\n", + "venue: By Ocademy\n", + "open_access: true\n", + "bibliography:\n", + " - https://raw.githubusercontent.com/ocademy-ai/machine-learning/main/open-machine-learning-jupyter-book/references.bib\n", + "---" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Convolutional Neural Networks" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Convolutional Neural Networks (CNNs) are responsible for the latest major breakthroughs in image recognition in the past few years.\n", + "\n", + "In mathematics, a convolution is a function that is applied over the output of another function. In our case, we will consider applying a matrix multiplication (filter) across an image. See the below diagram for an example of how this may work.\n", + "\n", + "" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from IPython.display import HTML\n", + "display(HTML(\"\"\"\n", + "

\n", + "\n", + "A demo of convolution function. [source]\n", + "

\n", + "\"\"\"))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "CNNs generally follow a structure. The main convolutional setup is (input array) -> (convolutional filter layer) -> (Pooling) -> (Activation layer). The above diagram depicts how a convolutional layer may create one feature. Generally, filters are multidimensional and end up creating many features. It is also common to have a completely separate filter-feature creator of different sizes acting on the same layer. After this convolutional filter, it is common to apply a pooling layer. This pooling may be a max-pooling or an average pooling or another aggregation. One of the key concepts here is that the pooling layer has no parameters while decreasing the layer size. See the below diagram for an example of max-pooling.\n", + "\n", + "\n", + "\n", + "After the max pooling, there is generally an activation layer. One of the more common activation layers is the ReLU (Rectified Linear Unit) {cite}`reluwiki`." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## MNIST handwritten digits\n", + "\n", + "Here we illustrate how to use a simple CNN with three convolutional units to predict the MNIST handwritten digits. \n", + "\n", + "```{note}\n", + "There is good reason why this dataset is used like the 'hello world' of image recognition, it is fairly compact while having a decent amount of training, test, and validation data. It only has one channel (black and white) and only ten possible outputs (0-9).\n", + "```\n", + "\n", + "When the script is done training the model, you should see similar output to the following graphs.\n", + "\n", + "\n", + "\n", + "Training and test loss (left) and test batch accuracy (right).\n", + "\n", + "\n", + "\n", + "A random set of 6 digits with actual and predicted labels. You can see a prediction failure in the lower right box.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from IPython.display import HTML\n", + "display(HTML(\"\"\"\n", + "

\n", + "\n", + "A demo of CNN. [source]\n", + "

\n", + "\"\"\"))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from IPython.display import HTML\n", + "display(HTML(\"\"\"\n", + "

\n", + "\n", + "A demo of CNN. [source]\n", + "

\n", + "\"\"\"))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Code" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import tensorflow as tf\n", + "from tensorflow.keras.datasets import mnist\n", + "import matplotlib.pyplot as plt\n", + "\n", + "# Load dataset\n", + "(x_train, y_train), (x_test, y_test) = mnist.load_data()\n", + "\n", + "# Data preprocessing\n", + "x_train = x_train.reshape(-1, 28, 28, 1).astype('float32') / 255.0\n", + "x_test = x_test.reshape(-1, 28, 28, 1).astype('float32') / 255.0\n", + "\n", + "# Build model\n", + "model = tf.keras.Sequential([\n", + " tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),\n", + " tf.keras.layers.MaxPooling2D((2, 2)),\n", + " tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),\n", + " tf.keras.layers.MaxPooling2D((2, 2)),\n", + " tf.keras.layers.Flatten(),\n", + " tf.keras.layers.Dense(64, activation='relu'),\n", + " tf.keras.layers.Dense(10, activation='softmax')\n", + "])\n", + "\n", + "# Compiler model\n", + "model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])\n", + "\n", + "# Train model\n", + "history = model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))\n", + "\n", + "# Test model\n", + "test_loss, test_acc = model.evaluate(x_test, y_test, verbose=2)\n", + "print('Test accuracy:', test_acc)\n", + "\n", + "# Visualizing the training process\n", + "plt.plot(history.history['accuracy'], label='Training Accuracy')\n", + "plt.plot(history.history['val_accuracy'], label='Validation Accuracy')\n", + "plt.xlabel('Epoch')\n", + "plt.ylabel('Accuracy')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "import tensorflow as tf\n", + "from tensorflow.keras.datasets import mnist\n", + "\n", + "# Load data\n", + "(train_images, train_labels), (test_images, test_labels) = mnist.load_data()\n", + "\n", + "# Normalize pixel values to the range [0, 1]\n", + "train_images = train_images / 255.0\n", + "test_images = test_images / 255.0\n", + "\n", + "# Convert images to 4D tensors (batch_size, height, width, channels)\n", + "train_images = np.expand_dims(train_images, axis=-1)\n", + "test_images = np.expand_dims(test_images, axis=-1)\n", + "\n", + "# Set model parameters\n", + "batch_size = 100\n", + "learning_rate = 0.005\n", + "evaluation_size = 500\n", + "image_width = train_images.shape[1]\n", + "image_height = train_images.shape[2]\n", + "target_size = np.max(train_labels) + 1\n", + "num_channels = 1 # greyscale = 1 channel\n", + "generations = 500\n", + "eval_every = 5\n", + "conv1_features = 25\n", + "conv2_features = 50\n", + "max_pool_size1 = 2 # NxN window for 1st max pool layer\n", + "max_pool_size2 = 2 # NxN window for 2nd max pool layer\n", + "fully_connected_size1 = 100\n", + "\n", + "# Define the model\n", + "model = tf.keras.Sequential([\n", + " tf.keras.layers.Conv2D(conv1_features, (4, 4), activation='relu', input_shape=(image_width, image_height, num_channels)),\n", + " tf.keras.layers.MaxPooling2D((max_pool_size1, max_pool_size1)),\n", + " tf.keras.layers.Conv2D(conv2_features, (4, 4), activation='relu'),\n", + " tf.keras.layers.MaxPooling2D((max_pool_size2, max_pool_size2)),\n", + " tf.keras.layers.Flatten(),\n", + " tf.keras.layers.Dense(fully_connected_size1, activation='relu'),\n", + " tf.keras.layers.Dense(target_size)\n", + "])\n", + "\n", + "# Compile the model\n", + "model.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=learning_rate, momentum=0.9),\n", + " loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n", + " metrics=['accuracy'])\n", + "\n", + "# Train the model\n", + "train_loss = []\n", + "train_acc = []\n", + "test_acc = []\n", + "for i in range(generations):\n", + " rand_index = np.random.choice(len(train_images), size=batch_size)\n", + " rand_x = train_images[rand_index]\n", + " rand_y = train_labels[rand_index]\n", + " \n", + " history = model.train_on_batch(rand_x, rand_y)\n", + " temp_train_loss, temp_train_acc = history[0], history[1]\n", + " \n", + " if (i+1) % eval_every == 0:\n", + " eval_index = np.random.choice(len(test_images), size=evaluation_size)\n", + " eval_x = test_images[eval_index]\n", + " eval_y = test_labels[eval_index]\n", + " \n", + " test_loss, temp_test_acc = model.evaluate(eval_x, eval_y, verbose=0)\n", + " \n", + " # Record and print results\n", + " train_loss.append(temp_train_loss)\n", + " train_acc.append(temp_train_acc)\n", + " test_acc.append(temp_test_acc)\n", + " acc_and_loss = [(i+1), temp_train_loss, temp_train_acc * 100, temp_test_acc * 100]\n", + " acc_and_loss = [np.round(x, 2) for x in acc_and_loss]\n", + " print('Generation # {}. Train Loss: {:.2f}. Train Acc (Test Acc): {:.2f}% ({:.2f}%)'.format(*acc_and_loss))\n", + "\n", + "# Plot loss over time\n", + "plt.plot(range(0, generations, eval_every), train_loss, 'k-')\n", + "plt.title('Softmax Loss per Generation')\n", + "plt.xlabel('Generation')\n", + "plt.ylabel('Softmax Loss')\n", + "plt.show()\n", + "\n", + "# Plot train and test accuracy\n", + "plt.plot(range(0, generations, eval_every), train_acc, 'k-', label='Train Set Accuracy')\n", + "plt.plot(range(0, generations, eval_every), test_acc, 'r--', label='Test Set Accuracy')\n", + "plt.title('Train and Test Accuracy')\n", + "plt.xlabel('Generation')\n", + "plt.ylabel('Accuracy')\n", + "plt.legend(loc='lower right')\n", + "plt.show()\n", + "\n", + "# Plot some samples\n", + "# Plot the 6 of the last batch results:\n", + "predictions = model.predict(train_images[:6])\n", + "predictions = np.argmax(predictions, axis=1)\n", + "images = np.squeeze(train_images[:6])\n", + "\n", + "Nrows = 2\n", + "Ncols = 3\n", + "for i in range(6):\n", + " plt.subplot(Nrows, Ncols, i+1)\n", + " plt.imshow(np.reshape(images[i], [28, 28]), cmap='Greys_r')\n", + " plt.title('Pred: ' + str(predictions[i]), fontsize=10)\n", + " frame = plt.gca()\n", + " frame.axes.get_xaxis().set_visible(False)\n", + " frame.axes.get_yaxis().set_visible(False)\n", + "plt.show()" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## CIFAR-10\n", + "\n", + "Here we will build a convolutional neural network to predict the `CIFAR-10` data.\n", + "\n", + "The script provided will download and unzip the `CIFAR-10` data. Then it will start training a CNN from scratch. You should see similar output at the end of the following two graphs.\n", + "\n", + "\n", + "\n", + "Here we see the training loss (left) and the test batch accuracy (right)." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Code" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import tensorflow as tf\n", + "import matplotlib.pyplot as plt\n", + "import urllib.request\n", + "\n", + "# Set model parameters\n", + "batch_size = 128\n", + "data_dir = 'temp'\n", + "output_every = 50\n", + "generations = 200\n", + "eval_every = 500\n", + "image_height = 32\n", + "image_width = 32\n", + "crop_height = 24\n", + "crop_width = 24\n", + "num_channels = 3\n", + "num_targets = 10\n", + "extract_folder = 'cifar-10-batches-bin'\n", + "\n", + "# Load data\n", + "data_dir = 'temp'\n", + "if not os.path.exists(data_dir):\n", + " os.makedirs(data_dir)\n", + "cifar10_url = 'http://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz'\n", + "\n", + "# Check if file exists, otherwise download it\n", + "data_file = os.path.join(data_dir, 'cifar-10-binary.tar.gz')\n", + "if os.path.isfile(data_file):\n", + " pass\n", + "else:\n", + " # Download file\n", + " def progress(block_num, block_size, total_size):\n", + " progress_info = [cifar10_url, float(block_num * block_size) / float(total_size) * 100.0]\n", + " print('\\r Downloading {} - {:.2f}%'.format(*progress_info), end=\"\")\n", + " filepath, _ = urllib.request.urlretrieve(cifar10_url, data_file, progress)\n", + " # Extract file\n", + " tarfile.open(filepath, 'r:gz').extractall(data_dir)\n", + "\n", + "# Load CIFAR-10 dataset\n", + "(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.cifar10.load_data()\n", + "\n", + "# Preprocess the data\n", + "train_images = train_images / 255.0\n", + "test_images = test_images / 255.0\n", + "\n", + "# Crop images\n", + "train_images = tf.image.crop_to_bounding_box(train_images, 4, 4, 24, 24)\n", + "test_images = tf.image.crop_to_bounding_box(test_images, 4, 4, 24, 24)\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Convert labels to integers\n", + "train_labels = train_labels.flatten()\n", + "test_labels = test_labels.flatten()\n", + "\n", + "# Define the model architecture\n", + "model = tf.keras.Sequential([\n", + " tf.keras.layers.Conv2D(64, (5, 5), activation='relu', input_shape=(crop_height, crop_width, num_channels)),\n", + " tf.keras.layers.MaxPooling2D(pool_size=(3, 3), strides=(2, 2)),\n", + " tf.keras.layers.Conv2D(64, (5, 5), activation='relu'),\n", + " tf.keras.layers.MaxPooling2D(pool_size=(3, 3), strides=(2, 2)),\n", + " tf.keras.layers.Flatten(),\n", + " tf.keras.layers.Dense(384, activation='relu'),\n", + " tf.keras.layers.Dense(192, activation='relu'),\n", + " tf.keras.layers.Dense(num_targets)\n", + "])\n", + "\n", + "# Define loss function\n", + "loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)\n", + "\n", + "# Create accuracy metric\n", + "accuracy_metric = tf.keras.metrics.SparseCategoricalAccuracy()\n", + "\n", + "# Create optimizer\n", + "optimizer = tf.keras.optimizers.SGD(learning_rate=0.1, momentum=0.9)\n", + "\n", + "# Compile the model\n", + "model.compile(optimizer=optimizer, loss=loss_fn, metrics=[accuracy_metric])\n", + "\n", + "# Train the model\n", + "history = model.fit(train_images, train_labels, batch_size=batch_size, epochs=generations, \n", + " validation_data=(test_images, test_labels), verbose=1)\n", + "\n", + "# Evaluate the model\n", + "test_loss, test_accuracy = model.evaluate(test_images, test_labels, verbose=0)\n", + "\n", + "# Print loss and accuracy\n", + "print('Test Loss:', test_loss)\n", + "print('Test Accuracy:', test_accuracy)\n", + "\n", + "# Plot loss over time\n", + "plt.plot(history.history['loss'], 'k-')\n", + "plt.title('Softmax Loss per Generation')\n", + "plt.xlabel('Generation')\n", + "plt.ylabel('Softmax Loss')\n", + "plt.show()\n", + "\n", + "# Plot accuracy over time\n", + "plt.plot(history.history['sparse_categorical_accuracy'], 'k-')\n", + "plt.title('Test Accuracy')\n", + "plt.xlabel('Generation')\n", + "plt.ylabel('Accuracy')\n", + "plt.show()" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## How to fine-tune current CNN architectures?" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The purpose of the script provided in this section is to download the CIFAR-10 data and sort it out in the proper folder structure for running it through the TensorFlow fine-tuning tutorial. The script should create the following folder structure.\n", + "\n", + "-train_dir\n", + " |--airplane\n", + " |--automobile\n", + " |--bird\n", + " |--cat\n", + " |--deer\n", + " |--dog\n", + " |--frog\n", + " |--horse\n", + " |--ship\n", + " |--truck\n", + "-validation_dir\n", + " |--airplane\n", + " |--automobile\n", + " |--bird\n", + " |--cat\n", + " |--deer\n", + " |--dog\n", + " |--frog\n", + " |--horse\n", + " |--ship\n", + " |--truck\n", + "\n", + " ### Code" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# In this script, we download the CIFAR-10 images and\n", + "# transform/save them in the Inception Retraining Format\n", + "#\n", + "# The end purpose of the files is for re-training the\n", + "# Google Inception tensorflow model to work on the CIFAR-10.\n", + "\n", + "import os\n", + "import tarfile\n", + "import pickle as cPickle\n", + "import numpy as np\n", + "import urllib.request\n", + "import imageio\n", + "from tensorflow.python.framework import ops\n", + "ops.reset_default_graph()\n", + "\n", + "cifar_link = 'https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz'\n", + "data_dir = 'temp'\n", + "if not os.path.isdir(data_dir):\n", + " os.makedirs(data_dir)\n", + "\n", + "# Download tar file\n", + "target_file = os.path.join(data_dir, 'cifar-10-python.tar.gz')\n", + "if not os.path.isfile(target_file):\n", + " print('CIFAR-10 file not found. Downloading CIFAR data (Size = 163MB)')\n", + " print('This may take a few minutes, please wait.')\n", + " filename, headers = urllib.request.urlretrieve(cifar_link, target_file)\n", + "\n", + "# Extract into memory\n", + "tar = tarfile.open(target_file)\n", + "tar.extractall(path=data_dir)\n", + "tar.close()\n", + "objects = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']\n", + "\n", + "# Create train image folders\n", + "train_folder = 'train_dir'\n", + "if not os.path.isdir(os.path.join(data_dir, train_folder)):\n", + " for i in range(10):\n", + " folder = os.path.join(data_dir, train_folder, objects[i])\n", + " os.makedirs(folder)\n", + "# Create test image folders\n", + "test_folder = 'validation_dir'\n", + "if not os.path.isdir(os.path.join(data_dir, test_folder)):\n", + " for i in range(10):\n", + " folder = os.path.join(data_dir, test_folder, objects[i])\n", + " os.makedirs(folder)\n", + "\n", + "# Extract images accordingly\n", + "data_location = os.path.join(data_dir, 'cifar-10-batches-py')\n", + "train_names = ['data_batch_' + str(x) for x in range(1,6)]\n", + "test_names = ['test_batch']\n", + "\n", + "\n", + "def load_batch_from_file(file):\n", + " file_conn = open(file, 'rb')\n", + " image_dictionary = cPickle.load(file_conn, encoding='latin1')\n", + " file_conn.close()\n", + " return image_dictionary\n", + "\n", + "\n", + "def save_images_from_dict(image_dict, folder='data_dir'):\n", + " # image_dict.keys() = 'labels', 'filenames', 'data', 'batch_label'\n", + " for ix, label in enumerate(image_dict['labels']):\n", + " folder_path = os.path.join(data_dir, folder, objects[label])\n", + " filename = image_dict['filenames'][ix]\n", + " # Transform image data\n", + " image_array = image_dict['data'][ix]\n", + " image_array.resize([3, 32, 32])\n", + " # Save image using imageio\n", + " output_location = os.path.join(folder_path, filename)\n", + " # Ensure the pixel values are in the range [0, 255]\n", + " image_array = np.clip(image_array, 0, 255).astype(np.uint8)\n", + " imageio.imwrite(output_location, image_array.transpose(1, 2, 0))\n", + "\n", + "# Sort train images\n", + "for file in train_names:\n", + " print('Saving images from file: {}'.format(file))\n", + " file_location = os.path.join(data_dir, 'cifar-10-batches-py', file)\n", + " image_dict = load_batch_from_file(file_location)\n", + " save_images_from_dict(image_dict, folder=train_folder)\n", + "\n", + "# Sort test images\n", + "for file in test_names:\n", + " print('Saving images from file: {}'.format(file))\n", + " file_location = os.path.join(data_dir, 'cifar-10-batches-py', file)\n", + " image_dict = load_batch_from_file(file_location)\n", + " save_images_from_dict(image_dict, folder=test_folder)\n", + " \n", + "# Create labels file\n", + "cifar_labels_file = os.path.join(data_dir,'cifar10_labels.txt')\n", + "print('Writing labels file, {}'.format(cifar_labels_file))\n", + "with open(cifar_labels_file, 'w') as labels_file:\n", + " for item in objects:\n", + " labels_file.write(\"{}\\n\".format(item))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Stylenet / Neural-Style\n", + "\n", + "The purpose of this script is to illustrate how to do stylenet in TensorFlow. We reference the following [paper](https://arxiv.org/abs/1508.06576) for this algorithm.\n", + "\n", + "But there is some prerequisites,\n", + "\n", + "- Download the `VGG-verydeep-19.mat` file.\n", + "- You must download two images, a style image and a content image for the algorithm to blend.\n", + "\n", + "The style image is\n", + "\n", + ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/CNN/starry_night.jpg\n", + "---\n", + "name: Style image: starry night\n", + ":::\n", + "\n", + "The context image is below.\n", + "\n", + ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/CNN/book_cover.jpg\n", + "---\n", + "name: Content image: book cover\n", + ":::\n", + "\n", + "The final result looks like\n", + "\n", + ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/CNN/05_stylenet_ex.png\n", + "---\n", + "name: stylenet final result\n", + ":::\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Code" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# We use two images, an original image and a style image\n", + "# and try to make the original image in the style of the style image.\n", + "#\n", + "# Reference paper:\n", + "# https://arxiv.org/abs/1508.06576\n", + "#\n", + "# Need to download the model 'imagenet-vgg-verydee-19.mat' from:\n", + "# http://www.vlfeat.org/matconvnet/models/beta16/imagenet-vgg-verydeep-19.mat\n", + "\n", + "import os\n", + "import scipy.io\n", + "import scipy.misc\n", + "import imageio\n", + "from skimage.transform import resize\n", + "from operator import mul\n", + "from functools import reduce\n", + "from PIL import Image\n", + "import numpy as np\n", + "import requests\n", + "import tensorflow.compat.v1 as tf\n", + "tf.disable_eager_execution()\n", + "from tensorflow.python.framework import ops\n", + "ops.reset_default_graph()\n", + "\n", + "# URLs\n", + "original_image_url = 'https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/CNN/book_cover.jpg'\n", + "style_image_url = 'https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/CNN/starry_night.jpg'\n", + "vgg_url = 'https://static-1300131294.cos.ap-shanghai.myqcloud.com/data/deep-learning/cnn/imagenet-vgg-verydeep-19.mat'\n", + "\n", + "# Local directories\n", + "data_dir = 'temp'\n", + "vgg_dir = os.path.join(data_dir, 'VGG')\n", + "if not os.path.exists(vgg_dir):\n", + " os.makedirs(vgg_dir)\n", + "\n", + "# Function to download and save a file\n", + "def download_file(url, directory):\n", + " response = requests.get(url)\n", + " filename = url.split('/')[-1]\n", + " filepath = os.path.join(directory, filename)\n", + " with open(filepath, 'wb') as f:\n", + " f.write(response.content)\n", + " return filepath\n", + "\n", + "# Download images and VGG Network\n", + "original_image_path = download_file(original_image_url, data_dir)\n", + "style_image_path = download_file(style_image_url, data_dir)\n", + "vgg_path = download_file(vgg_url, vgg_dir)\n", + "\n", + "# Load images using PIL and convert to NumPy arrays\n", + "original_image = Image.open(original_image_path)\n", + "style_image = Image.open(style_image_path)\n", + "original_image = np.array(original_image)\n", + "style_image = np.array(style_image)\n", + "\n", + "# Default Arguments\n", + "original_image_weight = 5.0\n", + "style_image_weight = 500.0\n", + "regularization_weight = 100\n", + "learning_rate = 10\n", + "generations = 100\n", + "output_generations = 25\n", + "beta1 = 0.9\n", + "beta2 = 0.999\n", + "\n", + "# Get shape of target and make the style image the same\n", + "target_shape = original_image.shape\n", + "style_image = resize(style_image, target_shape)\n", + "\n", + "# VGG-19 Layer Setup\n", + "# From paper\n", + "vgg_layers = ['conv1_1', 'relu1_1',\n", + " 'conv1_2', 'relu1_2', 'pool1',\n", + " 'conv2_1', 'relu2_1',\n", + " 'conv2_2', 'relu2_2', 'pool2',\n", + " 'conv3_1', 'relu3_1',\n", + " 'conv3_2', 'relu3_2',\n", + " 'conv3_3', 'relu3_3',\n", + " 'conv3_4', 'relu3_4', 'pool3',\n", + " 'conv4_1', 'relu4_1',\n", + " 'conv4_2', 'relu4_2',\n", + " 'conv4_3', 'relu4_3',\n", + " 'conv4_4', 'relu4_4', 'pool4',\n", + " 'conv5_1', 'relu5_1',\n", + " 'conv5_2', 'relu5_2',\n", + " 'conv5_3', 'relu5_3',\n", + " 'conv5_4', 'relu5_4']\n", + "\n", + "\n", + "# Extract weights and matrix means\n", + "def extract_net_info(path_to_params):\n", + " vgg_data = scipy.io.loadmat(path_to_params)\n", + " normalization_matrix = vgg_data['normalization'][0][0][0]\n", + " mat_mean = np.mean(normalization_matrix, axis=(0, 1))\n", + " network_weights = vgg_data['layers'][0]\n", + " return mat_mean, network_weights\n", + " \n", + "\n", + "# Create the VGG-19 Network\n", + "def vgg_network(network_weights, init_image):\n", + " network = {}\n", + " image = init_image\n", + "\n", + " for i, layer in enumerate(vgg_layers):\n", + " if layer[0] == 'c':\n", + " weights, bias = network_weights[i][0][0][0][0]\n", + " weights = np.transpose(weights, (1, 0, 2, 3))\n", + " bias = bias.reshape(-1)\n", + " conv_layer = tf.nn.conv2d(image, tf.constant(weights), (1, 1, 1, 1), 'SAME')\n", + " image = tf.nn.bias_add(conv_layer, bias)\n", + " elif layer[0] == 'r':\n", + " image = tf.nn.relu(image)\n", + " else: # pooling\n", + " image = tf.nn.max_pool(image, (1, 2, 2, 1), (1, 2, 2, 1), 'SAME')\n", + " network[layer] = image\n", + " return network\n", + "\n", + "# Here we define which layers apply to the original or style image\n", + "original_layers = ['relu4_2', 'relu5_2']\n", + "style_layers = ['relu1_1', 'relu2_1', 'relu3_1', 'relu4_1', 'relu5_1']\n", + "\n", + "# Get network parameters\n", + "normalization_mean, network_weights = extract_net_info(vgg_path)\n", + "\n", + "shape = (1,) + original_image.shape\n", + "style_shape = (1,) + style_image.shape\n", + "original_features = {}\n", + "style_features = {}\n", + "\n", + "# Set style weights\n", + "style_weights = {l: 1./(len(style_layers)) for l in style_layers}\n", + "\n", + "# Computer feature layers with original image\n", + "g_original = tf.Graph()\n", + "with g_original.as_default(), tf.Session() as sess1:\n", + " image = tf.placeholder('float', shape=shape)\n", + " vgg_net = vgg_network(network_weights, image)\n", + " original_minus_mean = original_image - normalization_mean\n", + " original_norm = np.array([original_minus_mean])\n", + " for layer in original_layers:\n", + " original_features[layer] = vgg_net[layer].eval(feed_dict={image: original_norm})\n", + "\n", + "# Get style image network\n", + "g_style = tf.Graph()\n", + "with g_style.as_default(), tf.Session() as sess2:\n", + " image = tf.placeholder('float', shape=style_shape)\n", + " vgg_net = vgg_network(network_weights, image)\n", + " style_minus_mean = style_image - normalization_mean\n", + " style_norm = np.array([style_minus_mean])\n", + " for layer in style_layers:\n", + " features = vgg_net[layer].eval(feed_dict={image: style_norm})\n", + " features = np.reshape(features, (-1, features.shape[3]))\n", + " gram = np.matmul(features.T, features) / features.size\n", + " style_features[layer] = gram\n", + "\n", + "# Make Combined Image via loss function\n", + "with tf.Graph().as_default():\n", + " # Get network parameters\n", + " initial = tf.random_normal(shape) * 0.256\n", + " init_image = tf.Variable(initial)\n", + " vgg_net = vgg_network(network_weights, init_image)\n", + "\n", + " # Loss from Original Image\n", + " original_layers_w = {'relu4_2': 0.5, 'relu5_2': 0.5}\n", + " original_loss = 0\n", + " for o_layer in original_layers:\n", + " temp_original_loss = original_layers_w[o_layer] * original_image_weight *\\\n", + " (2 * tf.nn.l2_loss(vgg_net[o_layer] - original_features[o_layer]))\n", + " original_loss += (temp_original_loss / original_features[o_layer].size)\n", + "\n", + " # Loss from Style Image\n", + " style_loss = 0\n", + " style_losses = []\n", + " for style_layer in style_layers:\n", + " layer = vgg_net[style_layer]\n", + " feats, height, width, channels = [x.value for x in layer.get_shape()]\n", + " size = height * width * channels\n", + " features = tf.reshape(layer, (-1, channels))\n", + " style_gram_matrix = tf.matmul(tf.transpose(features), features) / size\n", + " style_expected = style_features[style_layer]\n", + " style_losses.append(style_weights[style_layer] * 2 *\n", + " tf.nn.l2_loss(style_gram_matrix - style_expected) /\n", + " style_expected.size)\n", + " style_loss += style_image_weight * tf.reduce_sum(style_losses)\n", + "\n", + " # To Smooth the results, we add in total variation loss\n", + " total_var_x = reduce(mul, init_image[:, 1:, :, :].get_shape().as_list(), 1)\n", + " total_var_y = reduce(mul, init_image[:, :, 1:, :].get_shape().as_list(), 1)\n", + " first_term = regularization_weight * 2\n", + " second_term_numerator = tf.nn.l2_loss(init_image[:, 1:, :, :] - init_image[:, :shape[1]-1, :, :])\n", + " second_term = second_term_numerator / total_var_y\n", + " third_term = (tf.nn.l2_loss(init_image[:, :, 1:, :] - init_image[:, :, :shape[2]-1, :]) / total_var_x)\n", + " total_variation_loss = first_term * (second_term + third_term)\n", + "\n", + " # Combined Loss\n", + " loss = original_loss + style_loss + total_variation_loss\n", + "\n", + " # Declare Optimization Algorithm\n", + " optimizer = tf.train.AdamOptimizer(learning_rate, beta1, beta2)\n", + " train_step = optimizer.minimize(loss)\n", + "\n", + " # Initialize variables and start training\n", + " with tf.Session() as sess:\n", + " tf.global_variables_initializer().run()\n", + " for i in range(generations):\n", + "\n", + " train_step.run()\n", + "\n", + " # Print update and save temporary output\n", + " if (i+1) % output_generations == 0:\n", + " print('Generation {} out of {}, loss: {}'.format(i + 1, generations,sess.run(loss)))\n", + "\n", + " image_eval = init_image.eval(session=sess)\n", + " image_eval = image_eval.reshape(shape[1:]) # 确保形状正确\n", + " image_eval += normalization_mean # 加上均值\n", + " image_eval = np.clip(image_eval, 0, 255) # 确保值在0到255之间\n", + " image_eval = image_eval.astype(np.uint8) # 转换为uint8\n", + "\n", + "# 保存图像\n", + "output_file = 'temp_output_{}.jpg'.format(i)\n", + "imageio.imwrite(output_file, image_eval)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Deepdream in TensorFlow\n", + "Note: There is no new code in this script. It originates from the TensorFlow tutorial located here. However, this code is modified slightly to run on Python 3. The code is also commented very heavily to explain, line-by-line, what occurs in the deepdream demo.\n", + "\n", + "Here are some potential outputs.\n", + "\n", + ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/CNN/06_deepdream_ex.png\n", + "---\n", + "name: Deepdream outputs\n", + ":::" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Code" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Using TensorFlow for Deep Dream\n", + "#---------------------------------------\n", + "# From: Alexander Mordvintsev\n", + "# --https://www.tensorflow.org/tutorials/generative/deepdream\n", + "#\n", + "# And as this code use Tensorflow 2.x, you may need to restart the kernel to run successfully." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import tensorflow as tf" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "import matplotlib as mpl\n", + "\n", + "import IPython.display as display\n", + "import PIL.Image" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "url = 'https://storage.googleapis.com/download.tensorflow.org/example_images/YellowLabradorLooking_new.jpg'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Download an image and read it into a NumPy array.\n", + "def download(url, max_dim=None):\n", + " name = url.split('/')[-1]\n", + " image_path = tf.keras.utils.get_file(name, origin=url)\n", + " img = PIL.Image.open(image_path)\n", + " if max_dim:\n", + " img.thumbnail((max_dim, max_dim))\n", + " return np.array(img)\n", + "\n", + "# Normalize an image\n", + "def deprocess(img):\n", + " img = 255*(img + 1.0)/2.0\n", + " return tf.cast(img, tf.uint8)\n", + "\n", + "# Display an image\n", + "def show(img):\n", + " display.display(PIL.Image.fromarray(np.array(img)))\n", + "\n", + "\n", + "# Downsizing the image makes it easier to work with.\n", + "original_img = download(url, max_dim=500)\n", + "show(original_img)\n", + "display.display(display.HTML('Image cc-by: Von.grzanka'))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "base_model = tf.keras.applications.InceptionV3(include_top=False, weights='imagenet')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Maximize the activations of these layers\n", + "names = ['mixed3', 'mixed5']\n", + "layers = [base_model.get_layer(name).output for name in names]\n", + "\n", + "# Create the feature extraction model\n", + "dream_model = tf.keras.Model(inputs=base_model.input, outputs=layers)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def calc_loss(img, model):\n", + " # Pass forward the image through the model to retrieve the activations.\n", + " # Converts the image into a batch of size 1.\n", + " img_batch = tf.expand_dims(img, axis=0)\n", + " layer_activations = model(img_batch)\n", + " if len(layer_activations) == 1:\n", + " layer_activations = [layer_activations]\n", + "\n", + " losses = []\n", + " for act in layer_activations:\n", + " loss = tf.math.reduce_mean(act)\n", + " losses.append(loss)\n", + "\n", + " return tf.reduce_sum(losses)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class DeepDream(tf.Module):\n", + " def __init__(self, model):\n", + " self.model = model\n", + "\n", + " @tf.function(\n", + " input_signature=(\n", + " tf.TensorSpec(shape=[None,None,3], dtype=tf.float32),\n", + " tf.TensorSpec(shape=[], dtype=tf.int32),\n", + " tf.TensorSpec(shape=[], dtype=tf.float32),)\n", + " )\n", + " def __call__(self, img, steps, step_size):\n", + " print(\"Tracing\")\n", + " loss = tf.constant(0.0)\n", + " for n in tf.range(steps):\n", + " with tf.GradientTape() as tape:\n", + " # This needs gradients relative to `img`\n", + " # `GradientTape` only watches `tf.Variable`s by default\n", + " tape.watch(img)\n", + " loss = calc_loss(img, self.model)\n", + "\n", + " # Calculate the gradient of the loss with respect to the pixels of the input image.\n", + " gradients = tape.gradient(loss, img)\n", + "\n", + " # Normalize the gradients.\n", + " gradients /= tf.math.reduce_std(gradients) + 1e-8 \n", + " \n", + " # In gradient ascent, the \"loss\" is maximized so that the input image increasingly \"excites\" the layers.\n", + " # You can update the image by directly adding the gradients (because they're the same shape!)\n", + " img = img + gradients*step_size\n", + " img = tf.clip_by_value(img, -1, 1)\n", + "\n", + " return loss, img" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "deepdream = DeepDream(dream_model)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def run_deep_dream_simple(img, steps=100, step_size=0.01):\n", + " # Convert from uint8 to the range expected by the model.\n", + " img = tf.keras.applications.inception_v3.preprocess_input(img)\n", + " img = tf.convert_to_tensor(img)\n", + " step_size = tf.convert_to_tensor(step_size)\n", + " steps_remaining = steps\n", + " step = 0\n", + " while steps_remaining:\n", + " if steps_remaining>100:\n", + " run_steps = tf.constant(100)\n", + " else:\n", + " run_steps = tf.constant(steps_remaining)\n", + " steps_remaining -= run_steps\n", + " step += run_steps\n", + "\n", + " loss, img = deepdream(img, run_steps, tf.constant(step_size))\n", + " \n", + " display.clear_output(wait=True)\n", + " show(deprocess(img))\n", + " print (\"Step {}, loss {}\".format(step, loss))\n", + "\n", + "\n", + " result = deprocess(img)\n", + " display.clear_output(wait=True)\n", + " show(result)\n", + "\n", + " return result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dream_img = run_deep_dream_simple(img=original_img, \n", + " steps=100, step_size=0.01)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import time\n", + "start = time.time()\n", + "\n", + "OCTAVE_SCALE = 1.30\n", + "\n", + "img = tf.constant(np.array(original_img))\n", + "base_shape = tf.shape(img)[:-1]\n", + "float_base_shape = tf.cast(base_shape, tf.float32)\n", + "\n", + "for n in range(-2, 3):\n", + " new_shape = tf.cast(float_base_shape*(OCTAVE_SCALE**n), tf.int32)\n", + "\n", + " img = tf.image.resize(img, new_shape).numpy()\n", + "\n", + " img = run_deep_dream_simple(img=img, steps=50, step_size=0.01)\n", + "\n", + "display.clear_output(wait=True)\n", + "img = tf.image.resize(img, base_shape)\n", + "img = tf.image.convert_image_dtype(img/255.0, dtype=tf.uint8)\n", + "show(img)\n", + "\n", + "end = time.time()\n", + "end-start" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Your turn! 🚀\n", + "\n", + "TBD." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Self study\n", + "\n", + "You can refer to those YouTube videos for further study:\n", + "\n", + "- [Convolutional Neural Networks (CNNs) explained, by deeplizard](https://www.youtube.com/watch?v=YRhxdVk_sIs)\n", + "- [Convolutional Neural Networks Explained (CNN Visualized), by Futurology](https://www.youtube.com/watch?v=pj9-rr1wDhM)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Research trend\n", + "\n", + "State of the Art Convolutional Neural Networks (CNNs) Explained | Deep Learning in 2020:\n", + "\n", + "
\n", + " \n", + "
" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Acknowledgments\n", + "\n", + "Thanks to [Nick](https://github.com/nfmcclure) for creating the open-source course [tensorflow_cookbook](https://github.com/nfmcclure/tensorflow_cookbook). It inspires the majority of the content in this chapter.\n", + "\n", + "---\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.16" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/open-machine-learning-jupyter-book/deep-learning/cnn.md b/open-machine-learning-jupyter-book/deep-learning/cnn.md deleted file mode 100644 index 8f996dc028..0000000000 --- a/open-machine-learning-jupyter-book/deep-learning/cnn.md +++ /dev/null @@ -1,1205 +0,0 @@ ---- -jupytext: - cell_metadata_filter: -all - formats: md:myst - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.11.5 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - - -# Convolutional Neural Networks - -```{epigraph} -Thanks to Convolutional Neural Network, computer vision is working far better than just two years ago, and this is enabling numerous exciting applications ranging from safe autonomous driving, to accurate face recognition, to automatic reading of radiology images. - --- Andrew Ng -``` - -Convolutional Neural Networks (CNNs) are responsible for the latest major breakthroughs in image recognition in the past few years. - -In mathematics, a convolution is a function that is applied over the output of another function. In our case, we will consider applying a matrix multiplication (filter) across an image. See the below diagram for an example of how this may work. - -:::{figure-md} 01_intro_cnn-dl - - -Illustration of matrix mutliplication (filter) in CNN {cite}`reluwiki` -::: - -

- -A demo of convolution function. [source] -

- -CNNs generally follow a structure. The main convolutional setup is (input array) -> (convolutional filter layer) -> (Pooling) -> (Activation layer). The above diagram depicts how a convolutional layer may create one feature. Generally, filters are multidimensional and end up creating many features. It is also common to have a completely separate filter-feature creator of different sizes acting on the same layer. After this convolutional filter, it is common to apply a pooling layer. This pooling may be a max-pooling or an average pooling or another aggregation. One of the key concepts here is that the pooling layer has no parameters while decreasing the layer size. See the below diagram for an example of max-pooling. - -:::{figure-md} 01_intro_cnn2-dl - - -Illustration of max pooling -::: - -After the max pooling, there is generally an activation layer. One of the more common activation layers is the ReLU (Rectified Linear Unit) {cite}`reluwiki`. - -## MNIST handwritten digits - -Here we illustrate how to use a simple CNN with three convolutional units to predict the MNIST handwritten digits. - -```{note} -There is good reason why this dataset is used like the 'hello world' of image recognition, it is fairly compact while having a decent amount of training, test, and validation data. It only has one channel (black and white) and only ten possible outputs (0-9). -``` - -When the script is done training the model, you should see similar output to the following graphs. - -:::{figure-md} 02_cnn1_loss_acc-dl - - -Train MINIST dataset with CNN: accuracy and loss -::: - -Training and test loss (left) and test batch accuracy (right). - -:::{figure-md} 02_cnn1_mnist_output-dl - - -Train MINIST dataset with CNN: prediction output -::: - -A random set of 6 digits with actual and predicted labels. You can see a prediction failure in the lower right box. - -

- -A demo of CNN. [source] -

- -

- -A demo of CNN. [source] -

- -### Code - -```{code-cell} -# In this example, we will download the MNIST handwritten -# digits and create a simple CNN network to predict the -# digit category (0-9) - -import matplotlib.pyplot as plt -import numpy as np -import tensorflow as tf -from tensorflow.examples.tutorials.mnist import input_data -from tensorflow.python.framework import ops -ops.reset_default_graph() - -# Start a graph session -sess = tf.Session() - -# Load data -data_dir = 'temp' -mnist = input_data.read_data_sets(data_dir, one_hot=False) - -# Convert images into 28x28 (they are downloaded as 1x784) -train_xdata = np.array([np.reshape(x, (28, 28)) for x in mnist.train.images]) -test_xdata = np.array([np.reshape(x, (28, 28)) for x in mnist.test.images]) - -# Convert labels into one-hot encoded vectors -train_labels = mnist.train.labels -test_labels = mnist.test.labels - -# Set model parameters -batch_size = 100 -learning_rate = 0.005 -evaluation_size = 500 -image_width = train_xdata[0].shape[0] -image_height = train_xdata[0].shape[1] -target_size = np.max(train_labels) + 1 -num_channels = 1 # greyscale = 1 channel -generations = 500 -eval_every = 5 -conv1_features = 25 -conv2_features = 50 -max_pool_size1 = 2 # NxN window for 1st max pool layer -max_pool_size2 = 2 # NxN window for 2nd max pool layer -fully_connected_size1 = 100 - -# Declare model placeholders -x_input_shape = (batch_size, image_width, image_height, num_channels) -x_input = tf.placeholder(tf.float32, shape=x_input_shape) -y_target = tf.placeholder(tf.int32, shape=(batch_size)) -eval_input_shape = (evaluation_size, image_width, image_height, num_channels) -eval_input = tf.placeholder(tf.float32, shape=eval_input_shape) -eval_target = tf.placeholder(tf.int32, shape=(evaluation_size)) - -# Declare model parameters -conv1_weight = tf.Variable(tf.truncated_normal([4, 4, num_channels, conv1_features], - stddev=0.1, dtype=tf.float32)) -conv1_bias = tf.Variable(tf.zeros([conv1_features], dtype=tf.float32)) - -conv2_weight = tf.Variable(tf.truncated_normal([4, 4, conv1_features, conv2_features], - stddev=0.1, dtype=tf.float32)) -conv2_bias = tf.Variable(tf.zeros([conv2_features], dtype=tf.float32)) - -# fully connected variables -resulting_width = image_width // (max_pool_size1 * max_pool_size2) -resulting_height = image_height // (max_pool_size1 * max_pool_size2) -full1_input_size = resulting_width * resulting_height * conv2_features -full1_weight = tf.Variable(tf.truncated_normal([full1_input_size, fully_connected_size1], - stddev=0.1, dtype=tf.float32)) -full1_bias = tf.Variable(tf.truncated_normal([fully_connected_size1], stddev=0.1, dtype=tf.float32)) -full2_weight = tf.Variable(tf.truncated_normal([fully_connected_size1, target_size], - stddev=0.1, dtype=tf.float32)) -full2_bias = tf.Variable(tf.truncated_normal([target_size], stddev=0.1, dtype=tf.float32)) - - -# Initialize Model Operations -def my_conv_net(conv_input_data): - # First Conv-ReLU-MaxPool Layer - conv1 = tf.nn.conv2d(conv_input_data, conv1_weight, strides=[1, 1, 1, 1], padding='SAME') - relu1 = tf.nn.relu(tf.nn.bias_add(conv1, conv1_bias)) - max_pool1 = tf.nn.max_pool(relu1, ksize=[1, max_pool_size1, max_pool_size1, 1], - strides=[1, max_pool_size1, max_pool_size1, 1], padding='SAME') - - # Second Conv-ReLU-MaxPool Layer - conv2 = tf.nn.conv2d(max_pool1, conv2_weight, strides=[1, 1, 1, 1], padding='SAME') - relu2 = tf.nn.relu(tf.nn.bias_add(conv2, conv2_bias)) - max_pool2 = tf.nn.max_pool(relu2, ksize=[1, max_pool_size2, max_pool_size2, 1], - strides=[1, max_pool_size2, max_pool_size2, 1], padding='SAME') - - # Transform Output into a 1xN layer for next fully connected layer - final_conv_shape = max_pool2.get_shape().as_list() - final_shape = final_conv_shape[1] * final_conv_shape[2] * final_conv_shape[3] - flat_output = tf.reshape(max_pool2, [final_conv_shape[0], final_shape]) - - # First Fully Connected Layer - fully_connected1 = tf.nn.relu(tf.add(tf.matmul(flat_output, full1_weight), full1_bias)) - - # Second Fully Connected Layer - final_model_output = tf.add(tf.matmul(fully_connected1, full2_weight), full2_bias) - - return final_model_output - -model_output = my_conv_net(x_input) -test_model_output = my_conv_net(eval_input) - -# Declare Loss Function (softmax cross entropy) -loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=model_output, labels=y_target)) - -# Create a prediction function -prediction = tf.nn.softmax(model_output) -test_prediction = tf.nn.softmax(test_model_output) - - -# Create accuracy function -def get_accuracy(logits, targets): - batch_predictions = np.argmax(logits, axis=1) - num_correct = np.sum(np.equal(batch_predictions, targets)) - return 100. * num_correct/batch_predictions.shape[0] - -# Create an optimizer -my_optimizer = tf.train.MomentumOptimizer(learning_rate, 0.9) -train_step = my_optimizer.minimize(loss) - -# Initialize Variables -init = tf.global_variables_initializer() -sess.run(init) - -# Start training loop -train_loss = [] -train_acc = [] -test_acc = [] -for i in range(generations): - rand_index = np.random.choice(len(train_xdata), size=batch_size) - rand_x = train_xdata[rand_index] - rand_x = np.expand_dims(rand_x, 3) - rand_y = train_labels[rand_index] - train_dict = {x_input: rand_x, y_target: rand_y} - - sess.run(train_step, feed_dict=train_dict) - temp_train_loss, temp_train_preds = sess.run([loss, prediction], feed_dict=train_dict) - temp_train_acc = get_accuracy(temp_train_preds, rand_y) - - if (i+1) % eval_every == 0: - eval_index = np.random.choice(len(test_xdata), size=evaluation_size) - eval_x = test_xdata[eval_index] - eval_x = np.expand_dims(eval_x, 3) - eval_y = test_labels[eval_index] - test_dict = {eval_input: eval_x, eval_target: eval_y} - test_preds = sess.run(test_prediction, feed_dict=test_dict) - temp_test_acc = get_accuracy(test_preds, eval_y) - - # Record and print results - train_loss.append(temp_train_loss) - train_acc.append(temp_train_acc) - test_acc.append(temp_test_acc) - acc_and_loss = [(i+1), temp_train_loss, temp_train_acc, temp_test_acc] - acc_and_loss = [np.round(x, 2) for x in acc_and_loss] - print('Generation # {}. Train Loss: {:.2f}. Train Acc (Test Acc): {:.2f} ({:.2f})'.format(*acc_and_loss)) - - -# Matlotlib code to plot the loss and accuracies -eval_indices = range(0, generations, eval_every) -# Plot loss over time -plt.plot(eval_indices, train_loss, 'k-') -plt.title('Softmax Loss per Generation') -plt.xlabel('Generation') -plt.ylabel('Softmax Loss') -plt.show() - -# Plot train and test accuracy -plt.plot(eval_indices, train_acc, 'k-', label='Train Set Accuracy') -plt.plot(eval_indices, test_acc, 'r--', label='Test Set Accuracy') -plt.title('Train and Test Accuracy') -plt.xlabel('Generation') -plt.ylabel('Accuracy') -plt.legend(loc='lower right') -plt.show() - -# Plot some samples -# Plot the 6 of the last batch results: -actuals = rand_y[0:6] -predictions = np.argmax(temp_train_preds, axis=1)[0:6] -images = np.squeeze(rand_x[0:6]) - -Nrows = 2 -Ncols = 3 -for i in range(6): - plt.subplot(Nrows, Ncols, i+1) - plt.imshow(np.reshape(images[i], [28, 28]), cmap='Greys_r') - plt.title('Actual: ' + str(actuals[i]) + ' Pred: ' + str(predictions[i]), - fontsize=10) - frame = plt.gca() - frame.axes.get_xaxis().set_visible(False) - frame.axes.get_yaxis().set_visible(False) -``` - -## CIFAR-10 - -```{seealso} -Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. [CIFAR-10 and CIFAR-100 datasets](https://www.cs.toronto.edu/~kriz/cifar.html). -``` - -Here we will build a convolutional neural network to predict the `CIFAR-10` data. - -The script provided will download and unzip the `CIFAR-10` data. Then it will start training a CNN from scratch. You should see similar output at the end of the following two graphs. - -:::{figure-md} 03_cnn2_loss_acc-dl - - -Train `CIFAR-10` dataset with CNN: accuracy and loss -::: - -Here we see the training loss (left) and the test batch accuracy (right). - -### Code - -```{code-cell} -# In this example, we will download the CIFAR-10 images -# and build a CNN model with dropout and regularization -# -# CIFAR is composed ot 50k train and 10k test -# images that are 32x32. - -import os -import sys -import tarfile -import matplotlib.pyplot as plt -import numpy as np -import tensorflow as tf -from six.moves import urllib -from tensorflow.python.framework import ops -ops.reset_default_graph() - -# Change Directory -try: - abspath = os.path.abspath(__file__) -except NameError: - abspath = os.getcwd() -dname = os.path.dirname(abspath) -os.chdir(dname) - -# Start a graph session -sess = tf.Session() - -# Set model parameters -batch_size = 128 -data_dir = 'temp' -output_every = 50 -generations = 20000 -eval_every = 500 -image_height = 32 -image_width = 32 -crop_height = 24 -crop_width = 24 -num_channels = 3 -num_targets = 10 -extract_folder = 'cifar-10-batches-bin' - -# Exponential Learning Rate Decay Params -learning_rate = 0.1 -lr_decay = 0.1 -num_gens_to_wait = 250. - -# Extract model parameters -image_vec_length = image_height * image_width * num_channels -record_length = 1 + image_vec_length # ( + 1 for the 0-9 label) - -# Load data -data_dir = 'temp' -if not os.path.exists(data_dir): - os.makedirs(data_dir) -cifar10_url = 'http://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz' - -# Check if file exists, otherwise download it -data_file = os.path.join(data_dir, 'cifar-10-binary.tar.gz') -if os.path.isfile(data_file): - pass -else: - # Download file - def progress(block_num, block_size, total_size): - progress_info = [cifar10_url, float(block_num * block_size) / float(total_size) * 100.0] - print('\r Downloading {} - {:.2f}%'.format(*progress_info), end="") - filepath, _ = urllib.request.urlretrieve(cifar10_url, data_file, progress) - # Extract file - tarfile.open(filepath, 'r:gz').extractall(data_dir) - - -# Define CIFAR reader -def read_cifar_files(filename_queue, distort_images = True): - reader = tf.FixedLengthRecordReader(record_bytes=record_length) - key, record_string = reader.read(filename_queue) - record_bytes = tf.decode_raw(record_string, tf.uint8) - image_label = tf.cast(tf.slice(record_bytes, [0], [1]), tf.int32) - - # Extract image - image_extracted = tf.reshape(tf.slice(record_bytes, [1], [image_vec_length]), - [num_channels, image_height, image_width]) - - # Reshape image - image_uint8image = tf.transpose(image_extracted, [1, 2, 0]) - reshaped_image = tf.cast(image_uint8image, tf.float32) - # Randomly Crop image - final_image = tf.image.resize_image_with_crop_or_pad(reshaped_image, crop_width, crop_height) - - if distort_images: - # Randomly flip the image horizontally, change the brightness and contrast - final_image = tf.image.random_flip_left_right(final_image) - final_image = tf.image.random_brightness(final_image,max_delta=63) - final_image = tf.image.random_contrast(final_image,lower=0.2, upper=1.8) - - # Normalize whitening - final_image = tf.image.per_image_standardization(final_image) - return final_image, image_label - - -# Create a CIFAR image pipeline from reader -def input_pipeline(batch_size, train_logical=True): - if train_logical: - files = [os.path.join(data_dir, extract_folder, 'data_batch_{}.bin'.format(i)) for i in range(1,6)] - else: - files = [os.path.join(data_dir, extract_folder, 'test_batch.bin')] - filename_queue = tf.train.string_input_producer(files) - image, label = read_cifar_files(filename_queue) - - # min_after_dequeue defines how big a buffer we will randomly sample - # from -- bigger means better shuffling but slower start up and more - # memory used. - # capacity must be larger than min_after_dequeue and the amount larger - # determines the maximum we will prefetch. Recommendation: - # min_after_dequeue + (num_threads + a small safety margin) * batch_size - min_after_dequeue = 5000 - capacity = min_after_dequeue + 3 * batch_size - example_batch, label_batch = tf.train.shuffle_batch([image, label], - batch_size=batch_size, - capacity=capacity, - min_after_dequeue=min_after_dequeue) - - return example_batch, label_batch - - -# Define the model architecture, this will return logits from images -def cifar_cnn_model(input_images, batch_size, train_logical=True): - def truncated_normal_var(name, shape, dtype): - return(tf.get_variable(name=name, shape=shape, dtype=dtype, initializer=tf.truncated_normal_initializer(stddev=0.05))) - def zero_var(name, shape, dtype): - return(tf.get_variable(name=name, shape=shape, dtype=dtype, initializer=tf.constant_initializer(0.0))) - - # First Convolutional Layer - with tf.variable_scope('conv1') as scope: - # Conv_kernel is 5x5 for all 3 colors and we will create 64 features - conv1_kernel = truncated_normal_var(name='conv_kernel1', shape=[5, 5, 3, 64], dtype=tf.float32) - # We convolve across the image with a stride size of 1 - conv1 = tf.nn.conv2d(input_images, conv1_kernel, [1, 1, 1, 1], padding='SAME') - # Initialize and add the bias term - conv1_bias = zero_var(name='conv_bias1', shape=[64], dtype=tf.float32) - conv1_add_bias = tf.nn.bias_add(conv1, conv1_bias) - # ReLU element wise - relu_conv1 = tf.nn.relu(conv1_add_bias) - - # Max Pooling - pool1 = tf.nn.max_pool(relu_conv1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1],padding='SAME', name='pool_layer1') - - # Local Response Normalization (parameters from paper) - # paper: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks - norm1 = tf.nn.lrn(pool1, depth_radius=5, bias=2.0, alpha=1e-3, beta=0.75, name='norm1') - - # Second Convolutional Layer - with tf.variable_scope('conv2') as scope: - # Conv kernel is 5x5, across all prior 64 features and we create 64 more features - conv2_kernel = truncated_normal_var(name='conv_kernel2', shape=[5, 5, 64, 64], dtype=tf.float32) - # Convolve filter across prior output with stride size of 1 - conv2 = tf.nn.conv2d(norm1, conv2_kernel, [1, 1, 1, 1], padding='SAME') - # Initialize and add the bias - conv2_bias = zero_var(name='conv_bias2', shape=[64], dtype=tf.float32) - conv2_add_bias = tf.nn.bias_add(conv2, conv2_bias) - # ReLU element wise - relu_conv2 = tf.nn.relu(conv2_add_bias) - - # Max Pooling - pool2 = tf.nn.max_pool(relu_conv2, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='SAME', name='pool_layer2') - - # Local Response Normalization (parameters from paper) - norm2 = tf.nn.lrn(pool2, depth_radius=5, bias=2.0, alpha=1e-3, beta=0.75, name='norm2') - - # Reshape output into a single matrix for multiplication for the fully connected layers - reshaped_output = tf.reshape(norm2, [batch_size, -1]) - reshaped_dim = reshaped_output.get_shape()[1].value - - # First Fully Connected Layer - with tf.variable_scope('full1') as scope: - # Fully connected layer will have 384 outputs. - full_weight1 = truncated_normal_var(name='full_mult1', shape=[reshaped_dim, 384], dtype=tf.float32) - full_bias1 = zero_var(name='full_bias1', shape=[384], dtype=tf.float32) - full_layer1 = tf.nn.relu(tf.add(tf.matmul(reshaped_output, full_weight1), full_bias1)) - - # Second Fully Connected Layer - with tf.variable_scope('full2') as scope: - # Second fully connected layer has 192 outputs. - full_weight2 = truncated_normal_var(name='full_mult2', shape=[384, 192], dtype=tf.float32) - full_bias2 = zero_var(name='full_bias2', shape=[192], dtype=tf.float32) - full_layer2 = tf.nn.relu(tf.add(tf.matmul(full_layer1, full_weight2), full_bias2)) - - # Final Fully Connected Layer -> 10 categories for output (num_targets) - with tf.variable_scope('full3') as scope: - # Final fully connected layer has 10 (num_targets) outputs. - full_weight3 = truncated_normal_var(name='full_mult3', shape=[192, num_targets], dtype=tf.float32) - full_bias3 = zero_var(name='full_bias3', shape=[num_targets], dtype=tf.float32) - final_output = tf.add(tf.matmul(full_layer2, full_weight3), full_bias3) - - return final_output - - -# Loss function -def cifar_loss(logits, targets): - # Get rid of extra dimensions and cast targets into integers - targets = tf.squeeze(tf.cast(targets, tf.int32)) - # Calculate cross entropy from logits and targets - cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=targets) - # Take the average loss across batch size - cross_entropy_mean = tf.reduce_mean(cross_entropy, name='cross_entropy') - return cross_entropy_mean - - -# Train step -def train_step(loss_value, generation_num): - # Our learning rate is an exponential decay after we wait a fair number of generations - model_learning_rate = tf.train.exponential_decay(learning_rate, generation_num, - num_gens_to_wait, lr_decay, staircase=True) - # Create optimizer - my_optimizer = tf.train.GradientDescentOptimizer(model_learning_rate) - # Initialize train step - train_step = my_optimizer.minimize(loss_value) - return train_step - - -# Accuracy function -def accuracy_of_batch(logits, targets): - # Make sure targets are integers and drop extra dimensions - targets = tf.squeeze(tf.cast(targets, tf.int32)) - # Get predicted values by finding which logit is the greatest - batch_predictions = tf.cast(tf.argmax(logits, 1), tf.int32) - # Check if they are equal across the batch - predicted_correctly = tf.equal(batch_predictions, targets) - # Average the 1's and 0's (True's and False's) across the batch size - accuracy = tf.reduce_mean(tf.cast(predicted_correctly, tf.float32)) - return accuracy - -# Get data -print('Getting/Transforming Data.') -# Initialize the data pipeline -images, targets = input_pipeline(batch_size, train_logical=True) -# Get batch test images and targets from pipline -test_images, test_targets = input_pipeline(batch_size, train_logical=False) - -# Declare Model -print('Creating the CIFAR10 Model.') -with tf.variable_scope('model_definition') as scope: - # Declare the training network model - model_output = cifar_cnn_model(images, batch_size) - # This is very important!!! We must set the scope to REUSE the variables, - # otherwise, when we set the test network model, it will create new random - # variables. Otherwise we get random evaluations on the test batches. - scope.reuse_variables() - test_output = cifar_cnn_model(test_images, batch_size) - -# Declare loss function -print('Declare Loss Function.') -loss = cifar_loss(model_output, targets) - -# Create accuracy function -accuracy = accuracy_of_batch(test_output, test_targets) - -# Create training operations -print('Creating the Training Operation.') -generation_num = tf.Variable(0, trainable=False) -train_op = train_step(loss, generation_num) - -# Initialize Variables -print('Initializing the Variables.') -init = tf.global_variables_initializer() -sess.run(init) - -# Initialize queue (This queue will feed into the model, so no placeholders necessary) -tf.train.start_queue_runners(sess=sess) - -# Train CIFAR Model -print('Starting Training') -train_loss = [] -test_accuracy = [] -for i in range(generations): - _, loss_value = sess.run([train_op, loss]) - - if (i+1) % output_every == 0: - train_loss.append(loss_value) - output = 'Generation {}: Loss = {:.5f}'.format((i+1), loss_value) - print(output) - - if (i+1) % eval_every == 0: - [temp_accuracy] = sess.run([accuracy]) - test_accuracy.append(temp_accuracy) - acc_output = ' --- Test Accuracy = {:.2f}%.'.format(100.*temp_accuracy) - print(acc_output) - -# Print loss and accuracy -# Matlotlib code to plot the loss and accuracies -eval_indices = range(0, generations, eval_every) -output_indices = range(0, generations, output_every) - -# Plot loss over time -plt.plot(output_indices, train_loss, 'k-') -plt.title('Softmax Loss per Generation') -plt.xlabel('Generation') -plt.ylabel('Softmax Loss') -plt.show() - -# Plot accuracy over time -plt.plot(eval_indices, test_accuracy, 'k-') -plt.title('Test Accuracy') -plt.xlabel('Generation') -plt.ylabel('Accuracy') -plt.show() -``` - -## How to fine-tune current CNN architectures? - -The purpose of the script provided in this section is to download the CIFAR-10 data and sort it out in the proper folder structure for running it through the TensorFlow fine-tuning tutorial. The script should create the following folder structure. - -```{code-cell} --train_dir - |--airplane - |--automobile - |--bird - |--cat - |--deer - |--dog - |--frog - |--horse - |--ship - |--truck --validation_dir - |--airplane - |--automobile - |--bird - |--cat - |--deer - |--dog - |--frog - |--horse - |--ship - |--truck -``` - -### Code - -```{code-cell} -# In this script, we download the CIFAR-10 images and -# transform/save them in the Inception Retraining Format -# -# The end purpose of the files is for re-training the -# Google Inception tensorflow model to work on the CIFAR-10. - -import os -import tarfile -import _pickle as cPickle -import numpy as np -import urllib.request -import scipy.misc -from tensorflow.python.framework import ops -ops.reset_default_graph() - -cifar_link = 'https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz' -data_dir = 'temp' -if not os.path.isdir(data_dir): - os.makedirs(data_dir) - -# Download tar file -target_file = os.path.join(data_dir, 'cifar-10-python.tar.gz') -if not os.path.isfile(target_file): - print('CIFAR-10 file not found. Downloading CIFAR data (Size = 163MB)') - print('This may take a few minutes, please wait.') - filename, headers = urllib.request.urlretrieve(cifar_link, target_file) - -# Extract into memory -tar = tarfile.open(target_file) -tar.extractall(path=data_dir) -tar.close() -objects = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] - -# Create train image folders -train_folder = 'train_dir' -if not os.path.isdir(os.path.join(data_dir, train_folder)): - for i in range(10): - folder = os.path.join(data_dir, train_folder, objects[i]) - os.makedirs(folder) -# Create test image folders -test_folder = 'validation_dir' -if not os.path.isdir(os.path.join(data_dir, test_folder)): - for i in range(10): - folder = os.path.join(data_dir, test_folder, objects[i]) - os.makedirs(folder) - -# Extract images accordingly -data_location = os.path.join(data_dir, 'cifar-10-batches-py') -train_names = ['data_batch_' + str(x) for x in range(1,6)] -test_names = ['test_batch'] - - -def load_batch_from_file(file): - file_conn = open(file, 'rb') - image_dictionary = cPickle.load(file_conn, encoding='latin1') - file_conn.close() - return image_dictionary - - -def save_images_from_dict(image_dict, folder='data_dir'): - # image_dict.keys() = 'labels', 'filenames', 'data', 'batch_label' - for ix, label in enumerate(image_dict['labels']): - folder_path = os.path.join(data_dir, folder, objects[label]) - filename = image_dict['filenames'][ix] - #Transform image data - image_array = image_dict['data'][ix] - image_array.resize([3, 32, 32]) - # Save image - output_location = os.path.join(folder_path, filename) - scipy.misc.imsave(output_location,image_array.transpose()) - -# Sort train images -for file in train_names: - print('Saving images from file: {}'.format(file)) - file_location = os.path.join(data_dir, 'cifar-10-batches-py', file) - image_dict = load_batch_from_file(file_location) - save_images_from_dict(image_dict, folder=train_folder) - -# Sort test images -for file in test_names: - print('Saving images from file: {}'.format(file)) - file_location = os.path.join(data_dir, 'cifar-10-batches-py', file) - image_dict = load_batch_from_file(file_location) - save_images_from_dict(image_dict, folder=test_folder) - -# Create labels file -cifar_labels_file = os.path.join(data_dir,'cifar10_labels.txt') -print('Writing labels file, {}'.format(cifar_labels_file)) -with open(cifar_labels_file, 'w') as labels_file: - for item in objects: - labels_file.write("{}\n".format(item)) -``` - -## Stylenet / Neural-Style - -The purpose of this script is to illustrate how to do stylenet in TensorFlow. We reference the following [paper](https://arxiv.org/abs/1508.06576) for this algorithm. - -But there is some prerequisites, - -- Download the `VGG-verydeep-19.mat` file. -- You must download two images, a style image and a content image for the algorithm to blend. - -The style image is - -:::{figure-md} starry_night-dl - - -Style image: starry night -::: - -The context image is below. - -:::{figure-md} book_cover-dl - - -Content image: book cover -::: - -The final result looks like - -:::{figure-md} 05_stylenet_ex-dl - - -stylenet final result -::: - -### Code - -```{code-cell} -# We use two images, an original image and a style image -# and try to make the original image in the style of the style image. -# -# Reference paper: -# https://arxiv.org/abs/1508.06576 -# -# Need to download the model 'imagenet-vgg-verydee-19.mat' from: -# http://www.vlfeat.org/matconvnet/models/beta16/imagenet-vgg-verydeep-19.mat - -import os -import scipy.io -import scipy.misc -import imageio -from skimage.transform import resize -from operator import mul -from functools import reduce -import numpy as np -import tensorflow as tf -from tensorflow.python.framework import ops -ops.reset_default_graph() - -# Image Files -original_image_file = 'images/book_cover.jpg' -style_image_file = 'images/starry_night.jpg' - -# Saved VGG Network path under the current project dir. -vgg_path = 'imagenet-vgg-verydeep-19.mat' - -# Default Arguments -original_image_weight = 5.0 -style_image_weight = 500.0 -regularization_weight = 100 -learning_rate = 10 -generations = 100 -output_generations = 25 -beta1 = 0.9 -beta2 = 0.999 - -# Read in images -original_image = imageio.imread(original_image_file) -style_image = imageio.imread(style_image_file) - -# Get shape of target and make the style image the same -target_shape = original_image.shape -style_image = resize(style_image, target_shape) - -# VGG-19 Layer Setup -# From paper -vgg_layers = ['conv1_1', 'relu1_1', - 'conv1_2', 'relu1_2', 'pool1', - 'conv2_1', 'relu2_1', - 'conv2_2', 'relu2_2', 'pool2', - 'conv3_1', 'relu3_1', - 'conv3_2', 'relu3_2', - 'conv3_3', 'relu3_3', - 'conv3_4', 'relu3_4', 'pool3', - 'conv4_1', 'relu4_1', - 'conv4_2', 'relu4_2', - 'conv4_3', 'relu4_3', - 'conv4_4', 'relu4_4', 'pool4', - 'conv5_1', 'relu5_1', - 'conv5_2', 'relu5_2', - 'conv5_3', 'relu5_3', - 'conv5_4', 'relu5_4'] - - -# Extract weights and matrix means -def extract_net_info(path_to_params): - vgg_data = scipy.io.loadmat(path_to_params) - normalization_matrix = vgg_data['normalization'][0][0][0] - mat_mean = np.mean(normalization_matrix, axis=(0,1)) - network_weights = vgg_data['layers'][0] - return mat_mean, network_weights - - -# Create the VGG-19 Network -def vgg_network(network_weights, init_image): - network = {} - image = init_image - - for i, layer in enumerate(vgg_layers): - if layer[0] == 'c': - weights, bias = network_weights[i][0][0][0][0] - weights = np.transpose(weights, (1, 0, 2, 3)) - bias = bias.reshape(-1) - conv_layer = tf.nn.conv2d(image, tf.constant(weights), (1, 1, 1, 1), 'SAME') - image = tf.nn.bias_add(conv_layer, bias) - elif layer[0] == 'r': - image = tf.nn.relu(image) - else: # pooling - image = tf.nn.max_pool(image, (1, 2, 2, 1), (1, 2, 2, 1), 'SAME') - network[layer] = image - return network - -# Here we define which layers apply to the original or style image -original_layers = ['relu4_2', 'relu5_2'] -style_layers = ['relu1_1', 'relu2_1', 'relu3_1', 'relu4_1', 'relu5_1'] - -# Get network parameters -normalization_mean, network_weights = extract_net_info(vgg_path) - -shape = (1,) + original_image.shape -style_shape = (1,) + style_image.shape -original_features = {} -style_features = {} - -# Set style weights -style_weights = {l: 1./(len(style_layers)) for l in style_layers} - -# Computer feature layers with original image -g_original = tf.Graph() -with g_original.as_default(), tf.Session() as sess1: - image = tf.placeholder('float', shape=shape) - vgg_net = vgg_network(network_weights, image) - original_minus_mean = original_image - normalization_mean - original_norm = np.array([original_minus_mean]) - for layer in original_layers: - original_features[layer] = vgg_net[layer].eval(feed_dict={image: original_norm}) - -# Get style image network -g_style = tf.Graph() -with g_style.as_default(), tf.Session() as sess2: - image = tf.placeholder('float', shape=style_shape) - vgg_net = vgg_network(network_weights, image) - style_minus_mean = style_image - normalization_mean - style_norm = np.array([style_minus_mean]) - for layer in style_layers: - features = vgg_net[layer].eval(feed_dict={image: style_norm}) - features = np.reshape(features, (-1, features.shape[3])) - gram = np.matmul(features.T, features) / features.size - style_features[layer] = gram - -# Make Combined Image via loss function -with tf.Graph().as_default(): - # Get network parameters - initial = tf.random_normal(shape) * 0.256 - init_image = tf.Variable(initial) - vgg_net = vgg_network(network_weights, init_image) - - # Loss from Original Image - original_layers_w = {'relu4_2': 0.5, 'relu5_2': 0.5} - original_loss = 0 - for o_layer in original_layers: - temp_original_loss = original_layers_w[o_layer] * original_image_weight *\ - (2 * tf.nn.l2_loss(vgg_net[o_layer] - original_features[o_layer])) - original_loss += (temp_original_loss / original_features[o_layer].size) - - # Loss from Style Image - style_loss = 0 - style_losses = [] - for style_layer in style_layers: - layer = vgg_net[style_layer] - feats, height, width, channels = [x.value for x in layer.get_shape()] - size = height * width * channels - features = tf.reshape(layer, (-1, channels)) - style_gram_matrix = tf.matmul(tf.transpose(features), features) / size - style_expected = style_features[style_layer] - style_losses.append(style_weights[style_layer] * 2 * - tf.nn.l2_loss(style_gram_matrix - style_expected) / - style_expected.size) - style_loss += style_image_weight * tf.reduce_sum(style_losses) - - # To Smooth the results, we add in total variation loss - total_var_x = reduce(mul, init_image[:, 1:, :, :].get_shape().as_list(), 1) - total_var_y = reduce(mul, init_image[:, :, 1:, :].get_shape().as_list(), 1) - first_term = regularization_weight * 2 - second_term_numerator = tf.nn.l2_loss(init_image[:, 1:, :, :] - init_image[:, :shape[1]-1, :, :]) - second_term = second_term_numerator / total_var_y - third_term = (tf.nn.l2_loss(init_image[:, :, 1:, :] - init_image[:, :, :shape[2]-1, :]) / total_var_x) - total_variation_loss = first_term * (second_term + third_term) - - # Combined Loss - loss = original_loss + style_loss + total_variation_loss - - # Declare Optimization Algorithm - optimizer = tf.train.AdamOptimizer(learning_rate, beta1, beta2) - train_step = optimizer.minimize(loss) - - # Initialize variables and start training - with tf.Session() as sess: - tf.global_variables_initializer().run() - for i in range(generations): - - train_step.run() - - # Print update and save temporary output - if (i+1) % output_generations == 0: - print('Generation {} out of {}, loss: {}'.format(i + 1, generations,sess.run(loss))) - image_eval = init_image.eval() - best_image_add_mean = image_eval.reshape(shape[1:]) + normalization_mean - output_file = 'temp_output_{}.jpg'.format(i) - imageio.imwrite(output_file, best_image_add_mean) - - - # Save final image - image_eval = init_image.eval() - best_image_add_mean = image_eval.reshape(shape[1:]) + normalization_mean - output_file = 'final_output.jpg' - scipy.misc.imsave(output_file, best_image_add_mean) -``` - -## Deepdream in TensorFlow -Note: There is no new code in this script. It originates from the TensorFlow tutorial located here. However, this code is modified slightly to run on Python 3. The code is also commented very heavily to explain, line-by-line, what occurs in the deepdream demo. - -Here are some potential outputs. - -:::{figure-md} 06_deepdream_ex-dl - - -Deepdream outputs -::: - -### Code - -```{code-cell} -# Using TensorFlow for Deep Dream -#--------------------------------------- -# From: Alexander Mordvintsev -# --https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/tutorials/deepdream -# -# Make sure to download the deep dream model here: -# https://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip -# -# Run: -# me@computer:~$ wget https://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip -# me@computer:~$ unzip inception5h.zip -# -# More comments added inline. - - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import os -import matplotlib.pyplot as plt -import numpy as np -import PIL.Image -import tensorflow as tf -from io import BytesIO -from tensorflow.python.framework import ops -ops.reset_default_graph() - -# Start a graph session -graph = tf.Graph() -sess = tf.InteractiveSession(graph=graph) - -os.chdir('~/Documents/tensorflow/inception-v1-model/') - -# Model filename -model_fn = 'tensorflow_inception_graph.pb' - -# Load graph parameters -with tf.gfile.FastGFile(model_fn, 'rb') as f: - graph_def = tf.GraphDef() - graph_def.ParseFromString(f.read()) - -# Create placeholder for input -t_input = tf.placeholder(np.float32, name='input') - -# Imagenet average bias to subtract off images -imagenet_mean = 117.0 -t_preprocessed = tf.expand_dims(t_input-imagenet_mean, 0) -tf.import_graph_def(graph_def, {'input':t_preprocessed}) - -# Create a list of layers that we can refer to later -layers = [op.name for op in graph.get_operations() if op.type=='Conv2D' and 'import/' in op.name] - -# Count how many outputs for each layer -feature_nums = [int(graph.get_tensor_by_name(name+':0').get_shape()[-1]) for name in layers] - -# Print count of layers and outputs (features nodes) -print('Number of layers', len(layers)) -print('Total number of feature channels:', sum(feature_nums)) - -# Picking some internal layer. Note that we use outputs before applying the ReLU nonlinearity -# to have non-zero gradients for features with negative initial activations. -layer = 'mixed4d_3x3_bottleneck_pre_relu' -channel = 30 # picking some feature channel to visualize - -# start with a gray image with a little noise -img_noise = np.random.uniform(size=(224,224,3)) + 100.0 - -def showarray(a, fmt='jpeg'): - # First make sure everything is between 0 and 255 - a = np.uint8(np.clip(a, 0, 1)*255) - # Pick an in-memory format for image display - f = BytesIO() - # Create the in memory image - PIL.Image.fromarray(a).save(f, fmt) - # Show image - plt.imshow(a) - - -def T(layer): - '''Helper for getting layer output tensor''' - return graph.get_tensor_by_name("import/%s:0"%layer) - - -# The following function returns a function wrapper that will create the placeholder -# inputs of a specified dtype -def tffunc(*argtypes): - '''Helper that transforms TF-graph generating function into a regular one. - See "resize" function below. - ''' - placeholders = list(map(tf.placeholder, argtypes)) - def wrap(f): - out = f(*placeholders) - def wrapper(*args, **kw): - return out.eval(dict(zip(placeholders, args)), session=kw.get('session')) - return wrapper - return wrap - - -# Helper function that uses TF to resize an image -def resize(img, size): - img = tf.expand_dims(img, 0) - # Change 'img' size by linear interpolation - return tf.image.resize_bilinear(img, size)[0, :, :, :] - - -def calc_grad_tiled(img, t_grad, tile_size=512): - '''Compute the value of tensor t_grad over the image in a tiled way. - Random shifts are applied to the image to blur tile boundaries over - multiple iterations.''' - # Pick a subregion square size - sz = tile_size - # Get the image height and width - h, w = img.shape[:2] - # Get a random shift amount in the x and y direction - sx, sy = np.random.randint(sz, size=2) - # Randomly shift the image (roll image) in the x and y directions - img_shift = np.roll(np.roll(img, sx, 1), sy, 0) - # Initialize the while image gradient as zeros - grad = np.zeros_like(img) - # Now we loop through all the sub-tiles in the image - for y in range(0, max(h-sz//2, sz),sz): - for x in range(0, max(w-sz//2, sz),sz): - # Select the sub image tile - sub = img_shift[y:y+sz,x:x+sz] - # Calculate the gradient for the tile - g = sess.run(t_grad, {t_input:sub}) - # Apply the gradient of the tile to the whole image gradient - grad[y:y+sz,x:x+sz] = g - # Return the gradient, undoing the roll operation - return np.roll(np.roll(grad, -sx, 1), -sy, 0) - -def render_deepdream(t_obj, img0=img_noise, - iter_n=10, step=1.5, octave_n=4, octave_scale=1.4): - # defining the optimization objective, the objective is the mean of the feature - t_score = tf.reduce_mean(t_obj) - # Our gradients will be defined as changing the t_input to get closer to - # the values of t_score. Here, t_score is the mean of the feature we select, - # and t_input will be the image octave (starting with the last) - t_grad = tf.gradients(t_score, t_input)[0] # behold the power of automatic differentiation! - - # Store the image - img = img0 - # Initialize the octave list - octaves = [] - # Since we stored the image, we need to only calculate n-1 octaves - for i in range(octave_n-1): - # Extract the image shape - hw = img.shape[:2] - # Resize the image, scale by the octave_scale (resize by linear interpolation) - lo = resize(img, np.int32(np.float32(hw)/octave_scale)) - # Residual is hi. Where residual = image - (Resize lo to be hw-shape) - hi = img-resize(lo, hw) - # Save the lo image for re-iterating - img = lo - # Save the extracted hi-image - octaves.append(hi) - - # generate details octave by octave - for octave in range(octave_n): - if octave>0: - # Start with the last octave - hi = octaves[-octave] - # - img = resize(img, hi.shape[:2])+hi - for i in range(iter_n): - # Calculate gradient of the image. - g = calc_grad_tiled(img, t_grad) - # Ideally, we would just add the gradient, g, but - # we want do a forward step size of it ('step'), - # and divide it by the avg. norm of the gradient, so - # we are adding a gradient of a certain size each step. - # Also, to make sure we aren't dividing by zero, we add 1e-7. - img += g*(step / (np.abs(g).mean()+1e-7)) - print('.',end = ' ') - showarray(img/255.0) - -# Run Deep Dream -if __name__=="__main__": - # Create resize function that has a wrapper that creates specified placeholder types - resize = tffunc(np.float32, np.int32)(resize) - - # Open image - img0 = PIL.Image.open('book_cover.jpg') - img0 = np.float32(img0) - # Show Original Image - showarray(img0/255.0) - - # Create deep dream - render_deepdream(T(layer)[:, :, :, channel], img0, iter_n=15) - - sess.close() -``` - -## Your turn! 🚀 - -TBD. - -## Self study - -You can refer to those YouTube videos for further study: - -- [Convolutional Neural Networks (CNNs) explained, by deeplizard](https://www.youtube.com/watch?v=YRhxdVk_sIs) -- [Convolutional Neural Networks Explained (CNN Visualized), by Futurology](https://www.youtube.com/watch?v=pj9-rr1wDhM) - -### Research trend - -State of the Art Convolutional Neural Networks (CNNs) Explained | Deep Learning in 2020: - -
- -
- -## Acknowledgments - -Thanks to [Nick](https://github.com/nfmcclure) for creating the open-source course [tensorflow_cookbook](https://github.com/nfmcclure/tensorflow_cookbook). It inspires the majority of the content in this chapter. - ---- - -```{bibliography} -:filter: docname in docnames -``` From 7bfc187a6427a8b112f5ce647bf52d1c445a6990 Mon Sep 17 00:00:00 2001 From: Xu Senbo <1170676717@qq.com> Date: Sun, 19 Nov 2023 21:31:03 +0800 Subject: [PATCH 2/9] Add new acknowledgment --- open-machine-learning-jupyter-book/deep-learning/cnn.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/open-machine-learning-jupyter-book/deep-learning/cnn.ipynb b/open-machine-learning-jupyter-book/deep-learning/cnn.ipynb index 26a91bb947..0d19684a5a 100644 --- a/open-machine-learning-jupyter-book/deep-learning/cnn.ipynb +++ b/open-machine-learning-jupyter-book/deep-learning/cnn.ipynb @@ -1186,7 +1186,7 @@ "source": [ "## Acknowledgments\n", "\n", - "Thanks to [Nick](https://github.com/nfmcclure) for creating the open-source course [tensorflow_cookbook](https://github.com/nfmcclure/tensorflow_cookbook). It inspires the majority of the content in this chapter.\n", + "Thanks to [Nick](https://github.com/nfmcclure) for creating the open-source course [tensorflow_cookbook](https://github.com/nfmcclure/tensorflow_cookbook). And thanks to [TensorFlow](https://www.tensorflow.org/) for creating the open source project [DeepDream](https://www.tensorflow.org/tutorials/generative/deepdream) It inspires the majority of the content in this chapter.\n", "\n", "---\n" ] From c00138fdd7e76d017dbfd1d6810f8611a3c8bb97 Mon Sep 17 00:00:00 2001 From: Xu Senbo <1170676717@qq.com> Date: Sun, 19 Nov 2023 21:55:08 +0800 Subject: [PATCH 3/9] Fix figure problem --- .../deep-learning/autoencoder.ipynb | 12 ++++++++++++ .../deep-learning/cnn.ipynb | 8 ++++++++ 2 files changed, 20 insertions(+) diff --git a/open-machine-learning-jupyter-book/deep-learning/autoencoder.ipynb b/open-machine-learning-jupyter-book/deep-learning/autoencoder.ipynb index d8facd45fa..00d8a8725a 100644 --- a/open-machine-learning-jupyter-book/deep-learning/autoencoder.ipynb +++ b/open-machine-learning-jupyter-book/deep-learning/autoencoder.ipynb @@ -65,6 +65,8 @@ ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/autoencoder/01_PCA1.png\n", "---\n", "name: Illustration of PCA\n", + "---\n", + "PCA 1\n", ":::\n", "\n", "- Transform features onto directions of maximum variance\n", @@ -72,6 +74,8 @@ ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/autoencoder/02_PCA2.png\n", "---\n", "name: Illustration of PCA\n", + "---\n", + "PCA 2\n", ":::\n", "\n", "- Usually consider a subset of vectors of most variance (dimensionality reduction)\n", @@ -79,6 +83,8 @@ ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/autoencoder/03_PCA3.png\n", "---\n", "name: Illustration of PCA\n", + "---\n", + "PCA 3\n", ":::" ] }, @@ -93,6 +99,8 @@ ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/autoencoder/04_simple.png\n", "---\n", "name: Illustration of Fully Connected autoencoder\n", + "---\n", + "Simple\n", ":::\n", "\n", ":::{note}\n", @@ -119,6 +127,8 @@ ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/autoencoder/05_diff_conv.png\n", "---\n", "name: Difference between regular and transposed convolution\n", + "---\n", + "diff\n", ":::\n", "\n", "In transposed convolutions, we stride over the output; hence, larger strides will result in larger outputs (opposite to regular convolutions); and we pad the output; hence, larger padding will result in smaller output maps.\n", @@ -128,6 +138,8 @@ ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/autoencoder/06_convmodel.png\n", "---\n", "name: Structure of convoluted autoencoder\n", + "---\n", + "Structure\n", ":::\n", "\n", ":::{note}\n", diff --git a/open-machine-learning-jupyter-book/deep-learning/cnn.ipynb b/open-machine-learning-jupyter-book/deep-learning/cnn.ipynb index 0d19684a5a..2cba0d26b6 100644 --- a/open-machine-learning-jupyter-book/deep-learning/cnn.ipynb +++ b/open-machine-learning-jupyter-book/deep-learning/cnn.ipynb @@ -617,6 +617,8 @@ ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/CNN/starry_night.jpg\n", "---\n", "name: Style image: starry night\n", + "---\n", + "starry night\n", ":::\n", "\n", "The context image is below.\n", @@ -624,6 +626,8 @@ ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/CNN/book_cover.jpg\n", "---\n", "name: Content image: book cover\n", + "---\n", + "book cover\n", ":::\n", "\n", "The final result looks like\n", @@ -631,6 +635,8 @@ ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/CNN/05_stylenet_ex.png\n", "---\n", "name: stylenet final result\n", + "---\n", + "stylenet\n", ":::\n" ] }, @@ -884,6 +890,8 @@ ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/CNN/06_deepdream_ex.png\n", "---\n", "name: Deepdream outputs\n", + "---\n", + "deepdream\n", ":::" ] }, From 08f671ab623f456772904602f2b0ea923829051f Mon Sep 17 00:00:00 2001 From: Xu Senbo <1170676717@qq.com> Date: Sun, 19 Nov 2023 22:10:22 +0800 Subject: [PATCH 4/9] delete : --- open-machine-learning-jupyter-book/deep-learning/cnn.ipynb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/open-machine-learning-jupyter-book/deep-learning/cnn.ipynb b/open-machine-learning-jupyter-book/deep-learning/cnn.ipynb index 2cba0d26b6..0a981fb9e0 100644 --- a/open-machine-learning-jupyter-book/deep-learning/cnn.ipynb +++ b/open-machine-learning-jupyter-book/deep-learning/cnn.ipynb @@ -616,7 +616,7 @@ "\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/CNN/starry_night.jpg\n", "---\n", - "name: Style image: starry night\n", + "name: Style image starry night\n", "---\n", "starry night\n", ":::\n", @@ -625,7 +625,7 @@ "\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/CNN/book_cover.jpg\n", "---\n", - "name: Content image: book cover\n", + "name: Content image book cover\n", "---\n", "book cover\n", ":::\n", From d0b95b481eac2f0851bdfb155414beb4bee1910e Mon Sep 17 00:00:00 2001 From: Xu Senbo <1170676717@qq.com> Date: Sun, 19 Nov 2023 22:23:19 +0800 Subject: [PATCH 5/9] delete --- --- open-machine-learning-jupyter-book/deep-learning/cnn.ipynb | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/open-machine-learning-jupyter-book/deep-learning/cnn.ipynb b/open-machine-learning-jupyter-book/deep-learning/cnn.ipynb index 0a981fb9e0..c7f686ace0 100644 --- a/open-machine-learning-jupyter-book/deep-learning/cnn.ipynb +++ b/open-machine-learning-jupyter-book/deep-learning/cnn.ipynb @@ -1194,9 +1194,7 @@ "source": [ "## Acknowledgments\n", "\n", - "Thanks to [Nick](https://github.com/nfmcclure) for creating the open-source course [tensorflow_cookbook](https://github.com/nfmcclure/tensorflow_cookbook). And thanks to [TensorFlow](https://www.tensorflow.org/) for creating the open source project [DeepDream](https://www.tensorflow.org/tutorials/generative/deepdream) It inspires the majority of the content in this chapter.\n", - "\n", - "---\n" + "Thanks to [Nick](https://github.com/nfmcclure) for creating the open-source course [tensorflow_cookbook](https://github.com/nfmcclure/tensorflow_cookbook). And thanks to [TensorFlow](https://www.tensorflow.org/) for creating the open source project [DeepDream](https://www.tensorflow.org/tutorials/generative/deepdream) It inspires the majority of the content in this chapter.\n" ] } ], From 71c774162b5ca02123219ab089ae749089dde7db Mon Sep 17 00:00:00 2001 From: Xu Senbo <1170676717@qq.com> Date: Sun, 19 Nov 2023 22:39:51 +0800 Subject: [PATCH 6/9] Modify toc.yml --- open-machine-learning-jupyter-book/_toc.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/open-machine-learning-jupyter-book/_toc.yml b/open-machine-learning-jupyter-book/_toc.yml index f5539a076a..cac0eebdbd 100644 --- a/open-machine-learning-jupyter-book/_toc.yml +++ b/open-machine-learning-jupyter-book/_toc.yml @@ -92,7 +92,7 @@ parts: - file: deep-learning/cnn - file: deep-learning/gan.md - file: deep-learning/rnn.md - - file: deep-learning/autoencoder.md + - file: deep-learning/autoencoder.ipynb - file: deep-learning/lstm.ipynb - file: deep-learning/time-series.ipynb - file: deep-learning/dqn.ipynb From ec45ba9fe2e3028aedfed61732bbef99207915f7 Mon Sep 17 00:00:00 2001 From: Xu Senbo <86239038+bestfw@users.noreply.github.com> Date: Sun, 19 Nov 2023 22:44:32 +0800 Subject: [PATCH 7/9] Update _toc.yml --- open-machine-learning-jupyter-book/_toc.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/open-machine-learning-jupyter-book/_toc.yml b/open-machine-learning-jupyter-book/_toc.yml index 622058792f..3bc0b95cda 100644 --- a/open-machine-learning-jupyter-book/_toc.yml +++ b/open-machine-learning-jupyter-book/_toc.yml @@ -91,7 +91,7 @@ parts: - file: deep-learning/dl-overview - file: deep-learning/cnn - file: deep-learning/gan.md - - file: deep-learning/rnn.md + - file: deep-learning/rnn.ipynb - file: deep-learning/autoencoder.ipynb - file: deep-learning/lstm.ipynb - file: deep-learning/time-series.ipynb @@ -237,4 +237,4 @@ parts: - file: slides/ml-advanced/kernel-method - file: slides/ml-advanced/model-selection - file: slides/deep-learning/cnn - - file: slides/deep-learning/gan \ No newline at end of file + - file: slides/deep-learning/gan From 19d991b9c63f890c6d6ed899c706e706eada5780 Mon Sep 17 00:00:00 2001 From: Xu Senbo <1170676717@qq.com> Date: Sun, 19 Nov 2023 23:14:42 +0800 Subject: [PATCH 8/9] Fix error --- .../deep-learning/autoencoder.ipynb | 6 +++--- open-machine-learning-jupyter-book/deep-learning/cnn.ipynb | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/open-machine-learning-jupyter-book/deep-learning/autoencoder.ipynb b/open-machine-learning-jupyter-book/deep-learning/autoencoder.ipynb index 00d8a8725a..5158c8fed7 100644 --- a/open-machine-learning-jupyter-book/deep-learning/autoencoder.ipynb +++ b/open-machine-learning-jupyter-book/deep-learning/autoencoder.ipynb @@ -64,7 +64,7 @@ "\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/autoencoder/01_PCA1.png\n", "---\n", - "name: Illustration of PCA\n", + "name: Illustration of PCA 1\n", "---\n", "PCA 1\n", ":::\n", @@ -73,7 +73,7 @@ "\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/autoencoder/02_PCA2.png\n", "---\n", - "name: Illustration of PCA\n", + "name: Illustration of PCA 2\n", "---\n", "PCA 2\n", ":::\n", @@ -82,7 +82,7 @@ "\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/autoencoder/03_PCA3.png\n", "---\n", - "name: Illustration of PCA\n", + "name: Illustration of PCA 3\n", "---\n", "PCA 3\n", ":::" diff --git a/open-machine-learning-jupyter-book/deep-learning/cnn.ipynb b/open-machine-learning-jupyter-book/deep-learning/cnn.ipynb index c7f686ace0..52677dc7f1 100644 --- a/open-machine-learning-jupyter-book/deep-learning/cnn.ipynb +++ b/open-machine-learning-jupyter-book/deep-learning/cnn.ipynb @@ -84,7 +84,7 @@ "\n", "\n", "\n", - "After the max pooling, there is generally an activation layer. One of the more common activation layers is the ReLU (Rectified Linear Unit) {cite}`reluwiki`." + "After the max pooling, there is generally an activation layer. One of the more common activation layers is the ReLU (Rectified Linear Unit)." ] }, { From 5b404f1952be3b9b6e54b45ce3b9ae6541bfea76 Mon Sep 17 00:00:00 2001 From: Xu Senbo <1170676717@qq.com> Date: Sun, 19 Nov 2023 23:26:35 +0800 Subject: [PATCH 9/9] Fix error --- .../deep-learning/autoencoder.ipynb | 30 ++++--------------- .../deep-learning/cnn.ipynb | 12 -------- 2 files changed, 6 insertions(+), 36 deletions(-) diff --git a/open-machine-learning-jupyter-book/deep-learning/autoencoder.ipynb b/open-machine-learning-jupyter-book/deep-learning/autoencoder.ipynb index 5158c8fed7..77d09293b6 100644 --- a/open-machine-learning-jupyter-book/deep-learning/autoencoder.ipynb +++ b/open-machine-learning-jupyter-book/deep-learning/autoencoder.ipynb @@ -63,28 +63,19 @@ "- Find directions of maximum variance\n", "\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/autoencoder/01_PCA1.png\n", - "---\n", - "name: Illustration of PCA 1\n", - "---\n", - "PCA 1\n", + "Illustration of PCA\n", ":::\n", "\n", "- Transform features onto directions of maximum variance\n", "\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/autoencoder/02_PCA2.png\n", - "---\n", - "name: Illustration of PCA 2\n", - "---\n", - "PCA 2\n", + "Illustration of PCA\n", ":::\n", "\n", "- Usually consider a subset of vectors of most variance (dimensionality reduction)\n", "\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/autoencoder/03_PCA3.png\n", - "---\n", - "name: Illustration of PCA 3\n", - "---\n", - "PCA 3\n", + "Illustration of PCA\n", ":::" ] }, @@ -97,10 +88,7 @@ "Here is an example of a basic fully-connected autoencoder\n", "\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/autoencoder/04_simple.png\n", - "---\n", - "name: Illustration of Fully Connected autoencoder\n", - "---\n", - "Simple\n", + "Illustration of Fully Connected autoencoder\n", ":::\n", "\n", ":::{note}\n", @@ -125,10 +113,7 @@ "The difference between regular convolution and transposed convolution can be seen from the following image.\n", "\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/autoencoder/05_diff_conv.png\n", - "---\n", - "name: Difference between regular and transposed convolution\n", - "---\n", - "diff\n", + "Difference between regular and transposed convolution\n", ":::\n", "\n", "In transposed convolutions, we stride over the output; hence, larger strides will result in larger outputs (opposite to regular convolutions); and we pad the output; hence, larger padding will result in smaller output maps.\n", @@ -136,10 +121,7 @@ "So, the whole model consists of two parts, encoder and decoder, and they are composed with regular convolution and transposed convolution respectively.\n", "\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/autoencoder/06_convmodel.png\n", - "---\n", - "name: Structure of convoluted autoencoder\n", - "---\n", - "Structure\n", + "Structure of convoluted autoencoder\n", ":::\n", "\n", ":::{note}\n", diff --git a/open-machine-learning-jupyter-book/deep-learning/cnn.ipynb b/open-machine-learning-jupyter-book/deep-learning/cnn.ipynb index 52677dc7f1..b56a91363c 100644 --- a/open-machine-learning-jupyter-book/deep-learning/cnn.ipynb +++ b/open-machine-learning-jupyter-book/deep-learning/cnn.ipynb @@ -615,28 +615,19 @@ "The style image is\n", "\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/CNN/starry_night.jpg\n", - "---\n", "name: Style image starry night\n", - "---\n", - "starry night\n", ":::\n", "\n", "The context image is below.\n", "\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/CNN/book_cover.jpg\n", - "---\n", "name: Content image book cover\n", - "---\n", - "book cover\n", ":::\n", "\n", "The final result looks like\n", "\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/CNN/05_stylenet_ex.png\n", - "---\n", "name: stylenet final result\n", - "---\n", - "stylenet\n", ":::\n" ] }, @@ -888,10 +879,7 @@ "Here are some potential outputs.\n", "\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/CNN/06_deepdream_ex.png\n", - "---\n", "name: Deepdream outputs\n", - "---\n", - "deepdream\n", ":::" ] },