Skip to content

Commit

Permalink
Days 36-39 100DaysOfMLCode
Browse files Browse the repository at this point in the history
  • Loading branch information
sayalaruano committed Nov 15, 2021
1 parent 2d32912 commit 5cb102a
Show file tree
Hide file tree
Showing 12 changed files with 229 additions and 102,619 deletions.
File renamed without changes.
File renamed without changes.
File renamed without changes.
4,456 changes: 0 additions & 4,456 deletions Decision_trees_Ensemble_learning/Data/CreditScoring.csv

This file was deleted.

49,081 changes: 0 additions & 49,081 deletions Decision_trees_Ensemble_learning/Data/New_York_City_Airbnb_Open_Data.csv

This file was deleted.

59 changes: 59 additions & 0 deletions Neural_Networks/Notes/NotesDay37_mlzoomcamp_10thweek_1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# NotesDay37 - Machine Learning Zoomcamp tenth week

## 8.1 Description og the problem: Fashion classification

This session is the first one in which the input data is not tabular, instead we will work with images. The project of this session is a clothing classifier, which distinguish among ten types of clothes, so this is a multi-class classification task.

The fifth week of Machine Learning Zoomcamp is about
deployment of ML models. We will deploy the churn model prediction developed the last weeks as a web service. In general, we need to save the model from the Jupyter notebook and load it into a web service, for which we will use [Flask Python library](https://flask.palletsprojects.com/en/2.0.x/). Also, we will use [Pyenv](https://github.com/pyenv/pyenv) to create a Python environment to manage software dependencies, and [Docker](https://www.docker.com/products/docker-desktop) to create a container for handling system dependencies. Ultimately, we will deploy the container in the cloud with AWS EB.

The Jupyter notebook with code of the churning prediction model to be deployed is available [here](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/chapter-05-deployment/05-deploy.ipynb).

## 8.2 TensorFlow and Keras

We used [Pickle library](https://docs.python.org/3/library/pickle.html) to save and load a machine learning model. In general, pickle allows us to save Python objects. When we load a model, we need to guarantee that all the required libraries are installed in the working Python environment.

For training a machine learning model and making predictions multiple times, it is advisable to convert jupyter notebooks into Python scripts.

**Libraries, classes and methods:**

* `open(x, 'yb')` - open a binary file with the name assigned in the x string, which has permission y, which can be for writing ('w') or reading ('r'). Writing permission is used for creating files and reading is applied for loading files. A binary file contains bits instead of text.
* `pickle.dump(x, y)` - Pickle function to save a python object x into a y file.
* `x.close()` - close the x file. It is important to guarantee that the file contains the object saved with pickle.
* `with open(x, 'yb') as y:` - same as `open(x, 'yb')`, but in this case you guarantee that this file will be closed.
* `pickle.load(x)` - Pickle function to load a python object x.

The python scripts for training our model and making predictions are available [here](https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp/05-deployment/code).

## 8.3 Pre-trained convolutional neural networks

A web service is a method for communication between devices over a network. In general, users make requests with some information, and they receive a result processed by the web service.

We created a simple web service that receives a query with a ping address, and it replies with pong message. Then, we queried the ping/pong web service with `curl` and browser.

**Libraries, classes and methods:**

* `Flask('x')` - create a Flask object with x name.
* `@x.route('x', methods=['y'])` - add a declarator (additional utilities) that specifies the address of an x object, and a method y (i.e. GET, POST, etc) for accessing to it.
* `x.run(debug=True, host=x, port=y)` - run a x Flask object in the debug mode with an x host and a y port. This code should be in a main function.

The python script of ping/pong web service is available [here](https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp/05-deployment/code).

## 5.4 Serving the churn model with Flask

We created a churn service with our machine learning model, which will be available at the `/predict` address. In this way, other services can send requests with information about customers, and receive responses with churn predictions from the web service.

The method associated with the web service was `POST` because it was required to send some information about customers, which is not easy to do with `GET` method. The requests and responses are sent with JSON files, which are quite similar to Python dictionaries.

The `gunicorn` library helps us to prepare a model to be launched in production. This library is not supported in Windows because it needs some Unix dependencies. So, the alternative for Windows is `waitress` library.

**Libraries, classes and methods:**

* `request.get_json()` - Flask utility to obtain the body of a request as a Python dictionary.
* `jsonify()` - Flask utility to convert a Python dictionary into a tJSON file.
* `requests().post()` - method from `requests` library to perform a POST to a web service. The 200 code means that the process was successful.
* `requests().post().json()` - method from `requests` library to perform a POST to a web service and obtain the json response as a python dictionary.
* `gunicorn --bind x y:z` - library for running a model in production stage. The x is the host, y is the address' name, and z is the name of the object that will be launched.
* `waitress --listen x y:z` - library for running a model in production stage. The x is the host, y is the address' name, and z is the name of the object that will be launched.

The python script of the churning web service is available [here](https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp/05-deployment/code).
61 changes: 61 additions & 0 deletions Neural_Networks/Notes/NotesDay38_mlzoomcamp_10thweek_2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# NotesDay38 - Machine Learning Zoomcamp tenth week

## 8.4 Convolutional neural networks (CNN)

CNN are neural networks that are mostly used to work with images. In general, these neural networks are composed of three types of layers:

* **Convolutional layers:** consist of some filters or small images, which contain simple shapes. These filters are slid across the images, and we can see how similar the filters are to parts of images being slid. All similarity values between filters an parts of an image are saved on a **feature map**, in which higher values indicate higher similarity. Each filter has its own feature map, which can be treated as images. So, each convolutional layer has some filters, apply them to images, outputs feature maps, and these are passed to the other convolutional layers. Filters of the last convolutional layers become more complex than the ones of starting layers, which is done by combining filters of previous layers. Each convolutional layer can detect progressively more complex features, and the more layers we have it is possible to capture more complex features. The output of convolutional layers are vector representation of images, which have information of features extracted with all the filters from convolutional layers. Consequently, the aim of these layers is to extract vector representation og images.
* **Dense layers:** convert the vector representation of images, obtained from convolutional layers, to predictions. For binary classification tasks, we can apply the sigmoid activation function and obtain the probabilities for the two classes. If we are dealing with multi-class classification tasks, we can apply softmax activation function, which is the generalization of sigmoid function for multiple classes, obtain probabilities to all of the classes, and choose the higher one. These layers are called dense because each element of the input x is connected to each element of the output w, so this is a matrix multiplication between X and W. It is possible to have multiple dense layers.
* **Pooling layers:** convert a feature map into a smaller matrix. This is useful for doing a neural network smaller, and forcing it to have fewer parameters.

## 8.5 Transfer learning

In transfer learning, we have pre-trained neural networks with generic convolutional networks that we do not need to change, and dense layers that are specific to the training dataset, which we should replace. So, we will keep convolutional layers and train new dense layers of the pre-trained models. In this way, the most difficult part can be reused and we transfer this knowledge to a new model.

In multi-class classification tasks, the target variable is represented with one-hot encoding.

In this project, the base-model to extract feature representation of images was Xception, and we added some dense layers to classify the 10 categories of clothes creating our custom model.

To obtain the vector representation of images, we sliced the 3D representation matrix, obtained the average of each of them, and put all these values into a vector. This process is called as 2D- average pooling. We used a functional style to create the neural network, in which components of the model are used as functions with their proper parameters.

To train the neural networks models, we need some requirements, including an optimizer, the learning rate, and an objective function. The optimizer allows to find the best weights for the model considering a specific objective function to know when the algorithm reaches the optimum.

The raw values of dense lawyer before applying softmax activation function are called logits. It is prefered to maintain these values in dense layers for maintaining numerical stability.

An epoch is an iteration to train a model over the the entire dataset.

**Classes and methods:**

* `ImageDataGenerator(preprocessing_function=x)` - tensorflow.keras.preprocessing.image class to create a generator for reading images using an x preprocessing function.
* `ImageDataGenerator().flow_from_directory('x', target_size=(y,z), batch_size=w)` - ImageDataGenerator method to read images with y height and z width from an x directory, and a w batch size.
* `ImageDataGenerator().class_indices` - ImageDataGenerator method to list categories of the images.
* `Xception(weights='imagenet',include_top=False input_shape=(150, 150, 3))` - tensorflow.keras.applications.xception class to create a Xception pre-trained neural network trained with imagenet dataset, without dense layers, and that receives as input, images with (150, 150, 3) size.
* `Xception().trainable = False` - Xception method to specify that we want to freeze the convolutional layers during training of the model.
* `Xception(keras.Input(shape=(150, 150, 3)), training=False)` - specify the input size of the images for the model
* `keras.Model(inputs, outputs)` - keras method to put information about inputs and outputs.
* `keras.layers.GlobalAveragePooling2D()(base)` - add pooling layers to a base model for converting a 3D representation matrix to a vector representation of images.
* `keras.layers.Dense(10)(vectors)` - keras method to create dense layers for transforming vector representation of images into predictions.
* `keras.optimizers.Adam(learning_rate=x)` - keras method to create an optimizer class with an x learning rate.
* `keras.losses.CategoricalCrossentropy(from_logits=True)` - keras method to create a CategoricalCrossentropy class for evaluating if an optimizer reaches the optimum for a multi-class classification problem. It is recommended to change from_logits parameter to True because in this way the calculations are numerically stable.
* `model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])` - Xception method to compile the optimizer, loss function and performance metrics a model before training it.
* `model.fit(train_ds, epochs=10, validation_data=val_ds)` - Xception method to compile the optimizer, loss function and performance metrics a model before training it.

The Jupyter notebook with code for this session is available [here](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/08-deep-learning/notebook.ipynb).

## 8.6 Adjusting the learning rate

THe learning rate is the speed in which a model can learn. If this value is high, the model learning is superficial and is prone to overfitting, while if this value is low the learning process is slow and the model tend to underfitting. So, it is important to find the right balance for tuning this parameter.

To fin-tune the learning rate, we trained models with different values of this parameter, and plotted the performance of these models in training and validation datasets using learning curves.

The Jupyter notebook with code for this session is available [here](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/08-deep-learning/notebook.ipynb).

## 8.7 Checkpointing

Checkpointing is a way of saving a model after each iteration of the training, or when a model reach certain conditions. In keras, it is possible to implement checkpointing using callbacks.

**Classes and methods:**

* `keras.callbacks.ModelCheckpoint('xception_v1_{epoch:02d}_{val_accuracy:.3f}.h5', save_best_only=True, monitor='val_accuracy', mode='max')`- keras class to save the best models of all epochs during the training, maximizing the performance metric.

The Jupyter notebook with code for this session is available [here](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/08-deep-learning/notebook.ipynb).
70 changes: 70 additions & 0 deletions Neural_Networks/Notes/NotesDay39_mlzoomcamp_10thweek_3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
## 8.8 Adding more dense layers

It is possible to have more than one dense layer in a convolutional neural network. In this way, there is an intermediate processing of the vector representation of images before it goes to the output, and it can be useful to improve the prediction output.

In the neural network layers, we can apply an activation function hat transforms raw scores into probabilities. For the output layers, sigmoid and softmax are the most common choices, while for intermediate layers we can apply other activation functions such as RELU. The RELU activation function output is 0 if x value is lower or equal to 0, and 1 if this value is greater than 0.

In this session, the addition of an extra dense layer did not improve the model performance, so we will not add it into the neural network architecture.

**Classes and methods:**

* `keras.layers.Dense(x, activation='relu')(vectors)` - add an intermediate dense layer to the neural network model with an x size and the relu activation function.
* `watch nvidia-smi` - check he use of the GPU

The Jupyter notebook with code for this session is available [here](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/08-deep-learning/notebook.ipynb).

## 8.9 Regularization and dropout

We want that our neural network models focus on overall shape of the clothes instead of details like logos. One way to approach this problem is hiding parts of the images at each epoch, so at each iteration the model will see a slightly different version of the same image. Formally, this process is called dropout, and it refers to randomly hide or freeze a part of the input of models, particularly some parts of inner layers.

Thus, we regularized the inner layer with dropout, which means that we add some restrictions into our model to avoid overfitting. The drop rate corresponds to the amount of the network that we freeze.

**Classes and methods:**

* `keras.layers.Dropout(x)(inner)` - add regularization with dropout to the inner layer of a neural network with an x dropout rate.

The Jupyter notebook with code for this session is available [here](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/08-deep-learning/notebook.ipynb).

## 8.10 Data augmentation

Data augmentation is the process to generate new data from the available dataset. In the case of images, we can apply some transformations such as horizontal or vertical flipping, clockwise or counter clockwise rotation, horizontal or vertical shifting, pulling only one side of the image or shearing, zoom out or zoom in, adding a black patch on the images, ir a combination of the previous techniques.

To choose the augmentation techniques, we can use our own judgement, look at the dataset and identify variations of data on it, and tune these techniques as a hyperparameter of the model.

We applied augmentation techniques to training dataset, but not for the validation one because we need to compare our model performance with other that did not apply augmentation.

**Classes and methods:**

* `ImageDataGenerator( preprocessing_function=preprocess_input, vertical_flip=True,...)` - load images with keras class with the vertical flip data augmentation technique, and it is possible to add many more transformations of the images.

The Jupyter notebook with code for this session is available [here](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/08-deep-learning/notebook.ipynb).

## 8.11 Training a larger model

In this lesson, we trained a neural network wit larger images - 299x299. Because of the increase of images' size, the model ran slower than the previous one.

The Jupyter notebook with code for this session is available [here](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/08-deep-learning/notebook.ipynb).

## 8.12 Using the model

In this lesson, we loaded the best model trained in the last lesson, evaluated its performance, and used to obtain predictions on new images.

**Classes and methods:**

* `keras.models.load_model('x')` - keras method to load an x model.
* `model.evaluate(x)` - keras method to evaluate a model using an x test dataset.

The Jupyter notebook with code for this session is available [here](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/08-deep-learning/notebook.ipynb).

## 8.13 Summary

* We can use pre-trained models for general image classification.
* Convolutional layers let us turn an image into a vector.
* Dense layers use the vector to make the predictions.
* Instead of training a model from scratch, we can use transfer learning and re-use already trained convolutional layers.
* First, train a small model (150x150) before training a big one (299x299).
* Learning rate - how fast the model trains. Fast learners aren't always best ones..
* We can save the best model using callbacks and checkpointing.
* To avoid overfitting, use dropout and augmentation.

The Jupyter notebook with code for this session is available [here](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/08-deep-learning/notebook.ipynb).
Loading

0 comments on commit 5cb102a

Please sign in to comment.