In the first class we learned how the k-nearest neighbors algorithm (k-NN) works. KNN is a non-parametric classification, that can be used in machine learning to classify data based on how similar they are with the already known data.
The algorithm calculates the distance between the points in the space based on a previous defined parameter (k = number of neighbours) and if they are close, they will be definied with the same classification. The distance between the points used can be Manhattan or Euclidean.
In the second week of course, we learned about the importance of EDA (Exploratory Data Analysis) to data science and machine learning. In statistics, exploratory data analysis is an approach of analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods.
We're gonna be using EDA in almost every project of data science and machine learning, so is very important to have statistical knowledge and the ability to plot good graphics.
In the third class, we learned about the Naives Bayes classifier uses in machine learning. In statistics, naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem. We used it to see what are the chances of a message be spam or not, using the famous dataset "Sms Spam Collection". Also using this dataset, we learned a little about Natural language processing.
We reviewed the last classes and learned a little about encoders, like One Hot Encoder