510_advancedModelBuilding.Rmd

# Advanced Models

## Genetic Programming for Symbolic Regression

This technique is inspired by Darwin's evolution theory.

  + 1960s by I. Rechenberg in his work "Evolution strategies“
  + 1975 Genetic Algorithms (GAs) invented by J Holland and published in his book "Adaption in Natural and Artificial Systems“
  + 1992 J. Koza has used genetic algorithm to evolve programs to perform certain tasks. He called his method "genetic programming" 
 

Other reference for GP: Langdon WB, Poli R (2001) Foundations of Genetic Programming. Springer.

![](figures/gpEvolution.png)


  - Depending on the function set used and the function to be minimised, GP can generate almost any type of curve
  
  
  ![](figures/gp1.png)
  ![](figures/gp2.png)


  ![Evolutionary Algorithms](figures/evoAlg.png)

In R, we can use the "rgp" package

## Genetic Programming Example

### Load Data

```{r loadData, eval = FALSE}
# library(foreign)

#read data
# telecom1 <- read.table("./datasets/effortEstimation/Telecom1.csv", sep=",",header=TRUE, stringsAsFactors=FALSE, dec = ".") 
 
# size_telecom1 <- telecom1$size
# effort_telecom1 <- telecom1$effort

# chinaTrain <- read.arff("./datasets/effortEstimation/china3AttSelectedAFPTrain.arff")
# china_train_size <- chinaTrain$AFP 
# china_train_effort <- chinaTrain$Effort
# chinaTest <- read.arff("./datasets/effortEstimation/china3AttSelectedAFPTest.arff")
#china_size_test <- chinaTest$AFP
# actualEffort <- chinaTest$Effort
```


### Genetic Programming for Symbolic Regression: China dataset.

```{r geneticProgramingExample, message=FALSE, warning=FALSE, eval = FALSE}
# library("rgp")
# options(digits = 5)
# stepsGenerations <- 100
# initialPopulation <- 100
# Steps <- c(10)
# y <- china_train_effort   #
# x <- china_train_size  # 

# data2 <- data.frame(y, x)  # create a data frame with effort, size
# newFuncSet <- mathFunctionSet
# alternatives to mathFunctionSet
# newFuncSet <- expLogFunctionSet # sqrt", "exp", and "ln"
# newFuncSet <- trigonometricFunctionSet
# newFuncSet <- arithmeticFunctionSet
# newFuncSet <- functionSet("+","-","*", "/","sqrt", "log", "exp") # ,, )

# gpresult <- symbolicRegression(y ~ x, 
#                                 data=data2, functionSet=newFuncSet,
#                                populationSize=initialPopulation,
                                

# stopCondition=makeStepsStopCondition(stepsGenerations))

# bf <- gpresult$population[[which.min(sapply(gpresult$population, gpresult$fitnessFunction))]]
# wf <- gpresult$population[[which.max(sapply(gpresult$population, gpresult$fitnessFunction))]]

# bf1 <- gpresult$population[[which.min((gpresult$fitnessValues))]]
# plot(x,y)
# lines(x, bf(x), type = "l", col="blue", lwd=3)
# lines(x,wf(x), type = "l", col="red", lwd=2)

# x_test <- china_size_test
# estim_by_gp <- bf(x_test)
# ae_gp <- abs(actualEffort - estim_by_gp)
# mean(ae_gp)

```


### Genetic Programming for Symbolic Regression. Telecom1 dataset.

  - For illustration purposes only. We use all data points. 
  
```{r, eval = FALSE}
# y <- effort_telecom1   # all data points
# x <- size_telecom1   # 
# 
# data2 <- data.frame(y, x)  # create a data frame with effort, size
# # newFuncSet <- mathFunctionSet
# # alternatives to mathFunctionSet
# newFuncSet <- expLogFunctionSet # sqrt", "exp", and "ln"
# # newFuncSet <- trigonometricFunctionSet
# # newFuncSet <- arithmeticFunctionSet
# # newFuncSet <- functionSet("+","-","*", "/","sqrt", "log", "exp") # ,, )
# 
# gpresult <- symbolicRegression(y ~ x, 
#                                 data=data2, functionSet=newFuncSet,
#                                 populationSize=initialPopulation,
#                                 stopCondition=makeStepsStopCondition(stepsGenerations))
# 
# bf <- gpresult$population[[which.min(sapply(gpresult$population, gpresult$fitnessFunction))]]
# wf <- gpresult$population[[which.max(sapply(gpresult$population, gpresult$fitnessFunction))]]
# 
# bf1 <- gpresult$population[[which.min((gpresult$fitnessValues))]]
# plot(x,y)
# lines(x, bf(x), type = "l", col="blue", lwd=3)
# lines(x,wf(x), type = "l", col="red", lwd=2)

```


## Neural Networks 

A neural network (NN) simulates some of the learning functions of the human brain. 

It can recognize patterns and "learn" . Through the use of a trial and error method the system “learns” to become an “expert” in the field.

A NN is composed of a set of nodes (units, neurons, processing elements) 
  + Each node has input and output
  + Each node performs a simple computation by its node function

Weighted connections between nodes
  + Connectivity gives the structure/architecture of the net
  + What can be computed by a NN is primarily determined by the connections and their weights

  ![](figures/neuralnet.png)
  ![](figures/neuralnet2.png)

There are several packages in R to work with NNs 

  + [neuralnet](https://cran.r-project.org/web/packages/neuralnet/index.html)
  + [nnet](https://cran.r-project.org/web/packages/nnet/index.html)
  + [RSNNS](https://cran.r-project.org/web/packages/RSNNS/index.html)

TO BE FIXED!!!: The following is an example with the neuralnet package (TO DO, denormalize!).  Neural nets need scaling of variables to work properly.
    
```{r NeuralNetExample, message=FALSE, warning=FALSE}
library(foreign)
library(neuralnet)

chinaTrain <- read.arff("datasets/effortEstimation/china3AttSelectedAFPTrain.arff")

afpsize <- chinaTrain$AFP
effort_china <- chinaTrain$Effort

chinaTest <- read.arff("datasets/effortEstimation/china3AttSelectedAFPTest.arff")
AFPTest <- chinaTest$AFP
actualEffort <- chinaTest$Effort

trainingdata <- cbind(afpsize,effort_china)
colnames(trainingdata) <- c("Input","Output")

testingdata <- cbind(afpsize,effort_china)
colnames(trainingdata) <- c("Input","Output")

#Normalize data
norm.fun = function(x){(x - min(x))/(max(x) - min(x))}
data.norm = apply(trainingdata, 2, norm.fun)
#data.norm

testdata.norm <- apply(trainingdata, 2, norm.fun)
#testdata.norm


#Train the neural network
#Going to have 10 hidden layers
#Threshold is a numeric value specifying the threshold for the partial
#derivatives of the error function as stopping criteria.
#net_eff <- neuralnet(Output~Input,trainingdata, hidden=5, threshold=0.25)
net_eff <- neuralnet(Output~Input, data.norm, hidden=10, threshold=0.01)

# Print the network
# print(net_eff)

#Plot the neural network
plot(net_eff)

#Test the neural network on some training data
#testdata.norm<-data.frame((testdata[,1] - min(data[, 'displ']))/(max(data[, 'displ'])-min(data[, 'displ'])),(testdata[,2] - min(data[, 'year']))/(max(data[, 'year'])-min(data[, 'year'])),(testdata[,3] - min(data[, 'cyl']))/(max(data[, 'cyl'])-min(data[, 'cyl'])),(testdata[,4] - min(data[, 'hwy']))/(max(data[, 'hwy'])-min(data[, 'hwy'])))

# Run them through the neural network
# net.results <- compute(net_eff, testdata.norm[,2]) 


#net.results <- compute(net_eff, dataTest.norm) # With normalized data

#Lets see what properties net.sqrt has
#ls(net.results)
#Lets see the results
#print(net.results$net.result)

#Lets display a better version of the results
#cleanoutput <- cbind(testdata.norm[,2],actualEffort,
#                     as.data.frame(net.results$net.result))
#colnames(cleanoutput) <- c("Input","Expected Output","Neural Net Output")
#print(cleanoutput)
```


## Support Vector Machines

SVM

## Ensembles

Ensembles or meta-learners combine multiple models to obtain better predictions i.e., this technique consists in combining single classifiers (sometimes are also called weak classifiers). 

A problem with ensembles is that their models are difficult to interpret (they behave as blackboxes) in comparison to
decision trees or rules which provide an explanation of their
decision making process.

They are typically classified as Bagging, Boosting and Stacking (Stacked generalization). 

### Bagging
Bagging (also known as Bootstrap aggregating) is an ensemble technique in which a base learner is applied to multiple equal size datasets created from the original data using bootstraping. Predictions are based on voting of the individual predictions. An advantage of bagging is that it does not require any modification to the learning algorithm and takes advantage of the instability of the base classifier to create diversity among individual ensembles so that individual members of the ensemble perform well in different regions of the data. Bagging does not perform well with classifiers if their output is robust to perturbation of the data such as
nearest-neighbour (NN) classifiers.

### Boosting
Boosting techniques generate multiple models that complement each other inducing models that improve regions of the data where previous induced models preformed poorly. This is achieved by increasing the weights of instances wrongly classified, so new learners focus on those instances. Finally, classification is based on a weighted voted among all members of the ensemble. 

In particular, AdaBoost.M1 [15] is a popular boosting algorithm for classification. The set of training examples is assigned an equal weight at the beginning and the weight of instances is either increased or
decreased depending on whether the learner classified that instance incorrectly or not. The following iterations focus on those instances with higher weights. AdaBoost.M1 can be applied to any base learner.

### Rotation Forests

Rotation Forests [40] combine randomly chosen subsets of attributes (random subspaces) and bagging approaches with principal components feature generation to construct an ensemble of decision trees. Principal Component Analysis is used as a feature selection technique combining subsets of
attributes which are used with a bootstrapped subset of the training data by the base classifier. 


### Boosting in R

In R, there are three packages to deal with Boosting: gmb, ada and the mboost packages. An example of gbm using the caret package.

```{r}
# load libraries
library(caret)
library(pROC)

#################################################
# model it
#################################################

# Get names of caret supported models (just a few - head)
head(names(getModelInfo()))

# Show model info and find out what type of model it is
getModelInfo()$gbm$tags
getModelInfo()$gbm$type
```

Example

```{r}
library(foreign)
library(caret)
library(pROC)

kc1 <- read.arff("./datasets/defectPred/D1/KC1.arff")

# Split data into training and test datasets
# TODO: Improve this with createDataParticion from Caret
set.seed(1234)
ind <- sample(2, nrow(kc1), replace = TRUE, prob = c(0.7, 0.3))
kc1.train <- kc1[ind==1, ]
kc1.test <- kc1[ind==2, ]


# create caret trainControl object to control the number of cross-validations performed
objControl <- trainControl(method='cv', number=3, returnResamp='none', summaryFunction = twoClassSummary, classProbs = TRUE)


# run model
objModel <- train(Defective ~ .,
                  data = kc1.train,
                  method = 'gbm', 
                  trControl = objControl,  
                  metric = "ROC" #,
                  #preProc = c("center", "scale")
                  )

# Find out variable importance
summary(objModel)

# find out model details
objModel

```

Evaluate model

```{r}
#################################################
# evalutate model
#################################################
# get predictions on your testing data

# class prediction
predictions <- predict(object=objModel, kc1.test[,-22], type='raw')
head(predictions)
postResample(pred=predictions, obs=as.factor(kc1.test[,22]))

# probabilities 
predictions <- predict(object=objModel, kc1.test[,-22], type='prob')
head(predictions)
postResample(pred=predictions[[2]], obs=ifelse(kc1.test[,22]=='yes',1,0))
auc <- roc(ifelse(kc1.test[,22]=="Y",1,0), predictions[[2]])
print(auc$auc)
```