-
Notifications
You must be signed in to change notification settings - Fork 0
/
Exercise4.m
122 lines (100 loc) · 3.62 KB
/
Exercise4.m
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
%% Exercise 4 - Classifying the BM-Dataset
% Submitted by *Prasannjeet Singh*
%
% <html>
% <link rel="stylesheet" type="text/css" href="../Data/layout.css">
% </html>
%
%% 1. Plotting the Dataset
% Note that I have used only 10,000 datasets to reduce the time taken to
% run the project. However, I have also performed the same with complete
% dataset and saved the result. Those are also displayed at the bottom.
% Loading the dataset
load BM.mat
% Cropping the dataset to first ten thousand items
X = batmanX(1:10000,:);
Y = num2cell(num2str(batmany(1:10000,:)));
hFig = figure(1);
gscatter(X(:,1),X(:,2),Y);
title('Scatter Diagram of Batman (Partial Data)');
axis tight;
snapnow;
close(hFig);
%% 2. Running *fitcsvm()* with chosen hyperparameters
% Following hyperparameters were used to make the decision boundary as
% precise without taking much time:
%
% # KernelFunction: gaussian
% # BoxConstant: 1
% # KernelScale: 1
classes = unique(Y);
rng(1); % For reproducibility
SVMModels = fitcsvm(X,Y,'KernelFunction','gaussian','BoxConstraint',1,'KernelScale',1.5);
%% 3. Plotting Decision Boundary
% Creating a Mesh-Grid (x-y axis) from min value of x to max value of x
% with a separation of 0.02:
d = 0.02;
[x1Grid,x2Grid] = meshgrid(min(X(:,1)):d:max(X(:,1)),min(X(:,2)):d:max(X(:,2)));
xGrid = [x1Grid(:),x2Grid(:)];
% Predicting for all the values in the mesh grid
maxScore = predict(SVMModels,xGrid);
% Plotting the Batman Decision boundary:
f1 = figure(3);
gscatter(xGrid(:,1),xGrid(:,2),maxScore,[0.1 0.5 0.5; 0.5 0.1 0.5]);
title('The Batman Decision Boundary');
axis tight;
snapnow;
close(f1);
%%
% *Training Error Rate*
trainingError = sum(str2num(cell2mat(predict(SVMModels,X))) ~= batmany(1:10000))
trainingErrorRate = trainingError/size(X,1)*100
%%
% Therefore, it can be observed that current model has only 174 errors out
% of a total of 10000 datasets, with the training error rate of 1.74
% percent.
%% 4. Observation with Complete Data:
% Following hyperparameters were used which made nicer decision boundaries,
% but took huge time. Note that to make a sharp decision boundary we are
% basically trying to *overfit* our model. And in that case, we have to
% impose huge penalty if any point tries to cross the margins. And we can
% impose higher penalty by increasing the parameter *BoxConstant*. However,
% an increase in BoxConstant results in increased run time of the
% algorithm.
%
% # KernelFunction: gaussian
% # BoxConstant: 20
% # KernelScale: 0.2
%
% Loading the data to showcase the results:
%
% The data contains both fitcsv model and the predicted solution to make
% the decision boundary. However, the model will not be used as I have
% already predicted the boundaries to save time.
load Data/batmodel.mat;
hFig = figure(4);
set(hFig, 'Position', [0 0 1500 500]);
subplot(1,2,1);
gscatter(batmanX(:,1), batmanX(:,2), batmany);
title('Scatter for complete Batman data');
axis tight;
subplot(1,2,2);
gscatter(xGrid(:,1),xGrid(:,2),batscore,[0.1 0.5 0.5; 0.5 0.1 0.5]);
title('Decision boundary with improved hyperparameters');
axis tight;
snapnow;
close(hFig);
%%
% *Training Error Rate*
%
% Since we have tried to overfit the data, we expect lower training error
% rate than what we got below. Training error rate can be calculate via the
% following code:
% trainingError = sum(str2num(cell2mat(predict(BatModel,batmanX))) ~= batmany)
% trainingErrorRate = trainingError/size(batmanX,1)*100
%%
% However, since it takes time, I have already calculated it and the
% training error, on a total of 100,000 training set turned out to be only
% 107, with a training error rate of *0.107%*, which is noticeably lower
% than our previous calculation.
%%