-
Notifications
You must be signed in to change notification settings - Fork 1
/
softmax.html
152 lines (120 loc) · 2.65 KB
/
softmax.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Softmax</title>
<meta name="description" content="Converting numbers to probability distributions">
<style>
body {
background-color: beige;
color: rgb(41, 45, 58);
}
pre {
color: rgb(38, 61, 106);
background-color: rgb(218, 218, 183);
}
b {
color: rgb(165, 122, 13);
}
@media (prefers-color-scheme: dark) {
body {
background-color: rgb(19, 22, 32);
color: beige;
}
pre {
background-color: rgb(29, 32, 38);
color: rgb(186, 186, 169);
}
b {
color: rgb(238, 204, 11);
}
}
body {
line-height: 1.4;
margin-inline: 0;
margin-block: 2rem 3rem;
}
p, pre {
margin-block: 1.5em;
margin-inline: 2em;
border-radius: 1px;
}
p {
padding: 1ch 2ch;
max-width: 32rem;
}
p + p {
padding-block-start: 0;
}
pre {
padding: 2.25ch 2.75ch;
}
@media (width > 35em) {
pre {
max-width: 32rem;
}
}
@media (width < 35em) {
p, pre {
margin-inline: 0;
}
pre {
overflow-y: auto;
}
}
</style>
</head>
<body>
<p>
<b>Softmax</b> converts an arbitrary set of numbers into a probability distribution.
That is, the numbers will all be between 0 and 1, and will sum together to 1.
</p>
<pre><code>const sm = (xs) => {
const s = xs.map(Math.exp).reduce((a, x) => a + x, 0);
return xs.map((x) => Math.exp(x) / s);
};
</code></pre>
<p>
It has some nice characteristics. E.g. compared to other methods of
normalization, this handles negatives, small, large, zero, anything you can
throw at it usually.
</p>
<pre><code>// helper method to print as percentages
const per = (xs) => xs.map((x) => Math.round(x * 100));
> per(sm([1, 0, 3, -10, 0.07]))
[ 11, 4, 81, 0, 4 ]
</code></pre>
<p>It has some not so nice ones too. E.g. it is not scale invariant.</p>
<pre><code>> per(sm([1, 2]))
[ 27, 73 ]
> per(sm([2, 4]))
[ 12, 88 ]
> per(sm([4, 8]))
[ 2, 98 ]
</code></pre>
<p>
Intuitively, I would think the presence of the exponent will amplify
differences. This happens indeed
</p>
<pre><code>> per(sm([1, 2, 4, 8]))
[ 0, 0, 2, 98 ]
</code></pre>
<p>
But not to an extent that I expected on first contact. On the contrary, it seems
to further "compress" the numbers together if they're close together
</p>
<pre><code>> per(sm([0.1, 0.2, 0.3, 0.4]))
[ 21, 24, 26, 29 ]
</code></pre>
<p>
Again, depending on the task at hand, this might or might not be the behaviour I
might want.
</p>
<p>
It seems to work great in ML settings, in particular for converting the output
of the last layer into probability distributions, for reasons that seem to be
tied to how the backpropogation works.
</p>
</body>
</html>