-
Notifications
You must be signed in to change notification settings - Fork 1
/
Q4L12b.txt
86 lines (85 loc) · 3.36 KB
/
Q4L12b.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
#
# File: content-mit-8371x-subtitles/Q4L12b.txt
#
# Captions for 8.421x module
#
# This file has 76 caption lines.
#
# Do not add or delete any lines. If there is text missing at the end, please add it to the last line.
#
#----------------------------------------
A fundamental mathematical tool in classical communication
theory is entropy.
We define entropy using a scenario where
we have a message source-- call it x-- which
sends an alphabet of characters labeled,
let's say, 1 through x bar. x bar
will be the size of the alphabet.
On average, for long messages, symbol i
occurs with probability p sub i.
Suppose our source generates a message consisting of symbols
taken from this alphabet.
Shannon realized, in 1948, that what
we want to capture as information
is the surprise that comes from each subsequent symbol
from the source.
And this is, mathematically, H of p.
Shannon stipulated that H of p should have certain desired
properties-- first, that H of p should be a continuous function
of the probabilities.
Second, it should be positive and, in fact,
0 only when there is no randomness.
Third, H of p should be capped above by some constant function
of the size of the alphabet.
Fourth, H of a joint probability distribution
should be equal to the sum of the H of the two probability
distributions only if the two distributions are independent.
It turns out that these four desired properties are uniquely
satisfied by a single function of p given by the sum, over i,
of p sub i log p sub i, all negated.
This is the definition of Shannon entropy.
Note that we will use, here, log base 2.
So our measurement of entropy will be in terms of bits.
Let us look at some examples of entropy.
First, let's look at a binary variable where
0 occurs with probability p.
We have seen this entropy function before.
It is the binary entropy function.
Consider, next, an expanded alphabet
where we have 1, 2, so forth up to x bar
and each one of the symbols occurs
with uniform probability.
Here, the entropy is simply the log
of the size of the alphabet.
As a third example, consider an alphabet
with four possible symbols, a, b, c, and d.
Here, say, a occurs with probability 1/2,
b, 1/4, and c and d with probability 1/8.
By direct calculation, the entropy is 7/4 bits.
Here is the explicit calculation which
you may use to check yourself.
This means that, on average, 1.75 questions
are needed to answer the question, what symbol
is coming next in the message.
As a fourth example, consider the famous question,
the entropy of English.
Here, suppose we have 26 letters and one space as our alphabet.
The entropy is clearly less than log
of 27, which is 4.76 bits per letter.
That is because, for example, the probability of e
is 13%, whereas the probability of p and z
are very low, around 0.1% for typical samples
of English literature.
Of course, there are atypical examples
such as the story Gadsby by Ernest Vincent Wright.
This story is 267 pages long and yet contains no words
with the letter "E."
Empirically, it can be found that with two-letter pairs,
the number of bits per letter drops to 4.03,
and with four-letter combinations,
2.8 bits per letter.
Shannon studied this experimentally
with a gambling experiment asking listeners
to predict the next letter as words were spelled out.
He observed a rate of 1.3 bits per letter.
This provides a concrete and interesting example of the meaning of entropy in communication.