-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathaises_10_1
1881 lines (1850 loc) · 108 KB
/
aises_10_1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!-- Appendix B - Utility Functions -->
<style>
.visionbox{
border-radius: 15px;
border: 2px solid #3585d4;
background-color: #ebf3fb;
text-align: left;
padding: 10px;
}
</style>
<style>
.visionboxlegend{
border-bottom-style: solid;
border-bottom-color: #3585d4;
border-bottom-width: 0px;
margin-left: -12px;
margin-right: -12px; margin-top: -13px;
padding: 0.01em 1em; color: #ffffff;
background-color: #3585d4;
border-radius: 15px 15px 0px 0px}
</style>
<h1 id="utility-and-utility-functions">B.1 Utility and Utility
Functions</h1>
<h2 id="fundamentals">B.1.1 Fundamentals</h2>
<p><strong>A utility function is a mathematical representation of
preferences.</strong> A utility function, <span
class="math inline"><em>u</em></span>, takes inputs like goods or
situations and outputs a value called <em>utility</em>. Utility is a
measure of how much an agent prefers goods and situations relative to
other goods and situations.<p>
Suppose we offer Alice some apples, bananas, and cherries. She might
have the following utility function for fruits:<p>
<span
class="math display"><em>u</em>(fruits) = 12<em>a</em> + 10<em>b</em> + 2<em>c</em>,</span>
where <span class="math inline"><em>a</em></span> is the number of
apples, <span class="math inline"><em>b</em></span> is the number of
bananas, and <span class="math inline"><em>c</em></span> is the number
of cherries that she consumes. Suppose Alice consumes no apples, one
banana, and five cherries. The amount of utility she gains from her
consumption is calculated as <span
class="math display"><em>u</em>(0 apples,1 banana,5 cherries) = (12⋅0) + (10⋅1) + (2⋅5) = 20.</span>
The output of this function is read as “20 units of utility” for short.
These units are arbitrary and reflect the level of Alice’s utility. We
can use utility functions to quantitatively represent preferences over
different combinations of goods and situations. For example, we can rank
Alice’s preferences over fruits as <span
class="math display">apple ≻ banana ≻ cherry,</span> where <span
class="math inline">≻</span> represents <em>preference</em>, such that
what comes before the symbol is preferred to what comes after it. This
follows from the fact that Alice gains 12 units from an apple, 10 units
from a banana, and 2 units from a cherry. The advantage of having a
utility function as opposed to just an explicit ranking of goods is that
we can directly infer information about more complex goods. For example,
we know <span class="math display"><em>u</em>(1 banana,5
cherries) = 20 > <em>u</em>(1 apple) = 12 > <em>u</em>(1
banana) = 10.</span> <p><strong>Utility functions, if accurate, reveal what
options agents would prefer and choose.</strong> If told to choose only
one of the three fruits, Alice would pick the apple, since it gives her
the most utility. Her preference follows from <em>rational choice
theory</em>, which proposes that individuals, acting in their own
self-interest, make decisions that maximize their self-interest. This
view is only an approximation to human behavior. In this chapter we will
discuss how rational choice theory is an imperfect but useful way to
model choices. We will also refer to individuals who behave in coherent
ways that help maximize utility as <em>agents</em>.</p>
<p><strong>We explore concepts about utility functions that are useful
for thinking about AIs, humans, and organizations like companies and
states.</strong> First, we introduce <em>Bernoulli utility
functions</em>, which are conventional utility functions that define
preferences over certain outcomes like the example above. We later
discuss <em>von Neumann-Morgenstern utility functions</em>, which extend
preferences to probabilistic situations, in which we cannot be sure
which outcome will occur. <em>Expected utility theory</em> suggests that
rationality is the ability to maximize preferences. We consider the
relevance of utility functions to <em>AI corrigibility</em>—the property
of being receptive to corrections—and see how this might be a source of
tail risk. Much of this chapter focuses on how utility functions help
understand and model agents’ <em>attitudes toward risk</em>. Finally, we
examine <em>non-expected utility theories</em>, which seek to rectify
some shortcomings of conventional expected utility theory when modeling
real-life behavior.</p>
<h2 id="motivations-for-learning-about-utility-functions">B.1.2 Motivations
for Learning About Utility Functions</h2>
<p><strong>Utility functions are a central concept in economics and
decision theory.</strong> Utility functions can be applied to a wide
range of problems and agents, from rats finding cheese in a maze to
humans making investment decisions to countries stockpiling nuclear
weapons. Conventional economic theory assumes that people are rational
and well-informed, and make decisions that maximize their self-interest,
as represented by their utility function. The view that individuals will
choose options that are likely to maximize their utility functions,
referred to as <em>expected utility theory</em>, has been the major
paradigm in real-world decision making since the Second World War <span
class="citation" data-cites="schoemaker1982expected">[1]</span>. It is
useful for modeling, predicting, and encouraging desired behavior in a
wide range of situations. However, as we will discuss, this view does
not perfectly capture reality, because individuals can often be
irrational, lack relevant knowledge, and frequently make mistakes.</p>
<p><strong>The objective of maximizing a utility function can cause
intelligence.</strong> The <em>reward hypothesis</em> suggests that the
objective of maximizing some reward is sufficient to drive behavior that
exhibits intelligent traits like learning, knowledge, perception, social
awareness, language, generalization, and more <span class="citation"
data-cites="silver2021reward">[2]</span>. The reward hypothesis implies
that artificial agents in rich environments with simple rewards could
develop sophisticated general intelligence. For example, an artificial
agent deployed with the goal of maximizing the number of successful food
deliveries may develop relevant geographical knowledge, an understanding
of how to move between destinations efficiently, and the ability to
perceive potential dangers. Therefore, the construction and properties
of the utility function that agents maximize are central to guiding
intelligent behavior.</p>
<p><strong>Certain artificial agents may be approximated as expected
utility maximizers.</strong> Some artificial intelligences are
agent-like. They are programmed to consider the potential outcomes of
different actions and to choose the option that is most likely to lead
to the optimal result. It is a reasonable approximation to say that many
artificial agents make choices that they predict will give them the
highest utility. For instance, in reinforcement learning, artificial agents explore their environment and
are rewarded for desirable behavior. These agents are explicitly
constructed to maximize reward functions, which strongly shape an
agent’s internal utility function, should it exist, and its
dispositions. This view of AI has implications for how we design and
evaluate these systems—we need to ensure that their value functions
promote human values. Utility functions can help us reason about the
behavior of AIs, as well as the behavior of powerful actors that direct
AIs, such as corporations or governments.</p>
<p><strong>Utility functions are a key concept in AI safety.</strong>
Utility functions come up explicitly and implicitly at various times
throughout this book, and are useful for understanding the behavior of
reward-maximizing agents, as well as humans and organizations involved
in the AI ecosystem. They will also come up in our chapter on Machine Ethics, when we
consider that some advanced AIs may have utility functions make up the
social welfare function they seek to increase. In the Collective Action Problems chapter, we will
continue our discussion of rational agents that seek to maximize their
own utility.</p>
<h1 id="properties-of-utility-functions">B.2 Properties of Utility
Functions</h1>
<p><strong>Overview.</strong> In this section, we will formalize our
understanding of utility functions. First, we will introduce
<em>Bernoulli utility functions</em>, which are simple utility functions
that allow an agent to select between different choices with known
outcomes. Then we will discuss <em>von Neumann-Morgenstern utility
functions</em>, which model how rational agents select between choices
with probabilistic outcomes based on the concept of <em>expected
utility</em>, to make these tools more generally applicable to the
choices under uncertainty. Finally, we will describe a solution to a
famous puzzle applying expected utility—the <em>St. Petersburg
Paradox</em>—to see why expected utility is a useful tool for decision
making.<p>
Establishing these mathematical foundations will help us understand how
to apply utility functions to various actors and situations.</p>
<h2 id="bernoulli-utility-functions">B.2.1 Bernoulli Utility Functions</h2>
<p><strong>Bernoulli utility functions represent an individual’s
preferences over potential outcomes.</strong> Suppose we give people the
choice between an apple, a banana, and a cherry. If we already know each
person’s utility function, we can deduce, predict, and compare their
preferences In the introduction, we met Alice, whose preferences are
represented by the utility function over fruits:<p>
<span
class="math display"><em>u</em>(<em>f</em>) = 12<em>a</em> + 10<em>b</em> + 2<em>c</em>.</span>
This is a Bernoulli utility function.</p>
<p><strong>Bernoulli utility functions can be used to convey the
strength of preferences across opportunities.</strong> In their most
basic form, Bernoulli utility functions express ordinal preferences by
ranking options in order of desirability. For more information, we can
consider cardinal representations of preferences. With cardinal utility
functions, numbers matter: while the units are still arbitrary, the
relative differences are informative.<p>
To illustrate the difference between ordinal and cardinal comparisons,
consider how we talk about temperature. When we want to precisely convey
information about temperature, we use a cardinal measure like Celsius or
Fahrenheit: “Today is five degrees warmer than yesterday.” We could have
also accurately, but less descriptively, used an ordinal descriptor:
“Today is warmer than yesterday.” Similarly, if we interpret Alice’s
utility function as cardinal, we can conclude that she feels more
strongly about the difference between a banana and a cherry (8 units of
utility) than she does about the difference between an apple and a
banana (2 units). We can gauge the relative strength of Alice’s
preferences from a utility function.</p>
<h2 id="von-neumann-morgenstern-utility-functions">B.2.2 Von
Neumann-Morgenstern Utility Functions</h2>
<p><strong>Von Neumann-Morgenstern utility functions help us understand
what people prefer when outcomes are uncertain.</strong> We do not yet
know how Alice values an uncertain situation, such as a coin flip. If
the coin lands on heads, Alice gets both a banana and an apple. But if
it lands on tails, she gets nothing. Now let’s say we give Alice a
choice between getting an apple, getting a banana, or flipping the coin.
Since we know her fruit Bernoulli utility function, we know her
preferences between apples and bananas, but we do not know how she
compares each fruit to the coin flip. We’d like to convert the possible
outcomes of the coin flip into a number that represents the utility of
each outcome, which can then be compared directly against the utility of
receiving the fruits with certainty. The von Neumann-Morgenstern (vNM)
utility functions help us do this <span class="citation"
data-cites="vonneumann1947theory">[3]</span>. They are extensions of
Bernoulli utility functions, and work specifically for situations with
uncertainty, represented as <em>lotteries</em> (denoted <strong><span
class="math inline"><em>L</em></span></strong>), like this coin flip.
First, we work through some definitions and assumptions that allow us to
construct utility functions over potential outcomes, and then we explore
the relation between von Neumann-Morgenstern utility functions and
expected utility.</p>
<p><strong>A lottery assigns a probability to each possible
outcome.</strong> Formally, a lottery <span
class="math inline"><em>L</em></span> is any set of possible outcomes,
denoted <span
class="math inline"><em>o</em><sub><em>i</em></sub></span>, and their
associated probabilities, denoted <span
class="math inline"><em>p</em><sub><em>i</em></sub></span>. Consider a
simple lottery: a coin flip where Alice receives an apple on heads, and
a banana on tails. This lottery has possible outcomes <span
class="math inline"><em>a</em><em>p</em><em>p</em><em>l</em><em>e</em></span>
and <span
class="math inline"><em>b</em><em>a</em><em>n</em><em>a</em><em>n</em><em>a</em></span>,
each with probability <span class="math inline">0.5</span>. If a
different lottery offers a cherry with certainty, it would have only the
possible outcome <span
class="math inline"><em>c</em><em>h</em><em>e</em><em>r</em><em>r</em><em>y</em></span>
with probability <span class="math inline">1</span>. Objective
probabilities are used when the probabilities are known, such as when
calculating the probability of winning in casino games like roulette. In
other cases where objective probabilities are not known, like predicting
the outcome of an election, an individual’s subjective best-guess could
be used instead. So, both uncertain and certain outcomes can be
represented by lotteries.<p>
</p>
<br>
<div class="visionbox">
<legend class="visionboxlegend">
<p><span><b>A Note on Expected Value vs. Expected Utility</b></span></p>
</legend>
<p>An essential distinction in this chapter is that between expected
value and expected utility.</p>
<p><strong>Expected value is the average outcome of a random
event.</strong> While most lottery tickets have negative expected value,
in rare circumstances they have positive expected value. Suppose a
lottery has a jackpot of 1 billion dollars. Let the probability of
winning the jackpot be 1 in 300 million, and let the price of a lottery
ticket be $2. Then the expected value is calculated by adding together
each possible outcome by its probability of occurrence. The two outcomes
are (1) that we win a billion dollars, minus the cost of $2 to play the
lottery, which happens with probability one in 300 million, and (2) that
we are $2 in debt. We can calculate the expected value with the formula:
<span class="math display">$$\frac{1}{300 \text{ million}} \cdot
\left(\$ 1 \text{ billion}-\$ 2\right)+\left(1-\frac{1}{300 \text{
million}}\right) \cdot \left(-\$ 2\right)\approx \$ 1.33.$$</span> The
expected value of the lottery ticket is positive, meaning that, on
average, buying the lottery ticket would result in us receiving <span
class="math inline">$</span>1.33.<p>
Generally, we can calculate expected value by multiplying each outcome
value, <span class="math inline"><em>o</em><em>i</em></span>, with its
probability <span class="math inline"><em>p</em>,</span> and sum
everything up over all <span class="math inline"><em>n</em></span>
possibilities: <span
class="math display"><em>E</em>[<em>L</em>] = <em>o</em><sub>1</sub> ⋅ <em>p</em><sub>1</sub> + <em>o</em><sub>2</sub> ⋅ <em>p</em><sub>2</sub> + ⋅ <em>s</em> + <em>o</em><sub><em>n</em></sub>·<sub><em>n</em></sub>.</span>
<strong>Expected utility is the average utility of a random
event.</strong> Although the lottery has positive expected value, buying
a lottery ticket may still not increase its expected utility. Expected
utility is distinct from expected value: instead of summing over the
monetary outcomes (weighing each outcome by its probability), we sum
over the utility the agent receives from each outcome (weighing each
outcome by its probability).<p>
If the agent’s utility function indicates that one “util” is just as
valuable as one dollar, that is <span
class="math inline"><em>u</em>($<em>x</em>) = <em>x</em></span>, then
expected utility and expected value would be the same. But suppose the
agent’s utility function were a different function, such as <span
class="math inline"><em>u</em>($<em>x</em>) = <em>x</em><sup>1/3</sup></span>.
This utility function means that the agent values each additional dollar
less and less as they have more and more money.<p>
For example, if an agent with this utility function already has <span
class="math inline">$</span>500, an extra dollar would increase their
utility by 0.05, but if they already have <span
class="math inline">$</span>200,000, an extra dollar would increase
their utility by only 0.0001. With this utility function, the expected
utility of this lottery example is negative: <span
class="math display">$$\frac{1}{300 \text{ million}} \cdot \left(1
\text{ billion}-2\right)^{1/3}+\left(1-\frac{1}{300 \text{
million}}\right) \cdot \left(-2\right)^{1/3}\approx -1.26.$$</span>
Consequently, expected value can be positive while expected utility can
be negative, so the two concepts are distinct.<p>
Generally, expected utility is calculated as: <span
class="math display"><em>E</em>[<em>u</em>(<em>L</em>)] = <em>u</em>(<em>o</em><sub>1</sub>) ⋅ <em>p</em><sub>1</sub> + <em>u</em>(<em>o</em><sub>2</sub>) ⋅ <em>p</em><sub>2</sub> + ⋯ + <em>u</em>(<em>o</em><sub><em>n</em></sub>) ⋅ <em>p</em><sub><em>n</em></sub>.</span></p>
</div>
<br>
<p><strong>According to expected utility theory, rational agents make
decisions that maximize expected utility.</strong> Von Neumann and
Morgenstern proposed a set of basic propositions called <em>axioms</em>
that define an agent with rational preferences. When an agent satisfies
these axioms, their preferences can be represented by a von
Neumann-Morgenstern utility function, which is equivalent to using
expected utility to make decisions. While expected utility theory is
often used to model human behavior, it is important to note that it is
an imperfect approximation. In the final section of this chapter, we
present some criticisms of expected utility theory and the vNM
rationality axioms as they apply to humans. However, artificial agents
might be designed along these lines, resulting in an explicit expected
utility maximizer, or something approximating an expected utility
maximizer. The von Neumann-Morgenstern rationality axioms are listed
below with mathematically precise notation for sake of completeness, but
a technical understanding of them is not necessary to proceed with the
chapter.</p>
<p><strong>Von Neumann-Morgenstern Rationality Axioms.</strong> When the
following axioms are satisfied, we can assume a utility function of an
expected utility form, where agents prefer lotteries that have higher
expected utility <span class="citation"
data-cites="vonneumann1947theory">[3]</span>. <span
class="math inline"><em>L</em></span> is a lottery. <span
class="math inline"><em>L</em><sub><em>A</em></sub> ≽ <em>L</em><sub><em>B</em></sub></span>
means that the agent prefers lottery A to lottery B, whereas <span
class="math inline"><em>L</em><sub><em>A</em></sub> ∼ <em>L</em><sub><em>B</em></sub></span>
means that the agent is indifferent between lottery A and lottery B.
These axioms and conclusions that can be derived from them are
contentious, as we will see later on in this chapter. There are six such
axioms, that we can split into two groups.<p>
</ol>
<p><strong></strong> The classic four axioms are:</p>
<ol>
<li><p>Completeness: The agent can rank their preferences over all
lotteries. For any two lotteries, it must be that <span
class="math inline"><em>L</em><sub><em>A</em></sub> ≽ <em>L</em><sub><em>B</em></sub></span>
or <span
class="math inline"><em>L</em><sub><em>B</em></sub> ≽ <em>L</em><sub><em>A</em></sub></span>.</p></li>
<li><p>Transitivity: If <span
class="math inline"><em>L</em><sub><em>A</em></sub> ≽ <em>L</em><sub><em>B</em></sub></span>
and <span
class="math inline"><em>L</em><sub><em>B</em></sub> ≽ <em>L</em><sub><em>C</em></sub></span>,
then <span
class="math inline"><em>L</em><sub><em>A</em></sub> ≽ <em>L</em><sub><em>C</em></sub></span>.</p></li>
<li><p>Continuity: For any three lotteries, <span
class="math inline"><em>L</em><sub><em>A</em></sub> ≽ <em>L</em><sub><em>B</em></sub> ≽ <em>L</em><sub><em>C</em></sub></span>,
there exists a probability <span
class="math inline"><em>p</em> ∈ [0,1]</span> such that <span
class="math inline"><em>p</em><em>L</em><sub><em>A</em></sub> + (1−<em>p</em>)<em>L</em><sub><em>C</em></sub> ∼ <em>L</em><sub><em>B</em></sub></span>.
This means that the agent is indifferent between <span
class="math inline"><em>L</em><sub><em>B</em></sub></span> and some
combination of the worse lottery <span
class="math inline"><em>L</em><sub><em>C</em></sub></span> and the
better lottery <span
class="math inline"><em>L</em><sub><em>A</em></sub></span>. In practice,
this means that agents’ preferences change smoothly and predictably with
changes in options.</p></li>
<li><p>Independence: The preference between two lotteries is not
impacted by the addition of equal probabilities of a third, independent
lottery to each lottery. That is, <span
class="math inline"><em>L</em><sub><em>A</em></sub> ≽ <em>L</em><sub><em>B</em></sub></span>
is equivalent to <span
class="math inline"><em>p</em><em>L</em><sub><em>A</em></sub> + (1−<em>p</em>)<em>L</em><sub><em>C</em></sub> ≽ <em>p</em><em>L</em><sub><em>B</em></sub> + (1−<em>p</em>)<em>L</em><sub><em>C</em></sub></span>
for any <span
class="math inline"><em>L</em><sub><em>C</em></sub></span>.
></p></li>
</ol>
The final two axioms represent relatively obvious characteristics of rational decision-making
, although actual decision-making processes sometimes deviate from these. These axioms are
relatively ``weak'' and are implied by the previous four.</p>
<ol>
<li><p><span>Monotonicity</span>: Agents prefer higher probabilities of
preferred outcomes.</p></li>
<li><p><span>Decomposability</span>: The agent is indifferent between
two lotteries that share the same probabilities for all the same
outcomes, even if they are described differently.</p></li>
</ol>
<p><strong>Form of von Neumann-Morgenstern utility functions.</strong>
If an agent’s preferences are consistent with the above axioms, their
preferences can be represented by a vNM utility function. This utility
function, denoted by a capital <span
class="math inline"><em>U</em></span>, is simply the expected Bernoulli
utility of a lottery. That is, a vNM utility function takes the
Bernoulli utility of each outcome, multiplies each with its
corresponding probability of occurrence, and then adds everything up.
Formally, an agent’s expected utility for a lottery <span
class="math inline"><em>L</em></span> is calculated as: <span
class="math display"><em>U</em>(<em>L</em>) = <em>u</em>(<em>o</em><sub>1</sub>) ⋅ <em>p</em><sub>1</sub> + <em>u</em>(<em>o</em><sub>2</sub>) ⋅ <em>p</em><sub>2</sub> + ⋯ + <em>u</em>(<em>o</em><sub><em>n</em></sub>) ⋅ <em>p</em><sub><em>n</em></sub>,</span>
so expected utility can be thought of as a weighted average of the
utilities of different outcomes.<p>
This is identical to the expected utility formula we discussed above—we
sum over the utilities of all the possible outcomes, each multiplied by
its probability of occurrence. With Bernoulli utility functions, an
agent prefers <span class="math inline"><em>a</em></span> to <span
class="math inline"><em>b</em></span> if and only if their utility from
receiving <span class="math inline"><em>a</em></span> is greater than
their utility from receiving <span
class="math inline"><em>b</em></span>. With expected utility, an agent
prefers lottery <span
class="math inline"><em>L</em><sub><em>A</em></sub></span> to lottery
<span class="math inline"><em>L</em><sub><em>B</em></sub></span> if and
only if their expected utility from lottery <span
class="math inline"><em>L</em><sub><em>A</em></sub></span> is greater
than from lottery <span
class="math inline"><em>L</em><sub><em>B</em></sub></span>. That is:
<span
class="math display"><em>L</em><sub><em>A</em></sub> ≻ <em>L</em><sub><em>B</em></sub> ⇔ <em>U</em>(<em>L</em><sub><em>A</em></sub>) > <em>U</em>(<em>L</em><sub><em>B</em></sub>).</span>
where the symbol <span class="math inline">≻</span> indicates
preference. The von Neumann-Morgenstern utility function models the
decision making of an agent considering two lotteries as just
calculating the expected utilities and choosing the larger resulting
one.<p>
</p>
<br>
<div class="visionbox">
<legend class="visionboxlegend">
<p><span><b>A Note on Logarithms</b></span></p>
</legend>
<p><strong>Logarithmic functions are commonly used as utility
functions.</strong> A logarithm is a mathematical function that
expresses the power to which a given number (referred to as the base)
must be raised in order to produce a value. The logarithm of a number
<span class="math inline"><em>x</em></span> with respect to base <span
class="math inline"><em>b</em></span> is denoted as <span
class="math inline">log<sub><em>b</em></sub><em>x</em></span>, and is
the exponent to which <span class="math inline"><em>b</em></span> must
be raised to produce the value <span
class="math inline"><em>x</em></span>. For example, <span
class="math inline">log<sub>2</sub>8 = 3</span>, because <span
class="math inline">2<sup>3</sup> = 8</span>.<p>
One special case of the logarithmic function, the natural logarithm, has
a base of <span class="math inline"><em>e</em></span> (which is Euler’s
constant, roughly 2.718); in this chapter, it is referred to simply as
<span class="math inline">log </span>. Logarithms have the following
properties, independent of base: <span
class="math inline">log 0 → − ∞</span>, <span
class="math inline">log 1 = 0,</span> <span
class="math inline">log<sub><em>b</em></sub><em>b</em> = 1,</span> and
<span
class="math inline">log<sub><em>b</em></sub><em>b</em><sup><em>a</em></sup> = <em>a</em></span>.<p>
Logarithms have a downward, concave shape, meaning the output increases
slower than the input. This shape resembles how humans value resources:
we generally value a good less if we already have more of it.
Logarithmic functions value goods in inverse proportion to how much of
the resource we already have.<p>
</p>
<figure id="fig:logarithms">
<img src="https://raw.githubusercontent.com/WilliamHodgkins/AISES/main/images/logarithms.png" class="tb-img-full" style="width: 70%"/>
<p class="tb-caption">Figure B.1: Logarithmic functions share several properties, such as being concave and crossing
the y-axis at one. </p>
</figure>
</div>
<br>
<h2 id="st.-petersburg-paradox">B.2.3 St. Petersburg Paradox</h2>
<p>An old man on the streets of St. Petersburg offers gamblers the
following game: he will flip a fair coin repeatedly until it lands on
tails. If the first flip lands tails, the game ends and the pension fund
gets $2. If the coin first lands on heads and then lands on tails, the
game ends and the gambler gets $4. The amount of money (the “return”)
will double for each consecutive flip landing heads before the coin
ultimately lands tails. The game concludes when the coin first lands
tails, and the gambler receives the appropriate returns. Now, the
question is, how much should a gambler be willing to pay from the
pension fund to play this game <span class="citation"
data-cites="peterson2019paradox">[4]</span>?<p>
With probability <span class="math inline">$\frac{1}{2}$</span>, the
first toss will land on tails, in which case the gambler wins two
dollars. With probability <span
class="math inline">$$\frac{1}{4}$$</span>, the first toss lands heads and
the second lands tails, and the gambler wins four dollars.
Extrapolating, this game offers a maximum possible payout of: <span
class="math display">$$\$ 2^{n} = \$ \overbrace{2 \cdot 2 \cdot 2\cdots
2 \cdot 2 \cdot 2}^{n \text{ times}},$$</span> where <span
class="math inline"><em>n</em></span> is the number of flips until and
including when the coin lands on tails. As offered, though, there is no
limit to the size of <span class="math inline"><em>n</em></span>, since
the company promises to keep flipping the coin until it lands on tails.
The expected payout of this game is therefore: <span
class="math display">$$E\left[L\right] =\frac{1}{2} \cdot \$
2+\frac{1}{4} \cdot \$ 4+\frac{1}{8} \cdot \$ 8+\cdots = \$ 1+\$ 1+\$
1+\cdots = \$ \infty.$$</span> Bernoulli described this situation as a
paradox because he believed that, despite it having infinite expected
value, anyone would take a large but finite amount of money over the
chance to play the game. While paying <span
class="math inline">$</span>10,000,000 to play this game would not be
inconsistent with its expected value, we would think it highly
irresponsible! The paradox reveals a disparity between expected value
calculations and reasonable human behavior.<p>
</p>
<figure id="fig:stpetersburg">
<img src="https://raw.githubusercontent.com/WilliamHodgkins/AISES/main/images/stpetersburg.png" class="tb-img-full" style="width: 70%"/>
<p class="tb-caption">Figure B.2: Winnings from the St. Petersburg Paradox double after each coin toss, offering small
likelihoods of big prizes.</p>
</figure>
<p><strong>Logarithmic utility functions can represent decreasing
marginal utility.</strong> A number of ways have been proposed to
resolve the St. Petersburg paradox. We will focus on the most popular:
representing the player with a utility function instead of merely
calculating expected value. As we discussed in the previous section, a
logarithmic utility function seems to resemble how humans think about
wealth. As a person becomes richer, each additional dollar gives them
less satisfaction than before. This concept, called decreasing marginal
utility, makes sense intuitively: a billionaire would not be as
satisfied winning $1000 as someone with significantly less money.
Wealth, and many other resources like food, have such diminishing
returns. While a first slice of pizza is incredibly satisfying, a second
one is slightly less so, and few people would continue eating to enjoy a
tenth slice of pizza.<p>
Assuming an agent with a utility function <span
class="math inline"><em>u</em>($<em>x</em>) = log<sub>2</sub>(<em>x</em>)</span>
over <span class="math inline"><em>x</em></span> dollars, we can
calculate the expected utility of playing the St. Petersburg game as:
<span class="math display">$$E\left[U\left(L\right)\right] =\frac{1}{2}
\cdot \log_{2}(2)+\frac{1}{4} \cdot \log_{2}(4)+\frac{1}{8} \cdot
\log_{2}(8)+\cdots = 2.$$</span> That is, the expected utility of the
game is 2. From the logarithmic utility function over wealth, we know
that: <span
class="math display">2 = log<sub>2</sub><em>x</em> ⇒ <em>x</em> = 4,</span>
which implies that the player is indifferent between playing this game
and having $4: the level of wealth that gives them the same utility as
what they expect playing the lottery.</p>
<p><strong>Expected utility is more reasonable than expected
value.</strong> The previous calculation explains why an agent with
<span
class="math inline"><em>u</em>($<em>x</em>) = log<sub>2</sub><em>x</em></span>
should not pay large amounts of money to play the St. Petersburg game.
The log utility function implies that the player receives diminishing
returns to wealth, and cares less about situations with small chances of
winning huge sums of money. Figure 5 shows how the large payoffs with
small probability, despite having the same expected value, contribute
little to expected utility. This feature captures the human tendency
towards risk aversion, explored in the next section. Note that while
logarithmic utility functions are a useful model (especially in
resolving such paradoxes), they do not perfectly describe human behavior
across choices, such as the tendency to buy lottery tickets, which we
will explore in the next chapter.<p>
</p>
<figure id="fig:flip">
<img src="https://raw.githubusercontent.com/WilliamHodgkins/AISES/main/images/stpetersburg_ev_eu.png" class="tb-img-full" style="width: 70%"/>
<p class="tb-caption">Figure B.3: In the St. Petersburg Paradox, each subsequent flip has the same expected value but
expected utility falls sharply.</p>
</figure>
<p><strong>Summary.</strong> In this section, we examined the properties
of Bernoulli utility functions, which allow us to compare an agent’s
preferences across different outcomes. We then introduced von
Neumann-Morgenstern utility functions, which calculate the average, or
expected, utility over different possible outcomes. From there, we
derived the idea that rational agents are able to make decisions that
maximize expected utility. Through the St. Petersburg Paradox, we showed
that taking the expected utility of a logarithmic function leads to more
reasonable behavior. Having understood some properties of utility
functions, we can now examine the problem of incorrigibility, where AI
systems do not accept corrective interventions because of rigid
preferences.</p>
<h1 id="tail-risk-corrigibility">B.3 Tail Risk: Corrigibility</h1>
<p><strong>Overview.</strong> In this section, we will explore how
utility functions provide insight into whether an AI system is open to
corrective interventions and discuss related implications for AI risks.
The von Neumann-Morgenstern (vNM) axioms of completeness and
transitivity can lead to strict preferences over shutting down or being
shut down, which affects how easily an agent can be corrected. We will
emphasize the importance of developing corrigible AI systems that are
responsive to human feedback and that can be safely controlled to
prevent unwanted AI behavior.</p>
<p><strong>Corrigibility measures our ability to correct an AI if and
when things go wrong.</strong> An AI system is <em>corrigible</em> if it
accepts and cooperates with corrective interventions like being shut
down or having its utility function changed <span class="citation"
data-cites="pacecorrigibility">[5]</span>. Without many assumptions, we
can argue that typical rational agents will resist corrective measures:
changing an agent’s utility function necessarily means that the agent
will pursue goals that result in less utility relative to their current
preferences.</p>
<p><strong>Suppose we own an AI that fetches coffee for us every
morning.</strong> Its utility function assigns “10 utils” to getting us
coffee quickly, “5 utils” to getting us coffee slowly, and “0 utils” to
not getting us coffee at all. Now, let’s say we want to change the AI’s
objective to instead make us breakfast. A regular agent would resist
this change, reasoning that making breakfast would mean it is less able
to efficiently make coffee, resulting in lower utility. However, a
corrigible AI would recognize that making breakfast could be just as
valuable to humans as fetching coffee and would be open to the change in
objective. The AI would move on to maximizing its new utility function.
In general, corrigible AIs are more amenable to feedback and
corrections, rather than stubbornly adhering to their initial goals or
directives. When AIs are corrigible, humans can more easily correct
rogue actions and prevent any harmful or unwanted behavior.</p>
<p><strong>Completeness and transitivity imply that an AI has strict
preferences over shutting down.</strong> Assume that an agent’s
preferences satisfy the vNM axioms of completeness, such that it can
rank all options, as well as transitivity, such that its preferences are
consistent. For instance, the AI can see that preferring an apple to a
banana and a banana to a cherry implies that we prefer an apple to a
cherry. Then, we know that the agent’s utility function ranks every
option.<p>
Consider again the coffee-fetching AI. Suppose that in addition to
getting us coffee quickly (10 utils), getting us coffee slowly (5
utils), and not getting us coffee (0 utils), there is a fourth option,
where the agent gets shut down immediately. The AI expected that
immediate shutdown will result in its owner getting coffee slowly
without AI assistance, which appears to be valued at 5 units of utility
(the same as it getting us coffee slowly). The agent thus strictly
prefers getting us coffee quickly to shutting down, and strictly prefers
shutting down to us not having coffee at all.<p>
Generally, unless indifferent between everything, completeness and
transitivity imply that the AI has unspecified preferences about
potentially shutting down <span class="citation"
data-cites="thornley2023shutdown">[6]</span>. Without completeness, the
agent could have no preference between shutting down immediately and all
other actions. Without transitivity, the agent could be indifferent
between shutting down immediately and all other possible actions without
that implying that the agent is indifferent between all possible
actions.</p>
<p><strong>It is bad if an AI either increases or reduces the
probability of immediate shutdown.</strong> Suppose that in trying to
get us coffee quickly, the AI drives at unsafe speeds. We’d like to shut
down the AI until we can reprogram it safely. A corrigible AI would
recognize our intention to shut down as a signal that it is misaligned.
However, an incorrigible AI would instead stay the course with what it
wanted to do initially—get us coffees—since that results in the most
utility. If possible, the AI would decrease the probability of immediate
shutdown, say by disabling its off-switch or locking the entrance to its
server rooms. Clearly, this would be bad.<p>
Consider a different situation where the AI realizes that making coffee
is actually quite difficult and that we would make coffee faster
manually, but fails to realize that we don’t want to exert the effort to
do so. The AI may then try to shut down, so that we’d have to make the
coffee ourselves. Suppose we tell the AI to continue making coffee at
its slow pace, rather than shut down. A corrigible AI would recognize
our instruction as a signal that it is misaligned and would continue to
make coffee. However, an incorrigible AI would instead stick with its
decision to shut down without our permission, since shutting down
provides it more utility. Clearly, this is also bad. We’d like to be
able to alter AIs without facing resistance.</p>
<p><strong>Summary.</strong> In this section, we introduced the concept
of corrigibility in AI systems. We discussed the relevance of utility
functions in determining corrigibility, particularly challenges that
arise if an AI’s preferences are complete and transitive, which can lead
to strict preferences over shutting down. We explored the potential
problems of an AI system reducing or increasing the probability of
immediate shutdown. The takeaway is that developing corrigible AI
systems—systems that are responsive and adaptable to human feedback and
changes—is essential in ensuring safe and effective control over AIs’
behavior. Examining the properties of utility functions illuminates
potential problems in implementing corrigibility.<p>
</p>
<br>
<div class="visionbox">
<legend class="visionboxlegend">
<p><span><b>A Note on Utility Functions vs. Reward Functions</b></span></p>
</legend>
<p> Utility
functions and reward functions are two interrelated yet distinct
concepts in understanding agent behavior. Utility functions represent an
agent’s preferences about states or the choice-worthiness of a state,
while rewards functions represent externally imposed reinforcement. The
fact that an outcome is rewarded externally does not guarantee that it
will become part of an agent’s internal utility function.<p>
An example where utility and reinforcement comes apart can be seen with
Galileo Galilei. Despite the safety and societal acceptance he could
gain by conforming to the widely accepted geocentric model, Galileo
maintained his heliocentric view. His environment provided ample
reinforcement to conform, yet he deemed the pursuit of scientific truth
more choiceworthy, highlighting a clear difference between environmental
reinforcement and the concepts of choice-worthiness or utility.<p>
As another example, think of evolutionary processes as selecting or
reinforcing some traits over others. If we considered taste buds as
components that help maximize fitness, we would expect more people to
want the taste of salads over cheeseburgers. However, it is more
accurate to view taste buds as “adaptation executors” rather than
“fitness maximizers,” as taste buds evolved in our ancestral environment
where calories were scarce. This illustrates the concept that agents act
on adaptations without necessarily adopting behavior that reliably helps
maximize reward.<p>
The same could be true for reinforcement learning agents. RL agents
might execute learned behaviors without necessarily maximizing reward;
they may form <em>decision procedures</em> that are not fully aligned
with its reinforcement. The fact that what is rewarded is not
necessarily what an agent thinks is choiceworthy could lead to AIs that
are not fully aligned with externally designed rewards. The AI might not
inherently consider reinforced behaviors as choiceworthy or of high
utility, so its utility function may differ from the one we want it to
have.<p>
</p>
</div>
<br>
<h1 id="attitudes-to-risk">B.4 Attitudes to Risk</h1>
<p><strong>Overview.</strong> The concept of risk is central to the
discussion of utility functions. Knowing an agent’s attitude towards
risk—whether they like, dislike, or are indifferent to risk—gives us a
good idea of what their utility function looks like. Conversely, if we
know an agent’s utility function, we can also understand their attitude
towards risk. We will first outline the three attitudes towards risk:
risk aversion, risk neutrality, and risk seeking. Then, we will consider
some arguments for why we might adopt each attitude, and provide
examples of situations where each attitude may be suitable to
favor.<p>
It is crucial to understand what risk attitudes are appropriate in which
contexts. To make AIs safe, we will need to give them safe risk
attitudes, such as by favoring risk-aversion over risk-neutrality. Risk
attitudes will help explain how people do and should act in different
situations. National governments, for example, will differ in risk
outlook from rogue states, and big tech companies will differ from
startups. Moreover, we should know how risk averse we should be with AI
development, as it has both large upsides and downsides.</p>
<h2 id="what-are-the-different-attitudes-to-risk">B.4.1 What Are the Different
Attitudes to Risk?</h2>
<p><strong>There are three broad types of risk preferences.</strong>
Agents can be risk averse, risk neutral, or risk seeking. In this
section, we first explore what these terms mean. We consider a few
equivalent definitions by examining different concepts associated with
risk <span class="citation" data-cites="dixitslides">[7]</span>. Then,
we analyze what the advantages to adopting each certain attitude toward
risk might be.</p>
<p><strong>Let’s consider these in the context of a bet on a coin
toss.</strong> Suppose agents are given the opportunity to bet <span
class="math inline">$</span>1000 on a fair coin toss—upon guessing
correctly, they would receive <span class="math inline">$</span>2000 for
a net gain of <span class="math inline">$</span>1000. However, if they
guess incorrectly, they would receive nothing and lose their initial bet
of <span class="math inline">$</span>1000. The expected value of this
bet is <span class="math inline">$</span>0, irrespective of who is
playing: the player gains or loses <span
class="math inline">$</span>1000 with equal probabilities. However, a
particular player’s willingness to take this bet, reflecting their risk
attitude, depends on how they calculate expected utility.</p>
<ol>
<li><p><em>Risk aversion</em> is the tendency to prefer a certain
outcome over a risky option with the same expected value. A risk-averse
agent would not want to participate in the coin toss. The individual is
unwilling to take the risk of a potential loss in order to potentially
earn a higher reward. Most humans are instinctively risk averse. A
common example of a risk-averse utility function is <span
class="math inline"><em>u</em>(<em>x</em>) = log <em>x</em></span> (red
line in Figure B.4).</p></li>
<li><p><em>Risk neutrality</em> is the tendency to be indifferent
between a certain outcome and a risky option with the same expected
value. For such players, expected utility is proportional to expected
value. A risk-neutral agent would not care whether they were offered
this coin toss, as its expected value is zero. If the expected value was
negative, they would prefer not to participate in the lottery, since the
lottery has negative expected value. Conversely, if the expected value
was positive, they would prefer to participate, since it would then have
positive expected value. The simplest risk-neutral utility function is
<span class="math inline"><em>u</em>(<em>x</em>) = <em>x</em></span>
(blue line in Figure B.4).</p></li>
<li><p><em>Risk seeking</em> is the tendency to prefer a risky option
over a sure thing with the same expected value. A risk-seeking agent
would be happy to participate in this lottery. The individual is willing
to risk a negative expected value to potentially earn a higher reward.
We tend to associate risk seeking with irrationality, as it leads to
lower wealth through repeated choices made over time. However, this is
not necessarily the case. An example of a risk-seeking utility function
is <span
class="math inline"><em>u</em>(<em>x</em>) = <em>x</em><sup>2</sup></span>
(green line in Figure B.4).</p></li>
</ol>
<p>We can define each risk attitude in three equivalent ways. Each draws
on a different aspect of how we represent an agent’s preferences.</p>
<figure id="fig:risk-att">
<img src="https://raw.githubusercontent.com/WilliamHodgkins/AISES/main/images/utilityfunctionsrisk.png" class="tb-img-full" style="width: 70%"/>
<p class="tb-caption">Figure B.4: Concave, linear, and convex utility functions model risk averse, risk neutral, and risk
seeking agents’ preferences.</p>
</figure>
<p><strong>Risk attitudes are fully explained by how an agent values
uncertain outcomes.</strong> According to expected utility theory, an
agent’s risk preferences can be understood from the shape of their
utility function, and vice-versa. We will illustrate this point by
showing that concave utility functions necessarily imply risk aversion.
An agent with a concave utility function faces decreasing marginal
utility. That is, the jump from <span class="math inline">$</span>1000
to <span class="math inline">$</span>2000 is less satisfying than the
jump from wealth <span class="math inline">$</span>0 to wealth <span
class="math inline">$</span>1000. Conversely, the agent dislikes
dropping from wealth <span class="math inline">$</span>1000 to wealth
<span class="math inline">$</span>0 more than they like jumping from
wealth <span class="math inline">$</span>1000 to wealth <span
class="math inline">$</span>2000. Thus, the agent will not enter the
aforementioned double-or-nothing coin toss, displaying risk
aversion.</p>
<p><strong>Preferences over outcomes may not fully explain risk
attitudes.</strong> It may seem unintuitive that risk attitudes are
entirely explained by how humans calculate utility of outcomes. As we
just saw, in expected utility theory, it is assumed that agents are risk
averse only because they have diminishing returns to larger outcomes.
Many economists and philosophers have countered that people also have an
inherent aversion to risk that is separate from preferences over
outcomes. At the end of this chapter, we will explore how non-expected
utility theories have attempted to more closely capture human behavior
in risky situations.</p>
<h2 id="risk-and-decision-making">B.4.2 Risk and Decision Making</h2>
<p><strong>Overview.</strong> Having defined risk attitudes, we will now
consider situations where it is appropriate to act in a risk-averse,
risk-neutral, or risk-seeking manner. Often, our risk approach in a
situation aligns with our overall risk preference—if we are risk averse
in day-to-day life, then we will also likely be risk averse when
investing our money. However, sometimes we might want to make decisions
as if we have a different attitude towards risk than we truly do.</p>
<p><strong>Criterion of rightness vs. decision procedure.</strong>
Philosophers distinguish between a <em>criterion of rightness</em>, the
way of judging whether an outcome is good, and a <em>decision
procedure</em>, the method of making decisions that lead to the good
outcomes. A good criterion of rightness may not be a good decision
procedure. This is related to the gap between theory and practice, as
explicitly pursuing an ideal outcome may not be the best way to achieve
it. For example, a criterion of rightness for meditation might be to
have a mind clear of thoughts. However, as a decision procedure,
thinking about not having thoughts may not help the meditator achieve a
clear mind—a better decision procedure would be to focus on the
breath.<p>
As another example, the <em>hedonistic paradox</em> reminds us that
people who directly aim at pleasure rarely secure it <span
class="citation" data-cites="sidgwick2019methods">[8]</span>. While a
person’s pleasure level could be a criterion of rightness, it is not
necessarily a good guide to increasing pleasure—that is, not necessarily
a good decision procedure. Whatever one’s vision of pleasure looks
like—lying on a beach, buying a boat, consuming drugs—people who
directly aim at pleasure often find these things are not as pleasing as
hoped. People who aim at meaningful experiences, helping others and
engaging in activities that are intrinsically worthwhile, are more
likely to be happy. People tend to get more happiness out of life when
not aiming explicitly for happiness but for some other goal. Using the
criterion of rightness of happiness as a decision procedure can
predictably lead to unhappiness.<p>
Maximizing expected value can be a criterion of rightness, but it is not
always a good decision procedure. In the context of utility, we observe
a similar discrepancy where explicitly pursuing the criterion of
rightness (maximizing the utility function) may not lead to the best
outcome. Suppose an agent is risk neutral, such that their criterion of
rightness is maximizing a linear utility function. In the first
subsection, we will explore how they might be best served by making
decisions as if they are risk averse, such that their decision procedure
is maximizing a concave utility function.</p>
<h3 id="why-be-risk-averse">Why Be Risk Averse?</h3>
<p><strong>Risk-averse behavior is ubiquitous.</strong> In this section,
we will explore the advantages of risk aversion and how it can be a good
way to advance goals across different domains, from evolutionary fitness
to wealth accumulation. It might seem that by behaving in a risk-averse
way, thereby refusing to participate in some positive expected value
situations, agents leave a lot of value on the table. Indeed, extreme
risk aversion may be counterproductive—people who keep all their money
as cash under their bed will lose value to inflation over time. However,
as we will see, there is a sweet spot that balances the safety of
certainty and value maximization: risk-averse agents with logarithmic
utility almost surely outperform other agents over time, under certain
assumptions.</p>
<p><strong>Response to computational limits.</strong> In complex
situations, decision makers may not have the time or resources to
thoroughly analyze all options to determine the one with the highest
expected value. This problem is further complicated when the outcomes of
some risks we take have effects on other decisions down the line, like
how risky investments may affect retirement plans. To minimize these
complexities, it may be rational to be risk averse. This helps us avoid
the worst effects of our incomplete estimates when our uncertain
calculations are seriously wrong.<p>
Suppose Martin is deciding between purchasing a direct flight or two
connecting flights with a tight layover. The direct flight is more
expensive, but Martin is having trouble estimating the likelihood and
consequences of missing his connecting flight. He may prefer to play the
situation safe and pay for the more expensive direct flight, even though
the true value-for-money of the connected route may have been higher.
Now Martin can confidently make future decisions like booking a bus from
the airport to his hotel. Risk-averse decision making not only reduces
computational burden, but can also increase decision-making speed.
Instead of constantly making difficult calculations, an agent may prefer
to have a bias against risk.</p>
<p><strong>Behavioral advantage.</strong> Risk aversion is not only a
choice but a fundamental psychological phenomenon, and is influenced by
factors such as past experiences, emotions, and cognitive biases. Since
taking risks could lead to serious injury or death, agents undergoing
natural selection usually develop strategies to avoid such risks
whenever possible. Humans often shy away from risk, prioritizing safety
and security over more risky ventures, even if the potential rewards are
higher.<p>
Studies have shown that animals across diverse species exhibit
risk-averse behaviors. In a study conducted on bananaquits, a
nectar-drinking bird, researchers presented the birds with a garden
containing two types of flowers: one with consistent amounts of nectar
and one with variable amounts. They found that the birds never preferred
the latter, and that their preference for the consistent variety was
intensified when the birds were provided fewer resources in total <span
class="citation" data-cites="wunderle1987risk">[9]</span>. This risk
aversion helps the birds survive and procreate, as risk-neutral or
risk-seeking species are more likely to die out over time: it is much
worse to have no nectar than it is better to have double the nectar.
Risk aversion is often seen as a survival mechanism.</p>
<p><strong>Natural selection favors risk aversion.</strong> Just as
individual organisms demonstrate risk aversion, entire populations are
pushed by natural selection to act risk averse in a manner that
maximizes the expected logarithm of their growth, rather than the
expected value. Consider the following, highly simplified example.
Suppose there are three types of animals—antelope, bear, crocodile—in an
area where each year is either scorching or freezing with probability
0.5. Every year, the populations grow or shrink depending on the
weather—some animals are better suited to the hot weather, and some to
the cold. The populations’ per-capita offspring, or equivalently the
populations’ growth multipliers, are shown in the table below.<p>
</p>
<p>Antelope have the same growth in each state, bears grow faster in the
warmth but slower in the cold when they hibernate, and crocodiles grow
rapidly when it is scorching and animals gather near water sources but
die out when their habitats freeze over. However, notice that the three
populations have the same average growth ratio of 1.1.<p>
However, “average growth” is misleading. Suppose we observe this
population over two periods, one hot followed by one cold. The average
growth multiplier over these two periods would be 1.1 for every animal.
However, this does not mean that they all grow the same amount. In the
table below, we can see the animals’ growth over time.<p>
</p>
<p>Adding the logarithm of each species’ hot and cold growth rates
indicates its long term growth trajectory. The antelope population will
continue growing no matter what, compounding over time. However, the
crocodile population will not—as soon as it enters a cold year, the
crocodiles will become permanently extinct. The bear population is not
exposed to immediate extinction risk, but over time it will likely
shrink towards extinction. Notice that maximizing long-run growth in
this case is equivalent to maximizing the sum of the logarithm of the
growth rates—this is risk aversion. The stable growth population, or
equivalently the risk-averse population, is favored by natural selection
<span class="citation" data-cites="okasha2007rational">[10]</span>.</p>
<p><strong>Avoid risk of ruin.</strong> Risk aversion’s key benefit is
that it avoids risk of ruin. Consider a repeated game of equal
probability “triple-or-nothing” bets. That is, players are offered a
<span class="math inline">$$\frac{1}{2}$$</span> probability of tripling
their initial wealth <span class="math inline"><em>w</em></span>, and a
<span class="math inline">$$\frac{1}{2}$$</span> probability of losing it
all. A risk-neutral player can calculate the expected value of a single
round as:<p>
<span class="math display">$$\frac{1}{2} \cdot 0+\frac{1}{2} \cdot 3w =
1.5w.$$</span> Since the expected value is greater than the player’s
initial wealth, a risk-neutral player would bet their entire wealth on
the game. Additionally, if offered this bet repeatedly, they would
reinvest everything they had in it each time. The expected value of
taking this bet <span class="math inline"><em>n</em></span> times in a
row, reinvesting all winnings, would be:<p>
<span class="math display">$$\frac{1}{2} \cdot 0+\frac{1}{4} \cdot
0+\cdots +\frac{1}{2^{n}} \cdot 0+\frac{1}{2^{n}} \cdot 3^{n} \cdot w =
(1.5)^{n}w.$$</span> If the agent was genuinely offered this bet as many
times as they wanted, then they would continue to invest everything
infinitely many times, which gives them expected value of:<p>
<span
class="math display">lim<sub><em>n</em> → ∞</sub>1.5<sup><em>n</em></sup><em>w</em> = ∞.</span>
This is another infinite expected value game—just like in the St.
Petersburg Paradox! However, notice that this calculation is again
heavily skewed by a single, low-probability branch in which an extremely
lucky individual continues to win, exponentially increasing their
wealth. In the figure below, we show the first four bets in this
strategy with a starting wealth of 16. Only along the cyan branch does
the player win any money, and this branch increasingly becomes
astronomically improbable. We would rarely choose to repeatedly play
triple-or-nothing games with everything we owned in real life. We are
risk averse when dealing with high probabilities of losing all our
money. Acting risk neutral and relying on expected value would be a poor
decision-making strategy.<p>
</p>
<figure id="fig:risk-neutral-betting">
<img src="https://raw.githubusercontent.com/WilliamHodgkins/AISES/main/images/riskaversebetting.png" class="tb-img-full" style="width: 70%"/>
<p class="tb-caption">Figure B.5: Risk-neutral betting can lead to ruin.</p>
</figure>
<p><strong>Maximizing logarithmic utility is a better decision
procedure.</strong> Agents might want to act as if maximizing the
logarithm of their wealth instead of maximizing the expected value. A
logarithmic function avoids risk of ruin because it assigns a utility
value of negative infinity to the outcome of zero wealth, since <span
class="math inline">log 0 → − ∞</span>. Therefore an agent with a
logarithmic utility function in wealth will never participate in a
lottery that could, however unlikely the case, land them at zero wealth.
The logarithmic function also grows slowly, placing less weight on very
unlikely, high-payout branches, a property that we used to resolve the
St. Petersburg Paradox. While we might have preferences that are linear
over wealth (which is our criterion of rightness) we might be better
served by a different decision procedure: maximizing the logarithm of
wealth rather than maximizing wealth directly.</p>
<p><strong>Maximizing the logarithm of wealth maximizes every percentile
of wealth.</strong> Maximizing the logarithmic utility valuation avoids
risk of ruin since investors never bet their entire wealth on one
opportunity, much like how investors seek to avoid over-investing in one
asset by diversifying investments over multiple assets. Instead of
maximizing average wealth (as expected value does), maximizing the
logarithmic utility of wealth maximizes other measures associated with
the distribution of wealth. In fact, doing so maximizes the median,
which is the 50th percentile of wealth, and it also delivers the highest
value at any arbitrary percentile of wealth. It even maximizes the
mode—the most likely outcome. Mathematically, maximizing a logarithmic
utility function in wealth outperforms any other investment strategy in
the long run, with probability one (certainty) <span class="citation"
data-cites="kelly1956new">[11]</span>. Thus, variations on maximizing the
logarithm of wealth are widely used in the financial sector.<p>
</p>
<h3 id="why-be-risk-neutral">Why Be Risk Neutral?</h3>
<p><strong>Risk neutrality is equivalent to acting on the expected
value.</strong> Since averages are straightforward and widely taught,
expected value is the mostly widely known explicit decision-making
procedure. However, despite expected value calculations being a common
concept in popular discourse, situations where agents do and should act
risk neutral are limited. In this section, we will first look at the
conditions under which risk neutrality might be a good decision
procedure—in such cases, maximizing expected value can be a significant
improvement over being too cautious. However, being mistaken about