-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path36-Cumulative-Optimization-Power.txt
525 lines (448 loc) · 17.2 KB
/
36-Cumulative-Optimization-Power.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
TITL:
*AI Timelines via Cumulative Optimization Power:*
*Less Long, More Short*
by jacob_cannell
(Excerpts only)
ABST:
Extrapolation of model 'P' into the future
leads to short AGI timelines of
appx 75% chance of AGI by 2032.
PREF:
Text below extracted from (@ URL https://www.lesswrong.com/posts/3nMpdmt8LrzxQnkGp/ai-timelines-via-cumulative-optimization-power-less-long).
TEXT:
- that we can best predict the future
by using simple models
which best postdict the past
(ala Bayes/Solomonoff).
- A simple generalized scaling model 'P'
predicts the emergence of capabilities in
trained ANNs (Artificial Neural Nets)
and BNNs (Biological Neural Nets).
- where define model 'P':.
- where For sufficiently flexible and efficient
NN architectures and learning algorithms,
the relative intelligence and capabilities
of the best systems
are simply proportional to net training compute.
- assuming efficient allocation of
(equivalent uncompressed)
model capacity bits N roughly proportional
to data size bits D.
- as a simple model based on
net training compute postdicts
the relative performance of
successful biological and artificial
neural networks.
- 'P' is a very simple model which explains
a large amount of the entropy/variance
in a rank order intelligence measure.
- much more so than any other
simple proposed candidates.
Since 'P' follows a predictable temporal trajectory
due to Moore's Law style technological progress,
we can then extrapolate the trends
to predict bounds and estimates
on the arrival of AGI.
- Naturally P is only a constraint on capabilities,
but it tends to be a dominate constraint for brains
due to strong evolutionary pressure
on energy efficiency,
and likewise P is a dominate constraint on ANNs
due to analogous strong market evolutionary pressure
on economic efficiency.
(Note; cite table of various types of learning systems)
- as providing data as an input to model 'P'.
Extrapolation of model 'P' into the future
leads to short AGI timelines of
appx 75% chance of AGI by 2032.
~ ~ ~
:qjn
The general trend is clear:
larger lifetime compute enables systems
of greater generality and capability.
Generality and performance
are both independently expensive,
as an efficient general system often
ends up requiring combinations of
many specialist subnetworks.
In the smallest brains intra-lifetime
learning optimization power
is dwarfed by the past inter-lifetime
optimization of evolution,
but the genome has only a small information capacity
equivalent to only a tiny brain.
- evolution is slower than neural learning
by some large factor proportional to
lifespan in seconds or neural clocks.
- evolution already adjusts for these tradeoffs
via organism lifespan.
- Larger brains are generally associated with
longer lifetimes,
across a variety of lineages
and brain sizes.
- that total brain model capacity bits
tracks lifetime data bits
just as it does in leading ANNs
(after adjusting for compression).
:qn4
- I have only includes 16 datapoints here,
but each were chosen semi-randomly
on the basis of rough impact/importance,
and well before any calculations.
- This simple net training compute model
has strong postdictive fit relative to
its complexity in the sense that
we could easily add hundreds or thousands more
such datapoints for successful ANNs
and BNNs only to get the same general results.
- where (@ cite https://scholar.google.com/scholar?cluster=3495982114195270407&hl=en&as_sdt=0,5).
The largest foundation models are already
now quickly approaching the human brain
in net training compute.
- ?; Is AGI then immanent?.
- ^; Basically, yes.
:qwj
But not because AGI will be reached
merely by any simple scaling
of existing models
or even their frankenstein integrations.
Algorithmic innovation is rarely the key constraint
on progress in DL,
due to the vast computational training expense of
testing new ideas.
Ideas are cheap,
hardware is not.
- where (@ image URL https://i.imgur.com/HAY1IlG.png).
Where considering some hardware constraint
model postdictions/predictions:.
- "AI roughly as intelligent/capable as"...
- ...C-Elegans in the mid 1990's.
- ...Honey Bees between 2012 and 2016.
- ...Ravens between 2016 and 2024.
- ...Homo Sapiens between 2026 and 2032.
- where (@ image URL https://i.imgur.com/y0VnpWy.png).
- where cite (@ brain efficiency https://www.lesswrong.com/posts/xwBuoE9p8GE7RAuhd/brain-efficiency-much-more-than-you-wanted-to-know).
:r3j
Moore's Law is approaching its end
(in most measures),
but that actually implies
that brain parity is near,
because the end of Moore's Law
is caused by semiconductor tech
approaching the very same physical limits
which bound the brain's performance.
- where/for the trend to break down,
the model must be missing or
failing to capture some key aspect
of ground truth reality.
- where/for AGI to be far away in time
(more than a decade or two away),
then there must be an immanent breakdown
in the simple P cumulative optimization power model
(net training compute)
somewhere between today
(with top LLMs approaching linguistic cortex,
top vision models approaching visual cortex)
and the near future
(where numerous research teams
will gain routine affordable
experimental access to the incredible compute P
range exceeding that of the human brain).
We are already quickly approaching the key
practical limit on reliable
switching energy of around 1E-18J.
- that we may already have enough performance
and energy efficiency.
- where (@ image URL https://i.imgur.com/YXme6nz.png).
- where (@ cite https://scholar.google.com/scholar?cluster=10773536632504446573&hl=en&as_sdt=2005&sciodt=0,5).
:r7a
Human intelligence does appear exceptional:
some other animals
(primates, birds, cetaceans, elephants...)
do occasionally use tools,
some have sophisticated communication abilities
bordering on pre-language,
some self-recognize in a mirror,
and a few even have human like
flexible working memory
(only less of it,
in proportion to lower P).
But only humans
have complex culture and technology.
Why?.
that human vs animal intelligence
is partly due to the product of
our large compute capacity
and long training time.
Our brains are the largest of primates,
about 4x larger than expected for
a primate of similar size,
and are then trained longer and harder
through extended neotany and education.
There are a few other mammals
that have brains with similar
or perhaps even larger capacities
and similar long lifespans
and training timelines: elephants
and some cetaceans.
The elephant brain has 2.6e9 neurons,
about 3x that of the human brain.
However the elephant brain also
has a greatly enlarged cerebellum,
and the cerebellum
has a very low synapse/neuron count
(that it is dominated by granule cells
that have only a handful of synapses each).
The cerebral cortex dominates
mammalian brain synapse count,
and elephant cerebral cortex
has only 5.6e9 neurons,
3x less than human.
- as unlikely that the elephant brain
has larger synaptic capacity than human.
- that elephants seem to be amongst
the most intelligent animals,
rivaling primates.
- that elephants use tools,
have complex communication and social behavior
(including perhaps burial),
self-recognize, etc.
At least one species of large dolphin,
the long-finned pilot whale,
may have more total synapses
than the human brain
due to having about twice as many
cortical neurons (3.7e10 vs 1.5e10).
- they reach sexual maturity
about 50% faster than humans,
which suggests a shorter brain training schedule.
- cetaceans
(especially large cetaceans
which inhabit the open oceans)
have impoverished learning environments
(as compared to terrestrial animals),
with far less opportunity for tool use.
- that these oceanic animals
are much more massive than humans.
- their brains have a much lower relative
energetic cost.
- that it is far easier for evolution
to endow huge animals with larger brains
under lower selection pressure
for or against size.
- where/for these reasons it is not surprising
that they have not yet reached criticality
despite their obvious high intelligence.
- that the source of human exceptionalism
is not some key architectural secret
buried deep in the brain.
- It is instead the outcome of
a critical phase transition.
- Primate brains are scaling efficient
and the human brain
is just a standard primate brain,
but scaled up in capacity
concomitant with extended training time
through neotany.
:rf4
- where cite (@ 'the scaling hypothesis' https://www.gwern.net/Scaling-hypothesis#scaling-hypothesis).
New capabilities emerge automatically from scaling P,
and early hominids were well positioned to
cross a critical threshold of
technological cultural transmission fidelity
(which requires some general intelligence both to
invent lucky new tech insights
and then also adequately communicate them
to kin/allies).
Like a nuclear-chain reaction,
this recursive feedback loop then greatly
increased the selection pressure
for larger brains,
extended neotany,
opposable thumbs,
improved cooling,
and long distance running,
quickly transforming our ancestors
from arboreal specialists
to the ultra-capable generalists
who conquered then transformed the planet.
- where cite (@ 'society fixed, biology mutable' https://slatestarcodex.com/2014/09/10/society-is-fixed-biology-is-mutable/).
Animals brains train only on the life
experiences of a single individual.
Through language, human brains train on
the linguistically compressed datasets
of their ancestors.
- as a radically different scaling regime.
So even though the training dataset
for a single human brain
is only on the order of 1e16 bits,
for highly educated humans
that dataset includes a (highly)
compressed encoding of
the full 1e27 bit (and growing)
dataset of human civilization.
- the net sum of everything humanity
has ever thought or experienced or imagined,
and then recorded.
:s2g
Our large brains triggered the criticality,
but human exceptionalism
is the end outcome of the transition
to a new exponential scaling regime
where brain effective training data
scales with population size
and thus exponentially over time,
rather than remaining constant
as in animals.
There is little to no evidence
for any such new major architectural functionality
in the human brain.
There simply was too little evolutionary time,
and consequently too few genetic changes
separating proto-humans (ie homo habilis)
from hominid ancestors,
to develop any major new
brain architectural innovations.
Evolution only had time to change
a few hyperparameters of the general
primate brain architecture,
and indeed the human brain
appears to be just a scaled up primate brain.
:s6j
The major functional architectural components
of the human brain are all present
(the neocortex, cerebellum,
thalamus, basal ganglia, etc)
and have similar functionality.
- in not only other primates,
but also mammals in general.
- Most vertebrates even have
the same general brain architecture
(with different sub-components differentially
scaled according to lineage-
specific scaling plans).
It seems that evolution
found the scalable architecture of intelligence
long ago in deep time,
perhaps even in the cambrian era,
and perhaps it was not actually
all that hard to find.
It seems plausible
that the core exceptional abilities
of the human brain stem from its capacity for
consequentalist reasoning.
- as the ability to plan ahead
and anticipate consequences of actions.
~ ~ ~
:sde
- that the brain has incredible
pattern recognition abilities.
- where cite (@ 'brain as universal learning machine' https://www.lesswrong.com/posts/9Yc7Pp7szcjPgPsjf/the-brain-as-a-universal-learning-machine).
- that brains implement a powerful and efficient
universal learning algorithm,
such that intelligence then comes from
compute scaling.
- that further refinements and scaling of ANNs
could solve many/most of the hard sensory,
pattern recognition,
and even control problems.
- The novel capabilities of GPT-3,
and moreover the fact
that they arose so quickly merely from scaling,
should cast serious doubts on the theory
that language is the unique human capability
whose explanation requires complex novel
brain architectural innovations.
:skn
Consider recent work:.
2022: Disco Diffusion,
(@ Imagen https://imagen.research.google/)
(@ Stable Diffusion https://stability.ai/blog/stable-diffusion-public-release)
(@ Chinchilla https://www.deepmind.com/publications/an-empirical-analysis-of-compute-optimal-large-language-model-training)
(@ DALL-E-2 https://openai.com/dall-e-2/)
(@ VPT https://openai.com/blog/vpt/)
(@ Minerva https://ai.googleblog.com/2022/06/minerva-solving-quantitative-reasoning.html)
(@ Pathways https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html)
And all of this recent progress
is the net result of spending
only a few tens of billions of dollars
on (mostly Nvidia) hardware
over the last 5 years.
- that Industry could easily spend 10x that
in the next 5 years.
:sqc
Modern large-scale DL research
is mostly driven by
huge public tech companies
who ultimately seek profits,
not the creation of new intelligent,
sentient agents.
There are several legitimate reasons
why true general intelligence
may be unprofitable,
but they all reduce to forms of risk.
If alignment proves difficult,
AGI could be dangerous
not only in the global existential sense,
but also politically and ethically.
And even if alignment proves tractable,
full generality
may still have significant
political, ethical, legal and PR risks
that simply outweigh the advantages
vs somewhat more narrow and specialized systems.
Large tech companies may then have incentives
to intentionally skirt, but not cross,
the border of full generality.
> - and we can hope so!.
- ?; maybe some AGI agents
are made sufficiently anthropomorphic
that they unavoidably attract human sympathies?.
Since corporate exploitation
of their economic value
could then be construed
as a form of enslavement,
the great pressure of the modern cultural
and political zeitgeist could thus strongly
shape incentives towards avoiding generality.
- that the critical risks remain.
- eventually creating AGI will simply become
too cheap and easy.
:sy6
David Roodman has developed
a simple trend economic model
of the world GWP trajectory
across all of human history,
and the best fit model is hyperexponential
leading to a singularity around 2047
(or at least that is the current
most likely rollout)
- cite (@ Modeling the human trajectory https://www.openphilanthropy.org/research/modeling-the-human-trajectory/).
- where in the early phases:.
- AGI will still compete with humans.
- the net impact of AGI on GWP
will be constrained by foundry output
and eventually affordable energy production.
- that Future extrapolation predicts
AGI in roughly a decade,
absent unlikely defeaters.
:t5g
- that DL systems already seem competitive
enough with equivalent brain modules
and animal brains of similar P values.
- and DL architectures/algorithms
will only improve.
- It is unlikely that the human brain
is highly exceptional other than in P.
- from this model I give a 75% chance of AGI by 2032.
- Just as humans share a common brain architecture
but then specialize in many different branches
and fields of knowledge,
I expect AI/AGI to initially share
common architectures
(at least in core principles)
but to diversify through training more greatly,
and into more myriad specializations
and economic niches,
than humanity.
"Like us, but more so".