36-Cumulative-Optimization-Power.txt

TITL:
   *AI Timelines via Cumulative Optimization Power:*
   *Less Long, More Short*
   by jacob_cannell
   (Excerpts only)

ABST:
   Extrapolation of model 'P' into the future
   leads to short AGI timelines of
   appx 75% chance of AGI by 2032.

PREF:
   Text below extracted from (@ URL https://www.lesswrong.com/posts/3nMpdmt8LrzxQnkGp/ai-timelines-via-cumulative-optimization-power-less-long).

TEXT:

   - that we can best predict the future
   by using simple models
   which best postdict the past
     (ala Bayes/Solomonoff).

   - A simple generalized scaling model 'P'
   predicts the emergence of capabilities in
   trained ANNs (Artificial Neural Nets)
   and BNNs (Biological Neural Nets).

   - where define model 'P':.
     - where For sufficiently flexible and efficient
     NN architectures and learning algorithms,
     the relative intelligence and capabilities
     of the best systems
     are simply proportional to net training compute.
       - assuming efficient allocation of
         (equivalent uncompressed)
       model capacity bits N roughly proportional
       to data size bits D.

     - as a simple model based on
     net training compute postdicts
     the relative performance of
     successful biological and artificial
     neural networks.

   - 'P' is a very simple model which explains
   a large amount of the entropy/variance
   in a rank order intelligence measure.
     - much more so than any other
     simple proposed candidates.

   Since 'P' follows a predictable temporal trajectory
   due to Moore's Law style technological progress,
   we can then extrapolate the trends
   to predict bounds and estimates
   on the arrival of AGI.
   - Naturally P is only a constraint on capabilities,
   but it tends to be a dominate constraint for brains
   due to strong evolutionary pressure
   on energy efficiency,
   and likewise P is a dominate constraint on ANNs
   due to analogous strong market evolutionary pressure
   on economic efficiency.

   (Note; cite table of various types of learning systems)
     - as providing data as an input to model 'P'.

   Extrapolation of model 'P' into the future
   leads to short AGI timelines of
   appx 75% chance of AGI by 2032.

   ~ ~ ~
:qjn
   The general trend is clear:
   larger lifetime compute enables systems
   of greater generality and capability.
   Generality and performance
   are both independently expensive,
   as an efficient general system often
   ends up requiring combinations of
   many specialist subnetworks.

   In the smallest brains intra-lifetime
   learning optimization power
   is dwarfed by the past inter-lifetime
   optimization of evolution,
   but the genome has only a small information capacity
   equivalent to only a tiny brain.
   - evolution is slower than neural learning
   by some large factor proportional to
   lifespan in seconds or neural clocks.
   - evolution already adjusts for these tradeoffs
   via organism lifespan.

   - Larger brains are generally associated with
   longer lifetimes,
   across a variety of lineages
   and brain sizes.
   - that total brain model capacity bits
   tracks lifetime data bits
   just as it does in leading ANNs
     (after adjusting for compression).

:qn4
   - I have only includes 16 datapoints here,
   but each were chosen semi-randomly
   on the basis of rough impact/importance,
   and well before any calculations.
   - This simple net training compute model
   has strong postdictive fit relative to
   its complexity in the sense that
   we could easily add hundreds or thousands more
   such datapoints for successful ANNs
   and BNNs only to get the same general results.

   - where (@ cite https://scholar.google.com/scholar?cluster=3495982114195270407&hl=en&as_sdt=0,5).

   The largest foundation models are already
   now quickly approaching the human brain
   in net training compute.

   - ?; Is AGI then immanent?.

   - ^; Basically, yes.

:qwj
   But not because AGI will be reached
   merely by any simple scaling
   of existing models
   or even their frankenstein integrations.

   Algorithmic innovation is rarely the key constraint
   on progress in DL,
   due to the vast computational training expense of
   testing new ideas.
   Ideas are cheap,
   hardware is not.

   - where (@ image URL https://i.imgur.com/HAY1IlG.png).

   Where considering some hardware constraint
   model postdictions/predictions:.
     - "AI roughly as intelligent/capable as"...
       - ...C-Elegans in the mid 1990's.
       - ...Honey Bees between 2012 and 2016.
       - ...Ravens between 2016 and 2024.
       - ...Homo Sapiens between 2026 and 2032.

   - where (@ image URL https://i.imgur.com/y0VnpWy.png).

   - where cite (@ brain efficiency https://www.lesswrong.com/posts/xwBuoE9p8GE7RAuhd/brain-efficiency-much-more-than-you-wanted-to-know).

:r3j
   Moore's Law is approaching its end
     (in most measures),
   but that actually implies
   that brain parity is near,
   because the end of Moore's Law
   is caused by semiconductor tech
   approaching the very same physical limits
   which bound the brain's performance.
     - where/for the trend to break down,
     the model must be missing or
     failing to capture some key aspect
     of ground truth reality.

     - where/for AGI to be far away in time
       (more than a decade or two away),
     then there must be an immanent breakdown
     in the simple P cumulative optimization power model
       (net training compute)
     somewhere between today
       (with top LLMs approaching linguistic cortex,
       top vision models approaching visual cortex)
     and the near future
       (where numerous research teams
       will gain routine affordable
       experimental access to the incredible compute P
       range exceeding that of the human brain).

   We are already quickly approaching the key
   practical limit on reliable
   switching energy of around 1E-18J.
   - that we may already have enough performance
   and energy efficiency.

   - where (@ image URL https://i.imgur.com/YXme6nz.png).

   - where (@ cite https://scholar.google.com/scholar?cluster=10773536632504446573&hl=en&as_sdt=2005&sciodt=0,5).

:r7a
   Human intelligence does appear exceptional:
   some other animals
     (primates, birds, cetaceans, elephants...)
   do occasionally use tools,
   some have sophisticated communication abilities
   bordering on pre-language,
   some self-recognize in a mirror,
   and a few even have human like
   flexible working memory
     (only less of it,
     in proportion to lower P).

   But only humans
   have complex culture and technology.
   Why?.

   that human vs animal intelligence
   is partly due to the product of
     our large compute capacity
     and long training time.
   Our brains are the largest of primates,
     about 4x larger than expected for
     a primate of similar size,
   and are then trained longer and harder
     through extended neotany and education.

   There are a few other mammals
   that have brains with similar
   or perhaps even larger capacities
   and similar long lifespans
   and training timelines: elephants
   and some cetaceans.

   The elephant brain has 2.6e9 neurons,
     about 3x that of the human brain.
   However the elephant brain also
   has a greatly enlarged cerebellum,
   and the cerebellum
   has a very low synapse/neuron count
     (that it is dominated by granule cells
     that have only a handful of synapses each).

   The cerebral cortex dominates
   mammalian brain synapse count,
   and elephant cerebral cortex
   has only 5.6e9 neurons,
   3x less than human.
     - as unlikely that the elephant brain
     has larger synaptic capacity than human.

   - that elephants seem to be amongst
   the most intelligent animals,
   rivaling primates.
     - that elephants use tools,
     have complex communication and social behavior
     (including perhaps burial),
     self-recognize, etc.

   At least one species of large dolphin,
   the long-finned pilot whale,
   may have more total synapses
   than the human brain
   due to having about twice as many
   cortical neurons (3.7e10 vs 1.5e10).
   - they reach sexual maturity
   about 50% faster than humans,
   which suggests a shorter brain training schedule.
   - cetaceans
     (especially large cetaceans
     which inhabit the open oceans)
   have impoverished learning environments
     (as compared to terrestrial animals),
   with far less opportunity for tool use.

   - that these oceanic animals
   are much more massive than humans.
     - their brains have a much lower relative
     energetic cost.
   - that it is far easier for evolution
   to endow huge animals with larger brains
   under lower selection pressure
   for or against size.
   - where/for these reasons it is not surprising
   that they have not yet reached criticality
   despite their obvious high intelligence.

   - that the source of human exceptionalism
   is not some key architectural secret
   buried deep in the brain.
   - It is instead the outcome of
   a critical phase transition.
   - Primate brains are scaling efficient
   and the human brain
   is just a standard primate brain,
   but scaled up in capacity
   concomitant with extended training time
   through neotany.

:rf4
   - where cite (@ 'the scaling hypothesis' https://www.gwern.net/Scaling-hypothesis#scaling-hypothesis).

   New capabilities emerge automatically from scaling P,
   and early hominids were well positioned to
   cross a critical threshold of
   technological cultural transmission fidelity
     (which requires some general intelligence both to
     invent lucky new tech insights
     and then also adequately communicate them
     to kin/allies).
   Like a nuclear-chain reaction,
   this recursive feedback loop then greatly
   increased the selection pressure
   for larger brains,
   extended neotany,
   opposable thumbs,
   improved cooling,
   and long distance running,
   quickly transforming our ancestors
   from arboreal specialists
   to the ultra-capable generalists
   who conquered then transformed the planet.

   - where cite (@ 'society fixed, biology mutable' https://slatestarcodex.com/2014/09/10/society-is-fixed-biology-is-mutable/).

   Animals brains train only on the life
   experiences of a single individual.

   Through language, human brains train on
   the linguistically compressed datasets
   of their ancestors.
     - as a radically different scaling regime.

   So even though the training dataset
   for a single human brain
   is only on the order of 1e16 bits,
   for highly educated humans
   that dataset includes a (highly)
   compressed encoding of
   the full 1e27 bit (and growing)
   dataset of human civilization.
     - the net sum of everything humanity
     has ever thought or experienced or imagined,
     and then recorded.

:s2g
   Our large brains triggered the criticality,
   but human exceptionalism
   is the end outcome of the transition
   to a new exponential scaling regime
   where brain effective training data
   scales with population size
   and thus exponentially over time,
   rather than remaining constant
   as in animals.

   There is little to no evidence
   for any such new major architectural functionality
   in the human brain.

   There simply was too little evolutionary time,
   and consequently too few genetic changes
   separating proto-humans (ie homo habilis)
   from hominid ancestors,
   to develop any major new
   brain architectural innovations.
   Evolution only had time to change
   a few hyperparameters of the general
   primate brain architecture,
   and indeed the human brain
   appears to be just a scaled up primate brain.

:s6j
   The major functional architectural components
   of the human brain are all present
     (the neocortex, cerebellum,
     thalamus, basal ganglia, etc)
   and have similar functionality.
     - in not only other primates,
     but also mammals in general.
   - Most vertebrates even have
   the same general brain architecture
     (with different sub-components differentially
     scaled according to lineage-
     specific scaling plans).

   It seems that evolution
   found the scalable architecture of intelligence
   long ago in deep time,
   perhaps even in the cambrian era,
   and perhaps it was not actually
   all that hard to find.

   It seems plausible
   that the core exceptional abilities
   of the human brain stem from its capacity for
   consequentalist reasoning.
     - as the ability to plan ahead
     and anticipate consequences of actions.

   ~ ~ ~
:sde
   - that the brain has incredible
   pattern recognition abilities.

   - where cite (@ 'brain as universal learning machine' https://www.lesswrong.com/posts/9Yc7Pp7szcjPgPsjf/the-brain-as-a-universal-learning-machine).

   - that brains implement a powerful and efficient
   universal learning algorithm,
   such that intelligence then comes from
   compute scaling.

   - that further refinements and scaling of ANNs
   could solve many/most of the hard sensory,
   pattern recognition,
   and even control problems.

   - The novel capabilities of GPT-3,
   and moreover the fact
   that they arose so quickly merely from scaling,
   should cast serious doubts on the theory
   that language is the unique human capability
   whose explanation requires complex novel
   brain architectural innovations.

:skn
   Consider recent work:.
      2022: Disco Diffusion,
      (@ Imagen https://imagen.research.google/)
      (@ Stable Diffusion https://stability.ai/blog/stable-diffusion-public-release)
      (@ Chinchilla https://www.deepmind.com/publications/an-empirical-analysis-of-compute-optimal-large-language-model-training)
      (@ DALL-E-2 https://openai.com/dall-e-2/)
      (@ VPT https://openai.com/blog/vpt/)
      (@ Minerva https://ai.googleblog.com/2022/06/minerva-solving-quantitative-reasoning.html)
      (@ Pathways https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html)

   And all of this recent progress
   is the net result of spending
   only a few tens of billions of dollars
   on (mostly Nvidia) hardware
   over the last 5 years.
   - that Industry could easily spend 10x that
   in the next 5 years.

:sqc
   Modern large-scale DL research
   is mostly driven by
   huge public tech companies
   who ultimately seek profits,
   not the creation of new intelligent,
   sentient agents.

   There are several legitimate reasons
   why true general intelligence
   may be unprofitable,
   but they all reduce to forms of risk.

   If alignment proves difficult,
   AGI could be dangerous
   not only in the global existential sense,
   but also politically and ethically.

   And even if alignment proves tractable,
   full generality
   may still have significant
   political, ethical, legal and PR risks
   that simply outweigh the advantages
   vs somewhat more narrow and specialized systems.

   Large tech companies may then have incentives
   to intentionally skirt, but not cross,
   the border of full generality.

      > - and we can hope so!.

   - ?; maybe some AGI agents
   are made sufficiently anthropomorphic
   that they unavoidably attract human sympathies?.

   Since corporate exploitation
   of their economic value
   could then be construed
   as a form of enslavement,
   the great pressure of the modern cultural
   and political zeitgeist could thus strongly
   shape incentives towards avoiding generality.

   - that the critical risks remain.
   - eventually creating AGI will simply become
   too cheap and easy.

:sy6
   David Roodman has developed
   a simple trend economic model
   of the world GWP trajectory
   across all of human history,
   and the best fit model is hyperexponential
   leading to a singularity around 2047
   (or at least that is the current
   most likely rollout)

   - cite (@ Modeling the human trajectory https://www.openphilanthropy.org/research/modeling-the-human-trajectory/).

   - where in the early phases:.
     - AGI will still compete with humans.
     - the net impact of AGI on GWP
     will be constrained by foundry output
     and eventually affordable energy production.

   - that Future extrapolation predicts
   AGI in roughly a decade,
   absent unlikely defeaters.

:t5g
   - that DL systems already seem competitive
   enough with equivalent brain modules
   and animal brains of similar P values.
   - and DL architectures/algorithms
   will only improve.

   - It is unlikely that the human brain
   is highly exceptional other than in P.

   - from this model I give a 75% chance of AGI by 2032.

   - Just as humans share a common brain architecture
   but then specialize in many different branches
   and fields of knowledge,
   I expect AI/AGI to initially share
   common architectures
     (at least in core principles)
   but to diversify through training more greatly,
   and into more myriad specializations
   and economic niches,
   than humanity.

   "Like us, but more so".