Skip to content

Latest commit

 

History

History
312 lines (180 loc) · 50.6 KB

Exponential-nature-of-forgetting.md

File metadata and controls

312 lines (180 loc) · 50.6 KB

1994: Exponential nature of forgetting

1994:遗忘的指数性质

[TOC=2,5]

Forgetting curve: power or exponential

遗忘曲线:幂次或指数

The shape of the forgetting curve is vital for understanding memory. The math behind the curve may even weigh in on the understanding of the role of sleep (see later). When Ebbinghaus first determined the rate of forgetting, he got a pretty nice set of data with a good fit to the power function. However, today we know forgetting is exponential. The discrepancy is explained here.

遗忘曲线的形状对理解记忆至关重要。这条曲线背后的数学原理甚至可能有助于我们理解睡眠的作用。当艾宾浩斯第一次确定遗忘率时,他得到了一组与幂函数吻合得很好的数据。然而,今天我们知道遗忘是指数级的。这个差异可以在这里得到解释。

img

Forgetting curve adapted from Hermann Ebbinghaus (1885). The curve has been rendered from original tabular data published by Ebbinghaus (Piotr Wozniak, 2017)

从赫尔曼·艾宾浩斯(1885)改编的遗忘曲线。曲线由艾宾浩斯发布的原始表格数据绘制(Piotr Wozniak, 2017)

Wrong thinking helped spaced repetition

错误的想法有助于“间隔重复”

For many years, the actual shape of the curve did not play much of a role in spaced repetition. My early intuitions about the nature of forgetting were all over the place depending on the context. Back in 1982, I was thinking that the evolution has designed forgetting for the brain to make sure we do not run out of memory space. The optimum time for forgetting would be determined by the statistical properties of the environment. Decay would be programmed to maximize survival. Once the review did not take place, the memory would get deleted to provide space for new learning.

许多年来,在间隔重复中,曲线的实际形状并没有起到多大作用。我对遗忘本质的早期直觉是根据上下文而变化的。早在1982年的时候,我就在想,进化已经为大脑设计了遗忘,以确保我们不会用完记忆空间。遗忘的最佳时间将由环境的统计特性决定。衰变的程序将使生存最大化。一旦没有复习,记忆就会被删除,为新的学习提供空间。

I was wrong thinking that there might be an optimum time for forgetting and this error was actually helpful for inventing spaced repetition. That "optimum time" intuition helped the first experiment in 1985. The optimum time for forgetting would imply sigmoidal forgetting curve with a clear inflection point that determines optimality. Before the review, forgetting would be minimal. A delayed review would result in rapid forgetting. This is why finding the optimum interval seemed so critical. When data started pouring in later on, with my confirmation bias, I still could not see my error. I wrote in my Master's Thesis about sigmoidal forgetting: "this follows directly from the observation that before the elapse of the optimal interval, the number of memory lapses is negligible". I must have forgotten my own forgetting curve plot produced in late 1984.

我错误地认为有一个遗忘的最佳时间,而这个错误实际上有助于发明间隔重复。这种“最佳时间”的直觉帮助了1985年的第一次实验。最佳遗忘时间为s形遗忘曲线,曲线拐点明显,决定了遗忘的最优性。在复习之前,遗忘是最小的。延迟复习会导致快速遗忘。这就是为什么寻找最佳时间间隔显得如此关键。后来,当数据开始大量涌入时,由于我的确认偏误,我仍然看不到我的错误。我在关于s形遗忘的硕士论文中写道:“这是直接从观察中得出的结论,在最佳时间间隔过去之前,记忆缺失的次数可以忽略不计”。我一定是忘记了我自己在1984年末制作的遗忘曲线图

Today this sigmoidal proposition may seem preposterous, but even my model of intermittent learning provided some support for the notion. Exponential approximation yielded particularly high deviation error for data collected in my work on the model of intermittent learning, and the superposition of sigmoid curves for different E-Factors could easily mimic early linearity. Linear approximation seemed to excellently fit the model of intermittent learning within the recall range in the available data. No wonder, with whole pages of heterogeneous material, exponential nature of forgetting remained well hidden.

今天,这个s型命题可能看起来很荒谬,但即使是我的间歇学习模型也为这个概念提供了一些支持。指数逼近法对我在间歇性学习模型中收集的数据产生了特别高的偏差误差,不同的e - factor的s型曲线叠加可以很容易地模拟早期线性。在现有的资料中,线性近似似乎非常适合间歇学习模型。难怪,整页的异质材料,指数性质的遗忘仍然隐藏得很好。

Contradictory models

矛盾的模型

I did not ponder forgetting curves much. However, my biological model of memory, dating back to 1988, spoke of the exponential decay in retrievability. Apparently, in those days, the forgetting curve and retrievability could exist in my head as independent entities.

我没有过多考虑遗忘曲线。然而,我的生物记忆模型可以追溯到1988年,它提到了可检索性的指数衰减。显然,在那些日子里,遗忘曲线和可检索性可以作为独立的实体存在于我的头脑中。

In my credit paper for a class in computer simulation (Dr Katulski, Jan 1988), my figures clearly show exponential forgetting curves:

在我的计算机模拟课的学分论文中(Katulski博士,1988年1月),我的数字清楚地显示了指数遗忘曲线:

Hypothetical mechanism involved in the process of optimal learning. (A) Molecular phenomena (B) Quantitative changes in the synapse

Figure: In my Master's Thesis titled "Optimization of learning" (1990), I presented some hypothetical concepts that might underly the process of optimal learning based on spaced repetition. (A) Molecular phenomena (B) Quantitative changes in the synapse. Those ideas are a bit dated today, but the serrated curves representing memory retrievability came to be widely known in popular publications on spaced repetition. They are usually wrongly attributed to Hermann Ebbinghaus

***图片:*在我的硕士论文《优化学习》(1990)中,我提出了一些假设概念,这些概念可能是基于间隔重复的优化学习过程的基础。(A)分子现象(B)突触的数量变化。这些想法在今天已经有些过时了,但是代表记忆可检索性的锯齿形曲线在关于间隔重复的流行出版物中已经广为人知。它们通常被错误地归于赫尔曼·艾宾浩斯

By that time, I might have picked a better idea from literature. In the years 1986-1987, I spent a lot of time in the university library looking for some good research on spaced repetition. I found none. I might have already been familiar with the forgetting curve determined by Ebbinghaus. It is mentioned in my Master's Thesis.

到那时,我可能已经从文学作品中选择了一个更好的主意。在1986-1987年间,我花了很多时间在大学图书馆寻找关于间隔重复的研究。我发现没有。我可能已经熟悉了由Ebbinghaus确定的遗忘曲线。在我的硕士论文中提到过。

Collecting data

收集数据

I collected data for my first forgetting curve plot in late 1984. As all the learning was done for learning's sake over the course of 11 months, and the cost of the graph was minimal, I forgot about that graph and it lay unused for 34 years in my archives:

我在1984年底收集了我的第一个遗忘曲线的数据。由于所有的学习都是为了学习而进行的,在11个月的时间里,图表的成本是最低的,我忘记了那个图表,它在我的档案里闲置了34年:

The very first forgetting curve for the retention of English vocabulary plotted back in 1984, just a few months before designing SuperMemo on paper

Figure: My very first forgetting curve for the retention of English vocabulary plotted back in 1984, i.e. a few months before designing SuperMemo on paper. This graph was not part of the experiment. It was simply a cumulative assessment of the results of intermittent learning of English vocabulary. The graph was soon forgotten. It was re-discovered 34 years later. After memorization, 49 pages of ~40 word pairs of English were reviewed at different intervals and the number of recall errors was recorded. After rejecting outliers and averaging, the curve appears to be far less steep that the curve obtained by Ebbinghaus (1885), in which he used nonsense syllables and a different measure of forgetting: saving on re-learning

***图片:*我对英语词汇记忆的第一次遗忘曲线是在1984年绘制的,那是在我设计SuperMemo之前的几个月。这张图不是实验的一部分。它只是对间歇学习英语词汇的结果进行了简单的累积评估。图表很快就被遗忘了。它在34年后被重新发现。记忆后,以不同的时间间隔复习49页~40对英语单词,并记录记忆错误的数量。在剔除了异常值并进行平均后,曲线似乎远没有艾宾浩斯(1885)所得到的曲线那么陡峭。在艾宾浩斯的曲线中,他使用了无意义的音节和另一种衡量遗忘的方法:节省再学习的时间

My 1985 experiment could also be seen as a noisy attempt to collect forgetting curve data. However, first SuperMemos did not care about the forgetting curve. The optimization was bang-bang in nature, even though today, collecting retention data seems such an obvious solution (as in 1985).

Until I started collecting data with SuperMemo software, where each item could be scrutinized independently, I could not fully recover from my early erroneous ideas about forgetting.

我1985年的实验也可以被看作是收集遗忘曲线数据的嘈杂尝试。然而,第一个超级备忘录并不关心遗忘曲线。这种优化在本质上是非常棒的,即使在今天,收集retention数据似乎是一个显而易见的解决方案(如在1985年)。

在我开始用SuperMemo软件收集数据之前,我无法完全从我早期关于遗忘的错误想法中恢复过来。

SuperMemo 1 for DOS (1987) collected full repetition histories that would make it possible to determine the nature of forgetting. However, within 10 days (on Dec 23, 1987), I had to ditch the full record of repetitions. At that time, my disk space was 360KB. That's correct. I would run SuperMemo from old type 5.25in diskettes. Full repetition history record returned to SuperMemo only 8 long years later (Feb 15, 1996) after the hectic effort from Dr Janusz Murakowski who considered every ticking minute a waste of valuable data that could power future algorithms and memory research. Two decades later, we have more data that we can effectively process.

Without repetition history, I could still investigate forgetting with a help of the forgetting curve data collected independently. On Jan 6, 1991, I figured out how to record forgetting curves in a small file that would not bloat the size of the database (i.e. without the full record of repetition history).

Only SuperMemo 6 started collecting forgetting curve data to determine optimum intervals (1991). It was doing the same thing as my first experiment, except it did it automatically, on a massive scale, and for memories separated into individual questions (this solved the heterogeneity problem). SuperMemo 6 initially used a binary chop to find the best moment corresponding with the forgetting index. A good fit approximation was still 3 years into the future.

SuperMemo 1 for DOS(1987)收集了完整的重复历史,使确定遗忘的性质成为可能。然而,在10天内(1987年12月23日),我不得不放弃重复学习的完整记录。那时,我的磁盘空间是360KB。这是正确的。我会用老式5.25英寸软盘运行SuperMemo。8年后(1996年2月15日),在博士Janusz Murakowski的努力下,完整的重复历史记录又回到了SuperMemo。20年后,我们有了更多可以有效处理的数据。

在没有重复历史的情况下,我仍然可以通过独立收集的遗忘曲线数据来研究遗忘。1991年1月6日,我发现了如何在一个小文件中记录遗忘曲线,这样就不会增加数据库的容量(即没有完整的重复历史记录)。

只有SuperMemo 6开始收集遗忘曲线数据来确定最佳间隔(1991)。它所做的事情和我的第一个实验是一样的(https://supermemo.guru/wiki/Birth_of_SuperMemo),只是它是自动地、大规模地、为记忆分割成单独的问题(这解决了异质性问题)。SuperMemo 6最初使用了一个二分法来寻找与遗忘指数相对应的最佳时刻。一个好的拟合近似值仍然是未来3年的值。

First forgetting curve data

第一遗忘曲线数据

By May 1991, I had some first data to peek at, and this was a major disappointment. I predicted I would need a year to see any regularity. However, every couple of months, I kept noting down my disappointment with minimum progress. The progress in collecting data was agonizingly slow and the wait was excruciating. A year later, I was no closer to the goal. If Ebbinghaus was able to plot a good curve with nonsense syllables, his pain of non-coherence must have been worth it. With meaningful data, the truth was very slow to emerge. Even with the convenience of having it all done by a computer while having fun with learning.

到1991年5月,我第一次看到了一些数据,这让我很失望。我预测我需要一年的时间才能看到规律。然而,每隔几个月,我就会记录下我的失望和进步。收集数据的过程慢得令人痛苦,等待也很痛苦。一年后,我仍然没有接近目标。如果艾宾浩斯能用无意义的音节画出一条优美的曲线,那么他对非连贯性的痛苦(https://supermemo.guru/wiki/Coherence)一定是值得的。有了有意义的数据,真相才慢慢浮出水面。即使是在享受学习乐趣的同时,用电脑来完成这些工作也很方便。

On Sep 3, 1992, SuperMemo 7 for Windows made it possible to have a first nice peek at a real forgetting curve. The view was mesmerizing:

1992年9月3日,Windows的SuperMemo 7让我们第一次看到了一个真实的遗忘曲线。景色迷人:

First peek at a pretty regular forgetting curve in SuperMemo 7 (1992)

Figure: SuperMemo 7 for Windows was written in 1992. As of Sep 03, 1992, it was able to display user's forgetting curve graph. The horizontal axis labeled U-Factor corresponded with days in this particular graph. The kinks between days 14 and 20 were one of the reasons it was difficult to determine the nature of forgetting. Old erroneous hypotheses were hard to falsify. Until the day 13, forgetting seemed nearly linear and might also provide a good exponential fit. It took two more years of data collecting to find answers (source: SuperMemo 7: User's Guide)

***图片:**适用于Windows的SuperMemo 7写于1992年。从1992年9月3日开始,它可以显示用户的遗忘曲线。标记为U-Factor的横轴与该图中的天数对应。第14天和第20天之间的纠结是很难确定遗忘性质的原因之一。旧的错误假设很难证伪。直到第13天,遗忘看起来几乎是线性的,也可能提供一个良好的指数拟合。为了找到答案,他们又花了两年多的时间收集数据

Forgetting curve approximations

遗忘曲线近似

By 1994, I still was not sure about the nature of forgetting. I took data collected in the previous 3 years (1991-1994) and set out to figure out the curve once and for all. I focused on my own data from over 200,000 repetitions. However, it was not easy. If SuperMemo schedules a repetition at R=0.9, you can draw a straight line from R=1.0 to R=0.9 and do great with noisy data:

到1994年,我仍然不确定遗忘的本质。我收集了前3年(1991-1994年)的数据,着手一劳永逸地找出曲线。我专注于我自己的20多万次重复的数据。然而,这并不容易。如果SuperMemo在R=0.9处安排一个重复,你可以在R=1.0到R=0.9之间画一条直线,并且可以很好地处理有噪声的数据:

Difficulty approximating forgetting curve

Figure: Difficulty approximating the forgetting curve. Back in 1994, it was difficult to understand the nature of forgetting in SuperMemo because most of the data used to be collected in a high recall range

***图片:*难以接近遗忘曲线。在1994年,人们很难理解SuperMemo中遗忘的本质,因为大多数数据都是在高回忆范围内收集的

My notes from May 6, 1994 illustrate the degree of uncertainty:

我1994年5月6日的笔记说明了不确定性的程度:

Personal anecdote. Why use anecdotes?

个人anecdote。为什么使用anecdotes?

May 6, 1994: All day of crazy attempts to better approximate forgetting curves. First I tried R=1-in/(Hn+in) where i - interval, H - memory half-life, and n - cooperativity factor. Late in the evening, I had it work quite slowly, but ... it appeared that r=exp(-a*i) works not much worse! Even the old linear approximation was not very much worse (sigmoid: D=8.6%, exponential D=8.8%, and linear D=10.8%). Perhaps, forgetting curves are indeed exponential? Going to sleep at 2:50

1994年5月6日:一整天都在疯狂地尝试更好地近似遗忘曲线。首先我尝试了R=1-in/(Hn+in),其中I - interval, H - memory half-life, n - cooperativity factor。晚上很晚的时候,我让它工作得很慢,但是……看来r=exp(-a*i)的效果并没有差多少!即使是旧的线性近似也不是很差(s型:D=8.6%,指数D=8.8%,线性D=10.8%)。也许,遗忘曲线确实是指数型的?在2:50睡觉

It was not easy to separate linear, power, exponential, Zipf, Hill, and other functions. Exponential, power and even linear approximations brought pretty good outcomes depending on circumstances that were hard to separate. Only when looking at forgetting curves well sorted for complexity at higher levels of stability, despite those graphs being data poor, could I see the exponential nature of forgetting more clearly.

One of the red herrings in 1994 was that, naturally, I had most data collected for the first review. New items at the entry to the process still provide a heterogeneous group that obeys the power law of forgetting.

要将线性、幂、指数、齐普夫、希尔等函数分开并不容易。指数逼近、幂逼近、甚至线性逼近都带来了非常好的结果,这取决于很难区分的情况。只有当观察(遗忘曲线)(https://supermemo.guru/wiki/Forgetting_curve)以及排[复杂性](https://supermemo.guru/wiki/Complexity)在更高水平的(稳定)(https://supermemo.guru/wiki/Stability),尽管这些图表数据差,我可以更清楚地看到遗忘的指数性质。

1994年的一个错误是,很自然地,我为第一次审查收集了大部分数据。流程入口的新项仍然提供遵循遗忘幂律的异构组。

img

The first review forgetting curve for newly learned knowledge collected with SuperMemo

Later on, when they are sorted by complexity and stability, they start becoming exponential. In Algorithm SM-6, complexity and stability were imperfectly expressed by E-Factors and repetition number respectively. This resulted in algorithmic imperfections that made for imperfect sorting. In addition, SuperMemo stays within the area of high retention when forgetting is nearly linear.

By May 1994, the main first-review curve in my Advanced English database collected 18,000 data points and seemed like the best analytical material. However, that curve encompasses all the learning material that enters the process independent of its difficulty. Little did I know that this curve is covered by the power law. My best deviation was 2.0.

用SuperMemo记忆法收集新知识的第一次复习遗忘曲线

后来,当它们按照复杂度稳定性进行排序时,它们开始呈指数级增长。在算法SM-6中,复杂度和稳定性分别用E-Factors和重复次数来表示。这导致了算法的不完善,导致了不完善的排序。此外,当遗忘接近线性时,SuperMemo停留在高记忆区域。

到1994年5月,我的Advanced English数据库中主要的初次评审曲线收集了18000个数据点,似乎是最好的分析材料。然而,这条曲线包含了所有独立于难度之外的学习材料。我几乎不知道这条曲线被幂律覆盖了。我最好的偏差是2。0。

For a similar curve from 2018 see:

2018年的类似曲线见:

Forgetting curve obtained in 2018 with SuperMemo 17 for average difficulty (A-Factor=3.9)

Figure: Forgetting curve obtained in 2018 with SuperMemo 17 for average difficulty (A-Factor=3.9). At 19,315 repetitions and least squares deviation of 2.319, it is pretty similar to the curve from 1994, except it is best approximated with an exponential function (for the power function example see: forgetting curve).

***图片:*使用SuperMemo 17获得2018年平均难度遗忘曲线(A-Factor=3.9)。重复19315次,最小二乘偏差2.319,与1994年的曲线非常相似,除了最接近的是指数函数(幂函数的例子见:遗忘曲线)。

Exponential forgetting prevails

指数遗忘盛行

By summer 1994, I was reasonably sure of the exponential nature of forgetting. By 1995, we published "2 components of memory" with the formula R=exp(-t/S). Our publication remains largely ignored by mainstream science but is all over the web when forgetting curves are discussed.

到1994年夏天,我相当确信遗忘的指数性质。到1995年,我们发表了2 components of memory,公式为R=exp(-t/S)。我们的出版物在很大程度上仍然被主流科学所忽视,但是当遗忘曲线被讨论时,它在网络上到处都是。

Interestingly, in 1966, Nobel Prize winner Herbert Simon had a peek at Jost's Law derived from Ebbinghaus work in 1897. Simon noticed that the exponential nature of forgetting necessitates the existence of a memory property that today we call memory stability. Simon wrote a short paper and moved on to hundreds of other projects he was busy with. His text was largely forgotten, however, it was prophetic. In 1988, similar reasoning led to the idea of the two component model of long-term memory.

有趣的是,1966年,诺贝尔奖得主赫伯特·西蒙在1897年从艾宾浩斯的著作中看到了[Jost's Law](https://supermemo.guru/wiki/Jost's _law)。西蒙发现,遗忘的指数性质需要记忆属性的存在,今天我们称之为记忆稳定。西蒙写了一篇简短的论文,然后开始着手其他数百个他忙得不可开交的项目。他的文字基本上被遗忘了,然而,它是预言。在1988年,类似的推理导致了长期记忆的两个组件模型的想法。

Today we can add one more implication: If forgetting is exponential, it implies a constant probability of forgetting in unit time, which implies neural network interference, which implies that sleep might build stability not by strengthening memories, but by simply removing the cause of interference: unnecessary synapses. Giulio Tononi might then be right about the net loss of synapses in sleep. However, he believes that loss is homeostatic. Exponential forgetting indicates that this could be much more. It might be a form of "intelligent forgetting" of things that interfere with key memories reinforced in waking.

今天,我们可以添加一个暗示:如果忘记指数,它意味着一个常数的概率忘记在单位时间内,这意味着神经网络干扰,这意味着睡眠可能建立稳定而不是通过加强记忆,但是通过简单地去除干扰的原因:不必要的突触。Giulio Tononi可能是正确的关于睡眠中突触的净损失。然而,他认为损失是自我平衡的。指数遗忘表明,这可能更多。这可能是“智能遗忘""的一种形式,即干扰在清醒状态下强化的关键记忆。

Negatively exponential forgetting curve

负指数遗忘曲线

Only in 2005, we wrote more extensively about the exponential nature of forgetting. In a paper presented by Dr Gorzelanczyk in a modelling conference in Poland, we wrote:

直到2005年,我们才更广泛地讨论了遗忘的指数性质。在波兰的一次建模会议上,Dr Gorzelanczyk提交的一篇论文中,我们写道:Archive warning: Why use literal archives?

Although it has always been suspected that forgetting is exponential in nature, proving this fact has never been simple. Exponential decay appears standardly in biological and physical systems from radioactive decay to drying wood. It occurs anywhere where expected decay rate is proportional to the size of the sample, and where the probability of a single particle decay is constant. The following problems have hampered the effort of modeling forgetting since the :

By employing SuperMemo, we can overcome all these obstacles to study the nature of memory decay. As a popular commercial application, SuperMemo provides virtually unlimited access to huge bodies of data collected from students all over the world. The forgetting curve graphs available to every user of the program (Tools : Statistics : Analysis : Forgetting curves) are plotted on relatively homogenous data samples and are a bona fide reflection of memory decay in time (as opposed to other forms of learning curves). The quest for heterogeneity significantly affects the sample size though. It is important to note that the forgetting curves for material with different memory stability and different knowledge difficulty differ. Whereas memory stability affects the decay rate, heterogeneous learning material produces a superposition of individual forgetting curves, each characterized by a different decay rate. Consequently, even in bodies with hundreds of thousands of individual pieces of information participating in the learning process, only relatively small homogeneous samples of data can be filtered out. These samples rarely exceed several thousands of repetition cases. Even then, these bodies of data go far beyond sample quality available to researchers studying the properties of memory in controlled conditions. Yet the stochastic nature of forgetting still makes it hard to make an ultimate stand on the mathematical nature of the decay function (see two examples below). Having analyzed several hundred thousand samples we have come closest yet to show that the forgetting is a form of exponential decay.

尽管人们一直怀疑遗忘在本质上是指数性的,但要证明这一事实绝非易事。指数衰减通常出现在生物和物理系统中,从放射性衰变到木材干燥。它发生在预期衰减率与样本大小成比例,且单个粒子衰减的概率为常数的任何地方。以下问题阻碍了建模遗忘的工作:

通过使用SuperMemo,我们可以克服所有这些障碍来研究记忆衰退的本质。作为一个流行的商业应用程序,SuperMemo几乎可以无限制地访问从世界各地学生那里收集来的海量数据。遗忘曲线图提供给每一个用户的程序(* 工具:统计:分析: *)遗忘曲线绘制于相对同质样本数据和真实的反映记忆衰减时间(相对于其他形式的学习曲线)。然而,对异质性的追求极大地影响了样本量。需要注意的是,不同记忆稳定性和不同知识难度的材料的遗忘曲线是不同的。尽管记忆稳定性会影响衰减率,但异质学习材料会产生个体遗忘曲线的叠加,每条曲线都以不同的衰减率为特征。因此,即使在有成千上万条信息参与学习过程的个体中,也只能过滤出相对较小的同质数据样本。这些示例很少超过数千次重复。即便如此,这些数据也远远超出了研究受控条件下记忆特性的研究人员所能获得的样本质量。然而,遗忘的随机性质仍然使得我们很难对衰减函数的数学性质做出最终的判断(见下面的两个例子)。在分析了几十万个样本之后,我们发现遗忘是指数衰减的一种形式。

Exemplary forgetting curve sketched by SuperMemo

Figure: Exemplary forgetting curve sketched by SuperMemo. The database sample of nearly a million repetition cases has been sifted for average difficulty and low stability (A-Factor=3.9, S in [4,20]), resulting in 5850 repetition cases (less than 1% of the entire sample). The red line is a result of regression analysis with R=e-kt/S. Curve fitting with other elementary functions demonstrates that the exponential decay provides the best match to the data. The measure of time used in the graph is the so-called U-Factor defined as the quotient of the present and the previous inter-repetition interval. Note that the exponential decay in the range of R from 1 to 0.9 can plausibly be approximated with a straight line, which would not be the case had the decay been characterized by a power function.

***图片:*SuperMemo绘制的遗忘曲线。对近100万个重复案例的数据库样本进行了平均难度和低稳定性筛选(a因子=3.9,S in[4,20]),导致5850个重复案例(不到整个样本的1%)。红线是R=e-kt/S的回归分析结果。与其它初等函数的曲线拟合表明,指数衰减与数据拟合最优。图中使用的时间度量是所谓的u因子,定义为现在和以前重复间隔的商。注意,在R从1到0.9的范围内,指数衰减可以近似成一条直线,但如果衰减的特征是幂函数,就不是这样了。

Exemplary forgetting curve sketched by SuperMemo

Figure: Exemplary forgetting curve sketched by SuperMemo. The database sample of nearly a million repetition cases has been sifted for average difficulty and medium stability (A-Factor=3.3, S > 1 year) resulting in 1082 repetition cases. The red line is a result of regression analysis with R=e-kt/S.

***图片:**SuperMemo绘制的遗忘曲线。近100万例重复案例的数据库样本经过筛选,平均难度和中等稳定性(a因子=3.3,S > 1年),结果是1082例重复案例。红线是R=e-kt/ s *的回归分析结果

Forgetting curve: Retrievability formula

遗忘曲线:可检索公式

In Algorithm SM-17, retrievability R corresponds with the probability of recall and represents the exponential forgetting curve. Retrievability is derived from stability and the interval:

算法SM-17中, retrievability R,并表示retrievability的概率。Retrievability来自稳定和[区间]:

R[n]:=exp-k*t/S[n-1]

where:

R[n]:=exp-k*t/S[n-1]

上式中:

That neat theoretical approach is made a bit more complex when we consider that forgetting may not be perfectly exponential if items are difficult or with mixed difficulty. In addition, forgetting curves in SuperMemo can be marred by user strategies.

In Algorithm SM-8, we hoped that retrievability information might be derived from grades. This turned out to be false. There is very little correlation between grades and retrievability, and it primarily comes from the fact that complex items get worse grades and tend to be forgotten faster (at least at the beginning).

当我们考虑到如果项目困难的或有混合困难时,遗忘可能不是完美的指数级时,这种简洁的理论方法就变得更加复杂了。此外,[SuperMemo]中的遗忘曲线也会受到用户策略的影响。

Algorithm SM-8中,我们希望可检索的信息可以来自于成绩。这被证明是错误的。成绩和可检索性之间的相关性非常小,这主要是因为complex条目变差了,容易被遗忘至少在开始的时候

Retention vs. the forgetting index

记忆与遗忘指数

Exponential nature of forgetting implies that the relationship between the measured forgetting index and knowledge retention can accurately be expressed using the following formula:

遗忘的指数性意味着被测量的遗忘指数知识保留之间的关系可以用以下公式准确地表达:

Retention = -FI/ln(1-FI)

where:

  • Retention - overall knowledge retention expressed as a fraction (0..1),
  • FI - forgetting index expressed as a fraction (forgetting index equals 1 minus knowledge retention at repetitions).

Retention = -FI/ln(1-FI)

上式中:

  • 保留率-整体知识保留率表示为分数(0..1),

  • FI -遗忘指数表示为一个分数(遗忘指数等于1减去重复时的知识保留)。

For example, by default, well-executed spaced repetition should result in retention 0.949 (i.e. 94.9%) for the forgetting index of 0.1 (i.e. 10%). 94.9% illustrates how much exponential decay resembles a linear function at first. For linear forgetting, the figure would be 95.000% (i.e. 100% minus half the forgetting index).

例如,默认情况下,执行良好的间隔重复应该会导致记忆率为0.949(即94.9%),遗忘指数为0.1(即10%)。一开始,94.9%的指数衰减与线性函数有多相似。对于线性遗忘,这个数字是95.000%(即100%减去一半的遗忘指数)。

Forgetting curve for poorly formulated material

对于表述糟糕的材料的遗忘曲线

In 1994, I was lucky my learning collections were largely well-formulated. This often wasn't the case with users of SuperMemo. For badly-formulated items, the forgetting curve is flattened. It is not purely exponential (as superposition of several exponential curves). SuperMemo can never predict the moment of forgetting of a single item. Forgetting is a stochastic process and can only operate on averages. A frequently propagated fallacy about SuperMemo is that it predicts the exact moment of forgetting: this is not true, and this is not possible. What SuperMemo does is a search for intervals at which items of given difficulty are likely to show a given probability of forgetting (e.g. 10%). Those flattened forgetting curves led to a paradox. Neglecting complex items may lead to a great survival after long breaks from review. Even for a pure negatively exponential forgetting curve, a 10-fold deviation in interval estimation will result in R2=exp10*ln(R1) difference in retention. This is equivalent to a drop from 98% to 81%. For a flattened forgetting curve typical of badly-formulated items, this drop may be as little as 98%->95%. This leads to a conclusion that keeping complex material at lower priorities is a good learning strategy.

在1994年,我很幸运我的learning collections大部分都是精心设计的。这通常不是SuperMemo用户的情况。对于表述不明确的条目遗忘曲线被拉平。它不是纯粹的指数型作为几个指数曲线的叠加。SuperMemo永远无法预测忘记某件事的时刻。遗忘是一个随机过程,只能在平均水平上进行。关于SuperMemo的一个经常传播的谬论是,它预测了遗忘的确切时刻:这不是真的,这是不可能的。SuperMemo所做的是搜索interval,在这个区间内,给定难度的项目可能显示出给定的遗忘概率(例如10%)。那些平坦的遗忘曲线导致了一个悖论。忽略复杂的项目可能会让你在长时间的复习之后仍然能坚持下来。即使对于纯粹的负指数遗忘曲线,区间估计的10倍偏差也会导致R2=exp10*ln(R1) retention的差异。这相当于从98%降到81%。对于一个平坦的遗忘曲线,典型的糟糕的项目,这种下降可能只有98%->95%。这就引出了一个结论,把复杂的材料放在更低的优先级(https://supermemo.guru/wiki/Priority_queue)是一个很好的学习策略。

Power law emerges in superposition of exponential forgetting curves

幂律出现在指数遗忘曲线的叠加中

To illustrate the importance of homogenous samples in studying forgetting curves, let us see the effect of mixing difficult knowledge with easy knowledge on the shape of the forgetting curve. The figure below shows why heterogeneous samples may lead to wrong conclusions about the nature of forgetting. The heterogeneous sample in this demonstration is best approximated with a power function! The fact that power curves emerge through averaging of exponential forgetting curves has earlier been reported by others (Anderson&Tweney 1997; Ritter&Schooler, 2002).

为了说明同质样本在研究遗忘曲线中的重要性,让我们看看难易知识混合对遗忘曲线形状的影响。下图说明了为什么异质样本会导致关于遗忘本质的错误结论。本演示中的异构样本最好使用幂函数来近似!幂曲线是通过指数遗忘曲线的平均而出现的,这一事实早在其他研究中已经被报道(anderson & ey 1997;Ritter&Schooler, 2002)。

Heterogenous forgetting index

Figure: Superposition of forgetting curves may result in obscuring the exponential nature of forgetting. A theoretical sample of two types of memory traces has been composed: 50% of the traces in the sample with stability S=1 (thin yellow line) and 50% of the traces in the sample with stability S=40 (thin violet line). The superimposed forgetting curve will, naturally, exhibit retrievability R=0.5*Ra+0.5*Rb=0.5*(e-k*t+e-k*t/40). The forgetting curve of such a composite sample is shown in granular black in the graph. The thick blue line shows the exponential approximation (R2=0.895), and the thick red line shows the power approximation of the same curve (R2=0.974). In this case, it is the power function that provides the best match to data, even though the forgetting of sample subsets is negatively exponential.

***图片:*遗忘曲线的叠加可能导致遗忘指数性质的模糊。理论样例由两类记忆道组成:稳定性S=1的样例中50%的道例(淡黄色线)和稳定性S=40的样例中50%的道例(淡紫色线)。叠加的遗忘曲线自然会显示出可检索性R=0.5*Ra+0.5*Rb=0.5*(e-k*t+e-k*t/40)。这种复合样品的遗忘曲线在图中以粒状黑色表示。蓝色粗线表示指数逼近(R2=0.895),红色粗线表示同一曲线的幂逼近(R2=0.974)。在这种情况下,幂函数提供了与数据的最佳匹配,即使样本子集的遗忘是负指数。

SuperMemo 17 also includes a single forgetting curve that is best approximated by a power function. This is the first forgetting curve after memorizing items. At the time of memorization, we do not know item complexity. This is why the material is heterogeneous and we get a power curve of forgetting.

SuperMemo 17还包括一个单一的遗忘曲线,最好是用幂函数来近似。这是记忆后的第一条遗忘曲线。在记忆的时候,我们不知道项目复杂性。这就是为什么材料是异质的,我们得到了遗忘的幂曲线。

The first review forgetting curve for newly learned knowledge collected with SuperMemo

Figure: The first forgetting curve for newly learned knowledge collected with SuperMemo. Power approximation is used in this case due to the heterogeneity of the learning material freshly introduced in the learning process. Lack of separation by memory complexity results in superposition of exponential forgetting with different decay constants. On a semi-log graph, the power regression curve is logarithmic (in yellow), and appearing almost straight. The curve shows that in the presented case recall drops merely to 58% in four years, which can be explained by a high reuse of memorized knowledge in real life. The first optimum interval for review at retrievability of 90% is 3.96 days. The forgetting curve can be described with the formula R=0.9906*power(interval,-0.07), where 0.9906 is the recall after one day, while -0.07 is the decay constant. In this is case, the formula yields 90% recall after 4 days. 80,399 repetition cases were used to plot the presented graph. Steeper drop in recall will occur if the material contains a higher proportion of difficult knowledge (esp. poorly formulated knowledge), or in new students with lesser mnemonic skills. Curve irregularity at intervals 15-20 comes from a smaller sample of repetitions (later interval categories on a log scale encompass a wider range of intervals)

***图片:*用SuperMemo软件收集的新知识的第一遗忘曲线。由于在学习过程中新引入的学习材料的异质性,在这种情况下使用了功率近似。由于缺乏记忆复杂性的分离,导致不同衰减常数的指数遗忘叠加。在半对数图上,幂回归曲线是对数的(黄色部分),而且几乎是直线。曲线显示,在被呈现的案例中,四年的记忆率仅下降到58%,这可以用在现实生活中对记忆知识的高度重复使用来解释。在90%的可恢复性下,第一次复查的最佳间隔为3.96天。遗忘曲线可以用公式R=0.9906*power(interval,-0.07)来描述,其中0.9906为一天后的回忆,-0.07为衰减常数。在这种情况下,该配方在4天后产生90%的召回率。使用80399例重复病例绘制所示图形。如果材料中包含更多的高难度知识(特别是表述不清的知识),或者新学生的记忆能力较差,那么记忆率就会急剧下降。间隔15-20的曲线不规则性来自更小的重复样本(后来的对数尺度的间隔类别包含更大范围的间隔)