这是用户在 2024-6-5 11:31 为 https://app.immersivetranslate.com/pdf-pro/b7039f3b-6e2a-4b73-abda-b430a2db978d 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?
2024_06_05_cc21ad89f4c8035c7b1cg

Threetypes of incremental learning
渐进学习的类型

Received: 1 October 2021
收到:2021 年 10 月 1 日

Accepted: 18 October 2022
接受:2022 年 10 月 18 日
Published online: 5 December 2022
在线出版:2022 年 12 月 5 日

Check for updates 检查更新

Gido M. van de Ven (1) , Tinne Tuytelaars & Andreas S. Tolias
Gido M. van de Ven (1) , Tinne Tuytelaars & Andreas S. Tolias

Abstract 摘要

Incrementally learning new information from a non-stationary stream of data, referred to as 'continual learning', is a key feature of natural intelligence, but a challenging problem for deep neural networks. In recent years, numerous deep learning methods for continual learning have been proposed, but comparing their performances is difficult due to the lack of a common framework. To help address this, we describe three fundamental types, or 'scenarios', of continual learning: task-incremental, domain-incremental and class-incremental learning. Each of these scenarios has its own set of challenges. To illustrate this, we provide a comprehensive empirical comparison of currently used continual learning strategies, by performing the Split MNIST and Split CIFAR-100 protocols according to each scenario. We demonstrate substantial differences between the three scenarios in terms of difficulty and in terms of the effectiveness of different strategies. The proposed categorization aims to structure the continual learning field, by forming a key foundation for clearly defining benchmark problems.
从非稳态数据流中增量学习新信息(称为 "持续学习")是自然智能的一个关键特征,但对于深度神经网络来说却是一个具有挑战性的问题。近年来,人们提出了许多用于持续学习的深度学习方法,但由于缺乏通用框架,很难对这些方法的性能进行比较。为了帮助解决这个问题,我们描述了持续学习的三种基本类型或 "场景":任务递增学习、领域递增学习和类递增学习。每种情况都有各自的挑战。为了说明这一点,我们对当前使用的持续学习策略进行了全面的实证比较,根据每种情景执行了拆分 MNIST 和拆分 CIFAR-100 协议。我们证明了三种场景在难度和不同策略的有效性方面存在很大差异。所提出的分类方法旨在构建持续学习领域的结构,为明确定义基准问题奠定重要基础。

Abstract 摘要

An important open problem in deep learning is enabling neural networks to incrementally learn from non-stationary streams of data . For example, when deep neural networks are trained on samples from a new task or data distribution, they tend to rapidly lose previously acquired capabilities, a phenomenon referred to as catastrophic forgetting . In stark contrast, humans and other animals are able to incrementally learn new skills without compromising those that were already learned . The field of continual learning, also referred to as lifelong learning, is devoted to closing the gap in incremental learning ability between natural and artificial intelligence. In recent years, this area of machine learning research has been rapidly expanding, fuelled by the potential utility of deploying continual learning algorithms for applications such as medical diagnosis , autonomous driving or predicting financial markets .
深度学习中一个重要的开放性问题是让神经网络从非稳态数据流中逐步学习 。例如,当深度神经网络接受来自新任务或数据分布的样本训练时,它们往往会迅速丧失先前获得的能力,这种现象被称为灾难性遗忘 。与此形成鲜明对比的是,人类和其他动物能够循序渐进地学习新技能,而不会影响已经学会的技能 。持续学习(也称终身学习)领域致力于缩小自然智能与人工智能在增量学习能力方面的差距。近年来,持续学习算法在医疗诊断 、自动驾驶 或金融市场预测 等应用领域的潜在效用推动了这一机器学习研究领域的迅速发展。

Despite its scope, continual learning research is relatively unstructured and the field lacks a shared framework. Because of an abundance of subtle, but often important, differences between evaluation protocols, systematic comparison between continual learning algorithms is challenging, even when papers use the same datasets . It is therefore not surprising that numerous continual learning methods claim to be
尽管持续学习的范围很广,但其研究相对缺乏条理,该领域也缺乏一个共享框架。由于评估协议之间存在大量微妙但往往重要的差异,即使论文使用相同的数据集 ,对持续学习算法进行系统比较也很有挑战性。因此,众多持续学习方法声称自己是

state-of-the-art. To help address this, here we describe a structured and intuitive framework for continual learning.
最先进的。为了帮助解决这个问题,我们在这里描述了一个结构化的、直观的持续学习框架。
We put forward the view that, at the computational level , there are three fundamental types, or 'scenarios', of supervised continual learning. Informally, (a) in task-incremental learning, an algorithm must incrementally learn a set of clearly distinguishable tasks; (b) in domain-incremental learning, an algorithm must learn the same kind of problem but in different contexts; and (c) in class-incremental learning, an algorithm must incrementally learn to distinguish between a growing number of objects or classes. In this article, we formally define these three scenarios and point out different challenges associated with each one of them. We also review existing strategies for continual learning with deep neural networks and we provide a comprehensive, empirical comparison to test how suitable these different strategies are for each scenario.
我们认为,在计算层面 ,有监督的持续学习有三种基本类型或 "情景"。非正式地讲,(a) 在任务递增学习中,算法必须递增地学习一组可明确区分的任务;(b) 在领域递增学习中,算法必须学习同类问题,但要在不同的情境中学习;(c) 在类别递增学习中,算法必须递增地学习区分越来越多的对象或类别。在本文中,我们正式定义了这三种情况,并指出了与每种情况相关的不同挑战。我们还回顾了利用深度神经网络进行持续学习的现有策略,并进行了全面的实证比较,以检验这些不同策略在每种场景下的适用性。

Three continual learning scenarios
三种持续学习情景

In classical machine learning, an algorithm has access to all training data at the same time. In continual learning, the data instead arrives in a sequence, or in a number of steps, and the underlying distribution of the data changes over time. In this article, we propose that, depending on how the aspect of the data that changes over time relates to the function or mapping that must be learned, there are three fundamental ways in which a supervised learning problem can be incremental (Table 1). Below, we start by describing the resulting three continual learning scenarios intuitively. After that we define them more formally: first in a restricted, 'academic' setting, before generalizing them to more flexible continual learning settings.
在传统的机器学习中,算法可以同时获得所有训练数据。而在持续学习中,数据会按顺序或按步骤到达,而且数据的基本分布会随着时间的推移而变化。在本文中,我们提出,根据随时间变化的数据方面与必须学习的函数或映射之间的关系,监督学习问题可以通过三种基本方式实现增量(表 1)。下面,我们将首先直观地描述由此产生的三种持续学习情况。之后,我们将对它们进行更正式的定义:首先是在受限的 "学术 "环境中,然后再将它们推广到更灵活的持续学习环境中。

Intuitive descriptions and each scenario's challenges
直观的描述和每个场景的挑战

The first continual learning scenario we refer to as 'task-incremental learning' (or Task-IL). This scenario is best described as the case where an algorithm must incrementally learn a set of distinct tasks (see refs. for examples from the literature). The defining characteristic of task-incremental learning is that it is always clear to the algorithm-also at test time-which task must be performed. In practice, this could mean that task identity is explicitly provided, or that the tasks are clearly distinguishable. In this scenario it is possible to train models with task-specific components (for example, a separate output layer per task), or even to have a completely separate network for each task to be learned. In this last case there is no forgetting at all. The challenge with task-incremental learning, therefore, is not-or should not be-to simply prevent catastrophic forgetting, but rather to find effective ways to share learned representations across tasks, to optimize the trade-off between performance and computational complexity and to use information learned in one task to improve performance on other tasks (that is, to achieve positive forward or even backward transfer between tasks) . These are still open challenges. Real-world examples of task-incremental learning are learning to play different sports or different musical instruments, because typically it is always clear which sport or instrument should be played.
我们将第一种持续学习情景称为 "任务递增学习"(Task-IL)。这种情况最适合描述为算法必须增量学习一组不同任务的情况(文献中的例子见参考文献 )。任务递增学习的显著特点是,算法始终清楚--在测试时也是如此--必须执行哪项任务。在实践中,这可能意味着任务标识是明确提供的,或者任务是清晰可辨的。在这种情况下,就有可能训练出具有特定任务组件的模型(例如,每个任务都有一个独立的输出层),甚至为每个待学习的任务建立一个完全独立的网络。在最后一种情况下,根本不存在遗忘。因此,任务递增学习所面临的挑战不是--或者说不应该是--简单地防止灾难性遗忘,而是要找到有效的方法,在不同任务间共享所学表征,优化性能与计算复杂度之间的权衡,并利用在一项任务中学到的信息提高其他任务的性能(即实现任务间的正向或反向转移) 。这些仍是尚未解决的难题。任务递增学习在现实世界中的例子是学习不同的体育运动或不同的乐器,因为通常情况下,人们总是很清楚应该进行哪项运动或演奏哪种乐器。
We call the second scenario 'domain-incremental learning' (or Domain-IL). In this scenario, the structure of the problem is always the same, but the context or input-distribution changes (for example, there are domain-shifts; see refs. ). Similarly to task-incremental learning, this scenario can be described as that an algorithm must incrementally learn a set of 'tasks' (although now it might be more intuitive to think of them as 'domains'), but with the crucial difference that-at least at test time-the algorithm does not know to which task a sample belongs. However, identifying the task is not necessary, because each task has the same possible outputs (for example, the same classes are used in each task).Using task-specific components in this scenario is, however, only possible if an algorithm first identifies the task , but that is not necessarily the most efficient strategy. Preventing forgetting by design' is therefore not possible with domain-incremental learning, and alleviating catastrophic forgetting is still an important unsolved challenge. Examples of this scenario are incrementally learning to recognize objects under variable lighting conditions (for example, indoors versus outdoors) or learning to drive in different weather conditions .
我们称第二种情况为 "领域增量学习"(或称 "领域-IL")。在这种情况下,问题的结构始终保持不变,但上下文或输入分布发生了变化(例如,存在领域转移;参见参考文献 )。与任务增量学习类似,这种情况可以描述为算法必须增量学习一组 "任务"(虽然现在把它们看作 "领域 "可能更直观),但关键的区别在于--至少在测试时--算法不知道样本属于哪个任务。不过,识别任务并不是必须的,因为每个任务都有相同的可能输出(例如,每个任务都使用相同的类别)。不过,在这种情况下使用特定于任务的组件只有在算法首先识别任务 时才有可能,但这并不一定是最有效的策略。因此,"通过设计防止遗忘 "在领域递增学习中是不可能实现的,而减轻灾难性遗忘仍是一个尚未解决的重要挑战。这种情况的例子包括在不同照明条件下渐进学习识别物体 (例如,室内与室外),或在不同天气条件下学习驾驶
The third continual learning scenario is 'class-incremental learning' (or Class-IL). This scenario is best described as the case where an algorithm must incrementally learn to discriminate between a growing number of objects or classes (for example, refs. ). An often used set-up for this scenario is that a sequence of classification-based tasks (although now it might be more intuitive to think of them as 'episodes') is encountered, whereby each task contains different classes and the algorithm must learn to distinguish between all classes . In this case, task identification is necessary to solve the problem, as it determines which possible classes the current sample might belong to. In other words, the algorithm should be able both to solve each individual task (that is, distinguish between classes within an episode) and to identify which task a sample belongs to (that is, distinguish between classes from different episodes). For example, an agent might first learn about cats and dogs, and later about cows and horses; while with task-incremental learning
第三种持续学习情况是 "类递增学习"(或称 "类-IL")。这种情况的最佳描述是,算法必须逐步学会分辨越来越多的对象或类别(例如,参考文献 )。这种情况下经常使用的一种设置是,遇到一系列基于分类的任务(虽然现在将其视为 "事件 "可能更直观),其中每个任务都包含不同的类别,而算法必须学会区分所有类别 。在这种情况下,任务识别是解决问题的必要条件,因为它决定了当前样本可能属于哪些类别。换句话说,算法既要能解决每个单独的任务(即区分一个情节中的类别),又要能识别样本属于哪个任务(即区分不同情节中的类别)。例如,一个代理可能先学习猫和狗,然后再学习牛和马;而任务递增式学习则是

Table Overview of the three continual learning scenarios
三个持续学习方案概览
Scenario 场景 Intuitive description 直观描述 Mapping to learn 绘制学习地图

任务增量学习
Task-incremental
learning

按顺序学习解决一系列不同的任务
Sequentially learn to solve
a number of distinct tasks

领域强化学习
Domain-incremental
learning

学会在不同情境中解决同一问题
Learn to solve the same
problem in different
contexts

班级强化学习
Class-incremental
learning

区分逐步观察到的类别
Discriminate between
incrementally observed
classes
Notation: is the input space, is the within-context output space and is the context space. In this article, the term 'context' refers to an underlying distribution from which observations are sampled. The context changes over time. In the continual learning literature, the term 'task' is often used in a way analogous to how the term 'context' is used here.
符号: 为输入空间, 为上下文内输出空间, 为上下文空间。在本文中,"上下文 "一词指的是一个基础分布,观测数据就是从这个分布中采样的。上下文随时间而变化。在持续学习文献中,"任务 "一词的使用方式通常与本文中 "上下文 "一词的使用方式类似。
the agent would not be expected to distinguish between animals encountered in different episodes (for example, between cats and cows), with class-incremental learning this is required. An important challenge in this scenario is learning to discriminate between classes that are not observed together, which has turned out to be very challenging for deep neural networks, especially when storing examples of previously seen classes is not allowed .
在这种情况下,深度神经网络面临的重要挑战是学习如何区分不在一起观察到的动物类别,这对深度神经网络来说非常具有挑战性,尤其是在不允许存储以前观察到的动物类别示例的情况下。在这种情况下,一个重要的挑战是学会区分不在一起观察到的类别,这对深度神经网络来说非常具有挑战性,尤其是在不允许存储以前观察到的类别示例的情况下

Formalization in a restricted, 'academic' setting
在受限制的 "学术 "环境中实现正规化

To more formally define these three scenarios, we start by considering the simple, but frequently studied, continual learning setting in which a classification problem is split up into multiple parts or episodes that must be learned sequentially, with no overlap between the different episodes. In the continual learning literature, these episodes are often called tasks, but in this article we will refer to them as 'contexts'. The term task is problematic because in the literature it is used with several different meanings or connotations. From here on, we will use the term task only to refer to a context when it is always clear to the learning algorithm when a sample belongs to that context (as is the case with task-incremental learning).
为了更正式地定义这三种情况,我们首先考虑一种简单但经常被研究的持续学习环境,在这种环境中,分类问题被分割成多个部分或情节,必须按顺序学习,不同情节之间没有重叠。在持续学习文献中,这些情节通常被称为任务,但在本文中,我们将把它们称为 "情境"。任务一词之所以有问题,是因为在文献中,它有几种不同的含义或内涵。从这里开始,我们将仅在学习算法始终清楚某个样本属于某个情境时(如任务递增学习的情况)使用任务一词来指代情境。
In the 'academic continual learning setting' sketched above (that is, classification-based, non-overlapping contexts encountered sequentially), a clear distinction can be drawn between the three scenarios. To formalize this, we express each sample as consisting of three components: an input , a within-context label and a context label . The three scenarios can then be defined based on how the function or mapping that must be learned relates to the context space . With task-incremental learning, an algorithm is expected to learn a mapping of the form , with domain-incremental learning a mapping of the form must be learned and with class-incremental learning the shape of the mapping to be learned is . For class-incremental learning this mapping can also be written as , with the 'global label space' obtained by combining and .
在上文勾勒的 "学术持续学习环境 "中(即基于分类的、非重叠的、依次遇到的情境),这三种情境之间有明显的区别。为了将其形式化,我们将每个样本表述为由三个部分组成:输入 、上下文内标签 和上下文标签 。然后,可以根据必须学习的函数或映射与上下文空间 的关系来定义三种情景。在任务递增学习中,算法需要学习形式为 的映射;在领域递增学习中,必须学习形式为 的映射;在类递增学习中,需要学习的映射形状为 。对于类递增学习,这个映射也可以写成 ,其中 是通过组合 得到的 "全局标签空间"。
These definitions imply that the three scenarios can be distinguished based on whether at test time context identity information is known to the algorithm and, in case it is not, whether it must be inferred (Fig.1). Each scenario thus specifies whether context labels are available during testing, but not necessarily whether they are available during training. With task-and class-incremental learning, it is often implicit that context labels are provided during training (for example, in the case of supervised learning), but with domain-incremental learning it is good practice to explicitly state whether context labels (or context boundaries) are provided during training.
这些定义意味着,可以根据算法在测试时是否知道上下文身份信息,以及在不知道的情况下是否必须推断出上下文身份信息来区分这三种情况(图 1)。因此,每种情况都规定了测试时是否有上下文标签,但不一定规定了训练时是否有上下文标签。在任务和类递增学习中,通常隐含着在训练过程中提供上下文标签(例如,在监督学习中),但在领域递增学习中,明确说明在训练过程中是否提供上下文标签(或上下文边界)是一种很好的做法。
To illustrate the continual learning scenarios with an example, Fig. 2 shows how Split MNIST, which is a popular toy problem for continual learning , can be performed according to each of the three scenarios. Further examples illustrating these scenarios with other context sequences are provided in Supplementary Note 1.
为了举例说明持续学习的应用场景,图 2 显示了如何根据三种应用场景中的每一种进行拆分 MNIST(这是持续学习 的一个常用玩具问题)。补充说明 1 提供了更多示例,用其他上下文序列来说明这些情况。
Fig. 1|Decision tree for the three continual learning scenarios. The scenarios can be defined based on whether at test time context identity is known and if it is not, whether it must be inferred.
图 1:三种持续学习情景的决策树。可以根据测试时上下文标识是否已知以及如果不已知是否必须推断来定义这些情景。
It might be unintuitive to distinguish domain- and classincremental learning by whether context identity must be inferred, because with class-incremental learning context identification is often not explicitly performed, as typically a direct mapping is learned from the input space to the set of global labels . Another way to tell these two scenarios apart is by whether different contexts contain the same classes (domain-incremental learning) or different classes (class-incremental learning). However, it should then be realized that whether two samples belong to the same class can change depending on perspective: in the Split MNIST example (Fig. 2), with domainincremental learning the digits ' 0 ' and ' 2 ' belong to the same class (as they are both even digits), but with class-incremental learning they are considered different classes.
用是否必须推断上下文标识来区分领域递增学习和类递增学习可能并不直观,因为在类递增学习中,上下文标识通常不会明确执行,因为通常是从输入空间 直接学习映射到全局标签集 。区分这两种情况的另一种方法是看不同的上下文是否包含相同的类别(领域递增学习)或不同的类别(类别递增学习)。然而,我们应该意识到,两个样本是否属于同一类别会因视角不同而发生变化:在拆分 MNIST 示例(图 2)中,通过领域递增学习,数字 "0 "和 "2 "属于同一类别(因为它们都是偶数),但通过类别递增学习,它们被视为不同的类别。

Generalization to more flexible settings
推广到更灵活的环境

The clear separation between the three scenarios makes the academic continual learning setting convenient for studying these scenarios and their different challenges in isolation. However, this setting does not reflect well the arbitrary non-stationarity that can be observed in the real world . To generalize the three scenarios to more flexible continual learning settings, we first introduce a distinction between the concepts 'context set' and 'data stream':
这三种情景之间的明确分离使得学术持续学习环境便于孤立地研究这些情景及其不同的挑战。然而,这种环境并不能很好地反映现实世界中可以观察到的任意的非稳态性 。为了将这三种情景推广到更灵活的持续学习环境中,我们首先引入了 "情境集 "和 "数据流 "这两个概念之间的区别:
The 'context set' is defined as a collection of underlying distributions, denoted by , from which the observations presented to the algorithm are sampled. For a supervised continual learning problem, for each context , samples from consist of an input and a within-context label . (With class-incremental learning each context could also contain a single class, in which case the withincontext label is not used.)
上下文集 "被定义为底层分布的集合,用 表示,提交给算法的观察结果就是从这个集合中采样的。在有监督的持续学习问题中,对于每个上下文 ,来自 的样本包括一个输入 和一个上下文内标签 。(在类递增学习中,每个上下文也可以包含一个单独的类,在这种情况下,上下文内标签 将不被使用)。
The 'data stream' is defined as a (possibly unbounded) stream of experiences that are sequentially presented to the algorithm: . .ach experience consists of a set of observations sampled from one or more of the underlying distributions of the context set. These experiences are the incremental steps of a continual learning problem, in the sense that at each step, the algorithm has free access to the data of the current experience, but not to the data from past or future experiences (see also ref. ).
数据流 "被定义为依次呈现给算法的经验流(可能是无限制的): .每一次经验都由一组从上下文集的一个或多个基本分布中采样的观察结果组成。这些经验是持续学习问题的增量步骤,即在每个步骤中,算法可以自由访问当前经验的数据,但不能访问过去或未来经验的数据(另见参考文献 )。
In the academic continual-learning setting, there is no distinction between the context set and the data stream, because each experience consists of all the training data of a particular context. In general, however, such a direct relation is not needed. Every observation within each experience can in principle be sampled from any combination of underlying datasets from the context set. This can be formalized as:
在学术持续学习环境中,情境集和数据流之间没有区别,因为每次体验都由特定情境中的所有训练数据组成。然而,在一般情况下,这种直接关系是不需要的。原则上,每个经验中的每个观察结果都可以从上下文集中的任何底层数据集组合中采样。这可以形式化为
whereby is observation of experience and is the probability that this observation is sampled from . Importantly, in this framework, from a probabilistic perspective, two observations at different points in time can only differ from each other with respect to the (combination of) contexts from which they are sampled. With this formulation, the context set describes the aspects of the data that 'can' change over time and the probabilities describe 'how' they change over time.
其中 是经验 中的观测值 是该观测值从 中采样的概率。重要的是,在这一框架中,从概率论的角度来看,不同时间点的两个观察结果只能因其采样的情境(组合)而有所不同。通过这种表述,情境集描述了数据 "可能 "随时间变化的方面,而概率 则描述了它们 "如何 "随时间变化。
An advantage of distinguishing between the context set and the data stream is that it makes it possible to describe continual learning problems with gradual transitions between contexts or whereby contexts are revisited . In this framework, which is suitable for so-called 'task-free continual learning , generalized versions of the three scenarios can be defined based on how the mapping that must be learned relates to the context space , which describes the non-stationary aspect of the data. Supplementary Note 2 illustrates how a 'task-free' data stream can be performed according to each of the generalized versions of the three scenarios.
区分上下文集和数据流的一个好处是,可以描述上下文之间渐进转换的持续学习问题 ,或重新审视上下文的持续学习问题 。在这个适合所谓 "无任务持续学习 "的框架中,可以根据必须学习的映射与上下文空间 的关系来定义这三种情况的通用版本,上下文空间描述了数据的非稳态方面。补充说明 2 举例说明了如何根据这三种情景的每一种通用版本来执行 "无任务 "数据流。
We note that for complex real-world incremental learning problems, it might not be straight-forward to express the mapping that must be learned in terms of the context space , for example, because there are different aspects of the data that change over time. To accommodate this, a multidimensional context space can be used, whereby each dimension could adhere to a different scenario. This allows for continual-learning problems that are mixtures of scenarios (Supplementary Note 3). Another generalization is that contexts do not need to be discrete, but can be continuous (in that case the summation in equation (1) becomes an integral); an example of a continuous context set is Rotated MNIST with arbitrary rotation (Supplementary Note3).
我们注意到,对于复杂的现实世界增量学习问题,用上下文空间 来表达必须学习的映射可能并不直截了当,例如,因为数据的不同方面会随着时间的推移而变化。为了适应这种情况,可以使用多维上下文空间 ,其中每个维度可以对应不同的场景。这样就可以解决情景混合的持续学习问题(补充说明 3)。另一个概括是,上下文不一定是离散的,也可以是连续的(在这种情况下,等式(1)中的求和就变成了积分);连续上下文集的一个例子是任意旋转的旋转 MNIST(补充说明 3)。

Empirical comparison 经验比较

To further explore the differences between the three continual learning scenarios, here we provide an empirical comparison of the performance of different deep learning strategies. To do this comparison in a structured manner, in Supplementary Note 4 we discuss and distinguish five computational strategies for continual learning (Fig. 3). For each strategy, we included a few representative methods in our comparison.
为了进一步探索三种持续学习方案之间的差异,我们在此对不同深度学习策略的性能进行了实证比较。为了有条理地进行比较,我们在补充说明 4 中讨论并区分了五种持续学习的计算策略(图 3)。对于每种策略,我们都在比较中纳入了一些具有代表性的方法。

Compared methods 比较方法

The use of context-specific components was represented by context-dependent gating , which masks for each context a randomly selected subset of hidden units; and the separate networks approach, where the available parameter budget is divided over all contexts and a separate network is learned for each context.
使用特定情境组件的方法有两种:一种是情境门控方法 ,即针对每种情境随机选择一个隐藏单元子集;另一种是独立网络方法,即在所有情境中分配可用参数预算,并针对每种情境学习一个独立网络。
Included parameter regularization methods were elastic weight consolidation ), which estimates parameter importance using a diagonal approximation to the Fisher information; and synaptic intelligence ( ), which estimates parameter importance online based on the training trajectory.
参数正则化方法包括弹性权重巩固法( )和突触智能法( ),前者利用费雪信息的对角线近似估计参数的重要性,后者则根据训练轨迹在线估计参数的重要性。
For functional regularization, compared were learning without forgetting , which uses the inputs from the current context as anchor points; and functional regularization of the memorable past(FROMP , which uses stored examples from past contexts as anchor points.
在功能正则化方面,比较了无遗忘学习( )和记忆性过去功能正则化(FROMP ),前者使用当前上下文中的输入作为锚点,后者则使用过去上下文中存储的示例作为锚点。
The included replay methods were deep generative replay ), which replays generated representations at the input level; brain-inspired replay (BI-R ), which replays generated representations at the latent feature level; experience replay , which replays stored samples in the 'standard way' (that is, loss on replayed data added to loss on current data); and averaged gradient episodic memory (A-GEM ), which replays the same stored samples but using the loss on replayed data as inequality constraint (that is, loss on current data optimized under constraint that loss on replayed data cannot increase).
包括的重放方法有:深度生成重放(deep generative replay ),在输入层面重放生成的表征;大脑启发重放(Brain-inspired replay,BI-R ),在潜特征层面重放生成的表征;经验重放(experience replay ),以 "标准方式 "重放存储的样本(即重放数据的损失与当前数据的损失相加);以及平均梯度外显记忆(A-GEM ),该方法重放相同的存储样本,但将重放数据的损失作为不平等约束条件(即在重放数据损失不能增加的约束条件下优化当前数据的损失)。
The compared template-based methods were iCaRL , with mean latent feature representations of stored examples as templates; and the generative classifier from ref. , which uses class-specific generative models as templates.
比较过的基于模板的方法有 iCaRL 和参考文献中的生成分类器,前者使用存储示例的平均潜在特征表示作为模板,后者使用特定类别的生成模型作为模板。 该方法使用特定类别的生成模型作为模板。

a
b
b Input (at test time)
输入(测试时)
Expected output 预期产出 Intuitive description 直观描述
Task-incremental learning
任务增量学习
Image + context label
图像 + 上下文标签
Within-context label
语境内标签
Choice between two digits of same context (e.g. 0 or 1)
在上下文相同的两个数字(如 0 或 1)之间进行选择
Domain-incremental learning
领域强化学习
Image 图片 Within-context label 语境内标签 Is the digit odd or even?
数字是奇数还是偶数?
Class-incremental learning
班级强化学习
Image 图片 Global label 全球标签 Choice between all ten digits
在所有十位数之间进行选择
Fig. 2 | Split MNIST according to the three scenarios. a, The Split MNIST protocol is obtained by splitting the original MNIST dataset into five contexts, with each context consisting of two digits. b, Overview of what is expected of the algorithm at test time when the Split MNIST protocol is performed according to each continual learning scenario. With task-incremental learning, at the computational level, there is no difference between whether the algorithm must return the within-context label or the global label, because the within-context label can be combined with the context label (which is provided as input) to get the global label.
图 2 根据三种情况拆分 MNIST a, 将原始 MNIST 数据集拆分为五个上下文,每个上下文由两位数字组成,从而得到拆分 MNIST 协议。 通过任务递增学习,在计算层面上,算法必须返回上下文内标签还是全局标签并无区别,因为上下文内标签可以与上下文标签(作为输入提供)相结合,从而得到全局标签。
Fig. 3 | Schematic illustrations of different continual learning strategies. a, Context-specific components. Certain parts of the network are only used for specific contexts. b, Parameter regularization. Parameters important for past contexts are encouraged not to change too much when learning new contexts. c, Functional regularization. The input-output mapping learned previously is encouraged not to change too much at a particular set of inputs (the 'anchor points') when training on new contexts. d, Replay. The training data of a new context is complemented with data representative of past context. The replayed data is sampled from , which can be a memory buffer or a generative model. e, Template-based classification. A 'template' is learned for each class (for example, a prototype, an energy value or a generative model), and classification is performed based on which template is most suitable for the sample to be classified. See Supplementary Note 4 for a detailed discussion of these strategies.
图 3 不同持续学习策略的示意图。网络的某些部分仅用于特定情境。在学习新情境时,鼓励对过去情境重要的参数不要发生太大变化。在对新情境进行训练时,鼓励不要对特定输入集("锚点")上之前学习的输入输出映射进行过多更改。新情境的训练数据与过去情境的代表性数据相辅相成。重放数据采样自 ,它可以是一个内存缓冲区,也可以是一个生成模型。为每个类别学习一个 "模板"(例如,一个原型、一个能量值或一个生成模型),然后根据哪个模板最适合要分类的样本进行分类。有关这些策略的详细讨论,请参阅补充说明 4。

Finally, two baselines were included. As lower target, referred to as 'none', the model was incrementally trained on all contexts in the standard way. As upper target, referred to as 'joint', the model was trained on the data of all contexts at the same time.
最后,还包括两个基线。作为下层目标,即 "无",模型以标准方式在所有语境中逐步训练。作为上部目标,即 "联合",模型同时在所有语境的数据上进行训练。

Set-up 设置

We performed both the Split MNIST and the Split CIFAR-100 protocol according to each of the three scenarios. All experiments used the academic continual learning setting and context identity information was available during training. To make the comparisons as informative as possible, we used similar network architectures and similar training protocols for all compared methods. Depending on the continual learning scenario, the output layer of the network was treated differently. With task-incremental learning, a multi-headed output layer was used whereby each context had its own output units and only the units of the context under consideration were used. For the other two scenarios, single-headed output layers were used, with the number of output units equal to the number of classes per context (domain-incremental learning) or to the total number of classes (class-incremental learning). See Methods for more detail.
我们根据三种情况分别执行了拆分 MNIST 和拆分 CIFAR-100 协议。所有实验都使用了学术持续学习设置,并且在训练过程中提供了上下文身份信息。为了使比较尽可能具有参考价值,我们在所有比较方法中使用了相似的网络架构和相似的训练协议。根据持续学习方案的不同,网络输出层的处理方式也不同。在任务递增学习中,我们使用了多头输出层,每个情境都有自己的输出单元,并且只使用所考虑情境的单元。在其他两种情况下,则使用单头输出层,输出单元的数量等于每个上下文的类的数量(领域递增学习)或类的总数(类递增学习)。详见方法。

Results 成果

For both Split MNIST (Table 2) and Split CIFAR-100 (Table3), we found clear differences between the three continual learning scenarios. With task-incremental learning, almost all tested methods performed well compared to the 'none' and 'joint' baselines, with domain-incremental learning the relative performances of many methods dropped considerably and with class-incremental learning they decreased even further.
对于 Split MNIST(表 2)和 Split CIFAR-100(表 3),我们发现三种持续学习方案之间存在明显差异。与 "无 "基线和 "联合 "基线相比,在任务递增学习中,几乎所有测试方法都表现良好;而在领域递增学习中,许多方法的相对性能大幅下降;在类递增学习中,性能进一步下降。
The decline in performance across the three scenarios was most pronounced for the parameter regularization methods. On both protocols, EWC and SI performed close to the upper target when context identity was known during testing (that is, task-incremental learning); with domain-incremental learning the performance of both methods was substantially lower, but remained above the lower target of sequentially training a network in the standard way; and with class-incremental learning the performance of EWC and SI was similar to the lower target, indicating that in this scenario these methods failed completely. There was a similar trend across the three scenarios for the functional regularization methods, albeit less pronounced for FROMP than for LwF.
在三种情况下,参数正则化方法的性能下降最为明显。在两种协议中,当测试期间上下文身份已知时(即任务递增学习),EWC 和 SI 的性能接近上限目标;当进行领域递增学习时,两种方法的性能大幅下降,但仍高于按标准方法顺序训练网络的下限目标;而当进行类递增学习时,EWC 和 SI 的性能与下限目标相近,表明在这种情况下这些方法完全失败。功能正则化方法在三种情况下也有类似的趋势,只是 FROMP 没有 LwF 那么明显。
Replay-based methods performed relatively well in all three scenarios. Although on both protocols their performance still decreased from task- to domain- to class-incremental learning, the decline was less sharp than for the regularization-based methods, and replay-based methods were among the top performers in each scenario. Template-based classification also performed well with class-incremental learning, with iCaRL and the Generative Classifier among the best performing methods on both protocols.
基于重放的方法在所有三种情况下的表现都相对较好。虽然在两种协议中,从任务学习到领域学习再到类递增学习,它们的性能都有所下降,但下降幅度没有基于正则化的方法那么大,而且基于重放的方法在每种情况下都是表现最好的。在类递增学习中,基于模板的分类方法也表现出色,iCaRL 和生成分类器在两种协议中都是表现最好的方法。
For class-incremental learning, the methods that performed best either used a generative model or they stored previously seen data in a memory buffer. Directly comparing methods using these two
在类递增学习方面,表现最好的方法要么使用了生成模型,要么在内存缓冲区中存储了之前看过的数据。直接比较使用这两种
Table 2 | Results on Split MNIST
表 2 分离 MNIST 的结果
Strategy 战略 Method 方法 Budget 预算 GM Task-IL 任务-IL Domain-IL 域名-IL Class-IL I 类
Baselines 基线 None - lower target
无 - 目标较低
Joint - upper target
关节 - 上部目标
Context-specific components
针对具体情况的组件
Separate Networks 独立网络 - - - -
- - - -
Parameter regularization
参数正则化
EWC - -
SI - -
Functional regularization
功能正则化
- -
FROMP 100 -
Replay 重播 DGR - Yes
- Yes
ER 100 -
A-GEM 100 -
Template-based classification
基于模板的分类
Generative Classifier 生成式分类器 - Yes - -
iCaRL 100 - - -

performed 20 times with different random seeds, reported is the mean ( s.e.m.) over these runs.
用不同的随机种子执行了 20 次,报告的是这些运行的平均值( s.e.m.)。
Table 3 | Results on Split CIFAR-100
表 3 CIFAR-100 的拆分结果
Strategy 战略 Method 方法 Budget 预算 GM Task-IL 任务-IL Domain-IL 域名-IL Class-IL I 类
Baselines 基线 None - lower target
无 - 目标较低
Joint - upper target
关节 - 上部目标
Context-specific components
针对具体情况的组件
Separate Networks 独立网络 - - - -
- - - -
Parameter regularization
参数正则化
EWC - -
SI - -
Functional regularization
功能正则化
LwF - -
Replay 重播 DGR - Yes
- Yes
ER 100 -