这是用户在 2024-5-28 19:19 为 https://ar5iv.labs.arxiv.org/html/2308.02533?_immersive_translate_auto_translate=1 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

Improving Generalization of Adversarial Training via
Robust Critical Fine-Tuning
通过鲁棒临界微调提高对抗训练的泛化能力

Kaijie Zhu1,2, Jindong Wang3, Xixu Hu4, Xing Xie3, Ge Yang1,2
朱凯杰 1,2 、王进东 3 、胡锡旭 4 、谢兴 3 、葛阳 1,2

1School of Artifical Intelligence, University of Chinese Academy of Sciences   
1 中国科学院大学医学智能学院

2Institute of Automation, CAS    3 Microsoft Research    4 City University of Hong Kong
2 中国科学院自动化研究所 3 微软研究院 4 香港城市大学
Corresponding author: Ge Yang <<ge.yang@ia.ac.cn>>.
通讯作者:葛阳 << ge. yang @ www.example.com >>
Abstract 摘要

Deep neural networks are susceptible to adversarial examples, posing a significant security risk in critical applications. Adversarial Training (AT) is a well-established technique to enhance adversarial robustness, but it often comes at the cost of decreased generalization ability. This paper proposes Robustness Critical Fine-Tuning (RiFT), a novel approach to enhance generalization without compromising adversarial robustness. The core idea of RiFT is to exploit the redundant capacity for robustness by fine-tuning the adversarially trained model on its non-robust-critical module. To do so, we introduce module robust criticality (MRC), a measure that evaluates the significance of a given module to model robustness under worst-case weight perturbations. Using this measure, we identify the module with the lowest MRC value as the non-robust-critical module and fine-tune its weights to obtain fine-tuned weights. Subsequently, we linearly interpolate between the adversarially trained weights and fine-tuned weights to derive the optimal fine-tuned model weights. We demonstrate the efficacy of RiFT on ResNet18, ResNet34, and WideResNet34-10 models trained on CIFAR10, CIFAR100, and Tiny-ImageNet datasets. Our experiments show that RiFT can significantly improve both generalization and out-of-distribution robustness by around 1.51.51.5% while maintaining or even slightly enhancing adversarial robustness. Code is available at https://github.com/microsoft/robustlearn.
深度神经网络容易受到对抗性示例的影响,在关键应用中构成重大安全风险。对抗训练(AT)是一种增强对抗鲁棒性的成熟技术,但它往往以降低泛化能力为代价。本文提出了鲁棒性临界微调(RiFT),这是一种在不影响对抗鲁棒性的情况下增强泛化能力的新方法。RIFT的核心思想是通过在其非鲁棒关键模块上微调对抗训练模型来利用冗余能力以实现鲁棒性。要做到这一点,我们引入模块鲁棒临界性(MRC),一个措施,评估一个给定的模块的重要性模型的鲁棒性在最坏情况下的重量扰动。使用此度量,我们将具有最低MRC值的模块识别为非鲁棒关键模块,并对其权重进行微调以获得微调后的权重。 随后,我们在逆向训练的权重和微调的权重之间进行线性插值,以获得最佳的微调模型权重。我们证明了RiFT在CIFAR 10、CIFAR 100和Tiny-ImageNet数据集上训练的ResNet 18、ResNet 34和WideResNet 34 -10模型上的有效性。我们的实验表明,RiFT可以显着提高泛化和分布外鲁棒性约 1.51.51.5 %,同时保持甚至略微增强对抗鲁棒性。代码可在https://github.com/microsoft/robustlearn上获得。

1 Introduction 1介绍

Refer to caption
Figure 1: Interpolation results of fine-tuning on different modules of ResNet18 on CIFAR10 dataset. Dots denote different interpolation points between the final fine-tuned weights of RiFT and the initial adversarially trained weights. All fine-tuning methods improve the generalization ability, but only fine-tuning on the non-robust-critical module (layer2.1.conv2) can preserve robustness. Additionally, fine-tuning on robust-critical module (layer4.1.conv1) causes the worst trade-off between generalization and robustness. In the initial interpolation stage, fine-tuning on non-robust-critical modules enhances adversarial robustness by around 0.3%percent0.30.3\%.
图1:在CIFAR10数据集上对ResNet18的不同模块进行微调的插值结果。点表示RiFT的最终微调权重和初始逆向训练权重之间的不同插值点。所有的微调方法都提高了泛化能力,但只有对非鲁棒关键模块(layer2.1.conv2)进行微调才能保持鲁棒性。此外,对鲁棒关键模块(layer4.1.conv1)的微调会导致泛化和鲁棒性之间的最差权衡。在初始插值阶段,对非鲁棒关键模块的微调将对抗鲁棒性提高了约 0.3%percent0.30.3\%

The pursuit of accurate and trustworthy artificial intelligence systems is a fundamental objective in the deep learning community. Adversarial examples [43, 14], which perturbs input by a small, human imperceptible noise that can cause deep neural networks to make incorrect predictions, pose a significant threat to the security of AI systems. Notable experimental and theoretical progress has been made in defending against such adversarial examples [6, 4, 10, 18, 11, 15, 36]. Among various defense methods [49, 32, 54, 30, 8], adversarial training (AT) [28] has been shown to be one of the most promising approaches [4, 11] to enhance the adversarial robustness. However, compared to standard training, AT severely sacrifices generalization on in-distribution data [40, 44, 55, 35, 31] and is exceptionally vulnerable to certain out-of-distribution (OOD) examples [13, 50, 21] such as Contrast, Bright and Fog, resulting in unsatisfactory performance.
追求准确和值得信赖的人工智能系统是深度学习社区的基本目标。对抗性示例[43,14]通过微小的人类不可感知的噪声干扰输入,可能导致深度神经网络做出错误的预测,对人工智能系统的安全性构成重大威胁。值得注意的实验和理论的进展已经取得了防御这样的敌对的例子[ 6,4,10,18,11,15,36]。在各种防御方法中[49,32,54,30,8],对抗训练(AT)[ 28]已被证明是增强对抗鲁棒性的最有前途的方法之一[4,11]。然而,与标准训练相比,AT严重牺牲了对分布内数据的泛化[ 40,44,55,35,31],并且特别容易受到某些分布外(OOD)示例[ 13,50,21]的影响,例如对比度,亮度和雾,导致性能不令人满意。

Prior arts tend to mitigate the trade-off between generalization and adversarial robustness within the adversarial training procedure. For example, some approaches have explored reweighting instances [56], using unlabeled data [35], or redefining the robust loss function [55, 46, 48, 31]. In this paper, we take a different perspective to address such a trade-off by leveraging the redundant capacity for robustness of neural networks after adversarial training. Recent research has demonstrated that deep neural networks can exhibit redundant capacity for generalization due to their complex and opaque nature, where specific network modules can be deleted, permuted [45], or reset to their initial values [52, 9] with only minor degradation in generalization performance. Hence, it is intuitive to ask: Do adversarially trained models have such redundant capacity? If so, how to leverage it to improve the generalization and OOD robustness 111Here, generalization refers to generalization to in-distribution (ID) samples, and OOD robustness refers to generalization to OOD samples. while maintaining adversarial robustness?
现有技术倾向于减轻对抗训练过程中的泛化和对抗鲁棒性之间的权衡。例如,一些方法已经探索了重新加权实例[ 56],使用未标记的数据[ 35],或重新定义鲁棒损失函数[ 55,46,48,31]。在本文中,我们从不同的角度来解决这种权衡,即在对抗训练后利用冗余能力来增强神经网络的鲁棒性。最近的研究表明,由于其复杂和不透明的性质,深度神经网络可以表现出冗余的泛化能力,其中特定的网络模块可以被删除,置换[ 45]或重置为初始值[52,9],而泛化性能只有轻微的下降。因此,我们可以很直观地问:逆向训练的模型有这样的冗余能力吗?如果是这样,如何利用它来提高泛化和OOD鲁棒性 1 ,同时保持对抗鲁棒性?

Based on such motivation, we introduce a new concept called Module Robust Criticality (MRC) 222In our paper, a module refers to a layer of the neural network. to investigate the redundant capacity of adversarially trained models for robustness. MRC aims to quantify the maximum increment of robustness loss of a module’s parameters under the constrained weight perturbation. As illustrated in Figure 3, we empirically find that certain modules do exhibit redundant characteristics under such perturbations, resulting in negligible drops in adversarial robustness. We refer to the modules with the lowest MRC value as the non-robust-critical modules. These findings further inspire us to propose a novel fine-tuning technique called Robust Critical Fine-Tuning (RiFT), which aims to leverage the redundant capacity of the non-robust-critical module to improve generalization while maintaining adversarial robustness. RiFT consists of three steps: (1) Module robust criticality characterization, which calculates the MRC value for each module and identifies the non-robust-critical module. (2) Non-robust-critical module fine-tuning, which exploits the redundant capacity of the non-robust-critical module via fine-tuning its weights with standard examples. (3) Mitigating robustness-generalization trade-off via interpolation, which interpolates between adversarially trained parameters and fine-tuned parameters to find the best weights that maximize the improvement in generalization while preserving adversarial robustness.
基于这样的动机,我们引入了一个新的概念,称为模块鲁棒临界性(MRC) 2 来研究对抗训练模型的冗余能力。MRC的目标是量化在约束权重扰动下模块参数鲁棒性损失的最大增量。如图3所示,我们根据经验发现,某些模块在这种扰动下确实表现出冗余特征,导致对抗鲁棒性的下降可以忽略不计。我们将具有最低MRC值的模块称为非鲁棒关键模块。这些发现进一步启发我们提出了一种新的微调技术,称为鲁棒临界微调(RIFT),其目的是利用非鲁棒临界模块的冗余能力来提高泛化能力,同时保持对抗鲁棒性。 RiFT包括三个步骤:(1)模块稳健关键性表征,计算每个模块的MRC值,并识别非稳健关键模块。(2)非鲁棒关键模块微调,通过标准示例微调其权重,从而利用非鲁棒关键模块的冗余容量。(3)通过插值来减轻鲁棒性-泛化权衡,该插值在对抗性训练参数和微调参数之间进行插值,以找到最佳权重,从而在保持对抗性鲁棒性的同时最大限度地提高泛化能力。

Experimental results demonstrate that RiFT significantly improves both the generalization performance and OOD robustness by around 222% while maintaining or even improving the adversarial robustness of the original models. Furthermore, we also incorporate RiFT to other adversarial training regimes such as TRADES [55], MART [46], AT-AWP [48], and SCORE [31], and show that such incorporation leads to further enhancements. More importantly, our experiments reveal several noteworthy insights. First, we found that fine-tuning on non-robust-critical modules can effectively mitigate the trade-off between adversarial robustness and generalization, showing that these two can both be improved (Section 5.3). As illustrated in Figure 1, adversarial robustness increases alongside the generalization in the initial interpolation procedure, indicating that the features learned by fine-tuning can benefit both generalization and adversarial robustness. This contradicts the previous claim [44] that the features learned by optimal standard and robust classifiers are fundamentally different. Second, the existence of non-robust-critical modules suggests that current adversarial training regimes do not fully utilize the capacity of DNNs (Section 5.2). This motivates future work to design more efficient adversarial training approaches using such capacity. Third, while previous study [24] reported that fine-tuning on pre-train models could distort the learned robust features and result in poor performance on OOD samples, we find that fine-tuning adversarially trained models do NOT lead to worse OOD performance (Section 5.3).
实验结果表明,RIFT显著提高了泛化性能和OOD鲁棒性约 222 %,同时保持甚至提高了原始模型的对抗鲁棒性。此外,我们还将RIFT纳入其他对抗性训练机制,如TRADES [ 55],MART [ 46],AT-AWP [ 48]和SCORE [ 31],并表明这种结合会导致进一步的增强。更重要的是,我们的实验揭示了几个值得注意的见解。首先,我们发现对非鲁棒关键模块进行微调可以有效地减轻对抗鲁棒性和泛化之间的权衡,这表明这两者都可以得到改善(第5.3节)。如图1所示,对抗鲁棒性随着初始插值过程中的泛化而增加,这表明通过微调学习的特征可以使泛化和对抗鲁棒性都受益。 这与之前的主张[ 44]相矛盾,即最佳标准和鲁棒分类器学习的特征是根本不同的。其次,非鲁棒关键模块的存在表明,当前的对抗性训练机制没有充分利用DNN的能力(第5.2节)。这促使未来的工作设计更有效的对抗性训练方法使用这种能力。第三,虽然之前的研究[ 24]报告说,对预训练模型进行微调可能会扭曲学习到的鲁棒特征,并导致OOD样本的性能较差,但我们发现,对逆向训练模型进行微调不会导致OOD性能变差(第5.3节)。

The contribution of this work is summarized as follows:
这项工作的贡献概述如下:

  1. 1.

    Novel approach. We propose the concept of module robust criticality and verify the existence of redundant capacity for robustness in adversarially trained models. We then propose RiFT to exploit such redundancy to improve the generalization of AT models.


    1.新方法。我们提出了模块鲁棒临界度的概念,并验证了在对抗训练模型中存在冗余的鲁棒能力。然后,我们提出RIFT来利用这种冗余来提高AT模型的泛化能力。
  2. 2.

    Superior experimental results. Our approach improves both generalization and OOD robustness of AT models by around 222%. It can also be incorporated with previous AT methods to mitigate the trade-off between generalization and adversarial robustness.


    2.上级实验结果。我们的方法将AT模型的泛化和OOD鲁棒性提高了约 222 %。它也可以与以前的AT方法结合,以减轻泛化和对抗鲁棒性之间的权衡。
  3. 3.

    Interesting insights. The findings of our experiments shed light on the intricate interplay between generalization, adversarial robustness, and OOD robustness. Our work emphasizes the potential of leveraging the redundant capacity in adversarially trained models to improve generalization and robustness further, which may inspire more efficient and effective training methods to fully utilize this redundancy.


    3.有意思的见解。我们的实验结果揭示了泛化,对抗鲁棒性和OOD鲁棒性之间复杂的相互作用。我们的工作强调了利用逆向训练模型中的冗余能力来进一步提高泛化能力和鲁棒性的潜力,这可能会激发更高效和有效的训练方法来充分利用这种冗余。

2 Related Work 2相关工作

Trade-off between adversarial robustness and generalization
对抗鲁棒性和泛化之间的权衡

The existence of such trade-off has been extensively debated in the adversarial learning community [40, 44, 55, 20, 35, 31]. Despite lingering controversies, the prevalent viewpoint is that this trade-off is inherent. Theoretical analyses [44, 35, 20] demonstrated that the trade-off provably exists even in simple cases, e.g., binary classification and linear regression. To address this trade-off, various methods have been proposed during adversarial training, such as instance reweighting [56], robust self-training [35], incorporating unlabeled data [7, 18], and redefining the robust loss function [55, 46, 48, 31]. This paper presents a novel post-processing approach that exploits the excess capacity of the model after adversarial training to address such trade-off. Our RiFT can be used in conjunction with existing adversarial training techniques, providing a practical and effective way to mitigate the trade-off further.
这种权衡的存在已经在对抗性学习社区中进行了广泛的辩论[ 40,44,55,20,35,31]。尽管争议挥之不去,但普遍的观点是,这种权衡是固有的。理论分析[44,35,20]表明,即使在简单的情况下,也可以证明存在权衡,例如,二元分类和线性回归。为了解决这种权衡,在对抗训练期间提出了各种方法,例如实例重新加权[ 56],鲁棒自训练[ 35],合并未标记数据[7,18]以及重新定义鲁棒损失函数[55,46,48,31]。本文提出了一种新的后处理方法,该方法利用对抗训练后模型的过剩容量来解决这种权衡。我们的Rift可以与现有的对抗训练技术结合使用,提供了一种实用有效的方法来进一步减轻这种权衡。

Redundant Fitting Capacity
冗余配件容量

The over-parameterized deep neural networks (DNNs) exhibit striking fitting power even for random labels [52, 3]. Recent studies have shown that not all modules contribute equally to the generalization ability of DNNs [45, 38, 53, 9], indicating the redundant fitting capacity for generalization. Veit et al. [45] found that some blocks can be deleted or permuted without degrading the test performance too much. Rosenfeld and Tsotsos [38] demonstrated that one could achieve comparable performance by training only a small fraction of network parameters. Further, recent studies have identified certain neural network modules, referred to as robust modules [53, 9], rewinding their parameters to initial values results in a negligible decline in generalization. Previous studies have proposed methods to reduce the computational and storage costs of deep neural networks by removing the redundant capacity for generalization while preserving comparable performance, such as compression [16] and distillation [19]. In contrast, our work focuses on the redundant capacity for robustness of adversarially trained models and tries to exlpoit such redundancy.
过度参数化的深度神经网络(DNN)即使对于随机标签也表现出惊人的拟合能力[52,3]。最近的研究表明,并非所有模块都对DNN的泛化能力有同等贡献[45,38,53,9],这表明泛化的冗余拟合能力。Veit等人[45]发现可以删除或置换某些块,而不会过多降低测试性能。Rosenfeld和Tsotsos [38]证明,只需训练一小部分网络参数就可以实现相当的性能。此外,最近的研究已经确定了某些神经网络模块,称为鲁棒模块[53,9],将其参数倒回初始值会导致泛化能力的下降可以忽略不计。 以前的研究已经提出了通过去除冗余泛化能力来降低深度神经网络的计算和存储成本的方法,同时保留相当的性能,例如压缩[ 16]和蒸馏[ 19]。相比之下,我们的工作集中在冗余能力的对抗训练模型的鲁棒性,并试图exlpoit这样的冗余。

Fine-tuning Methods 微调方法

Pre-training on large scale datasets has been shown to be a powerful approach for developing high-performing deep learning models [5, 12, 34, 22]. Fine-tuning is a widely adopted approach to enhance the transferability of pre-trained models to downstream tasks and domain shifts. Typically, fine-tuning methods involve fine-tuning the last layer (linear probing) [1, 24] or all layers (fully fine-tuning) [1, 18, 29, 24]. Salman et al. [39] demonstrated that both fully fine-tuning and linear probing of adversarially trained models can improve the transfer performance on downstream tasks. Nevertheless, recent studies [2, 47, 24] have suggested that fine-tuning can degrade pre-trained features and underperformance on out-of-distribution (OOD) samples. To address this issue, different fine-tuning techniques are proposed such as WiSE-FT [47] and surgical fine-tuning [27] that either leveraged ensemble learning or selective fine-tuning for better OOD performance. Kumar et al. [24] suggested the two-step strategy of linear probing then full fine-tuning (LP-FT) combines the benefits of both fully fine-tuning and linear probing.
在大规模数据集上进行预训练已被证明是开发高性能深度学习模型的强大方法[ 5,12,34,22]。微调是一种广泛采用的方法,用于增强预训练模型到下游任务和域转移的可移植性。通常,微调方法涉及微调最后一层(线性探测)[ 1,24]或所有层(完全微调)[ 1,18,29,24]。Salman等人[ 39]证明了完全微调和对抗训练模型的线性探测都可以提高下游任务的传输性能。然而,最近的研究[ 2,47,24]表明,微调可能会降低预训练的特征,并在分布外(OOD)样本上表现不佳。为了解决这个问题,提出了不同的微调技术,例如WiSE-FT [ 47]和外科微调[ 27],它们利用集成学习或选择性微调以获得更好的OOD性能。Kumar等人[ 24]提出了线性探测的两步策略,然后完全微调(LP-FT)结合了完全微调和线性探测的优点。

3 Module Robust Criticality
3模块强大的关键性

Improving the generalization of adversarially trained models requires a thorough understanding of DNNs, which, however, proves to be difficult due to the lack of explainability. Luckily, recent studies show that specific modules in neural networks, referred to as critical modules [53, 9], significantly impact model generalization if their parameters are rewound to initial values. In this work, we propose a metric called Module Robust Criticality (MRC) to evaluate the robustness contribution of each module explicitly.
提高对抗训练模型的泛化能力需要彻底理解DNN,然而,由于缺乏可解释性,这被证明是困难的。幸运的是,最近的研究表明,神经网络中的特定模块,称为关键模块[ 53,9],如果其参数被倒回初始值,则会显著影响模型泛化。在这项工作中,我们提出了一个度量称为模块鲁棒关键性(MRC)明确评估每个模块的鲁棒性贡献。

3.1 Preliminaries

We denote a l𝑙l-layered DNN as f(𝜽)=ϕ(𝒙(l);𝜽(l))ϕ(𝒙(1);𝜽(1))𝑓𝜽italic-ϕsuperscript𝒙𝑙superscript𝜽𝑙italic-ϕsuperscript𝒙1superscript𝜽1f({\bm{\theta}})=\phi({\bm{x}}^{(l)};{\bm{\theta}}^{(l)})\circ\ldots\circ\phi({\bm{x}}^{(1)};{\bm{\theta}}^{(1)}), where 𝜽(i)superscript𝜽𝑖{\bm{\theta}}^{(i)} is the parameter of i𝑖i-th layer and ϕ()italic-ϕ\phi(\cdot) denotes the activation function. We use 𝜽ATsubscript𝜽𝐴𝑇{\bm{\theta}}_{AT} and 𝜽FTsubscript𝜽𝐹𝑇{\bm{\theta}}_{FT} to denote the weights of the adversarially trained and fine-tuned model, respectively. We use 𝒟={(𝒙1,y1),,(𝒙n,yn)}𝒟subscript𝒙1subscript𝑦1subscript𝒙𝑛subscript𝑦𝑛\mathcal{D}=\{({\bm{x}}_{1},y_{1}),...,({\bm{x}}_{n},y_{n})\} to denote a dataset and 𝒟𝑠𝑡𝑑subscript𝒟𝑠𝑡𝑑\mathcal{D}_{\mathit{std}} means a standard dataset such as CIFAR10. The cross-entropy loss is denoted by \mathcal{L} and psubscriptdelimited-∥∥𝑝\lVert\cdot\rVert_{p} is denoted as the psubscript𝑝\ell_{p} norm.
我们将 l𝑙l 层DNN表示为 f(𝜽)=ϕ(𝒙(l);𝜽(l))ϕ(𝒙(1);𝜽(1))𝑓𝜽italic-ϕsuperscript𝒙𝑙superscript𝜽𝑙italic-ϕsuperscript𝒙1superscript𝜽1f({\bm{\theta}})=\phi({\bm{x}}^{(l)};{\bm{\theta}}^{(l)})\circ\ldots\circ\phi({\bm{x}}^{(1)};{\bm{\theta}}^{(1)}) ,其中 𝜽(i)superscript𝜽𝑖{\bm{\theta}}^{(i)} 是第 i𝑖i 层的参数,并且 ϕ()italic-ϕ\phi(\cdot) 表示激活函数。我们使用 𝜽ATsubscript𝜽𝐴𝑇{\bm{\theta}}_{AT}𝜽FTsubscript𝜽𝐹𝑇{\bm{\theta}}_{FT} 分别表示经过逆向训练和微调的模型的权重。我们使用 𝒟={(𝒙1,y1),,(𝒙n,yn)}𝒟subscript𝒙1subscript𝑦1subscript𝒙𝑛subscript𝑦𝑛\mathcal{D}=\{({\bm{x}}_{1},y_{1}),...,({\bm{x}}_{n},y_{n})\} 表示数据集,而 𝒟𝑠𝑡𝑑subscript𝒟𝑠𝑡𝑑\mathcal{D}_{\mathit{std}} 表示标准数据集,如CIFAR10。交叉熵损失由 \mathcal{L} 表示,并且 psubscriptdelimited-∥∥𝑝\lVert\cdot\rVert_{p} 表示为 psubscript𝑝\ell_{p} 范数。

Let Δ𝒙𝒮Δ𝒙𝒮\Delta{\bm{x}}\in\mathcal{S} denote the adversarial perturbation applied to a clean input 𝒙𝒙{\bm{x}}, where 𝒮𝒮\mathcal{S} represents the allowed range of input perturbations. Given a neural network f(𝜽)𝑓𝜽f({\bm{\theta}}) and a dataset 𝒟𝒟\mathcal{D}, adversarial training aims to minimize the robust loss [28] as:
Δ𝒙𝒮Δ𝒙𝒮\Delta{\bm{x}}\in\mathcal{S} 表示应用于干净输入 𝒙𝒙{\bm{x}} 的对抗扰动,其中 𝒮𝒮\mathcal{S} 表示输入扰动的允许范围。给定一个神经网络 f(𝜽)𝑓𝜽f({\bm{\theta}}) 和一个数据集 𝒟𝒟\mathcal{D} ,对抗训练的目标是最小化鲁棒损失[ 28]:

argmin𝜽(f(𝜽),𝒟), where (f(𝜽),𝒟)=(𝒙,y)𝒟maxΔ𝒙𝒮(f(𝜽,𝒙+Δ𝒙),y).subscript𝜽𝑓𝜽𝒟 where 𝑓𝜽𝒟subscript𝒙𝑦𝒟subscriptΔ𝒙𝒮𝑓𝜽𝒙Δ𝒙𝑦\begin{split}&\mathop{\arg\min}\limits_{{\bm{\theta}}}\mathcal{R}(f({\bm{\theta}}),\mathcal{D}),\text{ where }\\ \mathcal{R}(f({\bm{\theta}}),\mathcal{D})=&\sum\limits_{({\bm{x}},y)\in\mathcal{D}}\max_{\Delta{\bm{x}}\in\mathcal{S}}\mathcal{L}(f({\bm{\theta}},{\bm{x}}+\Delta{\bm{x}}),y).\end{split} (1)

Here, (f(𝜽),𝒟)𝑓𝜽𝒟\mathcal{R}(f({\bm{\theta}}),\mathcal{D}) is the robust loss to find the worst-case input perturbation that maximizes the cross-entropy classification error.
这里, (f(𝜽),𝒟)𝑓𝜽𝒟\mathcal{R}(f({\bm{\theta}}),\mathcal{D}) 是找到使交叉熵分类误差最大化的最坏情况输入扰动的鲁棒损失。

3.2 Module Robust Criticality
3.2模块稳健关键性

Definition 3.1 (Module Robust Criticality).

Given a weight perturbation scaling factor ϵ>0italic-ϵ0\epsilon>0 and a neural network f(𝜽)𝑓𝜽f({\bm{\theta}}), the robust criticality of a module i𝑖i is defined as

𝑀𝑅𝐶(f,𝜽(i),𝒟,ϵ)=𝑀𝑅𝐶𝑓superscript𝜽𝑖𝒟italic-ϵabsent\displaystyle\mathit{MRC}(f,{\bm{\theta}}^{(i)},\mathcal{D},\epsilon)= maxΔ𝜽𝒞𝜽(f(𝜽+Δ𝜽),𝒟)subscriptΔ𝜽subscript𝒞𝜽𝑓𝜽Δ𝜽𝒟\displaystyle\max\limits_{\Delta{\bm{\theta}}\in\mathcal{C}_{{\bm{\theta}}}}\mathcal{R}(f({\bm{\theta}}+\Delta{\bm{\theta}}),\mathcal{D})
(f(𝜽),𝒟),𝑓𝜽𝒟\displaystyle-\mathcal{R}(f({\bm{\theta}}),\mathcal{D}), (2)

where Δ𝜽={𝟎,,𝟎,Δ𝜽(i),𝟎,,𝟎}Δ𝜽00Δsuperscript𝜽𝑖00\Delta{\bm{\theta}}=\{\mathbf{0},\ldots,\mathbf{0},\Delta{\bm{\theta}}^{(i)},\mathbf{0},\ldots,\mathbf{0}\} denotes the weight perturbation with respect to the module weights 𝜽(i)superscript𝜽𝑖{\bm{\theta}}^{(i)}, 𝒞𝜽={Δ𝜽|Δ𝜽pϵ𝜽(i)p}subscript𝒞𝜽conditional-setΔ𝜽subscriptdelimited-∥∥Δ𝜽𝑝italic-ϵsubscriptdelimited-∥∥superscript𝜽𝑖𝑝\mathcal{C}_{{\bm{\theta}}}=\{\Delta{\bm{\theta}}\,\big{|}\,\lVert\Delta{\bm{\theta}}\rVert_{p}\leq\epsilon\lVert{\bm{\theta}}^{(i)}\rVert_{p}\}, ()\mathcal{R}(\cdot) is the robust loss defined in Eq. (1).
其中, Δ𝜽={𝟎,,𝟎,Δ𝜽(i),𝟎,,𝟎}Δ𝜽00Δsuperscript𝜽𝑖00\Delta{\bm{\theta}}=\{\mathbf{0},\ldots,\mathbf{0},\Delta{\bm{\theta}}^{(i)},\mathbf{0},\ldots,\mathbf{0}\} 表示相对于模块权重 𝜽(i)superscript𝜽𝑖{\bm{\theta}}^{(i)}𝒞𝜽={Δ𝜽|Δ𝜽pϵ𝜽(i)p}subscript𝒞𝜽conditional-setΔ𝜽subscriptdelimited-∥∥Δ𝜽𝑝italic-ϵsubscriptdelimited-∥∥superscript𝜽𝑖𝑝\mathcal{C}_{{\bm{\theta}}}=\{\Delta{\bm{\theta}}\,\big{|}\,\lVert\Delta{\bm{\theta}}\rVert_{p}\leq\epsilon\lVert{\bm{\theta}}^{(i)}\rVert_{p}\}()\mathcal{R}(\cdot) 的权重扰动,是在等式(1)中定义的鲁棒损失。(1).


定义3.1(模块稳健关键性)。给定权重扰动比例因子 ϵ>0italic-ϵ0\epsilon>0 和神经网络 f(𝜽)𝑓𝜽f({\bm{\theta}}) ,模块 i𝑖i 的鲁棒关键性被定义为:

The MRC value for each module represents how they are critically contributing to model adversarial robustness. The module with the lowest MRC value is considered redundant, as changing its weights has a negligible effect on robustness degradation. We refer to this module as the non-robust-critical module. Intuitively, MRC serves as an upper bound for weight changing of a particular module, as demonstrated in Theorem 3.1. Since we do not know the optimization directions and how they might affect the model robustness to adversarial examples, we measure the extent to which worst-case weight perturbations affect the robustness, providing an upper bound loss for optimizing the weight. Further, the MRC for a module depicts the sharpness of robust loss landscape [48, 41] around the minima 𝜽(i)superscript𝜽𝑖{\bm{\theta}}^{(i)}. If the MRC score is high, it means that the robust loss landscape with respect to 𝜽(i)superscript𝜽𝑖{\bm{\theta}}^{(i)} is sharp, and fine-tuning this module is likely to hurt the adversarial robustness.
每个模块的MRC值表示它们如何对模型对抗鲁棒性做出关键贡献。具有最低MRC值的模块被认为是冗余的,因为改变其权重对鲁棒性降低的影响可以忽略不计。我们把这个模块称为非鲁棒关键模块。直观地说,MRC用作特定模块的权重变化的上限,如定理3.1所示。由于我们不知道优化方向以及它们如何影响模型对对抗性示例的鲁棒性,我们测量了最坏情况下权重扰动影响鲁棒性的程度,为优化权重提供了上限损失。此外,模块的MRC描绘了最小值 𝜽(i)superscript𝜽𝑖{\bm{\theta}}^{(i)} 周围的鲁棒损失景观[48,41]的锐度。如果MRC分数很高,这意味着关于 𝜽(i)superscript𝜽𝑖{\bm{\theta}}^{(i)} 的鲁棒损失前景是尖锐的,并且微调这个模块可能会损害对抗鲁棒性。

Theorem 3.1.

The MRC value for a module i𝑖i serves as an upper bound for the robust loss increase when we optimize the module under constraint 𝒞𝜽subscript𝒞𝜽\mathcal{C}_{{\bm{\theta}}}:

(f(𝜽),𝒟)(f(𝜽),𝒟)𝑀𝑅𝐶(f,𝜽(i),𝒟,ϵ),𝑓superscript𝜽𝒟𝑓𝜽𝒟𝑀𝑅𝐶𝑓superscript𝜽𝑖𝒟italic-ϵ\displaystyle\mathcal{R}(f({\bm{\theta}}^{*}),\mathcal{D})-\mathcal{R}(f({\bm{\theta}}),\mathcal{D})\leq\mathit{MRC}(f,{\bm{\theta}}^{(i)},\mathcal{D},\epsilon),
where 𝜽=argmin𝜽,(𝜽𝜽)𝒞𝜽(𝒙,y)𝒟(f(𝜽,x),y).where superscript𝜽subscriptsuperscript𝜽superscript𝜽𝜽subscript𝒞𝜽subscript𝒙𝑦𝒟𝑓superscript𝜽𝑥𝑦\displaystyle\text{where }\,{\bm{\theta}}^{*}=\mathop{\arg\min}\limits_{{\bm{\theta}}^{\prime},({\bm{\theta}}^{\prime}-{\bm{\theta}})\in\mathcal{C}_{{\bm{\theta}}}}\sum\limits_{({\bm{x}},y)\in\mathcal{D}}\mathcal{L}(f({\bm{\theta}}^{\prime},x),y). (3)

定理3.1.当我们在约束 𝒞𝜽subscript𝒞𝜽\mathcal{C}_{{\bm{\theta}}} 下优化模块时,模块 i𝑖i 的MRC值用作鲁棒损耗增加的上限:
Proof.

By the definition of MRC, for any weights (𝜽𝜽)𝒞𝜽superscript𝜽𝜽subscript𝒞𝜽({\bm{\theta}}^{\prime}-{\bm{\theta}})\in\mathcal{C}_{{\bm{\theta}}}, we have:

(f(𝜽),𝒟)(f(𝜽),𝒟)𝑀𝑅𝐶(f,𝜽(i),𝒟,ϵ).𝑓superscript𝜽𝒟𝑓𝜽𝒟𝑀𝑅𝐶𝑓superscript𝜽𝑖𝒟italic-ϵ\mathcal{R}(f({\bm{\theta}}^{\prime}),\mathcal{D})-\mathcal{R}(f({\bm{\theta}}),\mathcal{D})\leq\mathit{MRC}(f,{\bm{\theta}}^{(i)},\mathcal{D},\epsilon). (4)

Thus, for the optimized weights:
因此,对于优化的权重:

𝜽=argmin𝜽,(𝜽𝜽)𝒞𝜽(𝒙,y)𝒟(f(𝜽,x),y),superscript𝜽subscriptsuperscript𝜽superscript𝜽𝜽subscript𝒞𝜽subscript𝒙𝑦𝒟𝑓superscript𝜽𝑥𝑦\displaystyle{\bm{\theta}}^{*}=\mathop{\arg\min}\limits_{{\bm{\theta}}^{\prime},({\bm{\theta}}^{\prime}-{\bm{\theta}})\in\mathcal{C}_{{\bm{\theta}}}}\sum\limits_{({\bm{x}},y)\in\mathcal{D}}\mathcal{L}(f({\bm{\theta}}^{\prime},x),y), (5)

it satisfies 它满足

(f(𝜽),𝒟)(f(𝜽),𝒟)𝑀𝑅𝐶(f,𝜽(i),𝒟,ϵ).𝑓superscript𝜽𝒟𝑓𝜽𝒟𝑀𝑅𝐶𝑓superscript𝜽𝑖𝒟italic-ϵ\displaystyle\mathcal{R}(f({\bm{\theta}}^{*}),\mathcal{D})-\mathcal{R}(f({\bm{\theta}}),\mathcal{D})\leq\mathit{MRC}(f,{\bm{\theta}}^{(i)},\mathcal{D},\epsilon). (6)

Such that the proof ends. ∎
这样证明就结束了。∎


证据根据MRC的定义,对于任何权重 (𝜽𝜽)𝒞𝜽superscript𝜽𝜽subscript𝒞𝜽({\bm{\theta}}^{\prime}-{\bm{\theta}})\in\mathcal{C}_{{\bm{\theta}}} ,我们有:

Remark: 备注:

The definition of MRC is similar in spirit to the work of Zhang et al. [53] and Chatterji et al. [9]. However, MRC differs fundamentally from them in two aspects. First, MRC aims to capture the influence of a module on adversarial robustness, while Zhang et al. [53] and Chatterji et al. [9] focus on studying the impact of a module on generalization. Second, MRC investigates the robustness characteristics of module weights under worst-case weight perturbations, whereas Zhang et al. [53] and Chatterji et al. [9] analyzed the properties of a module by rewinding its weights to their initial values. Similar to [25, 41], we define the weight perturbation constraint C𝜽subscript𝐶𝜽C_{{\bm{\theta}}} as a multiple of the psubscript𝑝\ell_{p} norm of original parameters, which ensures the scale-invariant property and allows us to compare the robust criticality of modules across different layers, see Appendix A for a detailed proof.
MRC的定义在精神上与Zhang等人的工作相似。[ 53]和Chatterji et al. [ 9]。然而,MRC在两个方面与它们有根本的不同。首先,MRC旨在捕捉模块对对抗鲁棒性的影响,而Zhang et al. [ 53]和Chatterji et al. [ 9]重点研究模块对泛化的影响。其次,MRC研究了最坏情况下权重扰动下模块权重的鲁棒性特征,而Zhang et al. [ 53]和Chatterji et al. [ 9]通过将其权重倒回其初始值来分析模块的属性。与[25,41]类似,我们将权重扰动约束 C𝜽subscript𝐶𝜽C_{{\bm{\theta}}} 定义为原始参数的 psubscript𝑝\ell_{p} 范数的倍数,这确保了尺度不变属性,并允许我们比较不同层模块的鲁棒临界性,请参见附录A以获得详细证明。

Theorem 3.1 establishes a clear upper bound for fine-tuning particular modules. This theorem assures us that fine-tuning on non-robust-critical modules shouldn’t harm the model robustness. However, it does not ascertain if fine-tuning the robust-critical module will lead to a significant decline in robust accuracy.
定理3.1建立了一个明确的上限微调特定的模块。这个定理向我们保证,对非鲁棒关键模块的微调不应该损害模型的鲁棒性。然而,它没有确定是否微调的鲁棒关键模块将导致鲁棒精度显着下降。

3.3 Relaxation of MRC 3.3 MRC松弛

Optimizing in Eq. (3.1) requires simultaneously finding worst-case weight perturbation Δ𝜽Δ𝜽\Delta{\bm{\theta}} and worst-case input perturbation Δ𝒙Δ𝒙\Delta{\bm{x}}, which is time-consuming. Thus, we propose a relaxation version by fixing Δ𝒙Δ𝒙\Delta{\bm{x}} at the initial optimizing phase. Concretely, we first calculate the adversarial examples Δ𝒙Δ𝒙\Delta{\bm{x}} with respect to 𝜽ATsubscript𝜽𝐴𝑇{\bm{\theta}}_{AT}. By fixing the adversarial examples unchanged during the optimization, we iteratively optimize the Δ𝜽Δ𝜽\Delta{\bm{\theta}} by gradient ascent method to maximize the robust loss to find the optimal Δ𝜽Δ𝜽\Delta{\bm{\theta}}. We set a weight perturbation constraint and check it after each optimization step. If the constraint is violated, we project the perturbation onto the constraint set. The pseudo-code is described in Algorithm 1. In our experiments, if not specified, we set p=2subscriptdelimited-∥∥𝑝subscriptdelimited-∥∥2\lVert\cdot\rVert_{p}=\lVert\cdot\rVert_{2} and ϵ=0.1italic-ϵ0.1\epsilon=0.1 for C𝜽subscript𝐶𝜽C_{{\bm{\theta}}}, the iterative step for optimizing Δ𝜽Δ𝜽\Delta{\bm{\theta}} is 101010.
优化Eq。(3.1)需要同时找到最坏情况的权重扰动 Δ𝜽Δ𝜽\Delta{\bm{\theta}} 和最坏情况的输入扰动 Δ𝒙Δ𝒙\Delta{\bm{x}} ,这是耗时的。因此,我们提出了一个放松版本,在初始优化阶段固定 Δ𝒙Δ𝒙\Delta{\bm{x}} 。具体地说,我们首先计算对抗样本 Δ𝒙Δ𝒙\Delta{\bm{x}} 相对于 𝜽ATsubscript𝜽𝐴𝑇{\bm{\theta}}_{AT} 。通过在优化过程中固定对抗样本不变,我们通过梯度上升法迭代优化 Δ𝜽Δ𝜽\Delta{\bm{\theta}} ,以最大化鲁棒损失,找到最佳的 Δ𝜽Δ𝜽\Delta{\bm{\theta}} 。我们设置一个权重扰动约束,并在每个优化步骤后检查它。如果约束被违反,我们将扰动投影到约束集上。在算法1中描述了伪码。在我们的实验中,如果没有指定,我们为 C𝜽subscript𝐶𝜽C_{{\bm{\theta}}} 设置 p=2subscriptdelimited-∥∥𝑝subscriptdelimited-∥∥2\lVert\cdot\rVert_{p}=\lVert\cdot\rVert_{2}ϵ=0.1italic-ϵ0.1\epsilon=0.1 ,优化 Δ𝜽Δ𝜽\Delta{\bm{\theta}} 的迭代步骤是 101010

Algorithm 1 Module Robust Criticality Characterization
算法1模块鲁棒关键性表征
1:neural network f𝑓f, adversarially trained model weights 𝜽ATsubscript𝜽𝐴𝑇{\bm{\theta}}_{AT}, desired module i𝑖i’s weights 𝜽(i)superscript𝜽𝑖{\bm{\theta}}^{(i)}, standard dataset 𝒟stdsubscript𝒟𝑠𝑡𝑑\mathcal{D}_{std}, weight perturbation scaling factor ϵitalic-ϵ\epsilon, optimization iteration steps T𝑇T, learning rate γ𝛾\gamma.
1:神经网络 f𝑓f ,逆向训练的模型权重 𝜽ATsubscript𝜽𝐴𝑇{\bm{\theta}}_{AT} ,期望模块 i𝑖i 的权重 𝜽(i)superscript𝜽𝑖{\bm{\theta}}^{(i)} ,标准数据集 𝒟stdsubscript𝒟𝑠𝑡𝑑\mathcal{D}_{std} ,权重扰动比例因子 ϵitalic-ϵ\epsilon ,优化迭代步骤 T𝑇T ,学习速率 γ𝛾\gamma
2:The module robust criticality of module i𝑖i.
2:模块 i𝑖i 的模块稳健关键性。
3:Initialize adversarial dataset: 𝒟adv={}subscript𝒟𝑎𝑑𝑣\mathcal{D}_{adv}=\{\}
3:初始化对抗数据集: 𝒟adv={}subscript𝒟𝑎𝑑𝑣\mathcal{D}_{adv}=\{\}
4:for Batch k𝒟stdsubscript𝑘subscript𝒟𝑠𝑡𝑑\mathcal{B}_{k}\in\mathcal{D}_{std} do
4:对于批号0#,
\triangleright Generate adversarial dataset
\triangleright 生成对抗数据集
5:     kadvsuperscriptsubscript𝑘𝑎𝑑𝑣\mathcal{B}_{k}^{adv} = PGD-10(𝜽AT,ksubscript𝜽𝐴𝑇subscript𝑘{\bm{\theta}}_{AT},\mathcal{B}_{k})
5: kadvsuperscriptsubscript𝑘𝑎𝑑𝑣\mathcal{B}_{k}^{adv} = PGD-10( 𝜽AT,ksubscript𝜽𝐴𝑇subscript𝑘{\bm{\theta}}_{AT},\mathcal{B}_{k}
6:     𝒟adv=𝒟advkadvsubscript𝒟𝑎𝑑𝑣subscript𝒟𝑎𝑑𝑣superscriptsubscript𝑘𝑎𝑑𝑣\mathcal{D}_{adv}=\mathcal{D}_{adv}\bigcup\mathcal{B}_{k}^{adv}
7:end for 7:结束
8:Freeze all parameters of 𝜽ATsubscript𝜽𝐴𝑇{\bm{\theta}}_{AT} except for 𝜽(i)superscript𝜽𝑖{\bm{\theta}}^{(i)}
8:冻结除 𝜽(i)superscript𝜽𝑖{\bm{\theta}}^{(i)} 外的所有 𝜽ATsubscript𝜽𝐴𝑇{\bm{\theta}}_{AT} 参数
9:𝜽1=𝜽ATsubscript𝜽1subscript𝜽𝐴𝑇{\bm{\theta}}_{1}={\bm{\theta}}_{AT}
10:for t=1,,T𝑡1𝑇t=1,\ldots,T do 10:对于 t=1,,T𝑡1𝑇t=1,\ldots,T do \triangleright Iterate T𝑇T epochs
\triangleright 迭代 T𝑇T epochs
11:     𝜽t+1=𝜽tsubscript𝜽𝑡1subscript𝜽𝑡{\bm{\theta}}_{t+1}={\bm{\theta}}_{t}
12:     for Batch kadv𝒟advsuperscriptsubscript𝑘𝑎𝑑𝑣subscript𝒟𝑎𝑑𝑣\mathcal{B}_{k}^{adv}\in\mathcal{D}_{adv} do
12:对于批号0#,
13:         Calculate Loss: (f,𝜽t,kadv))\mathcal{L}(f,{\bm{\theta}}_{t},\mathcal{B}_{k}^{adv})) 13:计算损失: (f,𝜽t,kadv))\mathcal{L}(f,{\bm{\theta}}_{t},\mathcal{B}_{k}^{adv}))
14:         𝜽t+1=𝜽t+1+γ𝜽t()subscript𝜽𝑡1subscript𝜽𝑡1𝛾subscriptsubscript𝜽𝑡{\bm{\theta}}_{t+1}={\bm{\theta}}_{t+1}+\gamma\nabla_{{\bm{\theta}}_{t}}(\mathcal{L}) \triangleright Gradient Ascent   \triangleright 梯度上升
15:     end for 15:结束
16:     Δ𝜽(i)=𝜽t+1(i)𝜽AT(i)Δsuperscript𝜽𝑖superscriptsubscript𝜽𝑡1𝑖superscriptsubscript𝜽𝐴𝑇𝑖\Delta{\bm{\theta}}^{(i)}={\bm{\theta}}_{t+1}^{(i)}-{\bm{\theta}}_{AT}^{(i)} \triangleright Check perturb constraint
\triangleright 检查扰动约束
17:     if Δ𝜽(i)2ϵ𝜽AT(i)2subscriptdelimited-∥∥Δsuperscript𝜽𝑖2italic-ϵsubscriptdelimited-∥∥superscriptsubscript𝜽𝐴𝑇𝑖2\lVert\Delta{\bm{\theta}}^{(i)}\rVert_{2}\geq\epsilon\lVert{\bm{\theta}}_{AT}^{(i)}\rVert_{2} then 17:如果 Δ𝜽(i)2ϵ𝜽AT(i)2subscriptdelimited-∥∥Δsuperscript𝜽𝑖2italic-ϵsubscriptdelimited-∥∥superscriptsubscript𝜽𝐴𝑇𝑖2\lVert\Delta{\bm{\theta}}^{(i)}\rVert_{2}\geq\epsilon\lVert{\bm{\theta}}_{AT}^{(i)}\rVert_{2} ,则
18:         Δ𝜽(i)=ϵ𝜽AT(i)2Δ𝜽(i)2Δ𝜽(i)Δsuperscript𝜽𝑖italic-ϵsubscriptdelimited-∥∥superscriptsubscript𝜽𝐴𝑇𝑖2subscriptdelimited-∥∥Δsuperscript𝜽𝑖2Δsuperscript𝜽𝑖\Delta{\bm{\theta}}^{(i)}=\epsilon\frac{\lVert{\bm{\theta}}_{AT}^{(i)}\rVert_{2}}{\lVert\Delta{\bm{\theta}}^{(i)}\rVert_{2}}\Delta{\bm{\theta}}^{(i)}
19:         𝜽t+1=𝜽t+Δ𝜽(i)subscript𝜽𝑡1subscript𝜽𝑡Δsuperscript𝜽𝑖{\bm{\theta}}_{t+1}={\bm{\theta}}_{t}+\Delta{\bm{\theta}}^{(i)}
20:         break 20:休息
21:     end if 21:如果结束
22:end for 22:结束
23:𝑀𝑅𝐶(𝜽(i))=(f,𝜽T,𝒟𝑎𝑑𝑣)(f,𝜽AT,𝒟𝑎𝑑𝑣)𝑀𝑅𝐶superscript𝜽𝑖𝑓subscript𝜽𝑇subscript𝒟𝑎𝑑𝑣𝑓subscript𝜽𝐴𝑇subscript𝒟𝑎𝑑𝑣\mathit{MRC}({\bm{\theta}}^{(i)})=\mathcal{L}(f,{\bm{\theta}}_{T},\mathcal{D}_{\mathit{adv}})-\mathcal{L}(f,{\bm{\theta}}_{AT},\mathcal{D}_{\mathit{adv}})
24:Return 𝑀𝑅𝐶(θ(i))𝑀𝑅𝐶superscript𝜃𝑖\mathit{MRC}({\bm{\theta}}^{(i)}) 第24章:返回 𝑀𝑅𝐶(θ(i))𝑀𝑅𝐶superscript𝜃𝑖\mathit{MRC}({\bm{\theta}}^{(i)})
Refer to caption
Figure 2: The pipeline of our proposed Robust Critical Fine-Tuning (RiFT).
图2:我们提出的鲁棒临界微调(RiFT)的管道。

4 RiFT: Robust Critical Fine-tuning
4 RIFT:稳健的关键微调

In this paper, we propose RiFT, a robust critical fine-tuning approach that leverages MRC to guide the fine-tuning of a deep neural network to improve both generalization and robustness. Let 𝒫𝑎𝑑𝑣(x,y)subscript𝒫𝑎𝑑𝑣𝑥𝑦\mathcal{P}_{\mathit{adv}}(x,y) and 𝒫𝑠𝑡𝑑(x,y)subscript𝒫𝑠𝑡𝑑𝑥𝑦\mathcal{P}_{\mathit{std}}(x,y) denote the distributions of adversarial and standard inputs, respectively. Then, applying an adversarially trained model on 𝒫𝑎𝑑𝑣(x,y)subscript𝒫𝑎𝑑𝑣𝑥𝑦\mathcal{P}_{\mathit{adv}}(x,y) to 𝒫𝑠𝑡𝑑(x,y)subscript𝒫𝑠𝑡𝑑𝑥𝑦\mathcal{P}_{\mathit{std}}(x,y) can be viewed as a distributional shift problem. Thus, it is natural for RiFT to exploit the redundant capacity to fine-tune adversarially trained models on the standard dataset.
在本文中,我们提出了RIFT,这是一种鲁棒的关键微调方法,它利用MRC来指导深度神经网络的微调,以提高泛化能力和鲁棒性。令 𝒫𝑎𝑑𝑣(x,y)subscript𝒫𝑎𝑑𝑣𝑥𝑦\mathcal{P}_{\mathit{adv}}(x,y)𝒫𝑠𝑡𝑑(x,y)subscript𝒫𝑠𝑡𝑑𝑥𝑦\mathcal{P}_{\mathit{std}}(x,y) 分别表示对抗输入和标准输入的分布。然后,在 𝒫𝑎𝑑𝑣(x,y)subscript𝒫𝑎𝑑𝑣𝑥𝑦\mathcal{P}_{\mathit{adv}}(x,y)𝒫𝑠𝑡𝑑(x,y)subscript𝒫𝑠𝑡𝑑𝑥𝑦\mathcal{P}_{\mathit{std}}(x,y) 上应用逆向训练的模型可以被视为分布转移问题。因此,RIFT很自然地利用冗余能力来微调标准数据集上的逆向训练模型。

Specifically, RiFT consists of three steps as shown in Figure 2. First, we calculate the MRC of each module and choose the module with the lowest MRC score as our non-robust-critical module. Second, we freeze the parameters of the adversarially trained model except for our chosen non-robust-critical module. Then we fine-tune the adversarially trained models on corresponding standard dataset 𝒟𝑠𝑡𝑑subscript𝒟𝑠𝑡𝑑\mathcal{D}_{\mathit{std}}. Third, we linearly interpolate the weights of the original adversarially trained model and fine-tuned model to identify the optimal interpolation point that maximizes generalization improvement while maintaining robustness.
具体来说,RiFT由三个步骤组成,如图2所示。首先,我们计算每个模块的MRC,并选择MRC分数最低的模块作为我们的非鲁棒关键模块。第二,我们冻结逆向训练模型的参数,除了我们选择的非鲁棒关键模块。然后,我们在相应的标准数据集 𝒟𝑠𝑡𝑑subscript𝒟𝑠𝑡𝑑\mathcal{D}_{\mathit{std}} 上对逆向训练的模型进行微调。第三,我们对原始逆向训练模型和微调模型的权重进行线性插值,以确定最佳插值点,从而在保持鲁棒性的同时最大限度地提高泛化能力。

Step 1: Module robust criticality characterization
步骤1:模块耐用性关键性表征

According to the Algorithm 1, we iteratively calculate the MRC value for each module 𝜽(i)𝜽ATsuperscript𝜽𝑖subscript𝜽𝐴𝑇{\bm{\theta}}^{(i)}\in{\bm{\theta}}_{AT}, then we choose the module with the lowest MRC value, denoted as 𝜽~~𝜽\tilde{{\bm{\theta}}}:
根据算法1,我们迭代地计算每个模块 𝜽(i)𝜽ATsuperscript𝜽𝑖subscript𝜽𝐴𝑇{\bm{\theta}}^{(i)}\in{\bm{\theta}}_{AT} 的MRC值,然后我们选择具有最低MRC值的模块,表示为 𝜽~~𝜽\tilde{{\bm{\theta}}}

𝜽~=𝜽(i) where i=argmini𝑀𝑅𝐶(f,𝜽(i),𝒟,ϵ).~𝜽superscript𝜽𝑖 where 𝑖subscript𝑖𝑀𝑅𝐶𝑓superscript𝜽𝑖𝒟italic-ϵ\tilde{{\bm{\theta}}}={\bm{\theta}}^{(i)}\,\text{ where }\,i=\mathop{\arg\min}\limits_{i}{\mathit{MRC}(f,{\bm{\theta}}^{(i)},\mathcal{D},\epsilon)}. (7)

Step 2: Fine-tuning on non-robust-critical modules
步骤2:对非稳健关键模块进行微调

Next, we freeze the rest of the parameters and fine-tune on desired parameters 𝜽~~𝜽\tilde{{\bm{\theta}}}. We solve the following optimization problem by SGD with momentum [42]
接下来,我们冻结其余的参数并微调所需的参数 𝜽~~𝜽\tilde{{\bm{\theta}}} 。我们通过带有动量的SGD解决以下优化问题[ 42]

argmin𝜽~(𝒙,y)𝒟(f(x,(𝜽~;𝜽𝜽~)),y)+λ𝜽~2,subscript~𝜽subscript𝒙𝑦𝒟𝑓𝑥~𝜽𝜽~𝜽𝑦𝜆subscriptdelimited-∥∥~𝜽2\displaystyle\mathop{\arg\min}\limits_{\tilde{{\bm{\theta}}}}\sum\limits_{({\bm{x}},y)\in\mathcal{D}}\mathcal{L}(f(x,(\tilde{{\bm{\theta}}};{\bm{\theta}}\setminus\tilde{{\bm{\theta}}})),y)+\lambda\lVert\tilde{{\bm{\theta}}}\rVert_{2}, (8)

where λ𝜆\lambda is the 2subscript2\ell_{2} weight decay factor.
其中 λ𝜆\lambda2subscript2\ell_{2} 权重衰减因子。

Step 3: Mitigating robustness-generalization trade-off via interpolation
步骤3:通过插值减轻鲁棒性-泛化权衡

For a interpolation coefficient α𝛼\alpha, the interpolated weights is calculated as:
对于内插系数 α𝛼\alpha ,内插权重被计算为:

𝜽α=(1α)𝜽AT+α𝜽FT,subscript𝜽𝛼1𝛼subscript𝜽𝐴𝑇𝛼subscript𝜽𝐹𝑇\displaystyle{\bm{\theta}}_{\alpha}=(1-\alpha){\bm{\theta}}_{AT}+\alpha{\bm{\theta}}_{FT}, (9)

where 𝜽ATsubscript𝜽𝐴𝑇{\bm{\theta}}_{AT} is the initial adversarially trained weights and 𝜽FTsubscript𝜽𝐹𝑇{\bm{\theta}}_{FT} is the fine-tuned weights obtained by Eq. (8). Since our goal is to improve the generalization while preserving adversarial robustness, thus the best interpolation point is chosen to be the point that most significantly improves the generalization while the corresponding adversarial robustness is no less than the original robustness by 0.10.10.1.
其中 𝜽ATsubscript𝜽𝐴𝑇{\bm{\theta}}_{AT} 是初始对抗训练的权重, 𝜽FTsubscript𝜽𝐹𝑇{\bm{\theta}}_{FT} 是通过等式2获得的微调权重。(8)。由于我们的目标是在保持对抗鲁棒性的同时提高泛化能力,因此最佳插值点被选择为最显着提高泛化能力的点,而相应的对抗鲁棒性不低于原始鲁棒性 0.10.10.1

Remark: 备注:

Theorem 3.1 establishes an upper bound on the possible drop in robustness loss that can be achieved through fine-tuning. It is expected that the second step of optimization would enforce the parameters to lie within the boundary 𝒞𝜽subscript𝒞𝜽\mathcal{C}_{{\bm{\theta}}} in order to satisfy the theorem. However, here we do not employ constrained optimization but find the optimal point by first optimizing without constraints and then interpolating. This is because (1) the constraints are empirically given and may not always provide the optimal range for preserving robustness, and it is possible to fine-tune outside the constraint range and still ensure that there is not much loss of robustness. (2) the interpolation procedure serves as a weight-ensemble, which may benefit both robustness and generalization, as noted in WiSE-FT [47]. The complete algorithm of RiFT is shown in Appendix B.
定理3.1建立了通过微调可以实现的鲁棒性损失的可能下降的上限。预期优化的第二步骤将强制参数位于边界 𝒞𝜽subscript𝒞𝜽\mathcal{C}_{{\bm{\theta}}} 内以满足定理。然而,这里我们不采用约束优化,而是通过首先无约束优化然后插值来找到最优点。这是因为(1)约束是凭经验给出的,并且可能不总是提供用于保持鲁棒性的最佳范围,并且可以在约束范围之外进行微调,并且仍然确保不存在太多的鲁棒性损失。(2)内插过程用作权重集合,这可以有益于鲁棒性和泛化,如WiSE-FT [ 47]中所指出的。RiFT的完整算法见附录B。

5 Experiments 5实验

5.1 Experimental Setup 5.1实验装置

Refer to caption
Figure 3: Example of module robust criticality (MRC) and its corresponding robust accuracy drop of ResNet18 trained on CIFAR10. Each column represents an individual module. The first row represents the corresponding robust accuracy drop and the second row represents the MRC value of each module. The higher the MRC value is, the more robust-critical the module is. Some modules are not critical to robustness, exhibiting redundant characteristics for contributing to robustness. However, some modules are critical to robustness. For example, the robust acc drop is only 2.86%percent2.862.86\% for layer2.1.conv2 while for layer4.1.conv1 the robust acc drop is up to 53.03%percent53.0353.03\%.
图3:模块鲁棒关键性(MRC)及其在CIFAR10上训练的ResNet18的相应鲁棒准确性下降的示例。每列代表一个单独的模块。第一行表示相应的鲁棒精度下降,第二行表示每个模块的MRC值。MRC值越高,模块的鲁棒临界性越强。有些模块对鲁棒性并不重要,表现出冗余特性,有助于鲁棒性。然而,有些模块对鲁棒性至关重要。例如,对于layer2.1.conv2,鲁棒的acc下降仅为 2.86%percent2.862.86\% ,而对于layer4.1.conv1,鲁棒的acc下降高达 53.03%percent53.0353.03\%

Datasets 数据集

We adopt three popular image classification datasets: CIFAR10 [23], CIFAR100 [23], and Tiny-ImageNet [26]. CIFAR10 and CIFAR100 comprise 60,000 32×32323232\times 32 color images in 10 and 100 classes, respectively. Tiny-ImageNet is a subset of ImageNet and contains 200200200 classes, where each class contains 500 colorful images with size 64×64646464\times 64. We use three OOD datasets accordingly to evaluate the OOD robustness: CIFAR10-C, CIFAR100-C, and Tiny-ImageNet-C [18]. These datasets simulate 151515 types of common visual corruptions and are grouped into four classes: Noise, Blur, Weather, and Digital.
我们采用了三种流行的图像分类数据集:CIFAR 10 [ 23],CIFAR 100 [ 23]和Tiny-ImageNet [ 26]。CIFAR 10和CIFAR 100分别包含10个和100个类别的60,000个 32×32323232\times 32 彩色图像。Tiny-ImageNet是ImageNet的一个子集,包含 200200200 个类,每个类包含500个大小为 64×64646464\times 64 的彩色图像。我们相应地使用三个OOD数据集来评估OOD鲁棒性:CIFAR 10-C,CIFAR 100-C和Tiny-ImageNet-C [ 18]。这些数据集模拟了 151515 种常见的视觉损坏,并分为四类:噪音,模糊,天气和数字。

Evaluation metrics 评估指标

We use the test set accuracy of each standard dataset to represent the generalization ability. For evaluating adversarial robustness, we adopt a common setting of PGD-10 [28] with constraint =8/255subscript8255\ell_{\infty}=8/255. We run PGD-10 with three times and select the worst robust accuracy as the final metric. The OOD robustness is evaluated by the accuracy of the test set of the corrupted dataset corresponding to the standard dataset.
我们使用每个标准数据集的测试集精度来表示泛化能力。为了评估对抗鲁棒性,我们采用了PGD-10 [ 28]的常见设置,约束为 =8/255subscript8255\ell_{\infty}=8/255 。我们运行PGD-10三次,并选择最差的鲁棒准确度作为最终指标。OOD的鲁棒性通过标准数据集对应的损坏数据集的测试集的准确性来评估。

Training details 培训详情

We use ResNet18 [17], ResNet34 [17], WideResNet34-10 (WRN34-10) [51] as backbones. ResNet18 and ResNet34 are 18-layer and 34-layer ResNet models, respectively. WideResNet34-10 is a 34-layer WideResNet model with a widening factor of 10. Similarly, we adopt PGD-10 [28] with constraint =8/255subscript8255\ell_{\infty}=8/255 for adversarial training. Following standard settings [37, 33], we train models with adversarial examples for 110110110 epochs. The learning rate starts from 0.10.10.1 and decays by a factor of 0.10.10.1 at epochs 100100100 and 105105105. We select the weights with the highest test robust accuracy as our adversarially trained models.
我们使用ResNet 18 [ 17],ResNet 34 [ 17],WideResNet 34 -10(WRN 34 -10)[ 51]作为骨干。ResNet 18和ResNet 34分别是18层和34层ResNet模型。WideResNet 34 -10是一个34层的WideResNet模型,加宽因子为10。类似地,我们采用PGD-10 [ 28],约束 =8/255subscript8255\ell_{\infty}=8/255 用于对抗训练。按照标准设置[ 37,33],我们用对抗性示例训练模型 110110110 epochs。学习率从 0.10.10.1 开始,在时期 100100100105105105 衰减 0.10.10.1 。我们选择具有最高测试鲁棒精度的权重作为我们的对抗训练模型。

We fine-tune the adversarially trained models 𝜽ATsubscript𝜽𝐴𝑇{\bm{\theta}}_{AT} using SGD with momentum [42] for 101010 epochs. The initial learning rate is set to 0.0010.0010.001.333The best learning rate for fine-tuning vary across architectures and datasets and is required to be carefully modified. We decay the learning rate by 1/101101/10 after fine-tuning for 555 epochs We choose the weights with the highest test accuracy as fine-tuned model weights, denoted as 𝜽FTsubscript𝜽𝐹𝑇{\bm{\theta}}_{FT}. We then interpolate between initial adversarially trained model weights 𝜽ATsubscript𝜽𝐴𝑇{\bm{\theta}}_{AT} and 𝜽FTsubscript𝜽𝐹𝑇{\bm{\theta}}_{FT}, the best interpolation point selected by Step 3 in Section 4 is denoted as 𝜽FTsuperscriptsubscript𝜽𝐹𝑇{\bm{\theta}}_{FT}^{*}. We then compare the generalization, adversarial robustness, and OOD robustness of 𝜽FTsuperscriptsubscript𝜽𝐹𝑇{\bm{\theta}}_{FT}^{*} and 𝜽ATsubscript𝜽𝐴𝑇{\bm{\theta}}_{AT}.
我们使用带有动量的SGD [ 42]对 101010 epochs的逆向训练模型 𝜽ATsubscript𝜽𝐴𝑇{\bm{\theta}}_{AT} 进行微调。初始学习率设置为 0.0010.0010.0013 在微调 555 个epochs之后,我们将学习率衰减 1/101101/10 我们选择具有最高测试精度的权重作为微调模型权重,表示为 𝜽FTsubscript𝜽𝐹𝑇{\bm{\theta}}_{FT} 。然后,我们在初始逆向训练模型权重 𝜽ATsubscript𝜽𝐴𝑇{\bm{\theta}}_{AT}𝜽FTsubscript𝜽𝐹𝑇{\bm{\theta}}_{FT} 之间进行插值,第4节中步骤3选择的最佳插值点表示为 𝜽FTsuperscriptsubscript𝜽𝐹𝑇{\bm{\theta}}_{FT}^{*} 。然后,我们比较了 𝜽FTsuperscriptsubscript𝜽𝐹𝑇{\bm{\theta}}_{FT}^{*}𝜽ATsubscript𝜽𝐴𝑇{\bm{\theta}}_{AT} 的泛化、对抗鲁棒性和OOD鲁棒性。

We report the average of three different seeds and omit the standard deviations of 3 runs as they are tiny (<0.20absent0.20<0.20%), which hardly effect the results. Refer to Appendix C for more training details.
我们报告了三种不同种子的平均值,并省略了3次运行的标准偏差,因为它们很小( <0.20absent0.20<0.20 %),几乎不影响结果。更多培训详情请参见附录C。

Table 1: Results of RiFT on different datasets and backbones. Std means the standard test accuracy for in distribution generalization, OOD denotes the OOD robust accuracy of corresponding corruption dataset (e.g., CIFAR10-C). Adv denotes the adversarial robust accuracy. In each column, we bold the entry with the higher accuracy. RiFT improves both generalization and OOD robustness across architectures and datasets while maintaining adversarial robustness.
表1:不同数据集和主干上的RiFT结果。Std表示分布泛化的标准测试准确度,OOD表示相应损坏数据集的OOD鲁棒准确度(例如,CIFAR10-C)。Adv表示对抗鲁棒准确度。在每一列中,我们以粗体显示准确度较高的条目。RiFT提高了跨架构和数据集的泛化和OOD鲁棒性,同时保持了对抗鲁棒性。
Architecture Method CIFAR10 CIFAR100 Tiny-ImageNet
Std OOD Adv Std OOD Adv Std OOD Adv
ResNet18 AT 81.46 73.56 53.63 57.10 46.43 30.15 49.10 27.68 23.28
AT+RiFT 83.44 75.69 53.65 58.74 48.06 30.17 50.61 28.73 23.34
ΔΔ\Delta +1.98 +2.13 +0.02 +1.64 +1.63 +0.02 +1.51 +1.05 +0.06
ResNet34 AT 84.23 75.37 55.31 58.67 48.24 30.50 50.96 27.91 24.27
AT+RiFT 85.41 77.15 55.34 60.88 49.97 30.58 52.54 30.07 24.37
ΔΔ\Delta +1.18 +1.78 +0.03 +2.21 +1.73 +0.08 +1.58 +2.16 +0.10
WRN34-10 AT 87.41 78.75 55.40 62.35 50.61 31.66 52.78 31.81 26.07
AT+RiFT 87.89 79.31 55.41 64.56 52.69 31.64 55.31 33.86 26.17
ΔΔ\Delta +0.48 +0.56 +0.01 +2.21 +2.08 -0.02 +2.53 +2.05 +0.10
Avg ΔΔ\Delta +1.21 +1.49 +0.02 +2.02 +1.81 +0.02 +1.87 +1.75 +0.08

5.2 Empirical Analysis of MRC
5.2 MRC的实证分析

Before delving into the main results of RiFT, we first empirically analyze our proposed MRC metric in Definition 3.1, which serves as the foundation of our RiFT approach. We present the MRC analysis on ResNet18 [17] on CIFAR-10 in Figure 3, where each column corresponds to the MRC value and its corresponding robust accuracy drop of a specific module.
在深入研究RiFT的主要结果之前,我们首先对定义3.1中提出的MRC指标进行了实证分析,该指标是我们RiFT方法的基础。我们在图3中展示了CIFAR-10上ResNet 18 [ 17]的MRC分析,其中每列对应于MRC值及其对应的特定模块的鲁棒准确度下降。

Our analysis shows that the impact of worst-case weight perturbations on model robustness varies across different modules. Some modules exhibit minimal impact on robustness under perturbation, indicating the presence of redundant capacity for robustness. Conversely, for other modules, the worst-case weight perturbation shows a significant impact, resulting in a substantial decline in robustness. For example, in module layer2.1.conv2, worst-case weight perturbations only result in a meager addition of 0.090.090.09 robust loss. However, for layer4.1.conv1, the worst-case weight perturbations affect the model’s robust loss by an additional 12.9412.9412.94, resulting in a substantial decline (53.03%percent53.0353.03\%) in robustness accuracy. Such robust-critical and non-robust-critical modules are verified to exist in various network architectures and datasets, as detailed in Appendix C.4. We also observe that as the network capacity decreases (e.g., from WRN34-10 to ResNet18) and the task becomes more challenging (e.g., from CIFAR10 to Tiny-ImageNet), the proportion of non-robust-critical modules increases, as less complex tasks require less capacity, leading to more non-robust-critical modules.
我们的分析表明,最坏情况下的权重扰动对模型鲁棒性的影响在不同的模块中各不相同。一些模块表现出扰动下的鲁棒性的影响最小,表明存在冗余能力的鲁棒性。相反,对于其他模块,最坏情况下的权重扰动显示出显着的影响,导致鲁棒性大幅下降。例如,在模块layer2.1.conv2中,最坏情况下的权重扰动只会导致 0.090.090.09 鲁棒损失的微小增加。然而,对于layer4.1.conv1,最坏情况下的权重扰动会额外影响模型的鲁棒性损失 12.9412.9412.94 ,导致鲁棒性准确性大幅下降( 53.03%percent53.0353.03\% )。这些鲁棒关键和非鲁棒关键模块已被验证存在于各种网络架构和数据集中,详见附录C.4。我们还观察到,随着网络容量的降低(例如,从WRN 34 -10到ResNet 18),任务变得更具挑战性(例如,,从CIFAR10到Tiny-ImageNet),非鲁棒关键模块的比例增加,因为不太复杂的任务需要更少的容量,导致更多的非鲁棒关键模块。

It is worthy noting that the decrease in robust accuracy doesn’t directly correlate with MRC. For instance, both layer4.0.conv2 and layer4.1.conv1 have a robust accuracy drop of 53.05%percent53.0553.05\%, yet their MRC values differ. This discrepancy can be attributed to the different probability distributions of misclassified samples across modules, resulting in same accuracy declines but different losses.
值得注意的是,鲁棒精度的下降与MRC没有直接关系。例如,layer4.0.conv2和layer4.1.conv1都有 53.05%percent53.0553.05\% 的鲁棒精度下降,但它们的MRC值不同。这种差异可以归因于不同模块的错误分类样本的不同概率分布,导致相同的准确性下降,但不同的损失。

5.3 Main Results 5.3主要结果

Table 1 summarizes the main results of our study, from which we have the following findings.
表1总结了我们研究的主要结果,从中我们有以下发现。

RiFT improves generalization
RiFT提高了泛化能力

First, RiFT effectively mitigates the trade-off between generalization and robustness raised by adversarial training. Across different datasets and network architectures, RiFT improves the generalization of adversarially trained models by approximately 222%. This result prompts us to rethink the trade-off, as it may be caused by inefficient adversarial training algorithm rather than the inherent limitation of DNNs. Furthermore, as demonstrated in Figure 1, both adversarial robustness and generalization increase simultaneously in the initial interpolation process, indicating that these two characteristics can be improved together. This trend is observed across different datasets and network architectures; see Appendix C.5 for more illustrations. This finding challenges the notion that the features of optimal standard and optimal robust classifiers are fundamentally different, as previously claimed by Tsipras et al. [44], as fine-tuning procedures can increase both robustness and generalization.
首先,RIFT有效地减轻了对抗训练所引起的泛化和鲁棒性之间的权衡。在不同的数据集和网络架构中,RiFT将逆向训练模型的泛化能力提高了约 222 %。这一结果促使我们重新考虑权衡,因为它可能是由低效的对抗性训练算法而不是DNN的固有限制造成的。此外,如图1所示,对抗鲁棒性和泛化在初始插值过程中同时增加,这表明这两个特性可以一起改善。在不同的数据集和网络架构中观察到这种趋势;更多说明请参见附录C.5。这一发现挑战了最佳标准和最佳鲁棒分类器的特征从根本上不同的概念,正如Tsipras等人先前所声称的那样。[ 44],因为微调程序可以增加鲁棒性和通用性。

Fine-tuning improves OOD robustness
微调提高了OOD的鲁棒性

Second, our study also investigated the out-of-distribution (OOD) robustness of the fine-tuned models and observed an improvement of approximately 222%. This observation is noteworthy because recent work [2, 24, 47] showed that fine-tuning pre-trained models can distort learned features and result in underperformance in OOD samples. Furthermore, Yi et al. [50] demonstrated that adversarial training enhances OOD robustness, but it is unclear whether fine-tuning on adversarially trained models distorts robust features. Our results indicate that fine-tuning adversarially trained models does not distort the robust features learned by adversarial training and instead helps improve OOD robustness. We suggest fine-tuning adversarially trained models may be a promising avenue for further improving OOD robustness.
其次,我们的研究还调查了微调模型的分布外(OOD)鲁棒性,并观察到约 222 %的改善。这一观察结果值得注意,因为最近的工作[2,24,47]表明,微调预训练模型可能会扭曲学习的特征,并导致OOD样本的性能不佳。此外,Yi et al. [ 50]证明了对抗性训练增强了OOD的鲁棒性,但目前还不清楚对抗性训练模型的微调是否会扭曲鲁棒性特征。我们的研究结果表明,微调对抗训练模型不会扭曲对抗训练所学习到的鲁棒特征,而是有助于提高OOD鲁棒性。我们建议微调对抗训练模型可能是进一步提高OOD鲁棒性的一个有希望的途径。

5.4 Incorporate RiFT to Other AT Methods
5.4将RIFT纳入其他AT方法

Table 2: Results of RiFT + other AT methods.
Method CIFAR10 CIFAR100
Std OOD Adv Std OOD Adv
TRADES 81.54 73.42 53.31 57.44 47.23 30.20
TRADES+RiFT 81.87 74.09 53.30 57.78 47.52 30.22
ΔΔ\Delta +0.33 +0.67 -0.01 +0.34 +0.29 +0.02
MART 76.77 68.62 56.90 51.46 42.07 31.47
MART+RiFT 77.14 69.41 56.92 52.42 43.35 31.48
ΔΔ\Delta +0.37 +0.79 +0.02 +0.96 +1.28 +0.01
AWP 78.40 70.48 53.83 52.85 43.10 31.00
AWP+RiFT 78.79 71.12 53.84 54.89 45.08 31.05
ΔΔ\Delta + 0.39 +0.64 +0.01 +2.04 +1.98 +0.05
SCORE 84.20 75.82 54.59 54.83 45.39 29.49
SCORE+RiFT 85.65 77.37 54.62 57.63 47.77 29.50
ΔΔ\Delta +1.45 +1.55 +0.03 +2.80 +2.38 +0.01
Table 3: Results of fine-tuning on different modules.
Method Std OOD Adv
All layers 83.56 75.48 52.66
Last layer 83.35 75.16 52.75
Robust-critical 83.36 75.42 52.48
Non-robust-critical 83.44 75.69 53.65
Table 4: Results of fine-tuning on multiple non-robust-critical modules.
Method Std OOD Adv
Top 1 83.44 75.69 53.65
Top 2 83.41 75.61 52.47
Top 3 83.59 75.77 52.22
Top 5 83.70 75.82 52.35

To further validate the effectiveness of RiFT, we conduct experiments on ResNet18 [17] trained on CIFAR10 and CIFAR100 [23] using four different adversarial training techniques: TRADES [55], MART [46], AWP [48], and SCORE [31], and then apply our RiFT to the resulting models. As shown in Table 4, our approach is compatible with various adversarial training methods and improves generalization and OOD robustness.
为了进一步验证RiFT的有效性,我们使用四种不同的对抗训练技术在CIFAR10和CIFAR100 [ 23]上训练ResNet18 [ 17]进行实验:TRADES [ 55],MART [ 46],AWP [ 48]和SCORE [ 31],然后将我们的RiFT应用于生成的模型。如表4所示,我们的方法与各种对抗训练方法兼容,并提高了泛化和OOD鲁棒性。

5.5 Ablation Study 5.5消融研究

Fine-tuning on different modules
不同模块的微调

To evaluate the efficacy of fine-tuning the non-robust-critical module, we conducted further experiments by fine-tuning the adversarially trained model on different modules. Specifically, we used four fine-tuning methods: fully fine-tuning, linear probing (fine-tuning on the last layer), fine-tuning on the non-robust-critical module, and fine-tuning on the robust-critical module. The experiment was conducted using ResNet18 on CIFAR-10, and the results are presented in Figure 1 and Table 4. As described in Section 3.2, MRC is an upper bound for weight perturbation, indicating the criticality of a module in terms of model robustness. Fine-tuning on a non-robust-critical module can help preserve adversarial robustness but does not guarantee improvement in generalization. Similarly, fine-tuning on the robust-critical module does not necessarily hurt robustness. However, our experiments observed that all fine-tuning methods improved generalization ability, but only fine-tuning on non-robust-critical module preserved adversarial robustness. Moreover, fine-tuning on the robust-critical module exhibited the worst trade-off between generalization and robustness compared to fine-tuning on all layers.
为了评估微调非鲁棒关键模块的有效性,我们通过在不同模块上微调对抗训练模型进行了进一步的实验。具体来说,我们使用了四种微调方法:完全微调,线性探测(最后一层微调),非鲁棒临界模块微调,鲁棒临界模块微调。使用ResNet 18在CIFAR-10上进行实验,结果见图1和表4。如第3.2节所述,MRC是权重扰动的上限,表明模块在模型鲁棒性方面的关键性。对非鲁棒关键模块进行微调可以帮助保持对抗鲁棒性,但不能保证泛化能力的提高。类似地,对鲁棒性关键模块进行微调并不一定会损害鲁棒性。 然而,我们的实验观察到,所有的微调方法提高了泛化能力,但只有微调非鲁棒关键模块保留对抗鲁棒性。此外,微调的鲁棒性关键模块表现出最差的权衡之间的泛化和鲁棒性相比,微调的所有层。

More non-robust-critical modules, more useful?
更多的非健壮关键模块,更有用吗?

To investigate whether fine-tuning on more non-critical modules could further improve generalization, we additionally fine-tune on the top two, top three, and top five non-robust-critical modules. However, Table 4 reveals that generalization and OOD robustness did not surpass the results achieved by fine-tuning a singular non-robust-critical module. Notably, performance deteriorated when fine-tuning multiple non-critical modules compared to fine-tuning all layers. It’s pivotal to note that this doesn’t negate MRC’s applicability to several modules. The MRC for module i𝑖i is evaluated with other module parameters held constant, making it challenging to discern the impact of worst-case perturbations across multiple modules using the MRC of a single one. We posit that broadening MRC’s definition to encompass multiple modules might address this problem.
为了研究对更多非关键模块进行微调是否可以进一步提高泛化能力,我们还对前两个、前三个和前五个非鲁棒关键模块进行了微调。然而,表4揭示了泛化和OOD鲁棒性并没有超过通过微调单个非鲁棒关键模块所获得的结果。值得注意的是,与微调所有层相比,微调多个非关键模块时性能会恶化。值得注意的是,这并不否定MRC对几个模块的适用性。在其他模块参数保持不变的情况下,对模块 i𝑖i 的MRC进行评估,这使得使用单个模块的MRC识别多个模块的最差情况扰动的影响具有挑战性。我们认为,扩大MRC的定义,包括多个模块可能会解决这个问题。

Ablation on interpolation factor αsuperscript𝛼\alpha^{*}
内插因子 αsuperscript𝛼\alpha^{*} 时消融

The value of αsuperscript𝛼\alpha^{*} is closely related to the fine-tuning learning rate. Specifically, a large learning rate can result in substantial weight updates that may push the fine-tuned weights 𝜽FTsubscript𝜽𝐹𝑇{\bm{\theta}}_{FT} away from their adversarially trained counterparts 𝜽ATsubscript𝜽𝐴𝑇{\bm{\theta}}_{AT}. Our empirical results indicate that a fine-tuning learning rate of 0.0010.0010.001 is suitable for most cases and that the corresponding αsuperscript𝛼\alpha^{*} value generally ranges between 0.60.60.6 to 0.90.90.9.
αsuperscript𝛼\alpha^{*} 的值与微调学习率密切相关。具体地,大的学习速率可以导致大量的权重更新,这可以将微调的权重 𝜽FTsubscript𝜽𝐹𝑇{\bm{\theta}}_{FT} 推离它们的对抗训练的对应物 𝜽ATsubscript𝜽𝐴𝑇{\bm{\theta}}_{AT} 。我们的经验结果表明,微调学习率 0.0010.0010.001 适用于大多数情况,相应的 αsuperscript𝛼\alpha^{*} 值通常在 0.60.60.60.90.90.9 之间。

Factors related to the generalization gain of RiFT
影响RIFT泛化增益的因素

”Our results unveiled patterns and behaviors that offer insights into the determinants of the generalization gains observed with RiFT. First, the generalization gain of RiFT is a function of both the neural network’s inherent capacity and the inherent difficulty posed by the classification task. Specifically, as the classification task becomes more challenging, the robust criticality of each module increases, which in turn decreases the generalization gain of RiFT. This effect can be mitigated by using a model with a larger capacity. For instance, we observe that the generalization gain of RiFT increases as we switch from ResNet18 to ResNet34 and to WRN34-10 when evaluating on CIFAR100 and Tiny-ImageNet. Further, We observed that the generalization gain of RiFT with WRN34-10 on CIFAR10 is notably lower, at approximately 0.5%percent0.50.5\%, compared to 2%percent22\% on other datasets. This might be attributed to the minimal generalization disparity between adversarially trained models and their standard-trained counterparts; specifically, while WRN34-10’s standard test accuracy stands at around 95%percent9595\%, its adversarial counterpart registers at 87%percent8787\%. It is evident that fine-tuning on a single module may not yield significant improvements. Investigating these patterns further could offer strategies for enhancing the robustness and generalization capabilities of deep neural networks.
“我们的研究结果揭示了模式和行为,为RIFT观察到的泛化增益的决定因素提供了见解。首先,RiFT的泛化增益是神经网络的固有能力和分类任务所带来的固有难度的函数。具体而言,随着分类任务变得更具挑战性,每个模块的鲁棒关键性增加,这反过来又降低了RIFT的泛化增益。这种影响可以通过使用具有更大容量的模型来减轻。例如,当我们在CIFAR100和Tiny—ImageNet上进行评估时,我们观察到RIFT的泛化增益随着我们从ResNet18切换到ResNet34和WRN 34—10而增加。此外,我们观察到,与其他数据集上的 2%percent22\% 相比,在CIFAR10上使用WRN 34—10的RiFT的泛化增益明显较低,约为 0.5%percent0.50.5\% 。 这可能是由于对抗训练模型与标准训练模型之间的泛化差异最小;具体来说,虽然WRN 34 -10的标准测试准确率约为 95%percent9595\% ,但其对抗训练模型的标准测试准确率为 87%percent8787\% 。很明显,对单个模块进行微调可能不会产生重大改进。进一步研究这些模式可以提供增强深度神经网络的鲁棒性和泛化能力的策略。

6 Conclusion 6结论

In this paper, we aim to exploit the redundant capacity of the adversarially trained models. Our proposed RiFT leverages the concept of module robust criticality (MRC) to guide the fine-tuning process, which leads to improved generalization and OOD robustness. The extensive experiments demonstrate the effectiveness of RiFT across various network architectures and datasets. Our findings shed light on the intricate relationship between generalization, adversarial robustness, and OOD robustness. RiFT is a primary exploration of fine-tuning the adversarially trained models. We believe that fine-tuning holds great promise, and we call for more theoretical and empirical analyses to advance our understanding of this important technique.
在本文中,我们的目标是利用逆向训练模型的冗余容量。我们提出的RIFT利用模块鲁棒关键性(MRC)的概念来指导微调过程,从而提高泛化能力和OOD鲁棒性。大量的实验证明了RiFT在各种网络架构和数据集上的有效性。我们的研究结果揭示了泛化,对抗鲁棒性和OOD鲁棒性之间的复杂关系。RIFT是对逆向训练模型进行微调的主要探索。我们认为,微调有很大的希望,我们呼吁更多的理论和实证分析,以促进我们对这一重要技术的理解。

References 引用

  • [1] Armen Aghajanyan, Akshat Shrivastava, Anchit Gupta, Naman Goyal, Luke Zettlemoyer, and Sonal Gupta. Better fine-tuning by reducing representational collapse. In International Conference on Learning Representations, 2021.
    Armen Aghajanyan、Akshat Shrivastava、Alfreit Gupta、Naman Goyal、Luke Zettlemoyer和Sonal Gupta。通过减少代表性崩溃来更好地进行微调。在2021年国际学习代表会议上。
  • [2] Anders Andreassen, Yasaman Bahri, Behnam Neyshabur, and Rebecca Roelofs. The evolution of out-of-distribution robustness throughout fine-tuning. arXiv preprint arXiv:2106.15831, 2021.
    Anders Andreassen,Yasaman Bahri,Behnam Neyshabur,and Rebecca Roelofs.整个微调过程中分布外稳健性的演变。arXiv预印本arXiv:2106.15831,2021。
  • [3] Devansh Arpit, Stanislaw Jastrzkebski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, et al. A closer look at memorization in deep networks. In International Conference on Machine Learning, pages 233–242. PMLR, 2017.
    Devansh Arpit,Stanislaw Jastrzkebski,Nicolas Ballas,大卫克鲁格,Emmanuel Bengio,Maxinder S Kanwal,Tegan Maharaj,Asja Fischer,Aaron Courville,Yoonne Bengio,et al. A closer look at memorization in deep networks.国际机器学习会议,第233-242页。PMLR,2017年。
  • [4] Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 274–283. PMLR, 10–15 Jul 2018.
    阿尼什·阿塔利,尼古拉斯·卡利尼,还有大卫瓦格纳。模糊梯度给予一种虚假的安全感:规避对抗性示例的防御。Jennifer Dy和Andreas Krause,编辑,Proceedings of the 35 th International Conference on Machine Learning,Proceedings of Machine Learning Research第80卷,第274-283页。PMLR,2018年7月10日至15日。
  • [5] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in Neural Information Processing Systems, 33:1877–1901, 2020.
    Tom Brown,Benjamin Mann,Nick Ryder,Melanie Subbiah,Jared D Kaplan,Prafulla达里瓦尔,Arvind Neelakantan,Pranav Shyam,Girish Sastry,阿曼达Askell,et al.语言模型是少数成功的学习者。神经信息处理系统的进展,33:1877-1901,2020。
  • [6] N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy, pages 39–57, Los Alamitos, CA, USA, may 2017. IEEE Computer Society.
    N. Carlini和D.瓦格纳。对神经网络鲁棒性的评估。在2017年IEEE安全和隐私研讨会上,第39-57页,Los Alamitos,CA,美国,2017年5月。IEEE计算机协会。
  • [7] Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, John C Duchi, and Percy S Liang. Unlabeled data improves adversarial robustness. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
    Yair Carmon,Aditi Raghunathan,Ludwig施密特,John C Duchi和珀西S Liang。未标记数据提高了对抗鲁棒性。In H. Wallach,H. Larochelle,A. Beygelzimer,F. d 'Alché-布克,E. Fox和R. Garnett,编辑,神经信息处理系统进展,第32卷。柯兰联营公司2019.
  • [8] Alvin Chan, Yi Tay, Yew Soon Ong, and Jie Fu. Jacobian adversarially regularized networks for robustness. In International Conference on Learning Representations, 2020.
    Alvin Chan,Yi Tay,Yew Soon Ong,and Jie Fu.雅可比逆向正则化网络的鲁棒性。在2020年国际学习代表会议上。
  • [9] Niladri Chatterji, Behnam Neyshabur, and Hanie Sedghi. The intriguing role of module criticality in the generalization of deep networks. In International Conference on Learning Representations, 2020.
    Niladri Chatterji,Behnam Neyshabur和Hanie Sedghi。模块临界性在深度网络推广中的有趣作用。在2020年国际学习代表会议上。
  • [10] Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 1310–1320. PMLR, 09–15 Jun 2019.
    杰里米·科恩,伊兰·罗森菲尔德和济科·科尔特。通过随机平滑验证对抗鲁棒性。Kamalika Chaudhuri和Ruslan Salakhutdinov,编辑,第36届机器学习国际会议论文集,机器学习研究论文集第97卷,第1310-1320页。PMLR,2019年6月9日至15日。
  • [11] Francesco Croce and Matthias Hein. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In International Conference on Machine Learning, pages 2206–2216. PMLR, 2020.
    弗朗切斯科·克罗切和马蒂亚斯·海因。对抗性鲁棒性的可靠评估与不同参数的自由攻击的集合。国际机器学习会议,第2206-2216页。PMLR,2020年。
  • [12] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
    Alexey Dosovitskiy,Lucas Beyer,亚历山大科列斯尼科夫,Dirk Weissenborn,Xiaohua Zhai,托马斯昂特希纳,Mostafa Dehghani,Matthias Minderer,Georg Heigold,Sylvain Gelly,Jakob Uszkoreit,and Neil Houlsby.一张图像值16 x16个单词:用于大规模图像识别的变形金刚。在2021年国际学习代表会议上。
  • [13] Justin Gilmer, Nicolas Ford, Nicholas Carlini, and Ekin Cubuk. Adversarial examples are a natural consequence of test error in noise. In International Conference on Machine Learning, pages 2280–2289. PMLR, 2019.
    贾斯汀·吉尔默,尼古拉斯·福特,尼古拉斯·卡里尼,还有埃金·库布克。对抗性示例是噪声中测试错误的自然结果。国际机器学习会议,第2280-2289页。PMLR,2019年。
  • [14] [十四] Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015.
    Ian Goodfellow,Jonathy Shlens,and Christian Szegedy.解释和利用对立的例子。在2015年国际学习代表会议上。
  • [15] [十五] Sven Gowal, Sylvestre-Alvise Rebuffi, Olivia Wiles, Florian Stimberg, Dan Andrei Calian, and Timothy A Mann. Improving robustness using generated data. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 4218–4233. Curran Associates, Inc., 2021.
    Sven Gowal,Sylvestre-Alvise Rebuffi,Olivia Wiles,Florian Stimberg,Dan Andrei Calian和Timothy A Mann。使用生成的数据提高鲁棒性。In M. Ranzato,A. Beygelzimer,Y. Dauphin,P.S. Liang和J. Wortman Vaughan,编辑,神经信息处理系统进展,第34卷,第4218-4233页。柯兰联营公司2021.
  • [16] Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015.
    Song Han,Huizi Mao,and William J Dally.深度压缩:使用修剪、训练量化和霍夫曼编码压缩深度神经网络。arXiv预印本arXiv:1510.00149,2015年。
  • [17] [十七] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
    何开明,张翔宇,任少卿,孙健。深度残差学习用于图像识别。在IEEE计算机视觉和模式识别会议论文集,第770-778页,2016年。
  • [18] Dan Hendrycks, Kimin Lee, and Mantas Mazeika. Using pre-training can improve model robustness and uncertainty. In International Conference on Machine Learning, pages 2712–2721. PMLR, 2019.
    Dan Hendrycks,Kimin Lee,和Mantas Mazeika.使用预训练可以提高模型的鲁棒性和不确定性。国际机器学习会议,第2712-2721页。PMLR,2019年。
  • [19] [十九] Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
    杰弗里欣顿,奥里尔Vinyals,和杰夫迪恩。在神经网络中提取知识。arXiv预印本arXiv:1503.02531,2015年。
  • [20] Adel Javanmard, Mahdi Soltanolkotabi, and Hamed Hassani. Precise tradeoffs in adversarial training for linear regression. In Jacob Abernethy and Shivani Agarwal, editors, Proceedings of Thirty Third Conference on Learning Theory, volume 125 of Proceedings of Machine Learning Research, pages 2034–2078. PMLR, 09–12 Jul 2020.
    阿德尔·贾万马尔马赫迪·索尔坦诺科塔比和哈米德·哈桑尼。线性回归对抗训练中的精确权衡。在Jacob Abernethy和Shivani Agarwal编辑的第三十三届学习理论会议论文集,机器学习研究论文集第125卷,第2034-2078页。PMLR,2020年7月9日至12日。
  • [21] Klim Kireev, Maksym Andriushchenko, and Nicolas Flammarion. On the effectiveness of adversarial training against common corruptions. In Uncertainty in Artificial Intelligence, pages 1012–1021. PMLR, 2022.
    Klim Kireev,Maksym Andriushchenko,and Nicolas Flammarion.对抗性训练对抗常见贪污行为之成效。在人工智能的不确定性,第1012-1021页。PMLR,2022年。
  • [22] Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, and Neil Houlsby. Big transfer (bit): General visual representation learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pages 491–507. Springer, 2020.
    亚历山大科列斯尼科夫,卢卡斯拜尔,翟晓华,琼Puigcerver,杰西卡容,西尔万吉利和尼尔Houlsby。大迁移(位):一般视觉表征学习。在计算机视觉ECCV 2020:第16届欧洲会议,格拉斯哥,英国,2020年8月23日至28日,会议记录,第V部分16,第491-507页。斯普林格,2020年。
  • [23] Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009.
    Alex Krizhevsky,Geoffrey欣顿等人,从微小图像中学习多层特征。2009.
  • [24] [二十四] Ananya Kumar, Aditi Raghunathan, Robbie Matthew Jones, Tengyu Ma, and Percy Liang. Fine-tuning can distort pretrained features and underperform out-of-distribution. In International Conference on Learning Representations, 2022.
    Ananya Kumar、Aditi Raghunathan、Robbie Matthew Jones、Tengyu Ma和珀西梁。微调可能会扭曲预训练的特征,并在分布外表现不佳。在2022年国际学习代表会议上。
  • [25] Jungmin Kwon, Jeongseop Kim, Hyunseo Park, and In Kwon Choi. Asam: Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 5905–5914. PMLR, 18–24 Jul 2021.
    Jungmin Kwon,Jeongseop Kim,Hyunseo Park和In Kwon Choi。Asam:Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks. Marina Meila和Tong Zhang,编辑,Proceedings of the 38 th International Conference on Machine Learning,Proceedings of Machine Learning Research,第139卷,第5905-5914页。PMLR,2021年7月18日至24日。
  • [26] Ya Le and Xuan Yang. Tiny imagenet visual recognition challenge. CS 231N, 7(7):3, 2015.
    雅乐、玄阳。微型图像网络视觉识别挑战赛。CS 231 N,7(7):3,2015年。
  • [27] Yoonho Lee, Annie S Chen, Fahim Tajwar, Ananya Kumar, Huaxiu Yao, Percy Liang, and Chelsea Finn. Surgical fine-tuning improves adaptation to distribution shifts. In The Eleventh International Conference on Learning Representations, 2023.
    Yoonho Lee,Annie S Chen,Fahim Tajwar,Ananya Kumar,Huaxiu Yao,珀西梁和切尔西芬恩。外科微调改善了对分布变化的适应。在第十一届国际学习代表大会上,2023年。
  • [28] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
    亚历山大·马德里、亚历山大·马克洛夫、路德维希·施密特、迪米特里斯·齐普拉斯和阿德里安·弗拉德。深度学习模型抵抗对抗性攻击。在2018年国际学习代表会议上。
  • [29] [二十九] John P Miller, Rohan Taori, Aditi Raghunathan, Shiori Sagawa, Pang Wei Koh, Vaishaal Shankar, Percy Liang, Yair Carmon, and Ludwig Schmidt. Accuracy on the line: on the strong correlation between out-of-distribution and in-distribution generalization. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 7721–7735. PMLR, 18–24 Jul 2021.
    John P米勒、罗汉陶里、阿迪提·拉古纳坦、佐川诗织、庞伟高、Vaishaal Shankar、珀西梁、亚伊尔·卡蒙和路德维希施密特。线上的准确性:分布外和分布内泛化之间的强相关性。Marina Meila和Tong Zhang编辑,第38届机器学习国际会议论文集,机器学习研究论文集第139卷,第7721-7735页。PMLR,2021年7月18日至24日。
  • [30] Aamir Mustafa, Salman Khan, Munawar Hayat, Roland Goecke, Jianbing Shen, and Ling Shao. Adversarial defense by restricting the hidden space of deep neural networks. In The IEEE International Conference on Computer Vision, October 2019.
    Aamir Mustafa、Salman Khan、Munawar Hayat、罗兰Goecke、沈建兵和Ling Shao。通过限制深度神经网络的隐藏空间进行对抗性防御。在IEEE计算机视觉国际会议上,2019年10月。
  • [31] Tianyu Pang, Min Lin, Xiao Yang, Jun Zhu, and Shuicheng Yan. Robustness and accuracy could be reconcilable by (Proper) definition. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 17258–17277. PMLR, 17–23 Jul 2022.
    庞天宇、林敏、肖扬、朱军、颜水城。稳健性和准确性可通过(适当)定义进行协调。在Kamalika Chaudhuri,Stefanie Jegelka,Le Song,Csaba Szepesvari,Gang Niu和Sivan Sabato,编辑,第39届国际机器学习会议论文集,机器学习研究论文集第162卷,第17258-17277页。PMLR,2022年7月17日至23日。
  • [32] [三十二] Tianyu Pang, Kun Xu, Yinpeng Dong, Chao Du, Ning Chen, and Jun Zhu. Rethinking softmax cross-entropy loss for adversarial robustness. In International Conference on Learning Representations, 2020.
    庞天宇、徐坤、董银鹏、杜超、陈宁、朱军。重新思考softmax交叉熵损失的对抗鲁棒性。在2020年国际学习代表会议上。
  • [33] Tianyu Pang, Xiao Yang, Yinpeng Dong, Hang Su, and Jun Zhu. Bag of tricks for adversarial training. In International Conference on Learning Representations, 2021.
    庞天宇、肖扬、董银鹏、苏航、朱军。对抗训练的技巧。在2021年国际学习代表会议上。
  • [34] [三十四] Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
    Alec拉德福,Jeffrey Wu,Rewon Child,大卫Luan,Dario Amodei,Ilya Sutskever等人。语言模型是无监督的多任务学习器。OpenAI博客,1(8):9,2019。
  • [35] [三十五] Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John Duchi, and Percy Liang. Understanding and mitigating the tradeoff between robustness and accuracy. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 7909–7919. PMLR, 13–18 Jul 2020.
    Aditi Raghunathan,Sang Michael Xie,Fanny Yang,John Duchi和珀西梁。理解和减轻鲁棒性和准确性之间的权衡。Hal Daumé III和Aarti Singh,编辑,Proceedings of the 37th International Conference on Machine Learning,第119卷,Proceedings of Machine Learning Research,第7909—7919页。PMLR,2020年7月13日至18日。
  • [36] [三十六] Sylvestre-Alvise Rebuffi, Sven Gowal, Dan A Calian, Florian Stimberg, Olivia Wiles, and Timothy Mann. Fixing data augmentation to improve adversarial robustness. arXiv preprint arXiv:2103.01946, 2021.
    Sylvestre—Alvise Rebuffi,Sven Gowal,Dan A Calian,Florian Stimberg,Olivia Wiles和Timothy Mann。修复数据增强以提高对抗鲁棒性。arXiv预印本arXiv:2103.01946,2021。
  • [37] [三十七] Leslie Rice, Eric Wong, and Zico Kolter. Overfitting in adversarially robust deep learning. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 8093–8104. PMLR, 13–18 Jul 2020.
    莱斯利·赖斯埃里克·王和济科·科尔特在对抗性强大的深度学习中的过拟合。Hal Daumé III和Aarti Singh,编辑,Proceedings of the 37th International Conference on Machine Learning,第119卷,Proceedings of Machine Learning Research,第8093—8104页。PMLR,2020年7月13日至18日。
  • [38] Amir Rosenfeld and John K Tsotsos. Intriguing properties of randomly weighted networks: Generalizing while learning next to nothing. In 2019 16th Conference on Computer and Robot Vision (CRV), pages 9–16. IEEE, 2019.
    Amir Rosenfeld和John K Tsotsos。随机加权网络的有趣特性:在几乎没有学习的情况下进行泛化。2019年第16届计算机和机器人视觉会议(CRV),第9-16页。IEEE,2019年。
  • [39] Hadi Salman, Andrew Ilyas, Logan Engstrom, Ashish Kapoor, and Aleksander Madry. Do adversarially robust imagenet models transfer better? Advances in Neural Information Processing Systems, 33:3533–3545, 2020.
    Hadi Salman,Andrew Ilyas,Logan Engstrom,Ashish Kapoor,and Aleksander Madry.对抗性强的imagenet模型传输得更好吗?神经信息处理系统的进展,33:3533-3545,2020。
  • [40] Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. Adversarially robust generalization requires more data. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
    路德维希施密特、希巴尼·桑图尔卡、迪米特里斯·齐普拉斯、库纳尔·塔尔瓦尔和亚历山大·马德里。逆向鲁棒泛化需要更多的数据。in s. Bengio,H. Wallach,H.拉罗谢尔湾Grauman,N. Cesa—Bianchi和R. Garnett,编辑,神经信息处理系统进展,第31卷。柯兰联营公司2018.
  • [41] David Stutz, Matthias Hein, and Bernt Schiele. Relating adversarially robust generalization to flat minima. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7807–7817, 2021.
    大卫斯图茨,马蒂亚斯海因,和伯恩特席勒。将逆向鲁棒推广与平坦极小点相关联。IEEE/CVF计算机视觉国际会议论文集,第7807-7817页,2021年。
  • [42] [四十二] Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. On the importance of initialization and momentum in deep learning. In International Conference on Machine Learning, pages 1139–1147. PMLR, 2013.
    伊利亚·苏茨克弗,詹姆斯·马滕斯,乔治·达尔,杰弗里·欣顿。初始化和动量在深度学习中的重要性。国际机器学习会议,第1139-1147页。PMLR,2013年。
  • [43] [第四十三章] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
    Christian Szegedy、Wojciech Zaremba、Ilya Sutskever、Joan Bruna、Dumitru Erhan、Ian Goodfellow和Rob费尔格斯。神经网络的有趣特性。arXiv预印本arXiv:1312.6199,2013。
  • [44] [四十四] Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry. Robustness may be at odds with accuracy. In International Conference on Learning Representations, 2019.
    季米特里斯·齐普拉斯、希巴尼·桑图尔卡、洛根·恩斯特罗姆、亚历山大·特纳和亚历山大·马德里。鲁棒性可能与准确性不一致。在2019年国际学习代表会议上。
  • [45] Andreas Veit, Michael J Wilber, and Serge Belongie. Residual networks behave like ensembles of relatively shallow networks. Advances in Neural Information Processing Systems, 29, 2016.
    Andreas Veit,Michael J Wilber,and Serge Belongie.剩余网络的行为就像相对较浅的网络的集合。神经信息处理系统的进展,29,2016。
  • [46] [第四十六章] Yisen Wang, Difan Zou, Jinfeng Yi, James Bailey, Xingjun Ma, and Quanquan Gu. Improving adversarial robustness requires revisiting misclassified examples. In International Conference on Learning Representations, 2020.
    Yisen Wang,Difan Zou,Jinfeng Yi,James Bailey,Xingjun Ma,and Quanquan Gu.提高对抗鲁棒性需要重新访问错误分类的示例。在2020年国际学习代表会议上。
  • [47] [四十七] Mitchell Wortsman, Gabriel Ilharco, Jong Wook Kim, Mike Li, Simon Kornblith, Rebecca Roelofs, Raphael Gontijo Lopes, Hannaneh Hajishirzi, Ali Farhadi, Hongseok Namkoong, and Ludwig Schmidt. Robust fine-tuning of zero-shot models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7959–7971, June 2022.
    Mitchell Wortsman、Gabriel Ilharco、Jong Wook Kim、Mike Li、Simon Kornblith、Rebecca Roelofs、Raphael Gontijo Lopes、Hannaneh Hajishirzi、Ali Farhadi、Hongseok Namkoong和Ludwig施密特。零发射模型的稳健微调。在IEEE/CVF计算机视觉和模式识别会议论文集,第7959—7971页,2022年6月。
  • [48] [四十八] Dongxian Wu, Shu-Tao Xia, and Yisen Wang. Adversarial weight perturbation helps robust generalization. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 2958–2969. Curran Associates, Inc., 2020.
    Dongxian Wu,Shu—Tao Xia,and Yisen Wang.对抗性权重扰动有助于鲁棒泛化。in H. Larochelle,M.兰扎托河Hadsell,M.F. Balcan和H.林,编辑,神经信息处理系统的进展,第33卷,第2958—2969页。柯兰联营公司2020.
  • [49] [四十九] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan Yuille. Mitigating adversarial effects through randomization. In International Conference on Learning Representations, 2018.
    谢慈航,王建宇,张志帅,周仁,艾伦·尤耶。通过随机化减轻对抗效应。在2018年国际学习代表会议上。
  • [50] Mingyang Yi, Lu Hou, Jiacheng Sun, Lifeng Shang, Xin Jiang, Qun Liu, and Zhiming Ma. Improved ood generalization via adversarial training and pretraing. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 11987–11997. PMLR, 18–24 Jul 2021.
    Mingyang Yi,Lu Hou,Jiacheng Sun,Lifeng Shang,Xin Jiang,Qun Liu,and Zhiming Ma.通过对抗性训练和预训练提高了ood的泛化能力。Marina Meila和Tong Zhang,编辑,Proceedings of the 38 th International Conference on Machine Learning,第139卷,Proceedings of Machine Learning Research,第11987-11997页。PMLR,2021年7月18日至24日。
  • [51] Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. In Edwin R. Hancock Richard C. Wilson and William A. P. Smith, editors, Proceedings of the British Machine Vision Conference, pages 87.1–87.12. BMVA Press, September 2016.
    Sergey Zagoruyko和Nikos Komodakis广泛的残余网络。在Edwin R.汉考克Wilson和William A.史密斯,编辑,英国机器视觉会议论文集,第87.1-87.12页。BMVA Press,2016.
  • [52] Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning requires rethinking generalization. In International Conference on Learning Representations, 2017.
    Chiyuan Zhang,Samy Bengio,莫里茨哈特,本杰明·雷希特,和Oriol Vinyals。理解深度学习需要重新思考泛化。在2017年国际学习代表会议上。
  • [53] Chiyuan Zhang, Samy Bengio, and Yoram Singer. Are all layers created equal? Journal of Machine Learning Research, 23(67):1–28, 2022.
    Chiyuan Zhang,Samy Bengio,and Yoram Singer.所有的创造都是平等的吗?Journal of Machine Learning Research,23(67):1-28,2022。
  • [54] [五十四] Haichao Zhang and Jianyu Wang. Defense against adversarial attacks using feature scattering-based adversarial training. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
    张海超和王建宇。使用基于特征分散的对抗性训练防御对抗性攻击。In H. Wallach,H. Larochelle,A. Beygelzimer,F. d 'Alché-布克,E. Fox和R. Garnett,编辑,神经信息处理系统进展,第32卷。柯兰联营公司2019.
  • [55] Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael Jordan. Theoretically principled trade-off between robustness and accuracy. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 7472–7482. PMLR, 09–15 Jun 2019.
    Hongyang Zhang,Yaodong Yu,Jiantao Jiao,Eric Xing,Laurent El Ghaoui,and Michael Jordan.在理论上原则性的鲁棒性和准确性之间的权衡。Kamalika Chaudhuri和Ruslan Salakhutdinov,编辑,第36届机器学习国际会议论文集,机器学习研究论文集第97卷,第7472-7482页。PMLR,2019年6月9日至15日。
  • [56] Jingfeng Zhang, Jianing Zhu, Gang Niu, Bo Han, Masashi Sugiyama, and Mohan Kankanhalli. Geometry-aware instance-reweighted adversarial training. In International Conference on Learning Representations, 2021.
    Jingfeng Zhang,Jianing Zhu,Gang Niu,Bo Han,Masashi Sugiyama,and Mohan Kankanhalli.几何感知实例重加权对抗训练。在2021年国际学习代表会议上。

Appendix A Proof of the scale-invariant property
附录A标度不变性质的证明

Without loss of generality, assume a two layers neural network f𝑓f and ϕitalic-ϕ\phi is a ReLU-based activation function.
不失一般性,假设两层神经网络 f𝑓fϕitalic-ϕ\phi 是基于ReLU的激活函数。

f(𝜽f,𝒙)=𝜽(2)ϕ(𝜽(1)𝒙).𝑓subscript𝜽𝑓𝒙superscript𝜽2italic-ϕsuperscript𝜽1𝒙f({\bm{\theta}}_{f},{\bm{x}})={\bm{\theta}}^{(2)}\phi({\bm{\theta}}^{(1)}{\bm{x}}). (10)

The corresponding scaled neural network g𝑔g is:
对应的缩放神经网络 g𝑔g 是:

g(𝜽g,𝒙)=1β𝜽(2)ϕ(β𝜽(1)𝒙),𝑔subscript𝜽𝑔𝒙1𝛽superscript𝜽2italic-ϕ𝛽superscript𝜽1𝒙g({\bm{\theta}}_{g},{\bm{x}})=\frac{1}{\beta}{\bm{\theta}}^{(2)}\phi(\beta{\bm{\theta}}^{(1)}{\bm{x}}), (11)

where the non-negative β𝛽\beta is the scaling factor.
其中,非负的 β𝛽\beta 是缩放因子。

Suppose we calculate the MRC value of the first module 𝜽(1)superscript𝜽1{\bm{\theta}}^{(1)} and 1β𝜽(1)1𝛽superscript𝜽1\frac{1}{\beta}{\bm{\theta}}^{(1)}.
假设我们计算第一个模块 𝜽(1)superscript𝜽1{\bm{\theta}}^{(1)}1β𝜽(1)1𝛽superscript𝜽1\frac{1}{\beta}{\bm{\theta}}^{(1)} 的MRC值。

Theorem A.1.

The rectified function ϕ(x)=max(x,0)italic-ϕ𝑥𝑥0\phi(x)=\max(x,0) is a homogeneous function where

(z,β)×+,ϕ(βz)=βϕ(z).formulae-sequencefor-all𝑧𝛽superscriptitalic-ϕ𝛽𝑧𝛽italic-ϕ𝑧\displaystyle\forall(z,\beta)\in\mathbb{R}\times\mathbb{R^{+}},\,\phi(\beta z)=\beta\phi(z). (12)

定理A.1.整流函数 ϕ(x)=max(x,0)italic-ϕ𝑥𝑥0\phi(x)=\max(x,0) 是齐次函数,其中
Proof. 证据
ϕ(βz)=max(βz,0)=βmax(z,0)=βϕ(z).italic-ϕ𝛽𝑧𝛽𝑧0𝛽𝑧0𝛽italic-ϕ𝑧\displaystyle\phi(\beta z)=\max(\beta z,0)=\beta\max(z,0)=\beta\phi(z). (13)

Theorem A.2.

𝒙,f(𝜽f,𝒙)g(𝜽g,𝒙)for-all𝒙𝑓subscript𝜽𝑓𝒙𝑔subscript𝜽𝑔𝒙\forall{\bm{x}},f({\bm{\theta}}_{f},{\bm{x}})\equiv g({\bm{\theta}}_{g},{\bm{x}}).

 定理A.2. 𝒙,f(𝜽f,𝒙)g(𝜽g,𝒙)for-all𝒙𝑓subscript𝜽𝑓𝒙𝑔subscript𝜽𝑔𝒙\forall{\bm{x}},f({\bm{\theta}}_{f},{\bm{x}})\equiv g({\bm{\theta}}_{g},{\bm{x}})
Proof. 证据
g(𝜽g,𝒙)𝑔subscript𝜽𝑔𝒙\displaystyle g({\bm{\theta}}_{g},{\bm{x}}) =1β𝜽(2)ϕ(β𝜽(1)𝒙)absent1𝛽superscript𝜽2italic-ϕ𝛽superscript𝜽1𝒙\displaystyle=\frac{1}{\beta}{\bm{\theta}}^{(2)}\phi(\beta{\bm{\theta}}^{(1)}{\bm{x}}) (14)
1ββ𝜽(2)ϕ(𝜽(1)𝒙)absent1𝛽𝛽superscript𝜽2italic-ϕsuperscript𝜽1𝒙\displaystyle\equiv\frac{1}{\beta}\beta{\bm{\theta}}^{(2)}\phi({\bm{\theta}}^{(1)}{\bm{x}}) (15)
𝜽(2)ϕ(𝜽(1)𝒙)absentsuperscript𝜽2italic-ϕsuperscript𝜽1𝒙\displaystyle\equiv{\bm{\theta}}^{(2)}\phi({\bm{\theta}}^{(1)}{\bm{x}}) (16)
f(𝜽f,𝒙)absent𝑓subscript𝜽𝑓𝒙\displaystyle\equiv f({\bm{\theta}}_{f},{\bm{x}}) (17)

Theorem A.3.

The robust losses of f𝑓f and g𝑔g are equal:

(f(𝜽f),𝒟)(g(𝜽g),𝒟).𝑓subscript𝜽𝑓𝒟𝑔subscript𝜽𝑔𝒟\displaystyle\mathcal{R}(f({\bm{\theta}}_{f}),\mathcal{D})\equiv\mathcal{R}(g({\bm{\theta}}_{g}),\mathcal{D}). (18)

定理A.3. f𝑓fg𝑔g 的鲁棒损失相等:
Proof.

According to Theorem A.2,

𝒙+Δ𝒙,f(𝜽f,𝒙+Δ𝒙)g(𝜽g,𝒙+Δ𝒙).for-all𝒙Δ𝒙𝑓subscript𝜽𝑓𝒙Δ𝒙𝑔subscript𝜽𝑔𝒙Δ𝒙\displaystyle\forall{\bm{x}}+\Delta{\bm{x}},f({\bm{\theta}}_{f},{\bm{x}}+\Delta{\bm{x}})\equiv g({\bm{\theta}}_{g},{\bm{x}}+\Delta{\bm{x}}). (19)

证据根据定理A.2,

Thus, 因此,

maxΔ𝒙𝒮(f(𝜽f,𝒙+Δ𝒙),y)subscriptΔ𝒙𝒮𝑓subscript𝜽𝑓𝒙Δ𝒙𝑦\displaystyle\mathop{\max}_{\Delta{\bm{x}}\in\mathcal{S}}\ell(f({\bm{\theta}}_{f},{\bm{x}}+\Delta{\bm{x}}),y) (20)
\displaystyle\equiv maxΔ𝒙𝒮(g(𝜽g,𝒙+Δ𝒙),y).subscriptΔ𝒙𝒮𝑔subscript𝜽𝑔𝒙Δ𝒙𝑦\displaystyle\mathop{\max}_{\Delta{\bm{x}}\in\mathcal{S}}\ell(g({\bm{\theta}}_{g},{\bm{x}}+\Delta{\bm{x}}),y). (21)

Thus, 因此,

(f(𝜽f),𝒟)𝑓subscript𝜽𝑓𝒟\displaystyle\mathcal{R}(f({\bm{\theta}}_{f}),\mathcal{D}) =(𝒙,y)𝒟maxΔ𝒙𝒮(f(𝜽f,𝒙+Δ𝒙),y)absentsubscript𝒙𝑦𝒟subscriptΔ𝒙𝒮𝑓subscript𝜽𝑓𝒙Δ𝒙𝑦\displaystyle=\sum\limits_{({\bm{x}},y)\in\mathcal{D}}\max_{\Delta{\bm{x}}\in\mathcal{S}}\ell(f({\bm{\theta}}_{f},{\bm{x}}+\Delta{\bm{x}}),y) (22)
(𝒙,y)𝒟maxΔ𝒙𝒮(g(𝜽g,𝒙+Δ𝒙),y)absentsubscript𝒙𝑦𝒟subscriptΔ𝒙𝒮𝑔subscript𝜽𝑔𝒙Δ𝒙𝑦\displaystyle\equiv\sum\limits_{({\bm{x}},y)\in\mathcal{D}}\max_{\Delta{\bm{x}}\in\mathcal{S}}\ell(g({\bm{\theta}}_{g},{\bm{x}}+\Delta{\bm{x}}),y) (23)
=(g(𝜽g),𝒟)absent𝑔subscript𝜽𝑔𝒟\displaystyle=\mathcal{R}(g({\bm{\theta}}_{g}),\mathcal{D}) (24)

Theorem A.4.

The Module Robustness Criticality (MRC) proposed in Definition 3.1 is invariant to the scaling of the parameters.


定理A.4.定义3.1中提出的模块鲁棒性临界性(MRC)对参数的缩放是不变的。
Proof. 证据

Let Δ𝜽f={Δ𝜽f(1),𝟎},Δ𝜽g={Δ𝜽g(1),𝟎}formulae-sequenceΔsubscript𝜽𝑓Δsuperscriptsubscript𝜽𝑓10Δsubscript𝜽𝑔Δsuperscriptsubscript𝜽𝑔10\Delta{\bm{\theta}}_{f}=\{\Delta{\bm{\theta}}_{f}^{(1)},\mathbf{0}\},\Delta{\bm{\theta}}_{g}=\{\Delta{\bm{\theta}}_{g}^{(1)},\mathbf{0}\} be the perturbation of the first layer for network f𝑓f and g𝑔g respectively. First, we prove
Δ𝜽f={Δ𝜽f(1),𝟎},Δ𝜽g={Δ𝜽g(1),𝟎}formulae-sequenceΔsubscript𝜽𝑓Δsuperscriptsubscript𝜽𝑓10Δsubscript𝜽𝑔Δsuperscriptsubscript𝜽𝑔10\Delta{\bm{\theta}}_{f}=\{\Delta{\bm{\theta}}_{f}^{(1)},\mathbf{0}\},\Delta{\bm{\theta}}_{g}=\{\Delta{\bm{\theta}}_{g}^{(1)},\mathbf{0}\} 分别为网络 f𝑓fg𝑔g 的第一层的扰动。首先我们要证明

maxΔ𝜽f𝒞𝜽f(f(𝜽f+Δ𝜽f),𝒟)subscriptΔsubscript𝜽𝑓subscript𝒞subscript𝜽𝑓𝑓subscript𝜽𝑓Δsubscript𝜽𝑓𝒟\displaystyle\max_{\Delta{\bm{\theta}}_{f}\in\mathcal{C}_{{\bm{\theta}}_{f}}}\mathcal{R}(f({\bm{\theta}}_{f}+\Delta{\bm{\theta}}_{f}),\mathcal{D}) (25)
\displaystyle\leq maxΔ𝜽g𝒞𝜽g(g(𝜽g+Δ𝜽g),𝒟).subscriptΔsubscript𝜽𝑔subscript𝒞subscript𝜽𝑔𝑔subscript𝜽𝑔Δsubscript𝜽𝑔𝒟\displaystyle\max_{\Delta{\bm{\theta}}_{g}\in\mathcal{C}_{{\bm{\theta}}_{g}}}\mathcal{R}(g({\bm{\theta}}_{g}+\Delta{\bm{\theta}}_{g}),\mathcal{D}). (26)

Let

Δ𝜽f=argmaxΔ𝜽f𝒞θf(f(𝜽f+Δ𝜽f),𝒟),Δsuperscriptsubscript𝜽𝑓subscriptΔsubscript𝜽𝑓subscript𝒞subscript𝜃𝑓𝑓subscript𝜽𝑓Δsubscript𝜽𝑓𝒟\displaystyle\Delta{\bm{\theta}}_{f}^{*}=\mathop{\arg\max}\limits_{\Delta{\bm{\theta}}_{f}\in\mathcal{C}_{\theta_{f}}}\mathcal{R}(f({\bm{\theta}}_{f}+\Delta{\bm{\theta}}_{f}),\mathcal{D}), (27)
Δ𝜽g=argmaxΔ𝜽g𝒞θg(g(𝜽g+Δ𝜽g),𝒟).Δsuperscriptsubscript𝜽𝑔subscriptΔsubscript𝜽𝑔subscript𝒞subscript𝜃𝑔𝑔subscript𝜽𝑔Δsubscript𝜽𝑔𝒟\displaystyle\Delta{\bm{\theta}}_{g}^{*}=\mathop{\arg\max}\limits_{\Delta{\bm{\theta}}_{g}\in\mathcal{C}_{\theta_{g}}}\mathcal{R}(g({\bm{\theta}}_{g}+\Delta{\bm{\theta}}_{g}),\mathcal{D}). (28)

Consider the perturbation Δ𝜽g~=βΔ𝜽fΔ~subscript𝜽𝑔𝛽Δsuperscriptsubscript𝜽𝑓\Delta\tilde{{\bm{\theta}}_{g}}=\beta\Delta{\bm{\theta}}_{f}^{*} for g𝑔g, it is easy to show that Δ𝜽g~𝒞θgΔ~subscript𝜽𝑔subscript𝒞subscript𝜃𝑔\Delta\tilde{{\bm{\theta}}_{g}}\in\mathcal{C}_{\theta_{g}},
考虑 g𝑔g 的扰动 Δ𝜽g~=βΔ𝜽fΔ~subscript𝜽𝑔𝛽Δsuperscriptsubscript𝜽𝑓\Delta\tilde{{\bm{\theta}}_{g}}=\beta\Delta{\bm{\theta}}_{f}^{*} ,很容易证明 Δ𝜽g~𝒞θgΔ~subscript𝜽𝑔subscript𝒞subscript𝜃𝑔\Delta\tilde{{\bm{\theta}}_{g}}\in\mathcal{C}_{\theta_{g}}

𝒞𝜽fsubscript𝒞subscript𝜽𝑓\displaystyle\mathcal{C}_{{\bm{\theta}}_{f}} ={Δ𝜽f|Δ𝜽fpϵ𝜽f(1)p},absentconditional-setΔsubscript𝜽𝑓subscriptdelimited-∥∥Δsubscript𝜽𝑓𝑝italic-ϵsubscriptdelimited-∥∥superscriptsubscript𝜽𝑓1𝑝\displaystyle=\{\Delta{\bm{\theta}}_{f}\,\big{|}\,\lVert\Delta{\bm{\theta}}_{f}\rVert_{p}\leq\epsilon\lVert{\bm{\theta}}_{f}^{(1)}\rVert_{p}\}, (29)
𝒞𝜽gsubscript𝒞subscript𝜽𝑔\displaystyle\mathcal{C}_{{\bm{\theta}}_{g}} ={Δ𝜽g|Δ𝜽gpϵ𝜽g(1)p}absentconditional-setΔsubscript𝜽𝑔subscriptdelimited-∥∥Δsubscript𝜽𝑔𝑝italic-ϵsubscriptdelimited-∥∥superscriptsubscript𝜽𝑔1𝑝\displaystyle=\{\Delta{\bm{\theta}}_{g}\,\big{|}\,\lVert\Delta{\bm{\theta}}_{g}\rVert_{p}\leq\epsilon\lVert{\bm{\theta}}_{g}^{(1)}\rVert_{p}\} (30)
={Δ𝜽g|Δ𝜽g=βΔ𝜽fpϵβ𝜽f(1)p}.absentconditional-setΔsubscript𝜽𝑔Δsubscript𝜽𝑔𝛽subscriptdelimited-∥∥Δsubscript𝜽𝑓𝑝italic-ϵ𝛽subscriptdelimited-∥∥superscriptsubscript𝜽𝑓1𝑝\displaystyle=\{\Delta{\bm{\theta}}_{g}\,\big{|}\,\Delta{\bm{\theta}}_{g}=\beta\lVert\Delta{\bm{\theta}}_{f}\rVert_{p}\leq\epsilon\beta\lVert{\bm{\theta}}_{f}^{(1)}\rVert_{p}\}. (31)

Therefore, 因此,我们认为,

(g(𝜽g+Δ𝜽g~),𝒟)𝑔subscript𝜽𝑔Δ~subscript𝜽𝑔𝒟\displaystyle\mathcal{R}(g({\bm{\theta}}_{g}+\Delta\tilde{{\bm{\theta}}_{g}}),\mathcal{D}) maxΔ𝜽g𝒞θg(g(𝜽g+Δ𝜽g),𝒟)absentsubscriptΔsubscript𝜽𝑔subscript𝒞subscript𝜃𝑔𝑔subscript𝜽𝑔Δsubscript𝜽𝑔𝒟\displaystyle\leq\mathop{\max}\limits_{\Delta{\bm{\theta}}_{g}\in\mathcal{C}_{\theta_{g}}}\mathcal{R}(g({\bm{\theta}}_{g}+\Delta{\bm{\theta}}_{g}),\mathcal{D}) (32)
=(g(𝜽g+Δ𝜽g),𝒟).absent𝑔subscript𝜽𝑔Δsuperscriptsubscript𝜽𝑔𝒟\displaystyle=\mathcal{R}(g({\bm{\theta}}_{g}+\Delta{\bm{\theta}}_{g}^{*}),\mathcal{D}). (33)

Repeat the same analysis as presented in Theorem A.2,
重复定理A.2中的分析,

g(𝜽g+Δ𝜽g~)𝑔subscript𝜽𝑔Δ~subscript𝜽𝑔\displaystyle g({\bm{\theta}}_{g}+\Delta\tilde{{\bm{\theta}}_{g}}) (34)
=\displaystyle= 1β𝜽(2)ϕ((β𝜽(1)+βΔ𝜽f)𝒙)1𝛽superscript𝜽2italic-ϕ𝛽superscript𝜽1𝛽Δsuperscriptsubscript𝜽𝑓𝒙\displaystyle\frac{1}{\beta}{\bm{\theta}}^{(2)}\phi((\beta{\bm{\theta}}^{(1)}+\beta\Delta{\bm{\theta}}_{f}^{*}){\bm{x}}) (35)
=\displaystyle= 𝜽(2)ϕ((𝜽(1)+Δ𝜽f)𝒙)superscript𝜽2italic-ϕsuperscript𝜽1Δsuperscriptsubscript𝜽𝑓𝒙\displaystyle{\bm{\theta}}^{(2)}\phi(({\bm{\theta}}^{(1)}+\Delta{\bm{\theta}}_{f}^{*}){\bm{x}}) (36)
\displaystyle\equiv f(𝜽f+Δ𝜽f).𝑓subscript𝜽𝑓Δsuperscriptsubscript𝜽𝑓\displaystyle f({\bm{\theta}}_{f}+\Delta{\bm{\theta}}_{f}^{*}). (37)

According to Theorem A.3,
根据定理A.3,

(f(𝜽f+Δ𝜽f),𝒟)𝑓subscript𝜽𝑓Δsuperscriptsubscript𝜽𝑓𝒟\displaystyle\mathcal{R}(f({\bm{\theta}}_{f}+\Delta{\bm{\theta}}_{f}^{*}),\mathcal{D}) (g(𝜽g+Δ𝜽g~),𝒟)absent𝑔subscript𝜽𝑔Δ~subscript𝜽𝑔𝒟\displaystyle\equiv\mathcal{R}(g({\bm{\theta}}_{g}+\Delta\tilde{{\bm{\theta}}_{g}}),\mathcal{D}) (38)
(g(𝜽g+Δ𝜽g),𝒟).absent𝑔subscript𝜽𝑔Δsuperscriptsubscript𝜽𝑔𝒟\displaystyle\leq\mathcal{R}(g({\bm{\theta}}_{g}+\Delta{\bm{\theta}}_{g}^{*}),\mathcal{D}). (39)

Similarly, we can prove
同样,我们可以证明

maxΔ𝜽g𝒞𝜽g(g(𝜽g+Δ𝜽g),𝒟)subscriptΔsubscript𝜽𝑔subscript𝒞subscript𝜽𝑔𝑔subscript𝜽𝑔Δsubscript𝜽𝑔𝒟\displaystyle\max_{\Delta{\bm{\theta}}_{g}\in\mathcal{C}_{{\bm{\theta}}_{g}}}\mathcal{R}(g({\bm{\theta}}_{g}+\Delta{\bm{\theta}}_{g}),\mathcal{D}) (40)
\displaystyle\leq maxΔ𝜽f𝒞𝜽f(f(𝜽f+Δ𝜽f),𝒟).subscriptΔsubscript𝜽𝑓subscript𝒞subscript𝜽𝑓𝑓subscript𝜽𝑓Δsubscript𝜽𝑓𝒟\displaystyle\max_{\Delta{\bm{\theta}}_{f}\in\mathcal{C}_{{\bm{\theta}}_{f}}}\mathcal{R}(f({\bm{\theta}}_{f}+\Delta{\bm{\theta}}_{f}),\mathcal{D}). (41)

Thus, 因此,

maxΔ𝜽f𝒞θf(f(θf+Δ𝜽f),𝒟)subscriptΔsubscript𝜽𝑓subscript𝒞subscript𝜃𝑓𝑓subscript𝜃𝑓Δsubscript𝜽𝑓𝒟\displaystyle\max_{\Delta{\bm{\theta}}_{f}\in\mathcal{C}_{\theta_{f}}}\mathcal{R}(f(\theta_{f}+\Delta{\bm{\theta}}_{f}),\mathcal{D}) (42)
=\displaystyle= maxΔ𝜽gC𝜽g(g(θg+Δ𝜽g),𝒟).subscriptΔsubscript𝜽𝑔subscript𝐶subscript𝜽𝑔𝑔subscript𝜃𝑔Δsubscript𝜽𝑔𝒟\displaystyle\max_{\Delta{\bm{\theta}}_{g}\in C_{{\bm{\theta}}_{g}}}\mathcal{R}(g(\theta_{g}+\Delta{\bm{\theta}}_{g}),\mathcal{D}). (43)

Such that the proof ends. ∎
这样证明就结束了。∎

Appendix B Algorithm of RiFT
附录B RiFT算法

The complete algorithm of RiFT is presented in Algorithm 2.
RiFT的完整算法在算法2中给出。

Algorithm 2 Robust Critical Fine-Tuning
算法2鲁棒临界微调
1:adversarially trained model weights 𝜽ATsubscript𝜽𝐴𝑇{\bm{\theta}}_{AT}, standard dataset 𝒟stdsubscript𝒟𝑠𝑡𝑑\mathcal{D}_{std}, weight perturbation scaling factor α𝛼\alpha, fine-tuning optimization iteration steps T𝑇T and learning rate γ𝛾\gamma, weight decay facotr λ𝜆\lambda.
2:The fine-tuned model weights 𝜽ATsuperscriptsubscript𝜽𝐴𝑇{\bm{\theta}}_{AT}^{*}.
3:Step 1: Calculate MRC for each module
4:for Module weight 𝜽(j)superscript𝜽𝑗{\bm{\theta}}^{(j)} do
5:     Calculate MRC value of 𝜽(j)superscript𝜽𝑗{\bm{\theta}}^{(j)} using Algorithm 1.
6:end for
7:Select the module with lowest MRC value, denote as non-robust critical module 𝜽(i)superscript𝜽𝑖{\bm{\theta}}^{(i)}
8:Step 2: Fine-tuning on Non-robust critical module
9:𝜽1=𝜽ATsubscript𝜽1subscript𝜽𝐴𝑇{\bm{\theta}}_{1}={\bm{\theta}}_{AT}
10:for t=1,,T𝑡1𝑇t=1,\ldots,T do \triangleright Fine-tuning T𝑇T epochs
11:     for Batch k𝒟stdsubscript𝑘subscript𝒟𝑠𝑡𝑑\mathcal{B}_{k}\in\mathcal{D}_{std} do
12:         Calculate loss: (f(𝜽t),k))\mathcal{L}(f({\bm{\theta}}_{t}),\mathcal{B}_{k}))
13:         𝜽t+1(i)=𝜽t+1(i)γ𝜽t()subscriptsuperscript𝜽𝑖𝑡1subscriptsuperscript𝜽𝑖𝑡1𝛾subscriptsubscript𝜽𝑡{\bm{\theta}}^{(i)}_{t+1}={\bm{\theta}}^{(i)}_{t+1}-\gamma\nabla_{{\bm{\theta}}_{t}}(\mathcal{L}) \triangleright Gradient Descent
14:     end for
15:     𝜽FT=𝜽tsubscript𝜽𝐹𝑇subscript𝜽𝑡{\bm{\theta}}_{FT}={\bm{\theta}}_{t} if 𝜽tsubscript𝜽𝑡{\bm{\theta}}_{t} obtain highest std test acc.
16:end for
17:Step 3: Interpolation
18:for α(0,1,0.05)𝛼010.05\alpha\in(0,1,0.05) do
19:     𝜽α=(1α)𝜽AT+α𝜽FTsubscript𝜽𝛼1𝛼subscript𝜽𝐴𝑇𝛼subscript𝜽𝐹𝑇{\bm{\theta}}_{\alpha}=(1-\alpha){\bm{\theta}}_{AT}+\alpha{\bm{\theta}}_{FT}
20:     𝜽FT=𝜽αsuperscriptsubscript𝜽𝐹𝑇subscript𝜽𝛼{\bm{\theta}}_{FT}^{*}={\bm{\theta}}_{\alpha} if it reaches best standard test acc while preserve the robustness as 𝜽ATsubscript𝜽𝐴𝑇{\bm{\theta}}_{AT}.
21:end for
22:Return Fine-tuned model weights 𝜽FTsuperscriptsubscript𝜽𝐹𝑇{\bm{\theta}}_{FT}^{*}

Appendix C Training Details
附录C培训详情

C.1 Experiment Environment
C.1实验环境

All experiments are conducted on a workstation equipped with an NVIDIA GeForce RTX 3090 GPU with 24GB memory and NVIDIA A100 with 80GB memory. The PyTorch version is 1.11.0.
所有实验都在配备有具有24GB存储器的NVIDIA GeForce RTX 3090 GPU和具有80GB存储器的NVIDIA A100的工作站上进行。PyTorch版本是1.11.0。

C.2 Adversarial Training Details
C.2对抗性培训详情

For vanilla adversarial training, We set the initial learning rate as 0.1, which decays at 100 and 105 epochs with factor 10. When generating adversarial examples, we set BN as train mode since it usually achieves higher robustness.
对于普通的对抗训练,我们将初始学习率设置为0.1,它在100和105个时期以因子10衰减。在生成对抗样本时,我们将BN设置为训练模式,因为它通常具有更高的鲁棒性。

When incorporating RiFT with other adversarial training methods, the SCORE method is incorporated with TRADES. For the CIFAR100 training, we ran with three different learning rate and select the best model weights as the one with highest robust accuracy. The hyper-parameter settings are either based on their original paper or same as the vanilla AT, depends on which method achieves better robust accuracy.
当将RIFT与其他对抗训练方法结合时,SCORE方法与TRADES结合。对于CIFAR100训练,我们使用三种不同的学习率运行,并选择最佳模型权重作为具有最高鲁棒精度的模型权重。超参数设置要么基于其原始论文,要么与vanilla AT相同,这取决于哪种方法实现了更好的鲁棒精度。

C.3 Fine-tuning Details C. 3微调详情

The hyper-parameter that most affects fine-tuning is the initial learning rate. According to our experience, we find a small learning rate usually performs better. If the adversarial robustness of the final fine-tuned weights is still higher than the robustness of the initial adversarial training, we then increase the learning rate.
对微调影响最大的超参数是初始学习率。根据我们的经验,我们发现小的学习率通常表现得更好。如果最终微调权重的对抗性鲁棒性仍然高于初始对抗性训练的鲁棒性,那么我们就提高学习率。

C.4 The MRC value of ResNet34 and WRN34-10
C.4 ResNet 34和WRN 34 -10的MRC值

Figure C.1 and Figure C.2 shows the Module Robust Criticality (MRC) value of each module in ResNet34 trained on CIFAR100 and WideResNet34 trained on Tiny-ImageNet, respectively. It can be observed that both models exhibit redundant capacity. Additionally, Figure C.3 and Figure C.4 shows the MRC value of each module in ResNet18 trained on CIFAR100 and Tiny-ImageNet, respectively. As we discussed in Section 5.3 and Section 5.5, ResNet18 has a lower redundant capacity compared to ResNet34 and WideResNet34, and the redundant capacity decreases as the classification task becomes more complex.
图C.1和图C.2分别显示了在CIFAR 100上训练的ResNet 34和在Tiny-ImageNet上训练的WideResNet 34中每个模块的模块鲁棒关键性(MRC)值。可以观察到,两种模型都表现出冗余容量。此外,图C.3和图C.4分别显示了在CIFAR 100和Tiny-ImageNet上训练的ResNet 18中每个模块的MRC值。正如我们在第5.3节和第5.5节中所讨论的,与ResNet 34和WideResNet 34相比,ResNet 18的冗余容量较低,并且随着分类任务变得更加复杂,冗余容量也会降低。

Refer to caption
Figure C.1: Example of module robust criticality (MRC) and its corresponding robust accuracy drop of ResNet34 trained on CIFAR100.
图C.1:在CIFAR100上训练的ResNet34的模块鲁棒关键性(MRC)及其相应的鲁棒准确度下降示例。
Refer to caption
Figure C.2: Example of module robust criticality (MRC) and its corresponding robust accuracy drop of WideResNet34 trained on Tiny-ImageNet.
图C.2:在Tiny-ImageNet上训练的WideResNet 34的模块鲁棒关键性(MRC)及其相应的鲁棒准确性下降的示例。
Refer to caption
Figure C.3: Example of module robust criticality (MRC) and its corresponding robust accuracy drop of ResNet18 trained on CIFAR100.
图C.3:在CIFAR100上训练的ResNet18的模块鲁棒关键性(MRC)及其相应的鲁棒准确性下降示例。
Refer to caption
Figure C.4: Example of module robust criticality (MRC) and its corresponding robust accuracy drop of ResNet18 trained on Tiny-ImageNet.
图C.4:在Tiny-ImageNet上训练的ResNet 18的模块鲁棒关键性(MRC)及其相应的鲁棒准确性下降的示例。

C.5 More interpolation results
C.5更多插值结果

Figure C.5 shows the interpolation results of different modules of ResNet18 trained on CIFAR100 dataset. It can be observed that fine-tuning on robust-critical module can also help improve generalization and robustness. This does not mean that our MRC is wrong, as we claimed in Section 5.2, fine-tuning on robust-critical module does not necessarily hurt robustness. The MRC provides guidance on which module to fine-tune for optimal results, and still, fine-tuning on non-robust-critical module achieves the highest test accuracy while preserving robustness.
图C.5显示了在CIFAR 100数据集上训练ResNet 18的不同模块的插值结果。可以观察到,对鲁棒关键模块的微调也可以帮助提高泛化和鲁棒性。这并不意味着我们的MRC是错误的,正如我们在5.2节中所声称的,对鲁棒关键模块进行微调并不一定会损害鲁棒性。MRC提供了对哪个模块进行微调以获得最佳结果的指导,并且仍然,对非鲁棒性关键模块进行微调可以在保持鲁棒性的同时实现最高的测试精度。

Refer to caption
Figure C.5: Interpolation results of fine-tuning on different modules of ResNet18 on CIFAR100 dataset. Dots denote different interpolation points between the final fine-tuned weights of RiFT and the initial adversarially trained weights.
图C.5:ResNet18不同模块在CIFAR100数据集上微调的插值结果。点表示RiFT的最终微调权重和初始逆向训练权重之间的不同插值点。

Appendix D Analysis of the complexity of MRC algorithm
附录D MRC算法的复杂度分析

When identifying the most non-robust-critical module, it is required to iterate all modules of the model. Suppose a model with n𝑛n modules, for each module, the calculation complexity depends on the iteration steps in Algorithm 1. Considering the different overheads for each iterative computation of the modules at different locations, for example, when calculating the last module’s MRC value, it only requires forward-backward iteration of the last layer of parameters. Thus, the average total forward-backward iteration of each module is n/2𝑛2n/2. In our experiments, we set the learning rate as 1 and the iteration step as 10. Thus, in our experiments, the complexity of MRC algorithm cost 5n5𝑛5n total forward-backward propagation.
当识别最不健壮关键的模块时,需要对模型的所有模块进行建模。假设一个模型有 n𝑛n 个模块,对于每个模块,计算复杂度取决于算法1中的迭代步骤。考虑到不同位置的模块每次迭代计算的开销不同,例如在计算最后一个模块的MRC值时,只需要对最后一层参数进行前后向迭代即可。因此,每个模块的平均总向前-向后迭代是 n/2𝑛2n/2 。在我们的实验中,我们将学习率设置为1,迭代步长设置为10。因此,在我们的实验中,MRC算法的复杂度花费 5n5𝑛5n 总的前向-后向传播。