這是用戶在 2024-4-9 21:18 為 https://app.immersivetranslate.com/pdf-pro/4ca264eb-6476-41b5-9e42-cd041e5f54b1 保存的雙語快照頁面,由 沉浸式翻譯 提供雙語支持。了解如何保存?

OpenPrompt: An Open-source Framework for Prompt-learning

Ning Ding* Shengding , Weilin Zhao*, Yulin Chen,
寧丁* 盛丁 ,趙維林* ,陳玉琳,
Zhiyuan Liu Hai-Tao Zheng , Maosong Sun
劉志遠 鄭海濤 ,孫茂松
Tsinghua University, Bejing, China
{dingn18, hsd20, zwl19, yl-chen21}@mails.tsinghua.edu.cn

Abstract 摘要

Prompt-learning has become a new paradigm in modern natural language processing, which directly adapts pre-trained language models (PLMs) to cloze-style prediction, autoregressive modeling, or sequence to sequence generation, resulting in promising performances on various tasks. However, no standard implementation framework of prompt-learning is proposed yet, and most existing promptlearning codebases, often unregulated, only provide limited implementations for specific scenarios. Since there are many details such as templating strategy, initializing strategy, and verbalizing strategy, etc. need to be considered in prompt-learning, practitioners face impediments to quickly adapting the desired prompt learning methods to their applications. In this paper, we present OpenPrompt, a unified easyto-use toolkit to conduct prompt-learning over PLMs. OpenPrompt is a research-friendly framework that is equipped with efficiency, modularity, and extendibility, and its combinability allows the freedom to combine different PLMs, task formats, and prompting modules in a unified paradigm. Users could expediently deploy prompt-learning frameworks and evaluate the generalization of them on different NLP tasks without constraints. OpenPrompt is publicly released at https://github.com/ thunlp/OpenPrompt.
即時學習已成為現代自然語言處理中的新範式,它直接適應預訓練語言模型(PLMs)進行填空式預測、自回歸建模或序列生成,從而在各種任務上取得了令人期待的表現。然而,目前尚未提出標準的即時學習實現框架,大多數現有的即時學習程式庫通常沒有規範,只提供特定場景的有限實現。由於即時學習需要考慮許多細節,如模板策略、初始化策略和語言表達策略等,從業人員在快速適應所需的即時學習方法到他們的應用程序時面臨困難。在本文中,我們提出了OpenPrompt,一個統一且易於使用的工具包,用於在PLMs上進行即時學習。OpenPrompt是一個研究友好的框架,具有高效性、模塊化和可擴展性,其組合性允許在統一的範式中結合不同的PLMs、任務格式和提示模塊。 使用者可以方便地部署即時學習框架並在不受限制的情況下評估其在不同 NLP 任務上的泛化能力。OpenPrompt 已在 https://github.com/thunlp/OpenPrompt 公開發布。

1 Introduction 1 簡介

Pre-trained language models (PLMs) (Han et al., 2021a; Qiu et al., 2020) have been widely proven to be effective in natural language understanding and generation, ushering in a new era of modern natural language processing (NLP). In the early stage of this revolution, a standard approach to adapt PLMs to various specific NLP tasks is the
預訓練語言模型(PLMs)(Han et al., 2021a; Qiu et al., 2020)已被廣泛證明在自然語言理解和生成方面非常有效,開啟了現代自然語言處理(NLP)的新時代。在這場革命的早期階段,將 PLMs 適應於各種特定的 NLP 任務的標準方法是
pretraining-finetuning paradigm, where additional parameters and task-specific objectives are introduced in the tuning procedure. However recently, the paradigm of the adaptation of PLMs is shifting. Originated in T5 (Raffel et al., 2019) and GPT3 (Brown et al., 2020), researchers find that PLMs can be effectively stimulated by textual prompts or demonstrations, especially in low-data scenarios.
預訓練-微調範式,其中在微調過程中引入了額外的參數和任務特定目標。然而,最近PLM適應範式的趨勢正在轉變。源於T5(Raffel et al., 2019)和GPT3(Brown et al., 2020),研究人員發現PLM可以通過文本提示或示範有效地進行刺激,尤其是在數據稀缺的情況下。
Take a simple prompt-based sentiment classification for example, the pipeline consists of a template and a verbalizer, where a template is used to process the original text with some extra tokens, and a verbalizer projects original labels to words in the vocabulary for final prediction. Assume the template is " It is ", where the token stands for the original text, and the verbalizer is {"positive":"great", "negative": "terrible" . The sentence "Albert Einstein was one of the greatest intellects of his time." will first be wrapped by the pre-defined template as " bert Einstein was one of the greatest intellects of his time. It is ". The wrapped sentence is then tokenized and fed into a PLM to predict the distribution over vocabulary on the <mask token position. It is expected that the word great should have a larger probability than terrible.
以一個簡單的基於提示的情感分類為例,流程包括模板和口語化器,其中模板用於處理原始文本並添加一些額外的標記,而口語化器將原始標籤映射到詞彙表中的詞語以進行最終預測。假設模板是「它是」,其中標記代表原始文本,而口語化器是{"positive":"很棒", "negative": "糟糕"}。句子「阿爾伯特·愛因斯坦是他那個時代最偉大的智者之一。」首先會被預定義的模板包裹成「 阿爾伯特·愛因斯坦是他那個時代最偉大的智者之一。它是」。然後,被包裹的句子會被分詞並輸入一個PLM(預訓練語言模型),以預測在
As illustrated above, prompt-learning projects the downstream tasks to pre-training objectives for PLMs with the help of textual or softencoding prompts. A series of studies of promptlearning (Liu et al., 2021a) have been proposed to investigate the strategies of constructing templates (Schick and Schütze, 2021; Gao et al., 2021; Liu et al., 2021b), verbalizers (Hu et al., 2021), optimization (Lester et al., 2021), and application (Li and Liang, 2021; Han et al., 2021b; Ding et al., 2021a) for this paradigm.
如上所示,提示學習將下游任務與預訓練目標相結合,通過文本或軟編碼提示來幫助 PLM。一系列的提示學習研究(Liu 等,2021a)已提出來研究構建模板(Schick 和 Schütze,2021;Gao 等,2021;Liu 等,2021b)、語言化(Hu 等,2021)、優化(Lester 等,2021)和應用(Li 和 Liang,2021;Han 等,2021b;Ding 等,2021a)的策略。
A prompt-learning problem could be regarded as a synthesis of PLMs, human prior knowledge, and specific NLP tasks that need to be handled.
提示學習問題可以被視為 PLM、人類先驗知識和需要處理的特定 NLP 任務的綜合。
Example PLM Template Verbalizer Task Reference
Naive TC MLM & Seq2Seq M. text M. One-Many Text Classification -
Naive KP LM & Seq2Seq M. text - Knowledge Probing -
Naive FET MLM M. text (meta info) M. One-Many Entity Typing (Ding et al., 2021a)
PTR MLM M. text (complex) M. One-One Relation Extratcion (Han et al., 2021b)
P-tuning LM Soft tokens M. One-One Text Classification (Liu et al., 2021b)
Prefix-tuning LM, Seq2Seq Soft tokens - Text Generation (Li and Liang, 2021)
LM-BFF MLM A. text M. One-Many Text Classification (Gao et al., 2021)
Table 1: Some examples implemented by OpenPrompt, where M. is the abbreviation of manually defined and A. is the abbreviation of automatically generated. Note that different approaches focus on different parts in prompt-learning. Additional to the whole pipeline, our specific implementations of these methods are integrated into the specific classes of OpenPrompt. For example, the core implementation of KPT is in the KnowledgeableVerbalizer class.
表 1:OpenPrompt 實現的一些示例,其中 M. 是手動定義的縮寫,A. 是自動生成的縮寫。請注意,不同的方法專注於提示學習的不同部分。除了整個流程外,我們將這些方法的具體實現集成到 OpenPrompt 的特定類中。例如,KPT 的核心實現在 KnowledgeableVerbalizer 類中。
Hence, it is hard to support the particular implementations of prompt-learning elegantly with the current deep learning or NLP libraries while there is also a lack of a standard paradigm. Previous works pursue the most efficient way to implement promptlearning with the least modification to the existing framework for traditional fine-tuning, resulting in poor readability and even unstable reproducibility. Moreover, the performance of a prompt-learning pipeline varies greatly with the choice of templates and verbalizers (Zhao et al., 2021), creating more barriers for implementations. Lastly, there is no comprehensive open-source framework particularly designed for prompt-learning at present, which makes it difficult to try out new methods and make rigorous comparisons for previous approaches.
To this end, we present OpenPrompt, an opensource, easy-to-use, and extensible toolkit for prompt-learning. OpenPrompt modularizes the whole framework of prompt-learning and considers the interactions between each module. We highlight the feature of combinability of OpenPrompt, which supports flexible combinations of diverse task formats, PLMs, and prompting modules. For example, we can easily adapt prefix-tuning ( and Liang, 2021) to a text classification task in OpenPrompt. This feature enables users to assess the generalization of their prompt-learning models on various tasks, but not only the performance on specific tasks.
為此,我們提出了OpenPrompt,這是一個開源、易於使用且可擴展的提示學習工具包。OpenPrompt將整個提示學習框架模塊化,並考慮每個模塊之間的交互作用。我們強調OpenPrompt的組合特性,它支持多樣化的任務格式、PLMs和提示模塊的靈活組合。例如,我們可以輕鬆地將前綴調整( 和Liang,2021)適應到OpenPrompt中的文本分類任務。這個特性使用戶能夠評估他們的提示學習模型在各種任務上的泛化能力,而不僅僅是特定任務的性能。
Specifically, in OpenPrompt, a Template class is used to define or generate textual or softencoding templates to wrap the original input. To flexibly support various templates under a unified paradigm, we design a new template language that could easily conduct token-level customization for the corresponding attributes. For example, users can specify which tokens are shared embedding, trainable, or in what way these tokens are to be post-processed, without having to perform complex implementations for specific templates. A Verbalizer projects the classification labels to words in the vocabulary, and a PromptModel is responsible for the training and inference process. Each module in OpenPrompt is clearly defined while retaining its independence and coupling so that researchers can easily deploy a model and make targeted improvements. We also implement baselines with OpenPrompt and evaluate them on a broad scope of NLP tasks, demonstrating the effectiveness of OpenPrompt.
The area of prompt-learning is in the exploratory stage with rapid development. Hopefully, OpenPrompt could help beginners quickly understand prompt-learning, enable researchers to efficiently deploy prompt-learning research pipeline, and empower engineers to readily apply prompt-learning to practical NLP systems to solve real-world problems. OpenPrompt will not only open source all the code, but will also continue to update the documentation to provide detailed tutorials.
在探索階段,快速發展的提示學習領域。希望 OpenPrompt 能夠幫助初學者快速理解提示學習,使研究人員能夠高效部署提示學習研究流程,並使工程師能夠輕鬆應用提示學習於實際的自然語言處理系統,解決現實世界的問題。OpenPrompt 不僅會開源所有代碼,還會持續更新文檔,提供詳細的教程。

2 Background 2 背景

Prompt-learning reveals what the next generation of NLP may look like.
Although PLMs have achieved tremendous success on almost all the subtasks in NLP, one problem still hangs in the air, have we really fully exploited the potential of PLMs, especially the big ones? Conventional fine-tuning uses extra task-specific heads and objectives for adaptation, but this strategy may face two issues. On the one hand, such an approach creates a natural gap between model tuning and pre-training. On the other hand, as the number of model parameters increases, this finetuning approach becomes increasingly difficult to
雖然 PLMs 在 NLP 的幾乎所有子任務上取得了巨大的成功,但仍然存在一個問題,我們是否真正充分發揮了 PLMs 的潛力,尤其是大型的 PLMs?傳統的微調方法使用額外的任務特定頭和目標進行適應,但這種策略可能面臨兩個問題。一方面,這種方法在模型微調和預訓練之間產生了自然的差距。另一方面,隨著模型參數的增加,由於計算量巨大(例如 GPT-3(Brown 等,2020)),這種微調方法變得越來越困難。

operate due to the massive computational volume (e.g., GPT-3 (Brown et al., 2020)).
By mimicking the process of pre-training, prompt-learning intuitively bridges the gap between pre-training and model tuning. Practically, this paradigm is surprisingly effective in low-data regime (Le Scao and Rush, 2021; Gao et al., 2021). For example, with appropriate template, zero-shot prompt-learning could even outperform 32-shot fine-tuning (Ding et al., 2021a). Another promising empirical attribute of prompt-learning is the potential to stimulate large-scale PLMs. When it comes to a 10B model, solely optimizing prompts (the parameters of the model are fixed) could achieve comparable performance to full parameter finetuning (Lester et al., 2021). These practical studies imply that we may use prompts to more effectively and efficiently dig the knowledge kept in PLMs, leading to a deeper understanding of the underlying principles of their mechanisms (Wei et al., 2021; Qin et al., 2021; Vu et al., 2021).
通過模仿預訓練的過程,提示學習直觀地彌合了預訓練和模型微調之間的差距。在實際應用中,這種範式在低數據情況下非常有效(Le Scao和Rush,2021; Gao等,2021)。例如,使用適當的模板,零-shot提示學習甚至可以超越32-shot微調(Ding等,2021a)。提示學習的另一個有前途的實證特點是激發大規模PLM的潛力。當涉及到10B模型時,僅優化提示(模型的參數固定)可以達到與完整參數微調相當的性能(Lester等,2021)。這些實際研究暗示我們可以使用提示更有效、更高效地挖掘PLM中保存的知識,從而更深入地理解其機制的基本原理(Wei等,2021; Qin等,2021; Vu等,2021)。
From a practical implementation point of view, prompt-learning is actually complex and requires a lot of detailed consideration. With general-purpose NLP under the prompt-learning paradigm as our target, we present OpenPrompt, a unified toolkit to effectively and efficiently implement promptlearning approaches. OpenPrompt demonstrates a comprehensive view of the programming details of prompt-learning, and enables practitioners to quickly understand the mechanisms and practical attributes of this technique. And one can quickly deploy existing representative promptlearning algorithms that are already implemented in the package under a unified programming framework. Moreover, OpenPrompt allows researchers or developers to quickly try out new ideas of prompt-learning, which not only includes newly designed templates or verbalizers, but also the exploration of the attributes of prompt-learning, e.g., prompt-based adversarial attacking.
從實際實施的角度來看,提示學習實際上是復雜的,需要考慮很多細節。以通用的 NLP 在提示學習範式下為目標,我們提出了 OpenPrompt,一個統一的工具包,以有效且高效地實現提示學習方法。OpenPrompt 展示了提示學習的編程細節的全面視圖,使從業人員能夠快速理解這一技術的機制和實際屬性。使用統一的編程框架,可以快速部署已經在該套件中實現的代表性提示學習算法。此外,OpenPrompt 還允許研究人員或開發人員快速嘗試提示學習的新思想,其中不僅包括新設計的模板或語言化器,還包括對提示學習屬性的探索,例如基於提示的對抗攻擊。

3 Design and Implementation
3 設計和實施

As stated in § 1, prompt-learning is a comprehensive process that combines PLMs, human knowledge, and specific NLP tasks. Keeping that in mind, the design philosophy is to simultaneously consider the independence and mutual coupling of each module. As illustrated in Figure 1, OpenPrompt provides the full life-cycle of prompt-learning based on PyTorch (Paszke et al., 2019). In this section, we first introduce the combinability of OpenPrompt, and then the detailed design and implementation of each component in OpenPrompt.
如第 1 節所述,即時學習是一個綜合過程,結合了 PLMs、人類知識和特定的 NLP 任務。著眼於此,設計理念是同時考慮每個模塊的獨立性和相互耦合性。如圖 1 所示,OpenPrompt 基於 PyTorch(Paszke 等,2019)提供了基於提示學習的完整生命周期。在本節中,我們首先介紹 OpenPrompt 的可組合性,然後介紹 OpenPrompt 中每個組件的詳細設計和實現。

3.1 Combinability 3.1 可組合性

In the NLP world, we usually adopt different PLMs with corresponding objective functions to different underlying tasks (roughly, classification and generation). But in prompt learning, given that the core idea of the framework is to mimic pretraining tasks in the downstream task, which are essentially "predicting words based on context", we can further unify the execution of downstream tasks. OpenPrompt supports a combination of tasks (classification and generation), PLMs (MLM, LM and Seq2Seq), and prompt modules (different templates and verbalizers) in a flexible way. For example, from a model perspective, T5 (Raffel et al., 2019) is not only used for span prediction and GPT (Brown et al., 2020) is not only used for generative tasks. From the perspective of prompting, prefix-tuning can also be used for classification, and soft prompt can be used for generation. All these combinations can easily be implemented and validated on NLP tasks in our framework so that we can better understand the mechanisms involved.
在自然語言處理領域中,我們通常根據不同的底層任務(大致上分為分類和生成)採用不同的預訓練語言模型(PLMs)和相應的目標函數。但在提示學習中,由於該框架的核心思想是在下游任務中模仿預訓練任務,而這些任務本質上是“根據上下文預測單詞”,我們可以進一步統一下游任務的執行方式。OpenPrompt 以靈活的方式支持任務(分類和生成)、PLMs(MLM、LM 和 Seq2Seq)和提示模塊(不同的模板和語言生成器)的組合。例如,從模型的角度來看,T5(Raffel 等,2019)不僅用於跨度預測,GPT(Brown 等,2020)不僅用於生成任務。從提示的角度來看,前綴調整也可以用於分類,軟提示也可以用於生成。在我們的框架中,所有這些組合都可以輕鬆實現並在自然語言處理任務中進行驗證,以便更好地理解其中的機制。

3.2 Pre-trained Language Models
3.2 預訓練語言模型

One core idea of prompt-learning is to use additional context with masked tokens to imitate the pre-training objectives of PLMs and better stimulate these models. Hence, the choice of PLMs is crucial to the whole pipeline of prompt-learning. PLMs could be roughly divided into three groups according to their pre-training objectives.
prompt-learning 的一個核心思想是使用帶有遮罩標記的附加上下文來模仿 PLMs 的預訓練目標,並更好地刺激這些模型。因此,PLMs 的選擇對於 prompt-learning 的整個流程至關重要。根據它們的預訓練目標,PLMs 可以大致分為三組。
The first group of PLMs use masked language modeling (MLM) to reconstruct a sequence corrupted by random masked tokens, where only the losses of the masked tokens are computed. Typical PLMs with MLM objective include BERT (Devlin et al., 2019), RoBERTa (Liu et al., 2019), etc, and such an objective is regarded suitable for natural language understanding (NLU). The second group exploits the autoregressive-style language modeling (LM) to predict the current token according to its leading tokens. GPT-3 (Brown et al., 2020) is one of the representative works adopting this objective. The third part is the sequence-to-sequence (Seq2Seq) models, which aim to generate a sequence with a decoder conditioned on a separate encoder for an input sequence. Typical seq2seq PLMs
第一組 PLMs 使用遮罩語言建模(MLM)來重構一個由隨機遮罩標記損壞的序列,僅計算遮罩標記的損失。具有 MLM 目標的典型 PLMs 包括 BERT(Devlin et al., 2019)、RoBERTa(Liu et al., 2019)等,這樣的目標被認為適用於自然語言理解(NLU)。第二組利用自回歸式語言建模(LM)根據其前導標記預測當前標記。GPT-3(Brown et al., 2020)是採用此目標的代表性作品之一。第三部分是序列到序列(Seq2Seq)模型,旨在使用獨立的編碼器對輸入序列進行條件化解碼器生成序列。典型的 seq2seq PLMs

Figure 1: The overall architecture of OpenPrompt. Note that according to the prompt-learning strategies, not all the modules are necessarily used. For example, in generation tasks, there are no verbalizers in the learning procedure. The PromptTrainer is a controller that controls the data flow and the training process with some unique attributes, users can also implement the training process in a conventional fashion.
圖 1:OpenPrompt 的整體架構。根據提示學習策略,並不一定需要使用所有模塊。例如,在生成任務中,學習過程中沒有使用口語化模塊。PromptTrainer 是一個控制器,通過一些獨特的屬性控制數據流和訓練過程,用戶也可以按照傳統方式實現訓練過程。
include T5 (Raffel et al., 2020), MASS (Song et al., 2019) and BART (Lewis et al., 2020), etc.
包括 T5(Raffel et al., 2020)、MASS(Song et al., 2019)和 BART(Lewis et al., 2020)等。
Different PLMs have different attributes, resulting in various adaptation capabilities for different NLP tasks in prompt-learning. Practically in OpenPrompt, we support directly loading PLMs from huggingface transformers (Wolf et al., 2020), and PLMs implemented by other libraries will be supported in the future. Once the PLM is determined, researchers could deploy a known valid promptlearning pipeline (e.g., RoBERTa for few-shot sentiment classification) or explore other uses of PLM that could exploit its potential. Users of OpenPrompt do not need to implement objective heads for different PLMs to calculate the corresponding loss, a unified interface can perform these operations automatically (§3.6).
不同的 PLM 具有不同的屬性,導致在 prompt-learning 中對於不同的 NLP 任務具有各種適應能力。在 OpenPrompt 中,我們支持直接從 huggingface transformers(Wolf 等人,2020)加載 PLM,並且將來會支持其他庫實現的 PLM。一旦確定了 PLM,研究人員可以部署已知有效的 prompt-learning 流程(例如,用於少樣本情感分類的 RoBERTa),或者探索其他可以利用 PLM 潛力的用途。OpenPrompt 的用戶無需為不同的 PLM 實現目標頭部以計算相應的損失,統一的接口可以自動執行這些操作(§3.6)。

3.3 Tokenization 3.3 分詞

Tokenization is a crucial step in processing data for NLP, and it faces new challenges in promptlearning. After designing the template, the specific implementation of the tokenization for original input and the designed template could be time-consuming and error-prone. First, in promptlearning, some specific information such as the indices of entities and masked tokens should be
分詞是 NLP 中處理數據的關鍵步驟,在 prompt-learning 中面臨新的挑戰。在設計模板之後,對於原始輸入和設計的模板的分詞具體實現可能耗時且容易出錯。首先,在 prompt-learning 中,一些特定信息,如實體的索引和被遮罩的標記,應該被考慮進去。
carefully tackled in tokenization. Some small errors, such as the mismatch of masked token indices, may lead to serious consequences. Moreover, concatenation and truncation issues after tokenization (templates are not supposed to be truncated) should also be handled. Since different PLMs may have different tokenization strategies, we should also consider the inconsistency in the details of additional context processing.
在標記化中要小心處理。一些小錯誤,例如遮罩標記索引不匹配,可能會導致嚴重後果。此外,在標記化後的串接和截斷問題(模板不應該被截斷)也應該被處理。由於不同的 PLM 可能有不同的標記化策略,我們還應該考慮額外上下文處理的細節不一致性。
In OpenPrompt, we specifically design the tokenization module for prompt-learning and significantly simplify the process. By using our encapsulated data processing APIs, users could use the human-readable style to design templates and conveniently operate on the input and the template at the same time. Our component integrates complex information from input and template and then conducts tokenization. Based on the choice of PLMs (MLM, LM, and Seq2Seq), OpenPrompt automatically chooses the appropriate tokenizer in promptlearning, which could save considerable time for users to process prompt-related data.
在 OpenPrompt 中,我們專門為提示學習設計了標記化模塊,並顯著簡化了過程。通過使用我們封裝的數據處理 API,用戶可以使用人類可讀的方式設計模板,並同時方便地對輸入和模板進行操作。我們的組件將從輸入和模板中集成複雜信息,然後進行標記化。根據 PLMs(MLM、LM 和 Seq2Seq)的選擇,在提示學習中 OpenPrompt 自動選擇適合的標記器,這可以為用戶處理與提示相關的數據節省相當多的時間。

3.4 Templates 3.4 模板

As one of the central parts of prompt-learning, a template module wraps the original text with the textual or soft-encoding template. A template normally contains contextual tokens (textual or soft)
Figure 2: Some examples of our template language. In our template language, we can use the key "meta" to refer the original input text (Example B), parts of the original input (Example A, C, G), or other key information. We can also freely specify which tokens are hard and which are soft (and their initialization strategy). We could assign an id for a soft token to specify which tokens are sharing embeddings (Example F). OpenPrompt also supports the post processing (Example E) for each token, e.g., lambda expression or MLP.
圖 2:我們的模板語言的一些示例。在我們的模板語言中,我們可以使用鍵“meta”來引用原始輸入文本(示例 B),原始輸入的部分(示例 A、C、G)或其他關鍵信息。我們還可以自由指定哪些標記是硬標記,哪些是軟標記(以及它們的初始化策略)。我們可以為軟標記分配一個 ID,以指定哪些標記共享嵌入(示例 F)。OpenPrompt 還支持對每個標記的後處理(示例 E),例如 lambda 表達式或 MLP。
and masked tokens. In OpenPrompt, all the templates are inherited from a common base class with universal attributes and abstract methods.
和遮罩標記。在 OpenPrompt 中,所有模板都繼承自具有通用屬性和抽象方法的共同基類。
Previous works design a wide variety of templates, including manually written template (Schick and Schütze, 2021) and pure soft template (Lester et al., 2021). Gu et al. (2021) report a mix of manual template tokens and soft (trainable) tokens sometimes yields better results than separate manual template and soft template. In Liu et al. (2021b), a promising performance is achieved by fixing the majority of manual tokens while tuning a small number of the others. In Han et al. (2021b), the template is contextualized, which needs to be filled with the head entity and the tail entity to form a complete one, moreover, the output of multiple positions is used in the loss calculation in their template. Logan IV et al. (2021) design null template with simple concatenation of the inputs and an appended mask token.
之前的研究中设计了各种各样的模板,包括手动编写的模板(Schick和Schütze,2021)和纯软模板(Lester等,2021)。Gu等人(2021)报告称,手动模板令牌和软(可训练)令牌的混合有时比单独的手动模板和软模板产生更好的结果。在Liu等人(2021b)的研究中,通过固定大部分手动令牌并调整少量其他令牌,取得了令人期待的性能。在Han等人(2021b)的研究中,模板是上下文化的,需要填充头实体和尾实体以形成完整的模板,此外,他们的模板在损失计算中使用了多个位置的输出。Logan IV等人(2021)设计了空模板,通过简单地连接输入和附加的 掩码 令牌。
It's not reasonable to design a template format for each prompt since it will require high learning cost for practical use. To this end, in OpenPrompt, we design a template language to ease the problem, with which we can construct various types of templates under a unified paradigm. Our template language takes insight from the dict grammer of Python. And such a design ensures flexibility and clarity at the same time, allowing users to build different prompts with relative ease.
設計每個提示的模板格式是不合理的,因為這將對實際使用造成高昂的學習成本。為此,在 OpenPrompt 中,我們設計了一種模板語言來簡化這個問題,我們可以在統一的範式下構建各種類型的模板。我們的模板語言借鑒了 Python 的字典語法,這樣的設計既確保了靈活性和清晰度,又使用者能夠相對輕鬆地構建不同的提示。
More specifically, a template node is a text (or empty text) with an attributes' description. In our template language, one is free to edit the attributes of each token in the template, such as which characters are shared embedding, how the characters are post-processed (e.g. by MLP), etc. We show some template examples in Figure 2, and the detailed tutorial for writing templates is in our documentation https://thunlp.github.io/openPrompt.
更具體地說,模板節點是一個帶有屬性描述的文本(或空文本)。在我們的模板語言中,可以自由編輯模板中每個標記的屬性,例如共享嵌入的字符,字符的後處理方式(例如通過 MLP),等等。我們在圖 2 中展示了一些模板示例,撰寫模板的詳細教程可以在我們的文檔 https://thunlp.github.io/openPrompt 中找到。
Figure 3: An example to define a Verbalizer, the number of the label words for each class is flexible.
圖 3:定義一個 Verbalizer 的示例,每個類別的標籤詞數量是靈活的。
Figure 4: The illustration of the validation space of OpenPrompt. By driving different modules of the framework, we could implement and evaluate different methods on a broad set of NLP tasks. We show four examples in this illustration, the colored lines denote the implementation flow of the corresponding method.
圖 4:OpenPrompt 驗證空間的示意圖。通過驅動框架的不同模塊,我們可以在廣泛的 NLP 任務集上實現和評估不同的方法。在這個示意圖中,我們展示了四個例子,有色的線條表示對應方法的實現流程。

3.5 Verbalizers 3.5 語言化器

When it comes to prompt-based classification, a verbalizer class should be constructed to map original labels to label words in the vocabulary. When a PLM predicts a probability distribution over the vocabulary for one masked position, a verbalizer will extract the logits of label words and integrate the logits of label words to the corresponding class, thereby responsible for the loss calculation. Figure 3 shows a simple way to define a binary sentiment classification verbalizer.
當涉及到基於提示的分類時,需要構建一個語言化器類來將原始標籤映射到詞彙表中的標籤詞。當一個 PLM 對一個遮罩位置預測詞彙的概率分佈時,語言化器將提取標籤詞的 logits 並將其整合到相應的類別中,從而負責損失計算。圖 3 展示了一種定義二元情感分類語言化器的簡單方法。
Similar to templates, all the verbalizer classes are also inherited from a common base class with necessary attributes and abstract methods. Additional to manually-defined verbalizers, we implement automatic verbalizers like AutomaticVerbalizer and KnowledgeableVerbalizer (Hu et al., 2021). Moreover, important operations like calibrations (Zhao et al., 2021) are also realized in OpenPrompt.
類似於模板,所有的語言生成器類別也都是從一個共同的基類繼承,具有必要的屬性和抽象方法。除了手動定義的語言生成器外,我們還實現了自動語言生成器,如 AutomaticVerbalizer 和 KnowledgeableVerbalizer(Hu 等,2021)。此外,OpenPrompt 還實現了重要的操作,如校準(Zhao 等,2021)。

3.6 PromptModel

In OpenPrompt, we use a PromptModel object to be responsible for training and inference, which contains a PLM, a Template object, and a Verbalizer object (optional). Users could flexibly combine these modules and define advanced interactions among them. A model-agnostic forward method is implemented in the base class to predict words for the masked positions. One goal of this module is that users do not need to specifically implement heads for different PLMs, but use a unified API to "predict words for positions that need to be predicted" regardless of the pre-training objective. An example to define a PromptModel is shown in Figure 5.
在 OpenPrompt 中,我們使用 PromptModel 對象負責訓練和推理,該對象包含一個 PLM,一個模板對象和一個語言生成器對象(可選)。用戶可以靈活地組合這些模塊並定義它們之間的高級交互。基類中實現了一個與模型無關的前向方法,用於預測被遮蔽位置的單詞。該模塊的一個目標是,用戶不需要為不同的 PLM 專門實現頭部,而是使用統一的 API 來“預測需要預測的位置的單詞”,而不考慮預訓練目標。示例中顯示了如何定義 PromptModel,請參見圖 5。
Figure 5: An example to define a PromptModel and conduct evaluation.
圖 5: 定義 PromptModel 並進行評估的示例。

3.7 Training 3.7 訓練

From the perspective of trainable parameters, the training of prompt-learning could be divided into two types of strategies. The first strategy simultaneously tunes the prompts and the PLM, which is verified to be effective in a low-data regime (OpenPrompt also provides a Fewshot Sampler to support the few-shot learning scenario). The second strategy is to only train the parameters of prompts and keep the PLM frozen, this is regarded as a parameter-efficient tuning method and is considered as a promising way to stimulate super-large PLMs. Both of these strategies can be called with one click in the trainer (or runner) module of OpenPrompt. Trainer modules in OpenPrompt implement training process accompanied with promptoriented training tricks, e.g. the ensemble of templates. Meanwhile, OpenPrompt supports experimentation through configuration to easily drive large-scale empirical study.
從可訓練參數的角度來看,prompt-learning 的訓練可以分為兩種策略。第一種策略同時調整 prompts 和 PLM,這在低數據情況下被證明是有效的(OpenPrompt 還提供 Fewshot Sampler 來支持少數樣本學習場景)。第二種策略是僅訓練 prompts 的參數並保持 PLM 凍結,這被認為是一種參數高效的調整方法,被認為是激發超大型 PLM 的一種有前途的方式。這兩種策略都可以在 OpenPrompt 的訓練器(或運行器)模塊中通過一個點擊來調用。OpenPrompt 的訓練器模塊實現了與 prompt 導向訓練技巧相伴的訓練過程,例如模板的集成。同時,OpenPrompt 通過配置支持實驗,以便輕鬆驅動大規模的實證研究。

4 Evaluation 4 評估

OpenPrompt aims to support a broad set of NLP tasks under the paradigm of prompt-learning. In terms of evaluation, we use OpenPrompt to implement various baselines and assess them on the corresponding NLP tasks. We show the validation space in Figure 4. And the evaluation tasks include WebNLG (Gardent et al., 2017) for conditional generation, GLUE (Wang et al., 2018) and SuperGLUE (Wang et al., 2019) for natural language understanding; SemEval (Hendrickx et al., 2010) for relation extraction; Few-NERD (Ding et al., 2021b) for fine-grained entity typing; MNLI (Williams et al., 2017), AG's News (Zhang et al., 2015), DBPedia (Lehmann et al., 2015) and IMDB (Maas et al., 2011) for text classification; LAMA (Petroni et al., 2019) for knowledge probing. The processors of these datasets have already been implemented in OpenPrompt, and they are all inherited from a common base DataProcessor class. To keep the results up to date, we are constantly updating and reporting the latest results on our GitHub repository https://github.com/thunlp/OpenPrompt.
OpenPrompt 旨在支持廣泛的自然語言處理任務,並採用提示學習的範式。在評估方面,我們使用 OpenPrompt 實現各種基準並在相應的自然語言處理任務上進行評估。我們在圖 4 中展示了驗證空間。評估任務包括 WebNLG(Gardent 等,2017)用於條件生成,GLUE(Wang 等,2018)和 SuperGLUE(Wang 等,2019)用於自然語言理解;SemEval(Hendrickx 等,2010)用於關係提取;Few-NERD(Ding 等,2021b)用於細粒度實體分類;MNLI(Williams 等,2017),AG's News(Zhang 等,2015),DBPedia(Lehmann 等,2015)和 IMDB(Maas 等,2011)用於文本分類;LAMA(Petroni 等,2019)用於知識探索。這些數據集的處理器已經在 OpenPrompt 中實現,它們都繼承自一個共同的基礎 DataProcessor 類。為了保持結果的最新,我們不斷更新並在我們的 GitHub 存儲庫 https://github.com/thunlp/OpenPrompt 上報告最新結果。

5 Conclusion and Future Work
5 結論和未來工作

We propose OpenPrompt, a unified, easy-to-use and extensible toolkit for prompt-learning. OpenPrompt establishes a unified framework with clearly defined blocks and flexible interactions to support solid research on prompt-learning. At the application level, OpenPrompt could facilitate researchers and developers to effectively and efficiently deploy prompt-learning pipelines. In the future, we will continue to integrate new techniques and features to OpenPrompt to facilitate the research progress of prompt-learning.
我們提出了 OpenPrompt,一個統一、易於使用和可擴展的提示學習工具包。OpenPrompt 建立了一個統一的框架,具有明確定義的模塊和靈活的交互,以支持對提示學習的堅實研究。在應用層面上,OpenPrompt 可以幫助研究人員和開發人員有效且高效地部署提示學習流程。未來,我們將繼續將新技術和功能整合到 OpenPrompt 中,以促進提示學習的研究進展。

References 參考文獻

Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. arXiv preprint arXiv:2005.14165.
Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell 等人。2020。語言模型是少數樣本學習者。arXiv 預印本 arXiv:2005.14165。
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of ACL, pages 4171-4186, Minneapolis, Minnesota.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, 和 Kristina Toutanova。2019 年。BERT: 深度雙向轉換器的預訓練,用於語言理解。在 ACL 會議論文集中,頁面 4171-4186,明尼蘇達州明尼阿波利斯市。
Ning Ding, Yulin Chen, Xu Han, Guangwei Xu, Pengjun Xie, Hai-Tao Zheng, Zhiyuan Liu, Juanzi Li, and Hong-Gee Kim. 2021a. Prompt-learning for fine-grained entity typing. Arxiv preprint, 2108.10604
Ning Ding, Yulin Chen, Xu Han, Guangwei Xu, Pengjun Xie, Hai-Tao Zheng, Zhiyuan Liu, Juanzi Li, 和 Hong-Gee Kim。2021a。Prompt-learning 用於細粒度實體分類。Arxiv 預印本,2108.10604。
Ning Ding, Guangwei Xu, Yulin Chen, Xiaobin Wang, Xu Han, Pengjun Xie, Hai-Tao Zheng, and Zhiyuan Liu. 2021b. Few-nerd: A few-shot named entity recognition dataset. In Proceedings of ACL.
Ning Ding, Guangwei Xu, Yulin Chen, Xiaobin Wang, Xu Han, Pengjun Xie, Hai-Tao Zheng, 和 Zhiyuan Liu。2021b。Few-nerd: 一個少樣本命名實體識別數據集。在 ACL 會議論文集中。
Tianyu Gao, Adam Fisch, and Danqi Chen. 2021. Making pre-trained language models better few-shot learners. In Proceedings of ACL, pages 3816-3830, Online.
高天宇,亞當·菲斯奇,陳丹琪。2021。使預訓練語言模型成為更好的少數樣本學習者。在 ACL 會議論文集中,頁面 3816-3830,線上。
Claire Gardent, Anastasia Shimorina, Shashi Narayan, and Laura Perez-Beltrachini. 2017. The webnlg challenge: Generating text from rdf data. In Proceedings of INLG, pages 124-133.
Claire Gardent,Anastasia Shimorina,Shashi Narayan 和 Laura Perez-Beltrachini。2017。WebNLG 挑戰:從 RDF 數據生成文本。在 INLG 會議論文集中,頁面 124-133。
Yuxian Gu, Xu Han, Zhiyuan Liu, and Minlie Huang. 2021. Ppt: Pre-trained prompt tuning for few-shot learning. arXiv preprint arXiv:2109.04332.
顧宇賢,韓旭,劉志遠和黃敏烈。2021。PPT:用於少數樣本學習的預訓練提示調整。arXiv 預印本 arXiv:2109.04332。
Xu Han, Zhengyan Zhang, Ning Ding, Yuxian Gu, Xiao Liu, Yuqi Huo, Jiezhong Qiu, Liang Zhang, Wentao Han, Minlie Huang, Qin Jin, Yanyan Lan, Yang Liu, Zhiyuan Liu, Zhiwu Lu, Xipeng Qiu, Ruihua Song, Jie Tang, Ji-Rong Wen, Jinhui Yuan, Wayne Xin Zhao, and Jun Zhu. 2021a. Pre-trained models: Past, present and future. ArXiv preprint, abs/2106.07139.
徐瀚、張正彥、丁寧、顧宇賢、劉曉、霍宇琪、邱傑中、張亮、韓文濤、黃敏烈、金勤、藍燕燕、劉陽、劉志遠、盧志武、邱希鵬、宋瑞華、唐杰、溫吉榮、袁金輝、趙欣偉和朱軍。2021a。預訓練模型:過去、現在和未來。ArXiv 預印本,abs/2106.07139。
Xu Han, Weilin Zhao, Ning Ding, Zhiyuan Liu, and Maosong Sun. 2021b. Ptr: Prompt tuning with rules for text classification. ArXiv preprint, 2105.11259.
徐瀚、趙維林、丁寧、劉志遠和孫茂松。2021b。Ptr:使用規則進行提示調整的文本分類。ArXiv 預印本,2105.11259。
Iris Hendrickx, Su Nam Kim, Zornitsa Kozareva, Preslav Nakov, Diarmuid Ó Séaghdha, Sebastian Padó, Marco Pennacchiotti, Lorenza Romano, and Stan Szpakowicz. 2010. SemEval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals. In Proceedings of SemEval, pages 33-38.
Iris Hendrickx、Su Nam Kim、Zornitsa Kozareva、Preslav Nakov、Diarmuid Ó Séaghdha、Sebastian Padó、Marco Pennacchiotti、Lorenza Romano 和 Stan Szpakowicz。2010。SemEval-2010 任務 8:對名詞對之間的語義關係進行多路分類。在 SemEval 會議論文集中,頁 33-38。
Shengding Hu, Ning Ding, Huadong Wang, Zhiyuan Liu, Juanzi Li, and Maosong Sun. 2021. Knowledgeable prompt-tuning: Incorporating knowledge into prompt verbalizer for text classification. ArXiv preprint, 2108.02035.
盛鼎胡,丁寧,華東王,劉志遠,李娟子和孫茂松。2021 年。知識化提示調整:將知識融入提示語生成器以進行文本分類。ArXiv 預印本,2108.02035。
Teven Le Scao and Alexander M Rush. 2021. How many data points is a prompt worth? In Proceedings of NAACL, pages 2627-2636.
Teven Le Scao 和 Alexander M Rush。2021 年。一個提示值多少數據點?在 NAACL 會議論文集中,頁 2627-2636。
Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick Van Kleef, Sören Auer, et al. 2015. Dbpedia-a large-scale, multilingual knowledge base extracted from wikipedia. Semantic web, 6(2):167-195.
Jens Lehmann,Robert Isele,Max Jakob,Anja Jentzsch,Dimitris Kontokostas,Pablo N Mendes,Sebastian Hellmann,Mohamed Morsey,Patrick Van Kleef,Sören Auer 等。2015 年。Dbpedia-從維基百科中提取的大規模多語言知識庫。語義網,6(2):167-195。
Brian Lester, Rami Al-Rfou, and Noah Constant. 2021. The power of scale for parameter-efficient prompt tuning. ArXiv preprint, abs/2104.08691.
Brian Lester, Rami Al-Rfou, 和 Noah Constant。2021 年。參數高效提示調整的規模優勢。ArXiv 預印本,abs/2104.08691。
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising sequence-to-sequence pretraining for natural language generation, translation, and comprehension. In Proceedings of ACL, pages 7871-7880, Online.
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, 和 Luke Zettlemoyer。2020 年。BART:用於自然語言生成、翻譯和理解的去噪序列到序列預訓練。在 ACL 會議論文集中,頁面 7871-7880,線上。
Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings ACL, pages 4582-4597, Online. Association for Computational Linguistics.
Xiang Lisa Li 和 Percy Liang。2021 年。前綴調整:優化生成的連續提示。在 ACL 會議論文集中,頁面 4582-4597,線上。計算語言學協會。
Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2021a. Pretrain, prompt, and predict: A systematic survey of prompting methods in natural language processing. ArXiv preprint, abs/2107.13586.
劉鵬飛,袁維哲,傅金蘭,江正寶,林博,和格雷厄姆·紐比格。2021a。預訓練,提示和預測:自然語言處理中提示方法的系統性調查。ArXiv 預印本,abs/2107.13586。
Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, and Jie Tang. 2021b. Gpt understands, too. arXiv preprint arXiv:2103.10385.
劉曉,鄭亞男,杜正曉,丁明,錢宇潔,楊志林,和唐杰。2021b。Gpt 也能理解。arXiv 預印本 arXiv:2103.10385。
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019 RoBERTa: A robustly optimized BERT pretraining approach. ArXiv preprint, abs/1907.11692.
劉殷瀚,Myle Ott,Naman Goyal,Jingfei Du,Mandar Joshi,Danqi Chen,Omer Levy,Mike Lewis,Luke Zettlemoyer,和 Veselin Stoyanov。2019 RoBERTa:一種強健的優化 BERT 預訓練方法。ArXiv 預印本,abs/1907.11692。
Robert L Logan IV, Ivana Balažević, Eric Wallace, Fabio Petroni, Sameer Singh, and Sebastian Riedel. 2021. Cutting down on prompts and parameters: Simple few-shot learning with language models. arXiv preprint arXiv:2106.13353.
羅伯特·L·洛根四世,伊萬娜·巴拉傑維奇,埃里克·華萊士,法比奧·佩特羅尼,薩米爾·辛格和塞巴斯蒂安·里德爾。2021 年。減少提示和參數:語言模型的簡單少數樣本學習。arXiv 預印本 arXiv:2106.13353。
Andrew Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher Potts. 2011. Learning word vectors for sentiment analysis. In Proceedings of .
安德魯·馬斯,雷蒙德·E·戴利,彼得·T·范,黃丹,吳恩達和克里斯托弗·波茨。2011 年。學習情感分析的詞向量。在 的會議論文中。
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Proceedings of NeurIPS, 32:8026-8037.
亞當·帕斯克,山姆·格羅斯,弗朗西斯科·馬薩,亞當·勒勒,詹姆斯·布拉德伯里,格雷戈里·查南,特雷弗·基倫,林澤明,娜塔莉亞·吉梅爾舍因,盧卡·安蒂加等。2019 年。PyTorch:一個命令式風格、高性能的深度學習庫。NeurIPS 會議論文集,32:8026-8037。
Fabio Petroni, Tim Rocktäschel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander H Miller, and Sebastian Riedel. 2019. Language models as knowledge bases? arXiv preprint arXiv:1909.01066.
Fabio Petroni, Tim Rocktäschel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander H Miller, and Sebastian Riedel. 2019. 語言模型作為知識庫?arXiv 預印本 arXiv:1909.01066。
Yujia Qin, Xiaozhi Wang, Yusheng Su, Yankai Lin, Ning Ding, Zhiyuan Liu, Juanzi Li, Lei Hou, Peng Li, Maosong Sun, et al. 2021. Exploring lowdimensional intrinsic task subspace via prompt tuning. arXiv preprint arXiv:2110.07867.
Yujia Qin, Xiaozhi Wang, Yusheng Su, Yankai Lin, Ning Ding, Zhiyuan Liu, Juanzi Li, Lei Hou, Peng Li, Maosong Sun 等。2021. 通過提示調整探索低維內在任務子空間。arXiv 預印本 arXiv:2110.07867。
Xipeng Qiu, Tianxiang Sun, Yige Xu, Yunfan Shao, Ning Dai, and Xuanjing Huang. 2020. Pre-trained models for natural language processing: A survey. Science China Technological Sciences, pages 1-26.
Xipeng Qiu, Tianxiang Sun, Yige Xu, Yunfan Shao, Ning Dai 和 Xuanjing Huang。2020. 預訓練模型用於自然語言處理:一項調查。《中國科技科學》, 頁 1-26。
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. ArXiv preprint, abs/1910.10683.
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu。2019 年。探索統一的文本到文本轉換器的遷移學習極限。ArXiv 預印本,abs/1910.10683。
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-totext transformer. Journal of Machine Learning Research, 21(140):1-67.
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu。2020 年。探索統一的文本到文本轉換器的遷移學習極限。機器學習研究期刊,21(140):1-67。
Timo Schick and Hinrich Schütze. 2021. Exploiting cloze-questions for few-shot text classification and natural language inference. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 255-269, Online. Association for Computational Linguistics.
Timo Schick 和 Hinrich Schütze。2021 年。利用填空問題進行少樣本文本分類和自然語言推理。在第 16 屆歐洲計算語言學協會年會論文集: 主要卷,頁 255-269,線上。計算語言學協會。
Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and TieYan Liu. 2019. Mass: Masked sequence to sequence pre-training for language generation. arXiv preprint arXiv:1905.02450
Tu Vu, Brian Lester, Noah Constant, Rami Al-Rfou, and Daniel Cer. 2021. Spot: Better frozen model adaptation through soft prompt transfer. arXiv preprint arXiv:2110.07904.
Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. 2019. Superglue: A stickier benchmark for general-purpose language understanding systems. arXiv preprint arXiv:1905.00537.
Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. 2018. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461.
Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, 和 Samuel R Bowman。2018 年。Glue: 一個多任務基準和分析平台,用於自然語言理解。arXiv 預印本 arXiv:1804.07461。
Colin Wei, Sang Michael Xie, and Tengyu Ma. 2021. Why do pretrained language models help in downstream tasks? an analysis of head and prompt tuning.
Colin Wei, Sang Michael Xie, 和 Tengyu Ma。2021 年。為什麼預訓練語言模型在下游任務中有幫助?對頭部和提示調整的分析。
Adina Williams, Nikita Nangia, and Samuel R Bowman. 2017. A broad-coverage challenge corpus for sentence understanding through inference. arXiv preprint arXiv:1704.05426.
Adina Williams, Nikita Nangia, 和 Samuel R Bowman。2017 年。一個廣泛覆蓋的挑戰語料庫,用於通過推理進行句子理解。arXiv 預印本 arXiv:1704.05426。
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: State-of-the-art natural language processing. In Proceedings of EMNLP, pages 38-45, Online.
Thomas Wolf、Lysandre Debut、Victor Sanh、Julien Chaumond、Clement Delangue、Anthony Moi、Pierric Cistac、Tim Rault、Rémi Louf、Morgan Funtowicz、Joe Davison、Sam Shleifer、Patrick von Platen、Clara Ma、Yacine Jernite、Julien Plu、Canwen Xu、Teven Le Scao、Sylvain Gugger、Mariama Drame、Quentin Lhoest 和 Alexander M. Rush。2020 年。Transformers: 最先進的自然語言處理。在 EMNLP 會議論文集中,頁面 38-45,線上。
Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. Advances in neural information processing systems.
Xiang Zhang、Junbo Zhao 和 Yann LeCun。2015 年。基於字符級卷積網絡的文本分類。神經信息處理系統的進展。
Tony Z Zhao, Eric Wallace, Shi Feng, Dan Klein, and Sameer Singh. 2021. Calibrate before use: Improving few-shot performance of language models. arXiv preprint arXiv:2102.09690.
Tony Z Zhao、Eric Wallace、Shi Feng、Dan Klein 和 Sameer Singh。2021 年。在使用之前進行校準:提高語言模型的少樣本性能。arXiv 預印本 arXiv:2102.09690。

    • equal contribution 相等貢獻
    corresponding authors