Generative AI in EU Law: Liability, Privacy, Intellectual Property, and Cybersecurity 欧盟法律中的生成式人工智能:责任、隐私、知识产权和网络安全
Working Paper (this version: 15 March 2024) 工作文件(此版本:2024 年 3 月 15 日)Claudio Novelli ^(1){ }^{1}, Federico Casolari¹, Philipp Hacker ^(2){ }^{2}, Giorgio Spedicato ^(1){ }^{1}, Luciano Floridi1, 3 克劳迪奥·诺韦利 ^(1){ }^{1} ,费德里科·卡索拉里¹,菲利普·哈克 ^(2){ }^{2} ,乔尔乔·斯佩迪卡托 ^(1){ }^{1} ,卢西亚诺·弗洛里迪 1, 3^(1){ }^{1} Department of Legal Studies, University of Bologna, Via Zamboni, 27/29, 40126, Bologna, IT 博洛尼亚大学法律研究系,地址:意大利波隆纳市,总部所在地为维亚.赞博尼 27/29 号,邮编 40126^(2){ }^{2} European New School of Digital Studies, European University Viadrina, Große Scharrnstraße 59, 数字研究新欧洲学院,欧洲维亚德里纳大学,格罗塞沙恩大街 59 号15230 Frankfurt (Oder), Germany 德国弗兰克福(奥德尔)15230^(3){ }^{3} Digital Ethics Center, Yale University, 85 Trumbull Street, New Haven, CT 06511, US 耶鲁大学数字伦理中心,地址:美国康涅狄格州新哈文市翱伦街 85 号,邮编 06511*Email of the correspondence author: claudio.novelli@unibo.it ^(i){ }^{i} 通讯作者电子邮箱:claudio.novelli@unibo.it
Abstract 摘要
: The advent of Generative AI, particularly through Large Language Models (LLMs) like ChatGPT and its successors, marks a paradigm shift in the AI landscape. Advanced LLMs exhibit multimodality, handling diverse data formats, thereby broadening their application scope. However, the complexity and emergent autonomy of these models introduce challenges in predictability and legal compliance. This paper analyses the legal and regulatory implications of Generative AI and LLMs in the European Union context, focusing on liability, privacy, intellectual property, and cybersecurity. It examines the adequacy of the existing and proposed EU legislation, including the Artificial Intelligence Act (AIA), in addressing the challenges posed by Generative AI in general and LLMs in particular. The paper identifies potential gaps and shortcomings in the EU legislative framework and proposes recommendations to ensure the safe and compliant deployment of generative models. 生成式 AI(尤其是通过像 ChatGPT 及其继任者的大型语言模型)的出现标志着 AI 领域发生了范式转移。先进的多模态 AI 系统能够处理多种数据格式,从而扩展了其应用范围。然而,这些模型的复杂性和新兴自主性也引入了可预测性和法律合规性方面的挑战。本文分析了生成式 AI 和大语言模型在欧盟背景下的法律和监管影响,重点关注责任、隐私、知识产权和网络安全等问题。文章检视了现有和拟议的欧盟法规,包括《人工智能法案》(AIA),以评估其在应对生成式 AI 及大语言模型所带来的挑战方面的适当性。文章确定了欧盟立法框架中的潜在差距和不足,并提出了确保安全合规部署生成模型的建议。
Acknowledgements: CN’s contributions were supported by funding provided by Intesa Sanpaolo to the University of Bologna. 致谢:CN 的贡献得到了由 Intesa Sanpaolo 提供给博洛尼亚大学的资金支持。
1. Overview 1. 概述
Since the release of ChatGPT at the end of 2022, Generative AI in general, and Large Language Models (LLMs) in particular, have taken the world by storm. On a technical level, they can be distinguished from more traditional AI models in various ways. ^(1){ }^{1} They are trained on vast amounts of text and generate language as output, as opposed to scores or labels in traditional regression or classification (Foster 2022, 4-7; Hacker, 自从 2022 年底 ChatGPT 发布以来,生成式 AI 总体上以及大语言模型(LLMs)特别引起了全球关注。从技术层面来看,它们与传统 AI 模型有多方面的不同。它们是基于大量文本训练而产生的语言输出,而不同于传统回归或分类得出的分数或标签。
Engel, and Mauer 2023). Often, Generative AI models are marked by their wider scope and greater autonomy in extracting patterns within large datasets. In particular, LLMs’ capability for smooth general scalability enables them to generate content by processing a varying range of input from several domains. Many LLMs are multimodal (also called Large Multimodal Models, LMMs), meaning they can process and produce various types of data formats simultaneously: e.g., GPT-4 can handle text, image, and audio inputs concurrently for generating text, images, or even videos (e.g., Dall-E and Sora integrations). However, while advanced LLMs generally perform well across a broad spectrum of tasks, this comes with unpredictable outputs raising concerns over the lawfulness and accuracy of LLM-generated texts (Ganguli et al. 2022). 根据恩格尔和毛尔(2023 年)的研究,生成式人工智能模型通常具有更广泛的范围和更大的自主性,能够在大型数据集中发现模式。特别是LLMs的平滑通用可扩展性使它们能够通过处理来自多个领域的各种输入来生成内容。许多LLMs是多模态的(也称为大型多模态模型,LMM),这意味着它们可以同时处理和生成各种类型的数据格式:例如,GPT-4 可以同时处理文本、图像和音频输入,生成文本、图像甚至视频(例如 DALL-E 和 Sora 集成)。然而,虽然先进的LLMs在广泛的任务中表现良好,但其输出却难以预测,引发了人们对LLM生成文本的合法性和准确性的担忧(Ganguli 等人,2022 年)。
As powerful LLMs like GPT-4 and Gemini, image and video generators rise, their very momentum throws into stark relief the question of the adequacy of existing and forthcoming EU legislation. In this article, we discuss some key legal and regulatory concerns brought up by Generative AI and LLMs regarding liability, privacy, intellectual property, and cybersecurity. The EU’s response to these concerns should be contextualised within the guidelines of the Artificial Intelligence Act (AIA), which comprehensively addresses the design, development, and deployment of AI models, including Generative AI within its scope. Where we identify gaps or flaws in the EU legislation, we offer some recommendations to ensure that Generative AI models evolve lawfully. 随着像 GPT-4 和 Gemini 这样强大的LLMs以及图像和视频生成器的崛起,它们的势头突出了现有和即将出台的欧盟法律是否足够的问题。在本文中,我们讨论了生成性人工智能和LLMs在责任、隐私、知识产权和网络安全方面引发的一些关键法律和监管问题。欧盟对这些问题的回应应该置于人工智能法案(AIA)的指导方针之中,该法案全面解决了人工智能模型的设计、开发和部署,包括生成性人工智能在内。我们确定了欧盟法律中存在的缺口或缺陷,并提出了一些建议,以确保生成性人工智能模型合法发展。
2. Liability and AI Act 2. 责任和人工智能法
33%33 \% of firms view “liability for damage” as the top external obstacle to AI adoption, especially for LLMs, only rivalled by the “need for new laws”, expressed by 29%29 \% of companies. ^(2){ }^{2} A new, efficient liability regime may address these concerns by securing compensation to victims and minimizing the cost of preventive measures. In this context, two recent EU regulatory proposals on AI liability may affect LLMs (Dheu et al. 2022): one updating the existing Product Liability Directive (PLD) for defective products, ^(3){ }^{3} the other introducing procedures for fault-based liability for AI-related damages through the Artificial Intelligence Liability Directive (AILD). ^(4){ }^{4} While an interinstitutional agreement has been reached on the text of the new PLD, ^(5){ }^{5} the AILD is currently parked in the legislative process, but may be taken up again once the AI Act has entered into force. 33%33 \% 的公司将"损害责任"视为人工智能采用的最大外部障碍,尤其是对于LLMs而言,仅次于 29%29 \% 的公司表示"需要制定新法"。 ^(2){ }^{2} 在这个背景下,欧盟最近提出的两项人工智能责任法规可能会影响到LLMs(Dheu 等人,2022 年)。一项是更新现有的产品责任指令(PLD),以应对有缺陷产品的问题, ^(3){ }^{3}^(4){ }^{4}^(5){ }^{5}
The two proposals offer benefits for regulating AI liability, including Generative AI and LLMs. First, the scope of the PLD has been extended to include all AI systems and AI-enabled goods, except for open-source software, to avoid burdening research and innovation (Rec. 13 PLD; but see Rec. 13a PLD: covered if integrated into 这两项提议为 AI 责任监管提供了优势,包括生成式 AI 和LLMs。首先,PLD 的范围已扩展到包括所有 AI 系统和 AI 驱动的产品,除了开源软件,以避免给研究和创新带来负担(PLD 建议第 13 条;但见 PLD 建议第 13a 条:如果集成到其他产品中则也包括在内)。
However, both AILD and PLD reveal three major weaknesses (see below) when used in the context of Generative AI, largely stemming from their dependence on the AI Act (AIA), which appears ill-suited to govern LLMs effectively. Although the text of the AIA is now stable, it is important to consider improvements in the next legislative phases, such as the comitology procedure enabling implementing acts, before the AIA is enforced, which is expected to happen no earlier than 2026. For Generative AI and LLMs - labelled as General Purpose AI (GPAI) models obligations will apply sooner, specifically 12 months after the AIA’s entry into force. For existing GPAIs on the market when the AI Act rules are applied, this transition period is extended to 24 months (Art. 83(3) AIA). 然而,在广义 AI 的背景下,AILD 和 PLD 都显示出三个主要弱点(见下文),这主要源于它们对人工智能法案(AIA)的依赖,后者似乎不太适合有效管理LLMs。尽管 AIA 的文本现已稳定,但在 AIA 生效之前(预计不会早于 2026 年),我们应考虑在下一个立法阶段(如使能实施法案的协调程序)进行改进。对于被标记为通用人工智能(GPAI)模型的广义 AI 和LLMs,相关义务将更早适用,具体是在 AIA 生效后 12 个月。对于现有的 GPAI 市场参与者来说,这一过渡期延长至 24 个月(AIA 第 83(3)条)。
Scope. The disclosure mechanism and rebuttable presumption of a causal link in the AILD only apply to high-risk AI systems under the AIA. Hence, the primary issue here is to establish whether, and under what conditions, Generative AI (and LLMs) might fall under the scope of the AILD and its liability mechanism. 范围。AILD 中的披露机制和可推翻的因果关系推定只适用于 AIA 下的高风险 AI 系统。因此,主要问题在于确定通用 AI(和LLMs)是否以及在何种条件下可能属于 AILD 及其责任机制的范畴。
During the drafting of the AIA, GPAI models were first classified as high-risk by default. Subsequently, the risk assessment shifted to consider their downstream application (e.g., if used in a high-risk context such as a judicial settings). Finally, the consolidated version has provided a distinct classification. They carry a set of distinct, overarching obligations (Articles 53 ff., AIA). This framework introduces a tiered risk classification that diverges from the traditional high, medium, or low-risk categories: (1) providers of standard GPAI must always ensure detailed technical and informational documentation, also to enable downstream users to comprehend their capabilities and limitations, intellectual property law adherence (e.g., copyright Directive), and transparency about training data (Article 53, AIA); (2) providers of openly licensed GPAI models, i.e., with publicly accessible parameters and architecture, need only meet technical documentation requirements (Article 53, point 2); (3) providers of GPAI models posing systemic risks must fulfil standard obligations and additionally conduct model evaluations, including adversarial testing (red teaming), assess and mitigate risks, document and report incidents to the AI Office, and maintain adequate cybersecurity 在制定 AIA 期间,GPAI 模型最初被默认归类为高风险。随后,风险评估转向考虑它们的下游应用(例如,如果用于司法环境等高风险情况)。最终,合并版本提供了一个明确的分类。它们承担一组特定的总体义务(AIA 第 53 条及以下条款)。这一框架引入了分层的风险分类,与传统的高、中、低风险类别有所不同:(1)标准 GPAI 提供商必须始终确保详细的技术和信息文档,以使下游用户能够了解其能力和局限性,遵守知识产权法(如版权指令),并公开培训数据(AIA 第 53 条);(2)供应开放许可 GPAI 模型的提供商,即可公开访问参数和架构的模型,仅需满足技术文档要求(AIA 第 53 条第 2 点);(3)提供具有系统性风险的 GPAI 模型的提供商必须履行标准义务,并额外进行模型评估,包括对抗性测试(红队测试)、评估和缓解风险、记录并向 AI 办公室报告事故,并维持适当的网络安全。
The Commission advises providers of GPAI models with systemic risks to create a code of conduct with expert help, demonstrating compliance with the AI Act. This is especially important for outlining how to assess and manage risks for GPAI models with systemic risks. As a result, GPAI with systemic risks are likely to be subjected to the disclosure mechanism and rebuttable presumption in the AILD. 该委员会建议拥有系统性风险的 GPAI 模型的供应商在专家帮助下制定行为准则,展现其遵守人工智能法案的情况。这对于概述如何评估和管理具有系统性风险的 GPAI 模型风险尤为重要。因此,具有系统性风险的 GPAI 很可能受到人工智能法案中披露机制和可推翻推定的约束。
While these revisions to the AIA represent a positive step toward more effective risk assessment, concerns remain. So, for instance, the three-tier classification system to GPAIs - standard, open licensed, and systemically risky - may fail to account for the peculiarities of downstream applications, potentially leading to over-inclusive or under-inclusive risk categories (Novelli et al. 2024; 2023). The same definition of systemically risky GPAI models, primarily based on the computational resources used for training (FLOPs), may not capture their multidimensional nature: they depend on various factors such as the context of application, model architecture, and the quality of training, rather than just the quantity of computational resources used. FLOPs offer only a partial perspective on dangerousness and do not account for how different, noncomputational risk factors might interact and potentially lead to cascading failures, including interactions among various LLMs. Finally, the very threshold of 10^2510^{\wedge} 25 FLOPs as a risk parameter is questionable (The Future Society 2023) (Moës and Ryan 2023). LLMs with 10^2410^{\wedge} 24 or 10^2310^{\wedge} 23 FLOPs can be equally risky (e.g., GPT-3; Bard). This is further compounded by the trend towards downsizing LLMs while maintaining high performance and associated risks, such as in the case of Mistral’s Mixtral 8x7B model (Hacker 2023b). Again, while this is an ancillary issue as the AI Office will have the power to adjust this parameter, relying solely on FLOPs as a risk indicator remains inadequate. 尽管针对 AIA 的这些修订是朝向更有效的风险评估迈出的积极一步,但仍存在一些关切。例如,对 GPAIs 采用的三级分类系统(标准型、开放许可型和系统性风险型)可能无法考虑到下游应用的特点,从而导致风险类别的过度包容或不足包容(Novelli 等人,2024;2023)。基于训练所用计算资源(浮点运算数)的系统性风险 GPAI 模型的定义,未能捕捉其多维特性:它们取决于应用背景、模型架构和训练质量等各种因素,而非仅仅计算资源的数量。浮点运算数只能部分反映危险性,无法考虑不同的非计算风险因素如何相互作用,从而导致级联失败,包括各种LLMs之间的相互作用。最后, 10^2510^{\wedge} 25 浮点运算数作为风险参数的阈值本身就存在疑问(未来社会,2023)(Moës 和 Ryan,2023)。具有 10^2410^{\wedge} 24 或 10^2310^{\wedge} 23 浮点运算数的模型同样可能存在风险(例如 GPT-3;Bard)。这一问题还被加剧了,因为存在缩小LLMs规模同时保持高性能并带来关联风险的趋势,例如 Mistral 的 Mixtral 8x7B 模型(Hacker,2023b)。虽然这是一个附带问题,因为 AI 办公室将有权调整这一参数,但仍过于依赖浮点运算数作为风险指标是不充分的。
The AILD, proposed in September 2022, predates the drafting process of the final text of the AIA, which has undergone significant changes, particularly with the rise of LLMs in 2023. Therefore, it is necessary to update the AILD to align with the new technologies, risk categories, and obligations introduced in the AIA. A question arises regarding which type of Generative AI models the disclosure and rebuttable presumption mechanism should apply to. Given that all providers of GPAI models, including those with open licenses, will be subject to rigorous transparency and recordkeeping obligations, it seems reasonable to extend the disclosure mechanism and rebuttable presumption of causal link to all of them. This is because they are assumed to have the necessary information in case of incidents, and their failure to provide it can be used as a presumption of violation of the standards set by the same 2022 年 9 月提出的 AILD 在人工智能法案(AIA)最终文本的起草过程之前,在 2023 年人工智能(LLMs)的兴起中发生了重大变化。因此,有必要更新 AILD 以与 AIA 中引入的新技术、风险类别和义务保持一致。一个问题是,披露和可推翻的推定机制应该适用于哪种类型的生成式人工智能模型。鉴于所有 GPAI 模型供应商(包括开源许可)将受到严格的透明度和记录保持义务的约束,将披露机制和因果链的可推翻推定扩展到所有这些供应商似乎是合理的。这是因为他们在发生事故时被假定拥有必要的信息,而他们未能提供此信息可被用作违反同一标准的推定。
AIA. However, the AILD’s liability rules may prove overly stringent for some GPAI models, suggesting the need for exemptions. To facilitate these exemptions, additional criteria for classifying GPAI models are necessary. ^(8){ }^{8} In a similar vein, the AI Act introduces criteria that prevent AI systems operating in Annex III from being automatically deemed high-risk; they must instead present a significant risk to people or the environment. Likewise, Article 7 of the AIA empowers the Commission to adjust the high-risk designation by adding or removing specific applications or categories. A similar approach for GPAI could exempt certain Generative AI models from AILD’s strict requirements. This could involve tailoring the three-tier classification to realword Generative AI risk scenarios (Novelli et al. 2024; 2023), based not only to computation potency, but on their specific deployment contexts, considering the potential harms to assets and individuals (Bender et al. 2021). ^(9){ }^{9} For example, in the employment sector - deemed high-risk by the AIA - the risk levels can significantly differ between using LLMs just for resume screening optimization or for automated virtual interviews, where biases could be more common and human oversight less effective. Alternatively, exemptions for GPAI models could be established by aligning the three-tier system with the broad application areas designated for AI systems (e.g., Annex III). This way, models used in lower-risk areas, such as video games, could be exempted from the AILD’s more stringent liability rules. 人工智能协会(AIA)。然而,AILD 的责任规则可能对某些 GPAI 模型过于严格,这表明需要制定豁免条件。为了促进这些豁免,有必要制定额外的 GPAI 模型分类标准。人工智能法案引入了一些标准,可以防止在附录 III 中运行的人工智能系统自动被视为高风险;相反,它们必须对人或环境构成重大风险。同样地,AIA 第 7 条授权委员会通过增加或删除特定应用程序或类别来调整高风险指定。对 GPAI 采取类似的方法也可以让某些生成式人工智能模型获得 AILD 严格要求的豁免。这可能涉及根据不仅计算能力,还根据特定部署背景,考虑对资产和个人造成潜在危害(Bender 等人,2021 年),来调整三层分类。例如,在被 AIA 视为高风险的就业领域,使用仅用于简历筛选优化或自动化虚拟面试的风险水平可能大不相同,因为偏见可能更普遍,人工监督也不那么有效。另一种方法是,通过将三层系统与为人工智能系统指定的广泛应用领域(如附录 III)相一致来建立 GPAI 模型的豁免,这样,在较低风险领域(如视频游戏)使用的模型就可以免于 AILD 更严格的责任规则。
2) Defectiveness and fault. The two directive proposals assume that liability may arise from two different sources-defectiveness (PLD) and fault (AILD)-that are both evaluated by compliance with the requirements of the AIA. Both presume fault/a defect in case of non-compliance with the (high-risk systems) requirements of the AIA (Article 9(2)(b) PLD; Article 4(2) AILD), requirements which could also be introduced at a later stage by sectoral EU legal instruments. ^(10){ }^{10} However, these requirements may not be easily met during the development of Generative AI, particularly LLMs: e.g., their lack of a single or specific purpose before adaptation (Bommasani et al. 2022) could hamper the predictions of their concrete impact on the health, safety, and fundamental rights of persons in the Union which are required by the AIA risk management system and transparency obligations (Articles 9 and 13, AIA). Moreover, as just mentioned, further requirements are likely to be introduced in the EU regulatory framework concerning GPAI models. 2)缺陷和故障。这两个指令建议假定责任可能来自两种不同的来源-缺陷(PLD)和过错(AILD)-这两种来源都由遵守人工智能法案(AIA)的要求来评估。两者都假定在不符合人工智能法案(第 9(2)(b)条 PLD;第 4(2)条 AILD)对(高风险系统)的要求的情况下存在过错/缺陷,这些要求也可能在以后由欧盟行业法律文件引入。然而,在生成式人工智能的开发过程中,可能无法轻易满足这些要求,特别是LLMs:例如,在适应之前缺乏单一或特定目的(Bommasani 等人,2022 年)可能会妨碍对其对联盟人员健康、安全和基本权利的具体影响的预测,这是人工智能法案风险管理系统和透明度义务(第 9 和 13 条)所要求的。此外,正如刚才提到的,欧盟监管框架很可能会针对 GPAI 模型引入更多要求。
To enhance the effectiveness and reliability of Generative AI models, a necessary recommendation is to combine the conventional AI fault and defectiveness criteria with new methods specifically designed to align with their technical nuances. This may imply that the compliance requirements for evaluating faults and defectiveness should prioritize techniques for steering the randomness of their non-deterministic outputs over their intended purposes. Indeed, their capability for smooth general scalability 为了提高生成式 AI 模型的有效性和可靠性,必要的建议是将传统的 AI 故障和缺陷标准与专门设计以适应它们技术细微差别的新方法相结合。这可能意味着,在评估故障和缺陷时,合规性要求应该优先考虑用于引导它们非确定性输出随机性的技术,而不是它们的预期用途。事实上,它们具有平滑的整体可扩展性能力。
enables them to generate content by processing diverse inputs from arbitrary domains with minimal training (Ganguli et al. 2022). To this scope, several techniques might be incentivised by the regulator, also concurrently: e.g., temperature scaling, top-k sampling, prompt engineering, and adversarial training ( Hu et al. 2018). Methods for tempering the randomness may also include the so-called regularization techniques, like the dropout, which involves temporarily disabling a random selection of neurons during each training step of Generative AI models, fostering the development of more robust and generalized features (Lee, Cho, and Kang 2020). Consequently, it prevents the model from overfitting, ensuring more coherent and less random outputs. 使用少量训练即可从各种领域处理不同输入源来生成内容(Ganguli et al. 2022)。为此,监管机构可能会同时鼓励采用几种技术,如温度缩放、top-k 采样、提示工程和对抗训练(Hu et al. 2018)。抑制随机性的方法还可包括所谓的正则化技术,如 dropout,它涉及在生成式 AI 模型的每个训练步骤中暂时禁用随机选择的神经元,促进了更强大和通用特征的发展(Lee, Cho, and Kang 2020)。因此,它可防止模型过度拟合,确保更连贯、更少随机的输出。
Furthermore, compliance requirements for Generative AI and LLMs should also prioritize monitoring measures. These measures would serve to verify that the models operate as planned and to pinpoint and amend any divergences or unfavourable results. For example, calculating the uncertainty of outputs could be instrumental in recognizing situations where the model might be producing highly random results (Xiao et al. 2022). Such information is vital for end-users to have before utilizing, for instance, an LLM, representing a metric for evaluating the fault of the designers and deployers (or the defectiveness of the same). 此外,对于生成式人工智能和LLMs的合规要求,也应该优先考虑监控措施。这些措施将有助于验证模型的运行情况,并发现并修正任何偏离或不利的结果。例如,计算输出的不确定性可能有助于识别模型可能产生高度随机结果的情况(Xiao et al. 2022)。在使用诸如LLM之类的东西之前,这种信息对最终用户来说是至关重要的,因为它代表了评估设计者和部署者(或其本身的缺陷)的一种指标。
3) Disclosure of evidence. Both proposals state that the defendant - in our analysis, the deployers and designers of a Generative AI model - must provide evidence that is both relevant and proportionate to the claimant’s presented facts and evidence. Shortcomings here concern the content of such disclosure. First, the PLD and the AIA are misaligned as the former requires evidence disclosure for all AI systems, whereas the AIA proposal mandates record-keeping obligations only for high-risk systems (Article 12, AIA) (Hacker 2023a). Regarding Generative AI, there is no blanket requirement for GPAI providers to continuously and automatically record events (‘logging’) throughout the model’s lifecycle. The obligation to document and report significant incidents to the AI Office and national authorities is limited to models classified as systemically risky (Article 55c, AIA). Providers of standard GPAI are required to maintain technical documentation related to training, testing, model evaluation outcomes, and proof of copyright law compliance, yet there is no directive for ongoing performance recording. 3)证据披露。两项提案都规定,被告——在我们的分析中是生成式 AI 模型的部署者和设计者——必须提供与原告所提出的事实和证据相关且比例适当的证据。这里的缺陷涉及此类披露的内容。首先,PLD 和 AIA 不一致,前者要求对所有 AI 系统披露证据,而后者的提案只要求对高风险系统保留记录(AIA 第 12 条)(Hacker 2023a)。对于生成式 AI 而言,没有总体要求 GPAI 供应商在模型生命周期中持续自动记录事件("日志记录")。将模型归类为系统风险模型的义务仅限于向 AI 办公室和国家机构报告重大事件(AIA 第 55c 条)。标准 GPAI 供应商要求保留与培训、测试、模型评估结果和版权法合规性证明相关的技术文档,但没有针对持续性能记录的指示。
Second, both the PLD and the AILD do not indicate what type of information must be disclosed. While this issue can be attributed to their status as proposals, it is this gap that should be promptly addressed. Failing to establish clear guidelines on the necessary disclosures might leave the claimants practically unprotected. 第二,PLD 和 AILD 都没有指明必须披露的信息类型。虽然这个问题可归因于它们的提案状态,但这个空白应该尽快解决。未能建立有关必要披露的明确准则可能使申诉人实际上无法得到保护。
Regarding the first issue, the requirement to disclose evidence should not be confined to high-risk systems alone. The PLD could potentially adopt the AILD’s approach, which broadens the disclosure requirement to include opaque systems that are not classified as high-risk while exempting those high-risk systems that already have ample documentation under the AIA (Article 4(4) and (5), AILD). This strategy could broaden the scope to include standard GPAI models, not just those systemically risky. This adjustment is reasonable, particularly since standard GPAI models typically process less data than their systemically risky counterparts and already face stringent transparency obligations that should facilitate the implementation of record-keeping practices. While the content of disclosure might vary based on the system’s risk level, maintaining the obligation to disclose is important. 关于第一个问题,披露证据的要求不应仅限于高风险系统。PLD 可能会采用 AILD 的方法,该方法扩大了披露要求的范围,包括未被归类为高风险的不透明系统,同时豁免那些在 AIA 下已有充分文件记录的高风险系统(AILD 第 4(4)和(5)条)。这种策略可以将范围扩大到包括标准 GPAI 模型,而不仅限于系统性风险模型。这种调整是合理的,特别是因为标准 GPAI 模型通常处理的数据量较少,且已面临严格的透明度义务,应有助于实施记录保存做法。尽管披露的内容可能会根据系统的风险级别而有所不同,但保持披露的义务很重要。
This leads us to the second point of discussion: the content of disclosure. It should include a report of the damaging incident, noting the exact time and a brief description of its nature. It might include interaction logs and timestamps between users and the GPAI model, demonstrating adherence to relevant standards, possibly verified through third-party audit reports (Falco et al. 2021). Moreover, the disclosure should also mirror the sociotechnical structure of AI liability (Novelli, Taddeo, and Floridi 2023; Theodorou and Dignum 2020) and demonstrate that training data are representative and well-documented, e.g., in terms of the motivation behind data selection and transparency about the objectives of data collection (Bender et al. 2021; Jo and Gebru 2020). Also, producers might be obligated to use only documentable datasets of an appropriate size for the capabilities of the organization. For instance, LLMs operating on restricted datasets-thanks to their few/zero-shot learning skills (Brown et al. 2020)-may instead need to disclose the auxiliary information used for associating observed and non-observed classes of objects. 这引导我们进入第二个讨论要点:披露内容。它应该包括对损坏事件的报告,注明确切时间和简要描述其性质。它可能包括用户和 GPAI 模型之间的交互日志和时间戳,展示对相关标准的遵守,可能通过第三方审计报告进行验证(Falco 等人,2021 年)。此外,披露还应该反映人工智能责任的社会技术结构(Novelli、Taddeo 和 Floridi,2023 年;Theodorou 和 Dignum,2020 年),并证明训练数据具有代表性并得到良好记录,例如在数据选择动机和数据收集目标透明度方面(Bender 等人,2021 年;Jo 和 Gebru,2020 年)。同时,生产商可能有义务只使用组织能力合适大小的可记录数据集。例如,LLMs依靠少量/零样本学习技能(Brown 等人,2020 年)操作受限数据集,可能需要披露用于关联观察到和未观察到的物体类别的辅助信息。
To conclude, the process of evidential disclosure presupposes that individuals are informed when they are engaging with these models, and consequently, whether they have been adversely affected in specific ways. However, even though the stipulations outlined in the AIA mandate the notification of users during interactions with GPAI models, the methodology for user notification remains ambiguous (Ziosi et al. 2023). This is a key point as the efficacy of the disclosure mechanisms hinges on this prerequisite, wherein to lodge claims, users must possess a reasonable basis to suspect harm and furnish substantial details and corroborating evidence to substantiate a potential damages claim. Since the acquisition of this knowledge can present challenges, it is recommended to encourage Generative AI producers to actively notify occurrences of potential harm. This strategy would not only bolster the claimant’s ability to access crucial evidence but would also cultivate a more transparent environment within the operational sphere of Generative AI models. Such incentives might include initiatives like forming alliances with credible third-party organizations, including auditing agencies, to facilitate a thorough disclosure of information (and evidence) concerning adverse effects. 总而言之,证据披露的过程前提是个人在使用这些模型时得到通知,从而了解是否受到了特定方式的不利影响。然而,尽管 AIA 规定了在与 GPAI 模型互动时通知用户,但用户通知的方法仍然不明确(Ziosi 等人,2023)。这是一个关键点,因为披露机制的有效性取决于这个先决条件,即用户必须有合理的理由怀疑遭受了损害,并提供重大详细信息和证据来证明潜在的损害索赔。由于获取这些知识可能存在挑战,建议鼓励生成式 AI 生产商主动通知可能产生的潜在伤害。这种策略不仅能增强索赔人获取关键证据的能力,还能培养生成式 AI 模型运营领域内更透明的环境。这种激励措施可能包括与可信的第三方组织(包括审计机构)形成联盟,以促进对不利影响的全面信息(和证据)披露。
3. Privacy and Data Protection 3. 隐私和数据保护
Privacy and data protection pose critical legal hurdles to the development and deployment of Generative AI, as exemplified by the 2023 Italian data authority’s (Garante della Privacy) temporary ban on ChatGPT (Hacker, Engel, and Mauer 2023, Technical Report) and the following notice in January 2024 by the same authority to OpenAI that its ChatGPT chatbot allegedly violates the EU General Data Protection Regulation. On an abstract level, a Generative AI models preserves privacy if it was trained in a privacy-sensitive way, processes prompts containing personal data diligently, and discloses information relating to identifiable persons in appropriate contexts and to authorized individuals only. Privacy and data protection are not binary variables and, therefore, what is the right context or the right recipients of the information is a matter of debate. In the context of LLMs, these debates are further 隐私和数据保护为生成式人工智能的发展和部署带来了关键的法律障碍,正如 2023 年意大利数据管理机构(Garante della Privacy)临时禁止 ChatGPT(Hacker、Engel 和 Mauer 2023,技术报告)以及该机构于 2024 年 1 月向 OpenAI 发出的通知所示,称其 ChatGPT 聊天机器人违反了欧盟《通用数据保护条例》。从抽象的角度来看,如果生成式人工智能模型以对隐私敏感的方式进行培训,谨慎处理包含个人数据的提示,并仅在适当的情况下向授权个人披露与可识别个人有关的信息,则可以保护隐私。隐私和数据保护不是二元变量,因此什么是正确的背景或信息的正确接收者,这是一个需要讨论的问题。在LLMs的背景下,这些辩论进一步
complicated due to the diverse purposes, applications, and environments they operate in. ^(11){ }^{11} 由于其多种目的、应用和运行环境,这变得十分复杂。 ^(11){ }^{11}
Generative AI models are exposed to privacy and data protection violations due to pervasive training on (partially) personal data, the memorization of training data, inversion attacks (Nicholas Carlini et al. 2021), interactions with users, and the output the AI produces. Memorization of data may occur either through overfitting of abundant parameters to small datasets, which reduces the capacity to generalize to new data, or through the optimizing generalization of long-tailed data distributions (Feldman 2021). When the memorized training data contains personal information, LLMs may leak data and disclose it directly. When training data is not memorized, personal information can still be inferred or reconstructed by malicious actors using model inversion attacks, which reverse-engineer the input data to reveal private information (Fredrikson, Jha, and Ristenpart 2015). Against this, the existing privacypreserving strategies, such as data sanitization and differential privacy, provide limited privacy protection when applied to LLMs (Brown et al. 2022). This raises the question of whether, and how, personal data may be processed to train LLMs-a particularly thorny question concerning sensitive data. Moreover, users may enter private information through prompts, which may resurface in other instances. Some users, in addition, will be minors, for whom specific data protection rules apply. 生成式人工智能模型由于在(部分)个人数据上广泛训练、对训练数据的记忆、反向攻击(Nicholas Carlini 等人,2021 年)、与用户的互动以及所产生的输出而面临隐私和数据保护违规的风险。数据的记忆可能是由于大量参数过度拟合小型数据集而降低了对新数据的泛化能力,也可能是由于优化长尾数据分布的泛化(Feldman,2021 年)。当记忆的训练数据包含个人信息时,LLMs可能会泄露数据并直接披露。当训练数据未被记忆时,恶意行为者仍可使用模型反转攻击推断或重建私人信息(Fredrikson、Jha 和 Ristenpart,2015 年)。现有的隐私保护策略,如数据清洁和差分隐私,在应用于LLMs时提供的隐私保护有限(Brown 等人,2022 年)。这引发了是否以及如何处理个人数据来训练LLMs的问题,这是一个特别棘手的问题,涉及敏感数据。此外,用户可能通过提示输入私人信息,这些信息可能会在其他实例中重现。此外,还有一些用户是未成年人,对他们适用特定的数据保护规则。
These considerations lead to seven main problems at the intersection of data protection and Generative AI^(12)\mathrm{AI}^{12} the appropriate legal basis for AI training; the appropriate legal basis for processing prompts; information requirements; model inversion, data leakage, and the right to erasure; automated decision-making; protection of minors; and purpose limitation and data minimization. We analyse them first and then offer some thoughts on potential ways forward. 这些考虑因素导致了数据保护和生成式 AI^(12)\mathrm{AI}^{12} 模型之间的七个主要问题:用于 AI 训练的适当法律依据;用于处理提示的适当法律依据;信息要求;模型反转、数据泄漏和被遗忘权;自动决策;未成年人保护;以及目的限制和数据最小化。我们首先分析它们,然后提出一些可能的解决方案。
Legal basis for AI training on personal data. First and foremost, every processing operation of personal data-be it storage, transfer, copying, or else-needs a legal basis under Article 6 GDPR. For companies without an establishment in the EU, the GDPR still applies if their services are offered in the EU, for example, which is the case for many major LLM products. The GDPR also covers processing before the actual release of the model, i.e., for training purposes (Oostveen 2016). LLMs are typically trained on broad data at scale, with data sources ranging from proprietary information to everything available on the Internet-including personal data, i.e., data that can be related to an identifiable individual (Bommasani et al. 2021). Using this type of data for AI training purposes, hence, is illegal under the GDPR unless a specific legal basis applies. The same holds for any fine-tuning operations after initial pre-training. 基于个人数据训练人工智能的法律依据。首先,个人数据的任何处理操作,无论是存储、传输、复制等,都需要依据 GDPR 第 6 条寻求法律依据。对于未在欧盟境内设立机构的公司,如果其服务在欧盟提供,GDPR 仍然适用,例如许多主要的LLM产品如此。GDPR 同样涵盖模型实际发布前的处理,即用于培训目的(Oostveen 2016)。LLMs通常是在大规模数据的基础上进行训练,数据源涵盖专有信息到互联网上所有可获得的内容,包括个人数据,即与可识别的个人相关的数据(Bommasani et al. 2021)。除非有特定的法律依据,否则将这类数据用于人工智能训练是非法的,针对初步预训练的后续微调操作也同样如此。
1.a) Consent and the balancing test 1.a) 同意和平衡测试
The most prominent legal basis in the GDPR is consent (Article 6(1)(a), GDPR). However, for large data sets including personal information from a vast group of people unknown to the developers beforehand, eliciting valid consent from each individual is generally not an option due to prohibitive transaction costs (Mourby, Ó Cathaoir, and Collin 2021). Furthermore, using LLMs with web-scraped datasets and unpredictable applications is difficult to square with informed and specific consent (Bommasani et al. 2022). At the same time, requiring data subjects to be informed about the usage of their personal data may slow down the development of LLMs (Goldstein et al. 2023). Hence, for legal and economic reasons, AI training can typically be based only on the balancing test of Article 6(1)(f) GDPR (Zuiderveen Borgesius et al. 2018; Zarsky 2017), according to which the legitimate interests of the controller (i.e., the developing entity) justify processing unless they are overridden by the rights and freedoms of the data subjects (i.e., the persons whose data are used). ^(13){ }^{13} 在 GDPR 中最突出的法律依据是同意(GDPR 第 6(1)(a)条)。然而,对于包括大量未知开发者的人员信息的大型数据集,从每个个人获得有效同意通常不可行,因为交易成本过高(Mourby、Ó Cathaoir 和 Collin 2021)。此外,使用从网页收集的数据集和不可预测的应用程序来完成LLMs很难符合有针对性和充分信息的同意要求(Bommasani 等人,2022)。同时,要求数据主体了解其个人数据的使用情况可能会降低LLMs的发展速度(Goldstein 等人,2023)。因此,出于法律和经济原因,AI 训练通常只能基于 GDPR 第 6(1)(f)条的平衡测试(Zuiderveen Borgesius 等人,2018;Zarsky,2017),根据该测试,控制者(即开发实体)的合法利益可以证明数据处理的正当性,除非它们被数据主体(即其数据被使用的人)的权利和自由所取代。 ^(13){ }^{13}
Whether the balancing test provides a legal basis is, unfortunately, a matter of caseby-case analysis (Gil Gonzalez and de Hert 2019; Peloquin et al. 2020; Donnelly and McDonagh 2019). Generally, particularly socially beneficial applications will speak in favour of developers; similarly, the data subject is unlikely to prevail if the use of the data for AI training purposes could reasonably be expected by data subjects, Recital 47. That latter criterion, however, will rarely be fulfilled. In addition, privacy-enhancing strategies, such as pseudonymization, transparency or encryption, will count toward the legality of AI training under the balancing test. By contrast, the nature and scope of processing, the type of data (sensitive or not), the degree of transparency towards and control for data subjects, and other factors may tip the balance in the other direction (Hacker, Engel, and Mauer 2023, Technical Report, 2). 在平衡测试是否提供法律依据的问题上,很遗憾这需要逐案分析(Gil Gonzalez 和 de Hert 2019; Peloquin 等人 2020; Donnelly 和 McDonagh 2019)。通常而言,具有显著社会效益的应用会有利于开发者;同样,如果数据主体可以合理地预期数据将用于 AI 训练目的,他们很难获胜,参见第 47 条前言。然而,后一标准很少得到满足。此外,隐私增强策略,如伪匿名化、透明度或加密,将有助于 AI 训练在平衡测试中的合法性。相比之下,处理的性质和范围、数据类型(敏感或非敏感)、对数据主体的透明度和控制程度以及其他因素可能会倾斜平衡的另一方向(Hacker、Engel 和 Mauer 2023,技术报告,2)。
For narrowly tailored AI models based on supervised learning strategies, one may argue that AI training is not particularly harmful as it does not, generally, reveal any new information about the data subjects themselves (Hacker 2021; Zarsky 2017; Bonatti and Kirrane 2019). This argument is particularly strong if the model is not passed along to other entities and state-of-the-art IT security makes data breaches less likely. 对于基于监督学习策略的狭义定制 AI 模型,可以认为 AI 训练并不特别有害,因为它通常不会泄露任何关于数据主体本身的新信息(Hacker 2021; Zarsky 2017; Bonatti and Kirrane 2019)。如果模型不传给其他实体且最新的 IT 安全技术可降低数据泄露的可能性,这种论点尤为有力。
However, this position is difficult to maintain concerning Generative AI (Hacker, Engel, and Mauer 2023, Technical Report, 2): these models are generally used by millions of different actors, and models have been shown to reveal personal data through data leakage as well as model inversion (Nicholas Carlini et al. 2021; Bederman 2010; Lehman et al. 2021; Nicolas Carlini et al. 2023). This poses an even greater challenge in fine-tuning scenarios (Borkar 2023). 然而,这种观点在涉及生成式人工智能(Hacker、Engel 和 Mauer,2023,技术报告,2)时很难维持:这些模型通常被数百万不同的参与者使用,并且已经证明,这些模型通过数据泄露以及模型反转(Nicholas Carlini 等人,2021;Bederman,2010;Lehman 等人,2021;Nicolas Carlini 等人,2023)揭示了个人数据。这在精调场景中带来了更大的挑战(Borkar,2023)。
1.b) Sensitive Data 1.b) 敏感数据
To make matters even more complex, a much larger number of personal data pieces than expected may be particularly protected as sensitive data pursuant to Article 9 GDPR, under a new ruling of the CJEU. In Meta v. Bundeskartellamt, the Court decided that information need not directly refer to protected categories-such as ethnic or racial origin, religion, age, or health-to fall under Article 9. Rather, it suffices "that data processing allows information falling within one of those categories to be 在当前情况下变得更加复杂,根据欧盟法院的一项新裁决,依据 GDPR 第 9 条,比预期多得多的个人数据可能会被特别保护为敏感数据。在 Meta 诉 Bundeskartellamt 一案中,法院裁定,信息无需直接涉及保护类别(如种族或民族血统、宗教、年龄或健康)即可属于第 9 条的范围。相反,只要数据处理"可能泄露"属于这些类别的信息,即可满足要求。
revealed" ^(14){ }^{14} That case was decided concerning Meta, the parent company of Facebook, based on its vast collection of data tracking users and linking that data with the user’s Facebook account. 透露" ^(14){ }^{14} 这个案件是关于 Facebook 母公司 Meta 的,它基于大量收集的用户数据,并将这些数据与用户的 Facebook 账户相联系而做出的判决。
Arguably, however, as is generally the case in technology-neutral data protection law, the exact method of tracking or identification is irrelevant; the Court held that it does not matter, for example, whether the profiled person is a Facebook user or not. ^(15){ }^{15} Rather, from the perspective of data protection law, what is decisive is the controller’s ability to infer sensitive traits based on the available data-irrespective of whether the operator intends to make that inference. This broader understanding casts a wide net for the applicability of Article 9 GDPR, as machine learning techniques increasingly allow for the deduction of protected categories from otherwise innocuous data points (Bi et al., 2013; Chaturvedi & Chaturvedi, 2024). 从技术中立的数据保护法的一般情况来看,追踪或识别的具体方法并不重要;法院判决认为,例如被调查的人是否为 Facebook 用户并不重要。相反,从数据保护法的角度来看,关键在于控制者能否根据可用数据推断出敏感特征,而不管操作者是否有此意图。这种更广泛的理解为《通用数据保护条例》第 9 条的适用性设定了一个广泛的范围,因为机器学习技术越来越能从其他无害数据中推导出受保护的类别。
Hence, in many cases concerning big data formats, the hypothetical possibility to infer sensitive data potentially brings the processing, for example, for AI training purposes, under the ambit of Article 9. Developers then need to avail themselves of the specific exception in Article 9(2) GDPR. Outside of explicit consent, such an exception will, however, often not be available: Article 9(2) does not contain a general balancing test, in contrast to Article 6(1) GDPR (and the secondary use clause in Article 6(4)). The research exemption in Article 9(2)(j) GDPR, for example, is limited to building models for research purposes, but cannot be used to exploit them commercially (cf. Recitals 159 and 162). 因此,在涉及大数据格式的许多情况下,推断敏感数据的假设可能性将数据处理(例如用于 AI 训练目的)纳入《GDPR》第 9 条的范畴。开发人员则需要利用《GDPR》第 9(2)条中的特殊例外情况。除了明确同意,这种例外情况通常无法适用:第 9(2)条不包含通用的平衡测试,与《GDPR》第 6(1)条(以及第 6(4)条中的二次使用条款)形成对比。《GDPR》第 9(2)(j)条中的研究豁免仅限于为研究目的建立模型,但不能用于商业开发(参见前言第 159 和 162 段)。
Overall, this discussion points to the urgent need to design a novel exemption to Article 9 , accompanied by strong safeguards, similar to the ones contemplated in Article 10, point 5 of the AIA, to balance the societal interest in socially beneficial AI training and development with the protection of individual rights and freedoms, particularly in crucial areas such as medicine, education, or employment. While the TDM exception provides for a specific framework for the use of copyrighted material for AI training purposes (see below, 4), such rules are, unfortunately, entirely lacking under the GDPR. 总的来说,这一讨论指出了急需设计一种新的第 9 条豁免,并附有强大的保障措施,类似于 AIA 第 10 条第 5 点所考虑的那些措施,以平衡社会在有益于社会的 AI 培训和发展中的利益,与个人权利和自由的保护,特别是在医疗、教育或就业等关键领域。虽然 TDM 例外为使用受版权保护的材料进行 AI 培训提供了一个具体的框架(见下文 4),但遗憾的是,在 GDPR 下完全缺乏这样的规则。
2) Legal basis for prompts containing personal data. 2)包含个人数据的提示的法律依据。
The situation is different for prompts containing personal data entered into a trained model. Here, we have to fundamentally distinguish two situations. First, users may include personal information about themselves in prompts, for example, when they ask an LLM to draft an email concerning a specific event, appointment, or task. This may occur intentionally or inadvertently. In both cases, consent may indeed work as a legal basis as users have to individually register for the LLM product. During that procedure, controllers may request consent (respecting the conditions for valid consent under Articles 4(11) and 7 GDPR, of course). 对于包含个人数据输入到训练好的模型中的提示来说,情况是不同的。这里,我们必须根本区分两种情况。首先,用户可能会在提示中包含有关自己的个人信息,例如当他们要求LLM起草关于特定事件、预约或任务的电子邮件时。这可能是有意的也可能是无意的。在这两种情况下,同意确实可以作为法律依据,因为用户必须单独注册LLM产品。在此过程中,控制器可以征得同意(当然,需要遵守 GDPR 第 4(11)条和第 7 条对有效同意的条件)。
The second scenario concerns prompts containing personal information about third parties, i.e., not the person entering the prompt. This situation is more common among users who might not be fully aware of privacy and data protection laws. They might inadvertently include the personal details of others if the task at hand involves these third parties, and they expect the language model to provide personalized 第二种情况涉及包含第三方个人信息的提示,即非输入提示的人。这种情况在一些可能不完全了解隐私和数据保护法的用户中更为常见。如果所要完成的任务涉及这些第三方,他们可能会无意中包含他人的个人详细信息,并期望语言模型提供个性化的响应。
responses. Users cannot, however, validly consent for another person (unless they have been explicitly mandated by that person to do just that, which is unlikely). 响应。然而,用户无法有效地为另一个人提供同意(除非他们已被该人明确授权这样做,这种情况不太可能发生)。
A similar problem resurfaces as in the AI training or fine-tuning scenario, with the additional twist that the information is provided, and processing initiated, by the user, not the developers. While the user may be regarded as the sole controller, or joint controller together with the company operating the LLM (Article 4(7) GDPR), for the initial storage and transfer of the prompt (i.e., writing and sending the prompt), any further memorization or data leakage is under the sole control of the entity operating the LLM. Hence, under the Fashion ID judgment of the CJEU, ^(16){ }^{16} that operational entity will likely be considered the sole controller, and thus the responsible party (Art. 5(2) GDPR), for any storage, transfer, leakage, or other processing of the third-partyrelated personal data included in the prompt that occurs after the initial prompting by the user. Again, as in the training scenario, both the third-party-related prompt itself and any additional leakage or storage are difficult to justify under Article 6(1)(f) and, if applicable, Article 9 GDPR. 与人工智能训练或调优场景中出现的类似问题一样,这种情况下还有一个额外的特点,即信息是由用户而不是开发者提供和处理的。虽然用户可被视为对最初存储和传输提示(即编写和发送提示)的唯一控制者或与操作LLM公司共同控制者(GDPR 第 4(7)条),但任何进一步的记忆或数据泄露都由操作LLM的实体单独控制。因此,根据 CJEU 的 Fashion ID 判决,该运营实体很可能被认定为对提示中包含的第三方相关个人数据的任何存储、传输、泄露或其他处理(发生在用户最初提示之后)的唯一控制者,因此也是负责方(GDPR 第 5(2)条)。与训练场景一样,第三方相关提示本身以及任何额外的泄露或存储很难根据 GDPR 第 6(1)(f)条和第 9 条(如适用)进行合理化。
3) Information requirements. 3) 信息需求。
The next major roadblocks for GDPR-compliant Generative AI models are Articles 12-15 GDPR, which detail the obligations regarding the information that must be provided to data subjects. These articles pose a unique challenge for Generative AI due to the nature and scope of data they process (Hacker, Engel, and Mauer 2023, Technical Report, 2-3). 关于 GDPR 合规的生成式 AI 模型的下一个主要障碍是 GDPR 的第 12-15 条,这些条款详细规定了必须向数据主体提供的信息的义务。由于生成式 AI 所处理的数据的性质和范围,这些条款给生成式 AI 带来了独特的挑战(Hacker, Engel, and Mauer 2023, Technical Report, 2-3)。
When considering data harvested from the internet for training purposes, the applicability of Article 14 of the GDPR is crucial. This article addresses the need for transparency in instances where personal data is not directly collected from the individuals concerned. However, the feasibility of individually informing those whose data form part of the training set is often impractical due to the extensive effort required, potentially exempting it under Article 14(5)(b) of the GDPR. Factors such as the volume of data subjects, the data’s age, and implemented safeguards are significant in this assessment, as noted in Recital 62 of the GDPR. The Article 29 Working Party particularly notes the impracticality when data is aggregated from numerous individuals, especially when contact details are unavailable (Article 29 Data Protection Working Party 2018, para. 63, example). 当考虑从互联网上收集的数据用于训练目的时,GDPR 第 14 条的适用性至关重要。该条款解决了在未直接从相关个人收集个人数据的情况下如何确保透明度的问题。但由于需要大量的工作来逐一告知参与训练集的个人,这通常是不可行的,可能可以根据 GDPR 第 14(5)(b)条获得豁免。数据主体的数量、数据的年龄以及实施的保护措施等因素在此评估中很重要,正如 GDPR 第 62 条所述。第 29 条工作组特别指出,当数据从众多个人汇总而来,尤其是联系方式不可用时,此类做法是不可行的(第 29 条数据保护工作组,2018 年,第 63 段,实例)。
Conversely, the processing of personal data submitted by users on themselves in a chat interface (prompts) is not subject to such exemptions. Article 13 of the GDPR explicitly requires that data subjects be informed of several key aspects, including processing purposes, the legal basis for processing, and any legitimate interests pursued. Current practices may not have fully addressed these requirements, marking a significant gap in GDPR compliance. 相反地,用户在聊天界面上提交的个人数据(提示)的处理不受此类豁免的约束。《通用数据保护条例》第 13 条明确要求数据主体被告知几个关键方面,包括处理目的、处理的法律依据以及任何被追求的合法利益。目前的做法可能没有完全满足这些要求,这标志着 GDPR 合规性上的一个重大缺口。
Importantly, the balance between the practical challenges of compliance and the rights of data subjects is delicate. While the concept of disproportionate effort under Article 14(5) GDPR presents a potential exemption, it remains a contentious point, particularly for training data scraping and processing for commercial purposes. In this regard, the data controller, as defined in Article 4(7) of the GDPR, should meticulously document the considerations made under this provision. This documentation is a 重要的是,在符合实际挑战和数据主体权利之间保持微妙平衡。虽然《通用数据保护条例》第 14(5)条下的"不成比例的努力"概念提供了一个潜在的例外情况,但它仍然是一个有争议的问题,特别是在商业目的的训练数据收集和处理方面。在这方面,根据《通用数据保护条例》第 4(7)条的定义,数据控制者应该细致地记录在这一规定下作出的考虑。这种记录是数据控制者履行其责任的一种证明。
crucial aspect of the accountability principle enshrined in Article 5(2) of the GDPR. Furthermore, in our view, documents regarding the methods of collecting training data should be made publicly accessible, reinforcing the commitment to GDPR principles. 在 GDPR 第 5(2)条中明文规定的问责制原则中,这一方面至关重要。此外,我们认为,有关收集训练数据方法的文件应该公开访问,这可以强化对 GDPR 原则的承诺。
4) Model inversion, data leakage, and the right to erasure. 4)模型反转、数据泄露和被遗忘权。
GDPR compliance for Generative AI models gets even trickier with concerns about reconstructing training data from the model (model inversion) and unintentional data leaks, especially in light of the right to be forgotten (or right to erasure) under Article 17. Some scholars even argue that LLMs themselves might be considered personal data due to their vulnerability to these attacks (Veale, Binns, and Edwards 2018). Inversion attacks refer to techniques whereby, through specific techniques, individuals’ data used in the training of these models can be extracted or inferred. Similarly, the memorization problem, which causes LLMs to potentially output personal data contained in the training data, may be invoked to qualify LLMs themselves as personal data. 生成式 AI 模型的 GDPR 合规性变得更加棘手,出现了从模型中重建训练数据(模型反转)和意外数据泄露的担忧,尤其是在LLMs第 17 条"被遗忘权"的背景下。一些学者甚至认为,由于这些攻击的脆弱性,LLMs本身可能被视为个人数据(Veale、Binns 和 Edwards,2018)。反转攻击是指通过特定技术从这些模型的训练数据中提取或推断出个人数据。同样,导致LLMs可能输出训练数据中包含的个人数据的记忆问题,也可能被用来将其界定为个人数据。
The ramifications of classifying the model as personal data are profound and farreaching. If an LLM is indeed deemed personal data, a legal basis is needed for even using or downloading the model. Furthermore, such a qualification implies that data subjects could, in theory, invoke their right to erasure under Article 17 of the GDPR with respect to the entire model. This right, also known as the ‘right to be forgotten,’ allows individuals to request the deletion of their personal data under specific conditions. In the context of LLMs, this could lead to unprecedented demands for the deletion of the model itself, should it be established that the model contains or constitutes personal data of the individuals. 将模型归类为个人数据的后果是深远和广泛的。如果LLM被确定为个人数据,即使使用或下载该模型也需要法律依据。此外,这种资格意味着数据主体理论上可以根据 GDPR 第 17 条行使其删除权,即'被遗忘权'。这一权利允许个人在特定情况下要求删除其个人数据。在LLMs的背景下,如果确定该模型包含或构成个人数据,这可能会导致前所未有的要求删除模型本身的要求。
Such a scenario poses significant challenges for the field of AI and machine learning. The practicality of complying with a request for erasure in this context is fraught with technical and legal complexities (Villaronga et al., 2018; Zhang et al., 2023). Deleting a model, particularly one that has been widely distributed or deployed, could be technologically challenging and may have significant implications for the utility and functionality of the corresponding AI system. Furthermore, this approach raises questions about the balance between individual rights and the broader benefits of AI technologies. The deletion of entire models, with a potential subsequent economic need to retrain the entire model, also conflicts with environmental sustainability given the enormous energy and water consumption of (re-)training LLMs (Hacker 2024). 这种情景为人工智能和机器学习领域带来了重大挑战。在这种情况下满足删除请求的可行性充满了技术和法律复杂性(Villaronga et al., 2018; Zhang et al., 2023)。删除一个模型,特别是已经广泛分发或部署的模型,在技术上可能会很有挑战,并且可能会对相应的人工智能系统的效用和功能产生重大影响。此外,这种方法也引发了个人权利与人工智能技术带来的更广泛利益之间的平衡问题。删除整个模型,并且随后可能需要重新训练整个模型,也与环境可持续性相矛盾,因为(再)训练需要大量的能源和水资源(Hacker 2024)。
Although LLM producers, such as OpenAI, claim to comply with the right to erasure, it is unclear how they can do so because personal information may be contained in multiple forms in an LLM, which escalates the complexity of identifying and isolating specific data points, particularly when the data is not presented in a structured format (e.g., phone numbers). Additionally, the removal requests initiated by a single data subject may prove to be inadequate, especially in scenarios where identical information has been circulated by multiple users during their engagements with the LLM (Brown et al. 2022). In other words, the deletion of data from a training dataset represents a superficial solution, as it does not necessarily obliterate the potential for data retrieval or the extraction of associated information encapsulated within the mode’s parameters. Data incorporated during the training phase can permeate the outputs generated by certain machine learning models, creating a scenario 尽管生产商(如 OpenAI)声称遵守数据删除权,但实际上很难做到,因为个人信息可能以多种形式存在于模型中,这增加了识别和隔离具体数据点的复杂性,特别是当数据并非以结构化格式(如电话号码)表示时。此外,单个数据主体提出的删除请求可能是不充分的,尤其是在相同信息已被多个用户在与系统互动时传播的情况下(Brown 等人,2022)。换句话说,从训练数据集中删除数据只是表面解决方案,因为它并不能完全消除数据检索或提取模型参数中包含的相关信息的可能性。在训练阶段纳入的数据可能渗透到某些机器学习模型生成的输出中,创造了一个场景。
where original training data, or information linked to the purged data, can be inferred or “leaked,” thereby undermining the integrity of the deletion process and perpetuating potential privacy violations (De Cristofaro 2020). At a minimum, this points to the need for more robust and comprehensive strategies to address data privacy and “machine unlearning” (Hine et al., 2023; Floridi 2023; Nguyen et al., 2022) within the operational area of LLMs. 原始培训数据或与被删除数据相关的信息可以被推断或"泄露"的地方,从而破坏删除过程的完整性,并可能导致隐私侵犯的持续。至少这指出了需要更强大和全面的策略来解决LLMs运营范围内的数据隐私和"机器去学习"问题。
5) Automated decision-making. 5) 自动决策。
Furthermore, given new CJEU jurisprudence, the use of Generative AI models might be qualified as automated decision-making processes, a topic scrutinized under Article 22 of the GDPR. This article generally prohibits automated individual decisionmaking, including profiling, which produces legal effects concerning an individual or similarly significantly affects them, unless specific exceptions apply. 此外,鉴于欧洲联盟法院(CJEU)新近的判例,使用生成式人工智能模型可能被归类为自动化决策过程,这个问题受到《通用数据保护条例》(GDPR)第 22 条的审查。这一条款通常禁止进行自动化的个人决策,包括剖析,它会对个人产生法律影响或产生类似重大影响,除非出现特定例外情况。
In cases where LLMs are used for evaluation, such as in recruitment or credit scoring, the importance of this regulation becomes even more significant. A pertinent illustration is provided by the recent ruling in the SCHUFA case by the CJEU. ^(17){ }^{17} The Court determined that the automated generation of a probability value regarding an individual’s future ability to payment commitments by a credit information agency constitutes ‘automated individual decision-making’ as defined in Article 22. According to the Court, this presupposes, however, that this probability value significantly influences a third party’s decision to enter into, execute, or terminate a contractual relationship with that individual. 在使用LLMs进行评估的情况下,例如在招聘或信贷评分中,这项规定的重要性变得更加重要。最近欧洲法院在 SCHUFA 案中的裁决提供了一个相关的例证。 ^(17){ }^{17} 法院认定,信贷信息机构自动生成个人未来履行付款义务的概率值构成了第 22 条所定义的"自动化个人决策"。但法院表示,这种概率值必须对第三方与该个人建立、执行或终止合同关系的决定产生重大影响。
Extrapolating from this ruling, the automated evaluation or ranking of individuals by LLMs will constitute automated decision-making if it is of paramount importance for the decision at hand-even if a human signs off on it afterward. The legal implications of this judgment are profound. Exemptions from the general prohibition of such automated decision-making are limited to scenarios where there is a specific law allowing the process, explicit consent, or where the automated processing is necessary for contractual purposes, as per Article 22(2) of the GDPR. 根据此裁决,LLMs对个人进行的自动评估或排名,即使是人工最终审批,也将构成自动决策,因为它对于当前决策至关重要。此判决的法律影响深远。根据 GDPR 第 22(2)条,仅在存在具体法律允许该过程、有明确同意或自动处理是出于合同目的的情况下,才可豁免此类自动决策的一般禁令。
Obtaining valid consent in these contexts is challenging, especially considering the power imbalances often present between entities like employers or credit agencies and individuals seeking jobs or credit (Recital 43 GDPR). Therefore, the legality of using LLMs in such situations may largely depend on whether their use can be justified as necessary for the specific task at hand (Article 22(2)(a) GDPR). arguments based solely on efficiency are unlikely to be sufficient. Instead, those deploying LLMs for such purposes might need to demonstrate tangible benefits to the applicants, such as more reliable, less biased, or more transparent evaluation processes. Absent such a qualification, only specific union or Member State laws, containing sufficient safeguards, may permit such automated decision making (Article 22(2)(b) GDPR). 在这些背景下获得有效同意是一个挑战,特别是考虑到雇主或信用机构等实体与寻求工作或信贷的个人之间通常存在的权力失衡(GDPR 第 43 条)。因此,使用LLMs在此类情况下的合法性可能在很大程度上取决于其使用是否可被证明对于特定任务而言是必要的(GDPR 第 22(2)(a)条)。仅依赖于效率的论点可能不足以支持合法性。相反,为此类目的部署LLMs的实体可能需要证明对申请人有切实的利益,例如更可靠、更少偏差或更透明的评估流程。如果未满足此类条件,只有包含足够保障措施的特定工会或成员国法律,才可能允许此类自动决策(GDPR 第 22(2)(b)条)。
6) Protection of minors. 保护未成年人
The deployment of Generative AI models has raised significant concerns regarding age-appropriate content, especially given the potential for generating outputs that may not be suitable for minors. Under Article 8(2) GDPR, the controller must undertake "reasonable efforts to verify […] that [children’s] consent is given or authorized by the 生成式人工智能模型的部署引发了关于适合年龄的内容的重大担忧,特别是鉴于其可能产生不适合未成年人的输出的潜在可能。根据 GDPR 第 8(2)条,控制者必须采取"合理努力来验证[...] 儿童的同意是否已给出或经过授权"。
holder of parental responsibility over the child, taking into consideration available technology." 对孩子负有父母责任的人,考虑可用技术。
A notable instance of regulatory intervention in this context is the action taken by the Italian Data Protection Authority (Garante per la Protezione dei Dati Personali-GPDP). On March 30, 2023, the GPDP imposed a temporary restriction on OpenAI’s processing of data from Italian users, with a particular emphasis on safeguarding minors. ^(18){ }^{18} This move underscores the increasing scrutiny by data protection authorities on the implications of LLMs in the context of protecting vulnerable groups, especially children (Malgieri 2023). 这在监管干预方面的一个重要实例是意大利数据保护管理局(Garante per la Protezione dei Dati Personali-GPDP)采取的行动。2023 年 3 月 30 日,GPDP 对 OpenAI 处理意大利用户数据施加了临时限制,特别关注保护未成年人。这突显了数据保护机构正日益关注人工智能在保护弱势群体(尤其是儿童)方面的影响(Malgieri 2023)。
In response to these concerns, OpenAI, for example, has implemented measures aimed at enhancing the protection of minors. These include the establishment of an age gate and the integration of age verification tools. The effectiveness and robustness of these tools, however, remain an area of keen interest and ongoing evaluation, especially in the rapidly evolving landscape of AI and data protection. 为了应对这些担忧,OpenAI 例如采取了一些措施来增强对未成年人的保护。这些措施包括建立年龄验证门户和整合年龄验证工具。然而,这些工具的有效性和稳健性仍然是一个令人关注的领域,需要持续评估,特别是在不断发展的人工智能和数据保护领域。
7) Purpose limitation and data minimization. 7) 目的限制和数据最小化。
One approach to address data calibration for open-ended LLM applications is requiring developers to train models on smaller datasets and leverage few/zero-shot learning skills. As an alternative to imposing restrictions on the dataset, however, it could be more beneficial to strengthen privacy-preserving measures proportionally to dataset size. For example, rather than relying solely on pseudo-anonymization and encryption (Article 10, point 5(b), AIA), LLM providers should implement methods like differential privacy to counter adversarial attacks on large datasets (Shi et al. 2022; Plant, Giuffrida, and Gkatzia 2022). 针对开放式LLM应用程序的数据校准问题,一种方法是要求开发人员在较小的数据集上训练模型,并利用少量/零样本学习技能。不过,与对数据集施加限制相比,加强与数据集规模成比例的隐私保护措施可能更有益。例如,LLM提供商不应仅依赖于伪匿名化和加密(《人工智能法》第 10 条第 5(b)项),而应实施差分隐私等方法来应对针对大型数据集的对抗性攻击(Shi et al. 2022; Plant, Giuffrida, and Gkatzia 2022)。
8) Ways forward. 8) 前进的道路。
To enable Generative AI models to comply with GDPR data protection standards, we have already suggested a tailored regime under Art. 9(2) GDPR above. Another reasonable step would be to adapt the data governance measures outlined for high-risk systems in the AIA. The Europea Parliament had made proposal for an Article 28(b), which would have delineated the following obligations for GPAI providers: “process and incorporate only datasets that are subject to appropriate data governance measures […] in particular measures to examine the suitability of the data sources and possible biases and appropriate mitigation”. However, this proposal has not made it into the final version of the AI Act; rather, if used in specific high-risk scenarios, GPAIs will fall under the data governance rules of Article 10. 为使生成式人工智能模型能够遵守 GDPR 数据保护标准,我们已经在上面的 GDPR 第 9(2)条中建议了一个专门的制度。另一个合理的步骤是,调整人工智能法案中规定的高风险系统的数据治理措施。欧洲议会曾提出一项第 28(b)条,要求 GPAI 提供商采取以下义务:"仅处理和纳入受到适当数据治理措施约束的数据集[...],特别是检查数据来源的适当性和可能存在的偏差,并采取适当的缓解措施"。但是,这一提议最终未能进入人工智能法案的最终版本;相反,在特定的高风险情景下,GPAI 将受到第 10 条数据治理规则的约束。
While the revised iteration of the compromise text for Article 10 is extensive, it may also be too generic, necessitating the incorporation of more tailored measures or incentives to aptly address the complexities inherent to GPAI models like LLMs (e.g., under harmonised standards and common specifications, Art. 40 and 41 AIA). These 虽然第 10 条妥协文本的修订版相当广泛,但也可能过于一般化,需要纳入更多定制措施或激励措施,以恰当应对LLMs等 GPAI 模型固有的复杂性(例如,根据协调标准和共同规范,第 40 条和第 41 条 AIA)。
technical standards should be refined by incorporating LLM-specific measures, such as requiring training on publicly available data, wherever possible. A significant portion of these datasets might also take advantage of GDPR’s right to be forgotten exceptions for public interest, scientific, and historical research (Article 17(3)(d)). Where these exceptions do not apply, it could be feasible for LLMs to use datasets not contingent upon explicit consent, which are intended for public usage. Hence, the most appropriate way to use these systems could require fine-tuning public data with private information for individual data subjects’ local use. This should be allowed to maximize LLMs’ potential, as proposed by (Brown et al. 2022). 技术标准应当通过纳入特定于LLM的措施得到完善,例如尽可能要求使用公开可用的数据进行培训。这些数据集的很大一部分也可能利用 GDPR 关于公共利益、科学和历史研究的被遗忘权例外(第 17(3)(d)条)。在这些例外不适用的情况下,LLMs可能会使用不依赖于明确同意的数据集,这些数据集旨在供公众使用。因此,使用这些系统最合适的方式可能需要利用公共数据与个人数据主体的私有信息进行微调。这应该被允许,以最大化LLMs的潜力,正如(Brown et al. 2022)所提议的。
Other potential strategies to enhance data privacy are: encouraging the proper implementation of the opt-out right by LLM providers and deployers and exploring the potential of machine unlearning (MU) techniques, as mentioned. 增强数据隐私的其他潜在策略包括:鼓励LLM提供商和部署者正确实施退出权,并探索机器非学习(MU)技术的潜力,如前所提。
Regarding the first strategy, OpenAI has recently made a potentially significant advancement in this direction by releasing a web crawler, named GPTbot, that comes with an opt-out feature for website owners. This feature enables them to deny access to the crawler, as well as customize or filter accessible content, granting them control over the content that the crawler interacts with. ^(19){ }^{19} This is useful not only for implementing the opt-out right under the EU TDM copyright exception but also under Article 21 GDPR. 关于第一个策略,OpenAI 最近通过发布一个名为 GPTbot 的网络爬虫做出了潜在的重大进步,该爬虫配备了一个让网站所有者选择退出的功能。此功能使他们能够拒绝访问爬虫,并自定义或过滤可访问的内容,从而控制爬虫交互的内容。这不仅对于实施欧盟数据挖掘版权例外的选择退出权非常有用,而且对于 GDPR 第 21 条也非常有用。
Turning to the second strategy, MU stands as potentially a more efficient method to fully implement the right to erasure (Nguyen et al. 2022), a critical aspect when dealing with LLMs. Unlike conventional methods that merely remove or filter data from a training set - a process that is often inadequate since the removed data continues to linger in the model’s parameters - MU focuses on erasing the specific influence of certain data points on the model, without the need for complete retraining. This technique, therefore, could more effectively enhance both individual and group privacy when using LLMs (Hine et al. 2023; Floridi 2023). 转向第二个策略,MU 可能是一种更有效的方法来完全实现删除权(Nguyen 等, 2022),这是在处理LLMs时的一个关键方面。不同于仅从训练集中删除或过滤数据的传统方法,这种方法往往是不充分的,因为被删除的数据仍然存在于模型的参数中,MU 专注于消除某些数据点对模型的特定影响,无需完全重新训练。因此,这种技术在使用LLMs时可以更有效地增强个人和群体的隐私(Hine 等, 2023; Floridi 2023)。
4. Intellectual Property 4. 知识产权
Next to data protection concerns, Generative AI presents various legal challenges related to its “creative” outputs. Specifically, contents generated by LLMs result from processing text data such as websites, textbooks, newspapers, scientific articles, and programming codes. Viewed through the lens of intellectual property (IP) law, the use of LLMs raises a variety of theoretical and practical issues ^(20){ }^{20} that can only be briefly touched upon in this paper, and that the EU legislation seems not yet fully equipped to address. Even the most advanced piece of legislation currently under consideration by the EU institutions - the AIA - does not contain qualified answers to the issues that will be outlined below. The stakes have been raised significantly, however, by several high-profile lawsuits levelled by content creators (e.g., the New York Times; 除了数据保护问题之外,生成式人工智能在其"创造性"输出方面也带来了各种法律挑战。具体而言,LLMs生成的内容来自于处理网站、教科书、报纸、科技文章和编程代码等文本数据。从知识产权法的角度来看,LLMs的使用引发了各种理论和实践问题 ^(20){ }^{20} ,本文只能简要触及,而欧盟现有法律似乎尚未完全准备好应对这些问题。然而,内容创作者(如纽约时报)提起的几起备受关注的诉讼已经大大提高了这个问题的紧迫性。
Getty Images) against Generative AI developers, both in the US ^(21){ }^{21} and in the EU (de la Durantaye 2023). 根据 Getty Images 的说法,这是针对美国和欧盟的生成式 AI 开发者的诉讼(de la Durantaye 2023)。
Within the context of this article, it is advisable to distinguish between the training of LLMs and the subsequent generation of outputs. Furthermore, concerning the generation of outputs, it is worthwhile to further differentiate-as suggested, among the others, by the European Parliament ^(22){ }^{22} - between instances in which LLMs serve as mere instruments to enhance human creativity and situations in which LLMs operate with a significantly higher degree of autonomy. On the contrary, the possibility of protecting LLMs themselves through an IP right will not be discussed in this paper. 在本文的背景下,我们应该区分LLMs的训练和后续输出的生成。此外,关于输出的生成,值得进一步区分,正如欧洲议会 ^(22){ }^{22} 所建议的那样,在LLMs仅用于增强人类创造力的情况下,以及LLMs在较高程度的自主性下运作的情况下。相反,在本文中不会讨论保护LLMs自身的知识产权的可能性。
1) Training. 训练
The main copyright issue concerning AI training arises from the possibility that the training datasets may consist of or include text or other materials protected by copyright or related rights (Sartor, Lagioia, and Contissa 2018). Indeed, for text and materials to be lawfully reproduced (or otherwise used within the training process), either the rightholders must give their permission or the law must specifically allow their use in LLM training. 关于 AI 培训的主要版权问题源自于培训数据集可能由受版权或相关权保护的文本或其他材料构成或包含(Sartor、Lagioia 和 Contissa 2018)。事实上,为了合法复制(或在培训过程中以其他方式使用)文本和材料,权利人必须给予许可,或法律必须特别允许在 LLM 培训中使用它们。
The extensive scale of the datasets used and, consequently, the significant number of rightholders potentially involved render it exceedingly difficult to envision the possibility that those training LLMs could seek (and obtain) an explicit license from all right holders, reproducing the problem of data protection consent. This issue becomes particularly evident when, as often occurs, LLM training is carried out using web scraping techniques, a practice whose legality has been (and continues to be) debated by courts and scholars in Europe (Sammarco 2020; Klawonn 2019), even in terms of potential infringement of the sui generis right granted to the maker of a database by Directive 96//9//EC^(23)96 / 9 / \mathrm{EC}^{23}. On the one hand, some content available online, including texts and images, might be subject to permissive licensing conditions-e.g. some Creative Commons licenses-authorizing reproduction and reuse of such content even for commercial purposes. The owner of a website could, on the other hand, include contractual clauses in the Terms and Conditions of the website that prohibit web scraping even when all or some of the website’s content is not per se protected by intellectual property rights. ^(24){ }^{24} To mitigate legal risk, LLMs should be suitably capable of autonomously analyzing website Terms and Conditions, thereby discerning between materials whose use has not been expressly reserved by their rightholders and materials that may be freely used (also) for training purposes. 用于训练的数据集规模巨大,涉及大量权利人,很难设想这些训练LLMs的设计者能够从所有权利人那里获得明确的许可,从而再现了数据保护同意的问题。当训练LLM使用网页爬取技术时,这一问题尤为突出,这种做法的合法性一直受到欧洲法院和学者的争议(Sammarco 2020; Klawonn 2019),甚至涉及侵犯 96//9//EC^(23)96 / 9 / \mathrm{EC}^{23} 指令赋予数据库制作者的 sui generis 权利。一方面,在线提供的一些内容,包括文本和图像,可能受到宽松的许可条件的约束,例如一些知识共享许可,允许出于商业目的对此类内容进行复制和重复使用。另一方面,网站所有者可以在网站的使用条款中包含禁止网页爬取的合同条款,即使网站的全部或部分内容本身不受知识产权保护。 ^(24){ }^{24} 为了降低法律风险,LLMs应当具备自主分析网站使用条款的能力,从而区分那些权利人未明确保留使用权的材料,以及那些可以自由用于(包括)训练目的的材料。
The OpenAI above’s GPTbot web crawler which allows website owners to opt-out or filter/customize content access offers a significant technical tool in this context. While it does not eliminate all IP law concerns, it is a proactive measure that could, in the future, set a standard of care that all LLMs’ providers might be expected to OpenAI 的 GPTbot 网页爬虫允许网站所有者选择退出或过滤/定制内容访问,在这种情况下提供了一个重要的技术工具。虽然它不能消除所有知识产权法的担忧,但它是一项主动措施,未来可能成为所有 LLMs提供商应该遵守的标准。
uphold. ^(25){ }^{25} Significantly, the GPAI rules of the AIA discussed in the trilogue contained precisely an obligation for providers of such systems to establish a compliance system, via technical and organizational measures, capable of recognizing and respecting rightholders’ opt-outs (Hacker 2023c). For the moment, it remains unclear, however, if this provision will be contained in the final version of the AI Act. It would be a step in the right direction, as commercial LLM training without such a compliance system typically amounts to systematic copyright infringement, even under the new and permissive EU law provisions, to which we now turn. 坚持。 ^(25){ }^{25} 值得注意的是,三边会议中讨论的 GPAI 人工智能法规包含了供应商必须建立合规系统的义务,通过技术和组织措施来识别和尊重权利人的退出选择(Hacker 2023c)。然而目前还不清楚这一规定是否会出现在最终版本的人工智能法案中。这将是一个正确的方向,因为没有此类合规系统的商业LLM训练通常构成系统性的版权侵犯,即便是在新的宽松的欧盟法律规定下,我们现在就来讨论这一点。
A potential regulatory solution to ensure the lawful use of training datasets would involve applying the text and data mining (TDM) exception provided by Directive 2019/790/EU (DSMD) )^(26))^{26} to the training of LLMs. Indeed, Article 2(2) DSMD defines text and data mining as “any automated analytical technique aimed at analyzing text and data in digital form to generate information which includes but is not limited to patterns, trends and correlations”. Considering that the training of LLMs certainly encompasses (although it likely extends beyond) automated analysis of textual and data content in digital format to generate information, an argument could be made that such activity falls within the definition provided by the DSM Directive (Dermawan, n.d.). However, the application of the TDM exception in the context of LLMs training raises non-trivial issues (Pesch and Böhme 2023) (Hacker 2021) (see also, more generally, Geiger, Frosio, and Bulayenko 2018; Rosati 2018). 确保培训数据集合法使用的一个潜在监管方案是,将指令 2019/790/EU(DSMD) )^(26))^{26} 提供的文本和数据挖掘(TDM)例外适用于LLMs的培训。事实上,DSMD 第 2(2)条将文本和数据挖掘定义为"任何旨在分析数字形式的文本和数据以生成信息的自动分析技术,包括但不限于模式、趋势和相关性"。考虑到LLMs的培训确实包括(尽管可能超出)以数字格式分析文本和数据内容以生成信息的自动分析,可以认为此类活动符合 DSMD 提供的定义(Dermawan,n.d.)。然而,在LLMs培训背景下应用 TDM 例外会引发非平凡的问题(Pesch 和 Böhme 2023)(Hacker 2021)(参见 Geiger、Frosio 和 Bulayenko 2018;Rosati 2018)。
Firstly, where the TDM activity is not carried out by research organizations and cultural heritage institutions for scientific research-e.g., by private companies and/or for commercial purposes-it is permitted under Article 4(3) DSMD only on condition that the use of works and other protected materials “has not been expressly reserved by their right-holders in an appropriate manner, such as machine-readable means in the case of content made publicly available online”. This condition underscores our earlier note on the need for LLMs to automatically analyze the Terms and Conditions of websites and online databases. 首先,如果 TDM 活动不是由研究机构和文化遗产机构进行的科学研究 - 例如,由私人公司和/或商业目的进行 - 那么根据欧盟数字单一市场指令第 4(3) 条,只有在作品和其他受保护材料的使用"没有被权利人以适当的方式(如公开发布的内容中的机器可读方式)明确保留"的情况下才被允许。这一条件强调了我们先前关于需要LLMs自动分析网站和在线数据库的条款和条件的注记。
Secondly, a further element of complexity is that Article 4(2) DSMD stipulates that the reproductions and extractions of content made under Article 4(1) may only be retained “for as long as is necessary for the purposes of text and data mining”. In this sense, if one interprets the TDM exception to merely cover the training phase of LLMs (as separate from the validation and testing phases), LLMs should delete copyrighted content used during training immediately after its use. Consequently, these materials could not be employed to validate or test LLMs. In this perspective, to make the text and data mining exception more effective in facilitating LLM development, it is advisable to promote a broad normative interpretation of “text and data mining”, encompassing not only the training activity in the strict sense but also the validation and testing of the LLM. 其次,一个进一步的复杂性因素是《数字单一市场指令》第 4(2)条规定,根据第 4(1)条制作的内容副本和提取件只能"在为文本和数据挖掘之目的所必需的范围内保留"。从这个意义上说,如果将 TDM 例外解释为仅涵盖LLMs的训练阶段(独立于验证和测试阶段),LLMs应该在使用后立即删除所使用的受版权保护的内容。因此,这些材料就不能用于验证或测试LLMs。从这个角度来看,为了使文本和数据挖掘例外更有效地促进LLM的发展,建议采用"文本和数据挖掘"的广泛规范性解释,不仅包括狭义上的训练活动,还应包括LLM的验证和测试。
Thirdly, the exception covers only reproductions and extractions, but not modifications of the content-which will often be necessary to bring the material into 第三,该例外仅涵盖复制和提取,而不包括对内容的修改,而这种修改通常是必要的,以将材料纳入
a format suitable for AI training. Finally, according to Article 7(2) DSMD, the threestep test (Geiger, Griffiths, and Hilty 2008) contained in Article 5(5) of the InfoSoc Directive 2001/29/EC restricts the scope of the TDM exception. According to this general limit to copyright exceptions, contained as well in international treaties (Oliver 2001; Griffiths 2009), such exceptions apply only “in certain special cases which do not conflict with a normal exploitation of the work or other subject-matter and do not unreasonably prejudice the legitimate interests of the rightholder.” Importantly, this suggests that the TDM exception cannot justify reproductions that lead to applications that substitute, or otherwise significantly economically compete with, the protected material used for AI training. However, this is, arguably, precisely what many applications are doing (Marcus and Southen 2024). It remains unclear, however, to what extent the three-step-test limits individual applications of the TDM exception in concrete cases before the courts, as opposed to being a general constraint on Member States’ competence to curtail the ambit of copyright (Griffiths 2009, 3-4). 适合于 AI 训练的格式。最后,根据 DSMD 第 7(2)条,InfoSoc 指令 2001/29/EC 第 5(5)条中包含的三步测试(Geiger, Griffiths 和 Hilty 2008)限制了 TDM 例外的范围。根据这一适用于版权例外的一般限制,也包含在国际条约中(Oliver 2001;Griffiths 2009),此类例外仅适用于"不与作品或其他标的物的正常利用相冲突,且不会不合理损害权利人的正当权益的某些特殊情况"。重要的是,这表明 TDM 例外不能正当化导致替代或在经济上与用于 AI 训练的受保护材料产生重大竞争的复制行为。然而,这正是许多应用程序正在做的事情(Marcus 和 Southen 2024)。不过,目前仍不清楚三步测试在具体案件中在法院面前限制 TDM 例外的个别应用的程度,与之相反,它可能只是对成员国限制版权范围的一般约束(Griffiths 2009, 3-4)。
As mentioned, legal proceedings have recently been brought in the United States and the EU to contest copyright infringement related to materials used in the training phase by AI systems ^(27){ }^{27}. While the outcomes of such cases are not necessarily predictive of how analogous cases might be resolved in the EU-for example, in the US the fair use doctrine could be invoked (Gillotte 2020), which lacks exact equivalents in the legal systems of continental Europe-it will be intriguing to observe the approach taken by courts across the Atlantic. Note, particularly, that these cases may, among other things, be decided by the extent to which AI systems substitute for, i.e., compete with, the materials they were trained on (so-called transformativeness, see, e.g., (Henderson et al. 2023), a consideration that parallels the debate mentioned above in EU law on the interpretation of the three-step-test (and its transposition into Member State law (Griffiths 2009, 3-4). 如上所述,最近在美国和欧盟发起了法律诉讼,以应对 AI 系统训练阶段使用的材料所引发的版权侵权问题。尽管这些案件的结果未必能预测类似案件在欧盟的裁决情况 - 例如,在美国可能援引公平使用原则(Gillotte 2020),而该原则在欧洲大陆法律体系中并无完全对等的概念 - 但观察跨大西洋法院的处理方法仍将令人备受关注。尤其需要注意的是,这些案件的裁决可能部分取决于 AI 系统在多大程度上取代或竞争其所训练的材料(即所谓的"转换性",见(Henderson et al. 2023)),这与上述欧盟法律中关于三步检验标准解释的辩论(及其在各成员国法律中的移植,见 Griffiths 2009, 3-4)存在类似考量。
2) Output generation. 2) 输出生成。
It is now worth focusing on the legal issues raised by the generation of outputs by LLMs. In this regard, two different aspects must be primarily addressed: the legal relationship between these outputs and the materials used during the training of LLMs, and the possibility of granting copyright or patent protection to these outputs. 现在值得关注LLMs生成输出所引发的法律问题。在这方面,必须主要解决两个不同的方面:这些输出与LLMs训练过程中使用的材料之间的法律关系,以及是否可以为这些输出授予版权或专利保护。
An answer to this complex legal issue could hardly be provided in general and abstract terms, requiring proceeding with a case-by-case assessment, i.e., by comparing a specific LLM-generated output with one or more specific pre-existing materials. Such a comparison could in principle be conducted by applying the legal doctrines currently adopted by courts in cases of copyright or patent infringement (or, when appropriate, 对这一复杂的法律问题,很难给出一般性和抽象性的答复,需要进行逐案评估,即将特定的LLM生成输出与一个或多个具体的现有材料进行比较。这种比较原则上可以通过应用法院在版权或专利侵权案件中目前采用的法律理论来进行(或在适当情况下,采用其他相关的法律理论)。
the legal doctrines adopted to assess whether a certain work/invention qualifies as a derivative work/invention). In this perspective, indeed, the circumstance that the output is generated by a human creator or an AI system does not make a significant legal distinction, except in terms of identifying the subject legally accountable for the copyright infringement. 用于评估某项作品/发明是否符合衍生作品/发明资格的法律学说。从这个角度来看,输出是由人类创作者还是人工智能系统生成确实没有重大法律区别,除了确定版权侵权行为的法律责任主体。
In general terms, however, the use of protected materials in the training of an LLM does not imply, per se, that the LLM-generated outputs infringe upon the intellectual property rights in these materials ^(28){ }^{28} or qualify as derivative creations thereof. Broadly speaking, an LLM-generated output could infringe upon legal rights in two main ways. First, if the output exhibits substantial and direct similarities to legally protected elements of pre-existing materials, it would likely violate the (reproduction) right of those materials. Second, if the legally protected aspects or elements of the pre-existing materials appear in the LLM output through indirect adaptations or modifications, always unauthorized, then this output would likely qualify as a derivative creation from the pre-existing materials (Gervais 2022; Henderson et al. 2023). For instance, the fact that a text generated by an LLM shares the same style as the works of a specific author (as would occur if a prompt such as “write a novel in the style of Dr. Seuss” were used) would not imply, per se, an infringement of the intellectual property rights of that author. This is because, in most European legal systems, the literary or artistic style of an author is not an aspect upon which an exclusive right can be claimed. 总的来说,在LLM训练中使用受保护材料并不意味着LLM生成的输出侵犯了这些材料的知识产权或构成对其的衍生创作。广义上说,LLM生成的输出可能会以两种主要方式侵犯合法权利。首先,如果输出与现有受法律保护的元素存在实质性和直接的相似性,就很可能侵犯这些材料的(复制)权利。其次,如果现有受法律保护的方面或元素通过未经授权的间接改编或修改出现在LLM输出中,则该输出很可能被视为对现有材料的衍生创作(Gervais 2022;Henderson 等人,2023)。例如,LLM生成的文本与某位特定作者的作品风格相同(如使用"以 Dr. Seuss 的风格写一部小说"这样的提示)并不意味着侵犯该作者的知识产权,因为在大多数欧洲法律体系中,作者的文学或艺术风格不是可以主张排他性权利的方面。
If, by contrast, an infringement is found in an LLM output, the person prompting the LLM would first and foremost be liable because she directly brings the reproduction into existence. However, LLM developers might, ultimately, also be liable. The Court of Justice of the European Union (CJEU) has recently determined that if platforms fail to comply with any of three distinct duties of care, they will be directly accountable for violations of the right to publicly communicate a work. ^(29){ }^{29} These duties amount to i) expeditiously deleting it or blocking access to infringing uploads of which the platform has specific knowledge; ii) putting in place the appropriate technological measures that can be expected from a reasonably diligent operator in its situation to counter credibly and effectively copyright infringements if the platform knows or ought to know, in a general sense, that users of its platform are making protected content available to the public illegally via its platform; iii) not providing tools on its platform specifically intended for the illegal sharing of such content and not knowingly promoting such sharing, including by adopting a financial model that encourages users of its platform illegally to communicate protected content. ^(30){ }^{30} These duties could-mutatis mutandis-be transposed to LLM developers concerning the right of reproduction (Nordemann 2024, 2023), although such transposition may not be so straightforward. However, this would make good sense, both from a normative perspective encouraging active prevention of copyright infringement and from the perspective of the coherence of EU copyright law across technical facilities. ^(31){ }^{31} 如果发现在LLM输出中存在侵权行为,则首要责任在于发起LLM的人,因为她直接造成了复制的产生。但是,LLM开发者最终也可能承担责任。欧洲联盟法院(CJEU)最近裁定,如果平台未能遵守三项明确的注意义务,它们将直接对违反公开传播作品的权利负责。这三项义务包括:i)迅速删除或阻止其已知存在侵权上传内容的访问;ii)采取可合理预期的技术措施,以有效地应对平台用户非法通过其平台公开提供受保护内容的行为,前提是平台知道或应该知道此类行为;iii)不在其平台提供专门用于非法分享此类内容的工具,也不故意推广此类分享行为,包括采取鼓励用户非法传播受保护内容的财务模式。这些义务在一定程度上可以转移到LLM开发者身上,涉及复制权(Nordemann 2024, 2023),尽管这种转移可能不太直接。但从积极预防版权侵权的规范角度以及维护 EU 版权法在不同技术场景中的一致性来看,这样做都是合理的。
A distinct and further legal issue arises when an LLM-generated output can be regarded as an autonomous creation, legally independent from the pre-existing materials. In this scenario, the question pertains to whether such output may be eligible for protection under IP law, specifically through copyright (in the case of literary, artistic, or scientific works) or through patent protection (in the case of an invention) (Engel, 2020; Hristov, 2016; Klawonn, 2023; Varytimidou, 2023). 当一个LLM生成的输出可被视为独立于现有材料的自主创造时,就会产生一个明确且进一步的法律问题。在这种情况下,问题在于这种输出是否有资格根据知识产权法受到保护,具体来说,通过版权(对于文学、艺术或科学作品)或专利保护(对于发明)(Engel, 2020; Hristov, 2016; Klawonn, 2023; Varytimidou, 2023)。
As mentioned at the beginning of this paragraph, the fundamental legal problem, here, stems from the anthropocentric stance taken by intellectual property law. While international treaties and EU law do not explicitly state that the author or inventor must be human, various normative hints seem to support this conclusion. In the context of copyright, for instance, for a work to be eligible for protection, it must be original, i.e., it must constitute an author’s intellectual creation ^(32){ }^{32}. This requirement is typically interpreted, also by the Court of Justice of the EU, as the work needed to reflect the author’s personality (something that AI lacks, at least for now). Patent law takes a less marked anthropocentric approach, but even here, the so-called inventive step-which, together with novelty and industrial applicability, is required for an invention to be patentable-is normatively defined in terms of non-obviousness to a person skilled in the art^(33)\mathrm{art}^{33}. The very existence of moral rights (such as the so-called right of paternity) safeguarding the personality of the author or inventor suggests that the subject of protection can only be human. 正如本段开头所述,这里的根本法律问题源自知识产权法所采取的以人为中心的立场。虽然国际条约和欧盟法律没有明确规定作者或发明人必须是人类,但各种规范性暗示似乎支持这一结论。例如,在版权的背景下,为了作品获得保护,它必须是独创性的,即它必须构成作者的智力创作。这一要求通常被解释为,作品必须反映作者的个性(至少目前人工智能还缺乏这一点)。专利法采取了较弱的以人为中心的方法,但即使在这里,所谓的发明步骤(与新颖性和工业应用性一起是专利可授权性的要求)也是以一个有技能的人不明显的标准来定义的。保护作者或发明人人格的精神权利(如所谓的父权权)的存在,也表明受保护的主体只能是人类。
Considering these succinct considerations, we can return to the initial question, namely whether an LLM-generated output may be eligible for protection under intellectual property law. 考虑到这些简明扼要的考虑,我们可以回到最初的问题,即由LLM生成的输出是否符合知识产权法的保护条件。
The answer to this question is relatively straightforward when the LLM constitutes a mere instrument in the hands of a human creator, or, to put it differently, when the creative outcome is the result of predominantly human intellectual activity, albeit assisted or enhanced by an AI system. In such a scenario, the European Parliament has stressed that where AI is used only as a tool to assist an author in the process of creation, the current IP framework remains fully applicable ^(34){ }^{34}. Indeed, as far as copyright protection is concerned, the Court of Justice of the EU has made clear in the Painer case ^(35){ }^{35} that it is certainly possible to create copyright-protected works with the aid of a machine or device. A predominant human intellectual activity can be recognized, also based on the CJEU case law, when the human creator using an LLM makes free and creative choices in the phases of conception, execution, and/or redaction of the work (Hugenholtz and Quintais 2021). 当LLM构成人类创造者手中的一个工具时,或者换句话说,当创作成果主要是人的智力活动的结果时,即使由 AI 系统辅助或增强,这个问题的答案相当直接。在这种情况下,欧洲议会强调,如果 AI 仅用作辅助作者创作过程的工具,现有的知识产权框架仍完全适用 ^(34){ }^{34} 。事实上,就版权保护而言,欧盟法院在佩纳尔案中明确指出,使用机器或设备创作出受版权保护的作品是完全可能的 ^(35){ }^{35} 。根据欧盟法院的判例法,当使用LLM的人类创造者在构思、执行和/或编辑作品的过程中做出自由和富有创意的选择时,也可以认定存在主要的人类智力活动。
A similar conclusion can be drawn regarding the patent protection of inventive outcomes generated with the support of an LLM (Engel 2020). In this perspective, as noted by some scholars, it would likely be necessary to adopt a broader interpretation 关于以LLM支持产生的发明成果的专利保护,可以得出类似的结论。从这个角度来看,正如一些学者指出的,可能需要采取更广泛的解释。
of the inventive step requirement, which should be understood in terms of nonobviousness to a person skilled in the art assisted by AI, i.e., an AI-aided human expert (Ramalho 2018; Abbott 2018). 专利发明步骤要求应被理解为对于一位具有人工智能辅助的领域内专家而言并非显而易见(Ramalho 2018;Abbott 2018)。
An opposite conclusion is often reached when the LLM operates in a substantially autonomous manner. For the sake of clarity, it is necessary to explain the meaning of “autonomous” as used in this context (Dornis 2021). Obviously, in the current state of technology, some degree of human intervention-at the very least in the form of prompts-will always be necessary for an LLM to generate any output. However, the mere formulation of a prompt by a human being is likely insufficient to recognize a substantial human contribution to the creative output generated by the LLM. The fundamental legal aspect is that a notable human contribution must be discernible not in the broader creative process, but specifically in the resulting creative outcome. This condition is not met when human intervention merely involves providing a prompt to an LLM or even when minor modifications, legally insignificant, are made to the creative outcome generated by the LLM (e.g., minor editing of an LLM-generated text). By contrast, a level of IP protection might be appropriate for significant modifications made to the text produced by the LLM. 当LLM以相当自主的方式运作时,常常会得出相反的结论。为了清楚起见,有必要解释在这种情况下"自主"的含义(Dornis 2021)。显然,在当前的技术状况下,人类干预-至少以提示的形式-始终是LLM产生任何输出所必需的。然而,人类制定提示本身可能不足以认定LLM所产生的创作输出存在显著的人类贡献。从根本上讲,法律的关键在于,显著的人类贡献不应体现在更广泛的创作过程中,而应具体体现在最终的创作成果中。当人类干预仅限于向LLM提供提示,甚至只是对LLM产生的创作成果进行微小修改(在法律上并不重要)时,这一条件就无法满足(例如,编辑LLM生成的文本)。相比之下,对LLM所产生的文本进行重大修改,可能适当享有知识产权保护。
The conclusion above, which argues against copyright or patent protection for contents generated by LLMs in a substantially autonomous manner, finds confirmation in the positions taken on this issue by, e.g., the US Copyright Office, ^(36){ }^{36} affirmed by the United States District Court for the District of Columbia, ^(37){ }^{37} and the European Patent Office ^(38){ }^{38}. Furthermore, such a conclusion is consistent with the fundamental rationale of intellectual property of promoting and protecting human creativity, as also reflected at the normative level. ^(39){ }^{39} 上述结论,即反对对LLMs以实质自主方式生成的内容给予版权或专利保护,得到了如美国版权局、 ^(36){ }^{36} 经美国哥伦比亚特区地方法院 ^(37){ }^{37} 确认,以及欧洲专利局 ^(38){ }^{38} 在此问题上所持立场的确认。此外,这一结论与知识产权基础理念,即促进并保护人类创造力,这一规范层面的反映一致。 ^(39){ }^{39}
However, some authors have observed (sometimes with critical undertones) that a rationale for protecting LLMs autonomously generated content is the need to protect investments, made by individuals and/or organizations, aimed at bringing creative products to the market (Hilty, Hoffmann, and Scheuerer 2021; Geiger, Frosio, and Bulayenko 2018). 然而,一些作者已经观察到(有时带有批评色彩)保护LLMs自动生成内容的基本理由在于保护个人和/或组织为将创造性产品推向市场而做出的投资(Hilty、Hoffmann 和 Scheuerer 2021;Geiger、Frosio 和 Bulayenko 2018)。
In this case, the further issue of determining to whom such intellectual property rights should be granted emerges. Some national legislations-not coincidentally, following the common law tradition, which exhibits a less pronounced anthropocentric character compared to civil law tradition-acknowledge the possibility of protecting computer-created works (Goold 2021)-i.e. works “generated by computer in circumstances such that there is no human author of the work,” ^(40){ }^{40} 在这种情况下,确定应该授予知识产权的对象这一进一步问题出现了。一些国家的立法-非巧合地,遵循普通法传统,其人类中心主义特点不如民法传统那么明显-承认保护计算机创造作品的可能性(Goold 2021)——即"在没有人类作者的情况下由计算机生成的作品"。
granting the copyright to the person “by whom the arrangements necessary for the creation of the work are undertaken” ^('41){ }^{\prime 41}. The identity of such a person, however, remains somewhat unclear, as this could be, depending on the circumstances, the developer of the LLM, its trainer, or its user, possibly even jointly (Guadamuz 2021). 将版权授予"负责进行创作所需安排"的人 ^('41){ }^{\prime 41} 。然而,此人的身份仍不太明确,因为根据具体情况,这可能是LLM的开发者、训练员或用户,甚至可能是他们的联合体(Guadamuz 2021)。
In civil law systems, while awaiting a potential ad hoc regulatory intervention, a possible solution could involve applying to LLM-generated outputs the same principle that applies to works and inventions created by an employee within the scope of an employment contract. In such cases, in most EU legal systems, copyright or patent rights are vested in the employer. Similarly, in situations where the “employee” is artificial, the intellectual property right could be granted to the user of an LLM during entrepreneurial endeavours (Spedicato 2019). 在民法体系中,在等待潜在的特设监管干预之际,可能的解决方案可能涉及对LLM生成的输出应用与雇员在雇佣合同范围内创造的作品和发明相同的原则。在这种情况下,在大多数欧盟法律体系中,版权或专利权归雇主所有。同样,在"员工"是人工的情况下,知识产权可能授予在创业过程中使用LLM的用户(Spedicato 2019)。
5. Cybersecurity 5. 网络安全
Cybersecurity is a complex and, in the current geopolitical environment marked by armed and non-armed conflicts in many parts of the world, increasingly urgent matter. The EU has tackled this area with a range of instruments and provisions that apply, to varying degrees, to Generative AI models, too. 网络安全是一个复杂且日益紧迫的问题,当前的地缘政治环境下,世界各地出现了武装和非武装冲突。欧盟已经通过各种工具和条款来处理这一领域,这些在某种程度上也适用于生成式人工智能模型。
The Cyber Resilience Act and the AI Act. 网络复原力法案和人工智能法案。
While the GDPR, in Art. 32, does mandate state-of-the-art cybersecurity measures for any personal data processing, this provision does not, at least not easily, apply to industrial data (Purtova 2018) - which is often the target of cyberattacks, however. 尽管 GDPR 第 32 条要求任何个人数据处理都必须采取最新的网络安全措施,但这一规定至少不容易适用于工业数据(Purtova 2018)——工业数据却常常成为网络攻击的目标。
This gap is supposed to be filled by the Cyber-Resilience Act (CRA), recently approved by the EU Parliament. It introduces cybersecurity measures for digital products across Europe. Targeting both hardware and software, the act mandates that Products with Digital Elements (PDEs) adhere to certain cybersecurity standards from design to deployment. A PDE is defined as ‘a software or hardware product and its remote data processing solutions, including software or hardware components being placed on the market separately’ (Article 3(1) CRA). Hence, AI systems will generally constitute PDEs, to the extent that they are placed on the market in the EU. 这一缺口应该由最近获得欧盟议会批准的《网络抗灾力法案》(CRA)来填补。该法案为欧洲的数字产品引入了网络安全措施。该法案针对硬件和软件,要求具有数字元素(PDE)的产品从设计到部署都必须遵守某些网络安全标准。PDE 被定义为"置于市场上的软件或硬件产品及其远程数据处理解决方案,包括单独置于市场上的软件或硬件组件"(CRA 第 3 条第 1 款)。因此,在欧盟市场上销售的 AI 系统通常会构成 PDE。
The CRA establishes a comprehensive framework to bolster cybersecurity measures across the European Union. It introduces a staggered approach to securing PDEs, starting with Article 6, which mandates that all PDEs must meet basic cybersecurity requirements to enter the EU market. These essential requirements are outlined in Annex I of the CRA and adopt a risk-based methodology. They encompass a wide range of measures including conducting cybersecurity risk assessments to eliminate known vulnerabilities, implementing exploration and mitigation systems, ensuring security by default, providing cybersecurity updates automatically, protecting against unauthorized access, ensuring the confidentiality and integrity of data, requiring incident reporting, and maintaining resilience against DDoS attacks. Additionally, it necessitates ongoing responsibilities throughout the product’s lifecycle, such as promptly addressing emerging vulnerabilities, conducting regular security testing, and disseminating security patches swiftly. 欧盟网络安全法案制定了全面的框架,加强跨欧盟的网络安全措施。该法案采取分阶段的方法来确保产品数字元素的安全,从第 6 条开始,要求所有产品数字元素必须满足基本网络安全要求才能进入欧盟市场。这些基本要求在该法案附件 1 中有详细规定,采取基于风险的方法。它们包括进行网络安全风险评估以消除已知漏洞、实施探索和缓解系统、确保默认安全、自动提供网络安全更新、防止未经授权的访问、确保数据的机密性和完整性、要求报告安全事故以及维护抵御 DDoS 攻击的弹性。此外,它还要求在整个产品生命周期中承担持续的责任,如及时解决新出现的漏洞、定期进行安全测试以及迅速发布安全补丁。
For products deemed as ‘important PDEs,’ Article 7 stipulates that they must adhere to more stringent requirements, including undergoing conformity assessments. This classification is determined based on a specified list in Annex III, which includes important components like operating systems, browsers, personal information management systems, cybersecurity-related systems, and password managers. The integration of AI in any of these listed products automatically subjects the AI models to these enhanced cybersecurity protocols, ensuring a robust defense mechanism is in place against potential cyber threats. 对于被认定为"重要个人数据实体"的产品,第 7 条规定它们必须遵守更严格的要求,包括进行符合性评估。此类分类是根据附件三中列出的指定清单确定的,其中包括操作系统、浏览器、个人信息管理系统、网络安全相关系统和密码管理器等重要组件。任何这些列出的产品中集成 AI,都会自动使这些 AI 模型受到这些增强的网络安全协议的约束,确保建立了一个强大的防御机制来应对潜在的网络威胁。
‘Critical PDEs,’ under the scrutiny of Article 8, are required to implement substantial cybersecurity measures. The CRA empowers the Commission to designate what constitutes a critical PDE through delegated acts, referencing an exhaustive list of products that are integral to cybersecurity infrastructure, such as hardware devices with security boxes, smart meter gateways, and smart cards. The use of AI within these specified settings mandates compliance with the substantial cybersecurity framework. This constitutes the highest security level; however, Member States may establish even more stringent obligations for products used in national security or defense. The CRA’s dynamic structure, allowing for the updating of Annexes by the Commission, ensures that the legislative framework can, at least in theory, adapt to the rapidly evolving cyber threat landscape and technological advancements. 在第 8 条的审查下,'关键 PDEs'需要实施重大网络安全措施。CRA 赋予委员会通过授权法案指定什么构成关键 PDE,引用与网络安全基础设施不可或缺的产品清单,如带有安全箱的硬件设备、智能电表网关和智能卡。这些特定环境中 AI 的使用要求遵守重大网络安全框架。这构成了最高的安全级别;但是,成员国可以为用于国家安全或国防的产品制定更加严格的义务。CRA 的动态结构允许委员会更新附件,这确保立法框架至少在理论上可以适应快速发展的网络威胁环境和技术进步。
Although broadly encompasses AI systems under the category of PDEs, the CRA specifically delineates targeted requirements for high-risk AI systems in accordance with the classification set forth in the AIA (Article 8 CRA). To obtain a declaration of conformity, such products must comply with the CRA’s essential requirements as detailed in Annex I. As mentioned, this encompasses a range of measures; data processing should be limited strictly to what is necessary for the product’s intended purpose, emphasizing data minimization. 虽然 CRA 将广义上包括在 PDE 类别中的 AI 系统归类,但它明确针对 AIA(CRA 第 8 条)中的高风险 AI 系统提出了具体要求。为获得符合性声明,此类产品必须遵守 CRA 附件 I 中规定的基本要求。正如提到的,这涉及一系列措施;数据处理应严格限于产品预期目的所需,强调数据最小化。
Hence, the CRA does not explicitly address Generative AI or LLMs. This gap likely stems from the CRA’s alignment with an earlier version of the AIA that did not encompass Generative AI or LLMs. Yet, interpreting the CRA legislator’s intent as if they wanted to specifically target the most potentially hazardous AI systems through Article 8 and Annex I, and to maintain systemic coherence within the EU legal framework - especially in alignment with the AIA - it becomes evident that the CRA may benefit from adjustments to explicitly encompass Generative AI and align it with the requirements in the AI Act. 因此,CRA 并未明确提及生成式人工智能或LLMs。这一差距可能源于 CRA 与早期版本的人工智能法案相一致,该法案并未涵盖生成式人工智能或LLMs。但是,将 CRA 立法者的意图解释为他们希望通过第 8 条和附件 I 具体针对潜在最危险的人工智能系统,并维持与 EU 法律框架的系统协同性,尤其是与人工智能法案保持一致性,这就显而易见,CRA 可能需要进行调整,以明确涵盖生成式人工智能并与人工智能法案的要求保持一致。
Adapting the CRA to explicitly include Generative AI should be relatively straightforward. The AIA has already laid down a risk-tiered classification and specific regulations for Generative AI (i.e., GPAI). This pre-existing framework offers a clear pathway for incorporating Generative AI into the CRA, potentially through the European Commission’s delegated acts. Such integration would enhance the CRA’s effectiveness in governing AI technologies and align it more closely with the evolving landscape of AI and its potential risks, thereby reinforcing the EU’s commitment to a comprehensive and harmonized legal framework for AI regulation. 将 CRA 显式包括生成式人工智能应该是相对直接的。AIA 已经制定了针对生成式人工智能(即 GPAI)的风险分层分类和特定法规。这个现有的框架为将生成式人工智能整合到 CRA 中提供了明确的途径,可能通过欧盟委员会的授权法案实现。这种整合将增强 CRA 在管理人工智能技术方面的有效性,使其更好地与人工智能的演变及其潜在风险相匹配,从而强化欧盟在人工智能法律框架建设方面的承诺。
Importantly, the AIA currently only mandates cybersecurity measures for high-risk systems (Art. 15) and for GPAI with systemic risk (Art. 55). The Joint Research Centre has issued helpful guidance for interpreting and implementing cybersecurity in the context of AI systems (Joint Research Centre (European Commission) et al. 2023). However, in our view, the regulatory framework in the AIA fails to mirror the 重要的是,人工智能法案目前只要求对高风险系统(第 15 条)以及具有系统性风险的 GPAI(第 55 条)采取网络安全措施。联合研究中心已经发布了关于在人工智能系统背景下解释和实施网络安全的有帮助指南(联合研究中心(欧洲委员会)等,2023 年)。然而,我们认为,人工智能法案中的监管框架未能反映
fundamental importance of cybersecurity in our age. Generative AI models, in particular, are bound to become new building blocks for literally thousands of derived apps and products, functioning much like a new operating system in some respects. Hence, a backdoor created via insufficient cybersecurity will potentially enable attackers to exploit vulnerabilities in a range of derivative products. Therefore, economic efficiency (patching vulnerabilities once upstream instead of manifold times downstream) and prudence argues for stringent and obligatory cybersecurity measures for all GPAI, not only the largest ones (“systemic risk”), such as GPT-4 or Gemini. Strategic rivals, both nation-states and non-state actors, will be actively trying to exploit any vulnerabilities in advanced AI systems, particularly if the systems are widely used and integrated. Not addressing these threats for all GPAI seems naïve at best, and irresponsible in the current and future geopolitical climate. 网络安全在我们这个时代的根本重要性。生成式人工智能模型尤其将成为数以千计衍生应用和产品的新建块,在某些方面类似于新的操作系统。因此,通过不充分的网络安全而创造的后门有可能使攻击者利用一系列衍生产品中的漏洞。因此,经济效率(在上游一次修补漏洞,而不是在下游多次修补)和谨慎态度都表明,应该对所有通用人工智能模型采取严格的强制性网络安全措施,而不仅仅是最大的那些("系统性风险"),如 GPT-4 或 Gemini。战略对手,包括国家和非国家行为体,将积极试图利用先进人工智能系统中的任何漏洞,特别是如果这些系统被广泛使用和集成。在当前和未来的地缘政治气候下,不解决这些针对所有通用人工智能模型的威胁至多是天真,而且是不负责任的。
Hence, in our view, general-purpose AI systems should be included under the categories of Annex III CRA. This would ensure that they fulfill most stringent cybersecurity requirements, including conformity assessments. In the current geopolitical climate, and with the importance of foundation models starting to rival those of operating systems (which are included in Annex III CRA already), this seems like a sensible update. In addition, a link between Article 55 AI Act and the CRA should be included for the cybersecurity requirements concerning systemic risk GPAIS, mirroring the integration of cybersecurity obligations for high-risk AI systems into the AI Act (Article 12 CRA). 因此,在我们看来,通用人工智能系统应该包括在附录 III CRA 的类别中。这将确保它们满足最严格的网络安全要求,包括符合性评估。在当前的地缘政治形势下,基础模型的重要性开始与操作系统(已纳入附录 III CRA)相媲美,这似乎是一个明智的更新。此外,应在 AI 法案第 55 条和 CRA 之间建立联系,以满足涉及系统性风险 GPAIS 的网络安全要求,这与将高风险 AI 系统的网络安全义务纳入 AI 法案(第 12 条 CRA)的做法一致。
In short, generative AI legislation needs a critical cybersecurity patch. Below, we show that several specific cybersecurity concerns remain unaddressed by the current regulatory landscape, including the AIA, CRA, and broader EU legislation. 总之,生成式 AI 立法需要一个关键的网络安全补丁。下面我们展示了当前监管环境,包括 AIA、CRA 和更广泛的欧盟立法,仍未解决几个具体的网络安全问题。
2) Adversarial attacks. 2) 对抗性攻击。
The complexity and high dimensionality of Generative AI models make them particularly susceptible to adversarial attacks, i.e., attempts to deceive the model and induce incorrect outputs - such as misclassification - by feeding carefully crafted, adversarial data. Cybersecurity is a national competence (Cybersecurity Act, Recital 5) but joint efforts to address it should still be pursued at the EU level, going beyond the general principle of AI robustness. Importantly, the AIA mandates high-risk systems to implement technical measures to prevent or control attacks trying to manipulate the training dataset (‘data poisoning’), inputs designed to cause the model to make a mistake (‘adversarial examples’), or model flaws (Article 15, AIA). The EU’s Joint Research Centre has recently unveiled a comprehensive guidance document on cybersecurity measures in the context of AI and LLMs (Joint Research Centre (European Commission) et al. 2023). The European Parliament’s draft legislation adds another layer. Article 28b asks GPAI providers to build in “appropriate cybersecurity and safety” safeguards, echoing the two-tiered approach tentatively agreed upon in the trilogue (Hacker 2023c). However, effectively countering adversarial attacks requires careful prioritization and targeting within any AI system, not just high-risk ones. 生成式 AI 模型的复杂性和高维度使它们特别容易受到对抗性攻击,即通过输入精心设计的对抗性数据来欺骗模型并导致错误输出(如错误分类)。网络安全是国家竞争力的一个因素(网络安全法案,前言 5)。但在欧盟层面上还应该采取联合努力来解决这一问题,而不仅仅是遵循 AI 鲁棒性的一般原则。重要的是,人工智能法案要求高风险系统实施技术措施来防止或控制试图操纵训练数据集("数据中毒")、设计用于导致模型出错("对抗性示例")或模型缺陷的输入(第 15 条,人工智能法案)。欧盟联合研究中心最近发布了一份全面的指导文件,介绍了在 AI 及LLMs背景下的网络安全措施(欧盟委员会联合研究中心等,2023)。欧洲议会的法案草案增加了另一层内容。第 28b 条要求 GPAI 提供商建立"适当的网络安全和安全"保障措施,呼应三方会谈中临时达成的双层方法(Hacker 2023c)。然而,要有效应对对抗性攻击,需要在任何 AI 系统中仔细确定优先顺序和目标,而不仅仅是高风险系统。
The AIA’s risk levels, based on the likelihood of an AI system compromising fundamental legal values, are not a reliable predictor of vulnerability to adversarial attacks. Some AI deemed as high-risk by the AIA, e.g., for vocational training, may not have those technical traits that trigger adversarial attacks, and vice versa. Therefore, the AIA, and by extension the CRA which relies on its risk classification, should 根据 AI 系统可能危及基本法律价值的可能性,AIA 的风险级别并不能可靠地预测对抗性攻击的易受性。一些被 AIA 评为高风险的 AI 系统(例如职业培训系统)可能没有引发对抗性攻击的那些技术特征,反之亦然。因此,AIA 及其风险评估依赖的 CRA 应该
provide, through supplementary implementation acts, technical safeguards that are proportionate to the attack-triggers of a specific LLM, independently of the AIA risk levels. Attack-triggers include model complexity, overfitting, linear behaviour, gradient-based optimization, and exposure to universal adversarial triggers like inputagnostic sequences of tokens (Wallace et al. 2021). Finally, novel methods to counter adversarial attacks might involve limiting LLM access to trusted users or institutions and restricting the quantity or nature of user queries (Goldstein et al. 2023). 通过补充实施法案,提供与特定LLM的攻击触发器成比例的技术保护措施,而不考虑 AIA 风险等级。攻击触发器包括模型复杂性、过拟合、线性行为、基于梯度的优化以及暴露于通用对抗性触发器(如输入无关的标记序列)(Wallace 等,2021)。最后,抵御对抗性攻击的新方法可能包括限制LLM的访问权限,只允许经过信任的用户或机构访问,并限制用户查询的数量或性质(Goldstein 等,2023)。
3) Misinformation. 3)虚假信息。
LLMs can disseminate misinformation, easily, widely, and at a low cost by attributing a high probability to false or misleading claims. This is mainly due to web-scraped training data containing false or non-factual information (e.g., fictional), which lacks truth value when taken out of context. Other times, an opinion reflecting the majority’s viewpoint is misrepresented as truth, despite not being verified facts. Misinformation may facilitate fraud, scams, targeted and non-targeted manipulation (e.g., during elections) (AlgorithmWatch AIForensics 2023), and cyber-attacks (Weidinger et al. 2021; Ranade et al. 2021). <code1001>可以轻易、广泛且低成本地传播错误信息,并赋予虚假或误导性声明以高概率。这主要是由于网络抓取的训练数据包含虚构的或非事实性信息,在脱离上下文时缺乏真实性。有时,反映多数观点的意见被误将为事实,尽管并未经过验证。错误信息可能助长欺骗、诈骗、有针对性和无针对性的操纵(如在选举期间)(AlgorithmWatch AIForensics 2023),以及网络攻击(Weidinger et al. 2021; Ranade et al. 2021)。</code1001>
A concerning aspect of natural language processing (NLP) in general is the phenomenon of “hallucinations”. It refers to the generation of seemingly plausible text that diverges from the input data or contradicts factual information (Ye et al. 2023). These hallucinations arise due to the models’ tendency to extrapolate beyond their training data and synthesize information that aligns with their internal patterns, even if it is not supported by evidence. As a result, while NLP models may produce texts that demonstrate coherence, linguistic fluidity, and a semblance of authenticity, their outputs often lack fidelity to the original input and/or are misaligned with empirical truth and verifiable facts (Ji et al. 2023). This can lead to a situation where uncritical reliance on LLMs results in erroneous decisions and a cascade of negative consequences (Zhang et al. 2023), including the spread of misinformation, especially if false outputs are shared without critical evaluation. 自然语言处理(NLP)最令人担忧的一个方面是"幻觉"现象。它指的是生成看似合理但偏离输入数据或与事实信息相矛盾的文本(Ye et al. 2023)。这些幻觉产生的原因是模型倾向于推广到超出其训练数据的范围,合成与其内部模式一致但缺乏证据支持的信息。因此,尽管 NLP 模型可能产生连贯、语言流畅且较为真实的文本,但其输出往往缺乏对原始输入的真实性,与实证真理和可验证事实不符(Ji et al. 2023)。这可能导致盲目依赖LLMs结果而产生错误决策,并造成连锁负面影响(Zhang et al. 2023),包括误信息的传播,特别是在没有批判性评估的情况下分享虚假输出。
There are different kinds of LLMs’ hallucinations (Ye et al. 2023) but we cannot discuss them here in detail. In the recent generation of LLMs-e.g., GPT4 and Bardthe ‘Question and Answer’ kind is particularly frequent. These hallucinations manifest due to the models’ tendency to provide answers even when presented with incomplete or irrelevant information (Ye et al. 2023; Adlakha et al. 2023). A recent study found that hallucinations are particularly common when using LLMs on a wide range of legal tasks (Dahl et al. 2024). 有不同类型的LLMs幻觉(Ye 等,2023),但我们在此不能详细讨论。在最近一代LLMs中,例如 GPT4 和 Bard,'问答'类型尤为常见。这些幻觉的出现源于这些模型即使在获取不完整或无关信息的情况下也倾向于提供答复(Ye 等,2023;Adlakha 等,2023)。一项最新研究发现,在广泛的法律任务中使用LLMs时,幻觉特别普遍(Dahl 等,2024)。
EU legislation lacks specific regulations for misinformation created by Generative AI. As LLMs become increasingly integrated into online platforms, expanding the Digital Services Act (DSA) to include them, and mandating online platforms to prevent misinformation, seems the most feasible approach. Also, the project to strengthen the EU Code of Practice on Disinformation (2022) can contribute, though its voluntary adherence reduces its overall effectiveness. Tackling LLM-generated misinformation requires updating both the AIA and the DSA. The DSA contains a range of provisions that can be fruitfully applied to LLMs: e.g., Article 22, which introduces “trusted flaggers” to report illegal content to providers and document their notification (Hacker, Engel and Mauer 2023). 欧盟法律缺乏针对生成式人工智能所产生的虚假信息的具体规定。随着LLMs越来越广泛地融入在线平台,扩大《数字服务法》(DSA)的适用范围并要求在线平台防止虚假信息,似乎是最可行的方法。此外,加强《反虚假信息欧盟行为守则》(2022 年)的项目也可以作出贡献,尽管其自愿遵守性降低了其整体有效性。解决LLM生成的虚假信息需要同时更新《人工智能法案》(AIA)和《数字服务法》。《数字服务法》包含许多可以成功应用于LLMs的规定,例如第 22 条引入"受信任的标志器"向提供商报告非法内容并记录他们的通知(Hacker、Engel 和 Mauer,2023 年)。
However, it is essential to broaden the DSA’s scope and the content subject to platform removal duty, which currently covers only illegal content, as LLM-generated misinformation may be completely lawful (Berz, Engel, and Hacker 2023). Being the most technology-focused regulation, the AIA, or its implementing acts, should tackle design and development guidelines to prevent LLMs from spreading misinformation. Normative adjustments should not focus only on the limitation of dataset size but also explore innovative strategies that accommodate LLMs’ data hunger. Some measures might be the same (or similar to those) mentioned for adversarial attacks - restricting LLM usage to trusted users with limited interactions to prevent online misinformation proliferation ^(42){ }^{42} - while others may include innovative ideas like fingerprinting LLMgenerated texts, training models on traceable radioactive data, or enhancing fact sensitivity using reinforcement learning techniques (Goldstein et al. 2023). 然而,关键是要扩大 DSA 的范围和受平台移除义务约束的内容,目前它仅涵盖非法内容,因为LLM生成的虚假信息可能完全合法(Berz、Engel 和 Hacker 2023)。作为最注重技术的法规,AIA 或其执行法令,应当解决设计和开发指南,以防止LLMs传播虚假信息。规范性调整不应仅关注数据集大小的限制,也应探索创新策略来满足LLMs的数据需求。一些措施可能与针对对抗性攻击所提及的相同(或类似),如限制LLM的使用仅限可信用户,以防止在线虚假信息的传播 ^(42){ }^{42} ;而其他措施可能包括创新思路,如对 LLM 生成的文本进行指纹识别、在可追溯的放射性数据上训练模型,或使用强化学习技术提高事实敏感性(Goldstein 等人,2023)。
Specific solutions to address hallucinations in LLMs are crucial for mitigating the spread of misinformation and should be employed in policy-related applications. Numerous approaches have been proposed in the literature to address this challenge (Tonmoy et al. 2024; Ye et al. 2023). Some of these solutions are broad strategies that focus on optimizing dataset construction, such as implementing a self-curation phase within the instruction construction process. During this phase, the LLM identifies and selects high-quality demonstration examples (candidate pairs of prompts and responses) to fine-tune the underlying model to better follow instructions ( Li et al. 2023). Other strategies address the alignment of LLMs with specific downstream applications-which can benefit from supervised fine-tuning (Chung et al. 2022)-as hallucinations often arise from discrepancies between the model’s capabilities and the application’s requirements (Ye et al. 2023). 专门解决LLMs中幻觉的方法对于减轻虚假信息的传播至关重要,应该在与政策相关的应用中加以采用。文献中已提出了多种方法来解决这一挑战(Tonmoy 等,2024;叶等,2023)。其中一些解决方案是广泛的策略,着重优化数据集构建,例如在指令构建过程中实施自主审查阶段。在这个阶段中,LLM会识别并选择高质量的演示例子(提示与响应的候选对),以对底层模型进行细致调整,使其更好地遵循指令(李等,2023)。其他策略则是针对将LLMs与特定下游应用进行对齐-这可以从监督式微调中获益(Chung 等,2022)-因为幻觉通常源于模型能力与应用需求之间的差距(叶等,2023)。
Other approaches are narrower and focused on specific techniques, such as prompt engineering, to optimize the output generated by LLMs. This includes incorporating external authoritative knowledge bases (retrieval-augmented generation) (Kang, Ni, and Yao 2023) or introducing innovative coding strategies or faithfulness-based loss functions (Tonmoy et al. 2024). ^(43){ }^{43} 其他方法更加狭窄,专注于特定的技术,如提示工程,以优化LLMs生成的输出。这包括结合外部权威知识库(检索增强生成)(Kang, Ni 和 Yao 2023)或引入创新的编码策略或基于忠诚度的损失函数(Tonmoy 等人,2024 年)。 ^(43){ }^{43}
Another technical solution to mitigate hallucinations in LLMs worth considering is the Multiagent Debate approach, where multiple LLMs engage in an iterative process of proposing, debating, and refining their responses to a given query (Du et al. 2023). The aim is to achieve a consensus answer that is not only more accurate and factually correct but also preserves the richness of multiple perspectives (Ye et al. 2023). This approach draws inspiration from judicial techniques, particularly cross-examination, to foster a more rigorous examination of the LLMs’ responses (Cohen et al. 2023). 缓解LLMs中幻觉的另一种可行的技术解决方案是多智能体辩论方法,其中多个LLMs参与一个迭代过程,提出、辩论和完善对特定查询的响应(Du et al. 2023)。这种方法的目标是达成一个共识答案,不仅更加准确和事实正确,还能保留多种观点的丰富性(Ye et al. 2023)。这种方法借鉴了司法技术,特别是交叉询问,以促进对LLMs响应进行更加严格的审查(Cohen et al. 2023)。
4) Ways forward: NIS2. 4) 前进的方式:NIS2.
The provisional agreement on the EU’s updated Network and Information Systems Directive (NIS2 Directive) signifies a major update to the bloc’s cybersecurity framework, set to supersede the initial Network and Information Systems Directive. 欧盟更新网络和信息系统指令(NIS2 指令)的临时协议标志着该集团网络安全框架的重大更新,将取代最初的网络和信息系统指令。
With its formal adoption expected soon, NIS2 extends coverage to more sectors and entities (Annexes I and II). 随着正式通过即将到来,NIS2 将扩大保护范围,涵盖更多行业和实体(附件 I 和 II)。
NIS2 mandates that designated essential and important entities adopt measures across technical, operational, and organizational domains to address risks to their network and information systems (Article 3 NIS2). These precautions aim to either prevent or mitigate the effects of cyber incidents on users, maintaining security proportionate to assessed risks (Article 21 (1) NIS2). It also introduces requirements for enhancing supply chain security, focusing on the relationship with direct suppliers and service providers, to shield against cyber incidents. 网络安全指令 2 号(NIS2)要求指定的关键和重要实体在技术、运营和组织层面采取措施,以应对其网络和信息系统面临的风险(NIS2 第 3 条)。这些预防措施旨在预防或减轻网络安全事故对用户的影响,并根据评估的风险维护适当的安全水平(NIS2 第 21 条第 1 款)。指令还引入了增强供应链安全的要求,重点关注与直接供应商和服务提供商的关系,以防范网络安全事故。
The NIS2 Directive significantly expands cybersecurity measures beyond those of its predecessor, the NIS Directive, covering additional sectors and entities. This makes it highly relevant for those in Generative AI, including the digital infrastructure and services sectors, which would naturally involve companies working with (Generative) AI. Additionally, NIS2 mandates quick incident reporting, requiring entities to inform authorities within 24 hours of certain cybersecurity incidents (Article 23 point 4(a) NIS2). This is crucial for the AI sector, where only a rapid response to security breaches can mitigate the consequences, such as exploiting AI vulnerabilities or malicious AI activities. NIS2 指令在其前身 NIS 指令的基础上显著扩大了网络安全措施,覆盖了更多领域和实体。这对从事生成式 AI 的人来说非常重要,包括数字基础设施和服务行业,这些自然涉及到利用(生成式)AI 的公司。此外,NIS2 还要求快速报告事故,要求实体在发生某些网络安全事件后 24 小时内通知有关部门(NIS2 第 23 条第 4(a)款)。这对 AI 行业至关重要,因为只有快速应对安全漏洞,才能减轻后果,如利用 AI 漏洞或恶意 AI 活动。
In this context, the interplay between the NIS2 Directive and the CRA is crucial, particularly in how NIS2 can enhance or compensate for the CRA’s limitations. For instance, the CRA proposal focuses on ensuring high cybersecurity standards for products (with digital elements, i.e., PDEs), yet it does not fully extend these standards to services, except for “remote data processing solutions” (Article 3 CRA) (Eckhardt and Kotovskaia 2023). This gap could leave various generative AI models without adequate cybersecurity coverage, especially when these technologies are integrated into products or services beyond remote data processing. This includes scenarios where Generative AI and LLMs are part of more complex systems or services that offer decision-making, content generation, or predictive analytics. The NIS2 Directive takes a broader approach by targeting essential and significant entities, including cloud computing service providers. This implies that if generative AI and LLMs are offered through cloud services that meet the criteria for being considered essential or significant (e.g., due to their size or the critical nature of the services they provide), they would fall under the cybersecurity and incident notification requirements of NIS2. 在这种背景下,NIS2 指令和 CRA 之间的相互作用至关重要,特别是 NIS2 如何增强或补偿 CRA 的局限性。例如,CRA 提议专注于确保产品(包含数字元素,即 PDEs)的高网络安全标准,但并未完全扩展这些标准至服务,除了"远程数据处理解决方案"(CRA 第 3 条)(Eckhardt 和 Kotovskaia 2023)。这一缺口可能会使各种生成性人工智能模型缺乏适当的网络安全覆盖,特别是当这些技术集成到远程数据处理以外的产品或服务中时。这包括生成性人工智能和LLMs是更复杂系统或服务的一部分,提供决策制定、内容生成或预测分析的场景。NIS2 指令采取更广泛的方法,针对的是关键和重要实体,包括云计算服务提供商。这意味着,如果生成性人工智能和LLMs通过满足被视为关键或重要(例如,由于其规模或所提供服务的关键性质)的云服务提供,它们将受到 NIS2 的网络安全和事件通知要求的约束。
6. Conclusion 6. 结论
State-of-the-art Generative AI models in general, and LLMs in particular, exhibit high performance across a broad spectrum of tasks, but their unpredictable outputs raise concerns about the lawfulness and accuracy of the generated content. Overall, EU does not seem adequately prepared to cope with these novelties. Policy proposals include updating current and forthcoming regulations, especially those encompassing AI more broadly, as well as the enactment of specific regulations for Generative AI. In this article, we have offered an overall analysis of some of the most pressing challenges and some suggestions about how to address them. The broader point about how best to proceed in the development of a very complex and yet entirely coherent EU architecture of “digital laws” remains to be addressed. Ultimately, technological 当今最先进的生成式人工智能模型,尤其是LLMs,在广泛的任务领域表现出高度性能,但其输出的不可预测性引发了对所生成内容的合法性和准确性的担忧。总的来说,欧盟似乎并未充分准备好应对这些新事物。政策建议包括更新当前和即将出台的法规,特别是那些涉及人工智能的更广泛的法规,以及制定针对生成式人工智能的特定法规。在本文中,我们提出了一些最紧迫挑战的整体分析,并就如何应对提出了一些建议。关于如何最佳地推进建立一个非常复杂但又完全连贯的欧盟"数字法律"架构的更广泛问题,仍需要继续探讨。最终,技术
solutions may help if we start asking not only what the law can do for the development of socially preferable AI, but also what AI can do to improve the relevance, coherence and timeliness of the law. But this is a topic beyond the scope of this article. 解决方案可能有帮助,如果我们不仅问法律能为发展社会更可取的人工智能做些什么,也问人工智能能为提高法律的相关性、一致性和及时性做些什么。但这是超出本文范围的话题。
7. References 7. 参考文献
Abbott, Ryan. 2018. ‘Everything Is Obvious’. UCLA Law Review. https://www.uclalawreview.org/everything-is-obvious/. 艾伯特、瑞安。2018 年。'一切都是显而易见的'。加州大学洛杉矶分校法学评论。https://www.uclalawreview.org/everything-is-obvious/.
Adlakha, Vaibhav, Parishad Behnam Ghader, Xing Han Lu, Nicholas Meade, and Siva Reddy. 2023. ‘Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering’. arXiv. https://doi.org/10.48550/arXiv.2307.16877. 阿德拉卡、瓦伊布哈夫、帕里沙德·贝纳姆·加德、辛汉·卢、尼古拉斯·米德和希瓦·雷迪。2023。《评估问答任务的指令执行模型的正确性和忠实度》。 arXiv。 https://doi.org/10.48550/arXiv.2307.16877。
AlgorithmWatch AIForensics. 2023. ‘An Analysis of Microsoft’s Bing Chat Generative AI and Elections: Are Chatbots a Reliable Source of Information for Voters?’ 算法观察 AI 取证。2023 年。'微软必应聊天生成 AI 与选举的分析:聊天机器人是否可靠的选民信息来源?'
Article 29 Data Protection Working Party. 2018. “Guidelines on Transparency under Regulation 2016/679, WP260 rev.01.” 2018 年《数据保护工作组条例(EU) 2016/679》WP260 rev.01 号《关于透明度的指南》
Bederman, David J. 2010. ‘The Souls of International Organizations: Legal Personality and the Lighthouse at Cape Spartel’. In International Legal Personality. Routledge. 伯德曼,大卫·J.2010 年。'国际组织的灵魂:法律人格和开普斯帕特尔灯塔'。载于《国际法人格》。伦敦:劳特利奇。
Bender, Emily M., Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. 'On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 2’. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610-23. Virtual Event Canada: ACM. https://doi.org/10.1145/3442188.3445922. 本德,艾米丽·M.,蒂姆尼特·盖布鲁,安吉丽娜·麦克米伦-梅杰,玛格丽特·米切尔。2021 年。《论随机口述鹦鹉的危险性:语言模型会太大吗?2》。载于《2021 年 ACM 公平性、责任和透明度会议论文集》,610-23 页。加拿大虚拟事件:ACM。https://doi.org/10.1145/3442188.3445922。
Berz, Amelie, Andreas Engel, and Philipp Hacker. 2023. “Generative KI, Datenschutz, Hassrede und Desinformation-Zur Regulierung von KI-Meinungen.” Zeitschrift für Urheber- und Medienrecht: 586. 贝尔兹、艾米丽、安德里亚斯·恩格尔和菲利普·哈克.2023."生成式人工智能、数据隐私、仇恨言论和虚假信息——关于人工智能观点的监管."知识产权与媒体法杂志:586.
Bi, B., Shokouhi, M., Kosinski, M., & Graepel, T. (2013). Inferring the demographics of search users: Social data meets search queries. Proceedings of the 22nd international conference on World Wide Web. 毕, 博科希, 科辛斯基, 格雷佩尔. (2013). 推断搜索用户的人口统计学特征:社交数据与搜索查询. 第 22 届万维网国际会议论文集.
Biderman, Stella, USVSN Sai Prashanth, Lintang Sutawika, Hailey Schoelkopf, Quentin Anthony, Shivanshu Purohit, and Edward Raf. 2023. “Emergent and predictable memorization in large language models.” arXiv preprint arXiv:2304.11158. 碧德曼、薩伊·普拉桑斯、琳塔·苏塔维卡、海利·舒柯普夫、奎丁·安东尼、希瓦希·普罗希特和爱德华·拉夫。2023 年。"大型语言模型中的自然和可预测的记忆化"。arXiv 预印本 arXiv:2304.11158。
Bommasani, Rishi, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, et al. 2022. ‘On the Opportunities and Risks of Foundation Models’. arXiv. https://doi.org/10.48550/arXiv.2108.07258. 波马萨尼、利什、德鲁·A·哈德森、埃赫桑·阿德利、拉斯·奥尔特曼、西蒙·阿罗拉、悉尼·冯·阿尔克斯、迈克尔·S·伯恩斯坦等. 2022. '论基础模型的机遇和风险'. arXiv. https://doi.org/10.48550/arXiv.2108.07258.
Bonatti, Piero A., and Sabrina Kirrane. 2019. ‘Big Data and Analytics in the Age of the GDPR’. 2019 IEEE International Congress on Big Data (BigDataCongress), July, 716. https://doi.org/10.1109/BigDataCongress.2019.00015. 博纳蒂,皮耶罗 A.和萨布丽娜·基拉内。 2019 年。 "GDPR 时代的大数据和分析"。 2019 年 IEEE 大数据大会(BigDataCongress),7 月,71-6。 https://doi.org/10.1109/BigDataCongress.2019.00015.
Borkar, Jaydeep. 2023. What Can We Learn from Data Leakage and Unlearning for Law? 波尔卡,杰德普。2023 年。我们从数据泄露和反学习中能学到什么?
Brown, Hannah, Katherine Lee, Fatemehsadat Mireshghallah, Reza Shokri, and Florian Tramèr. 2022. ‘What Does It Mean for a Language Model to Preserve Privacy?’ In 2022 ACM Conference on Fairness, Accountability, and Transparency, 2280-92. FAccT '22. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3531146.3534642. 布朗、汉娜、凯瑟琳·李、法蒂玛赛达特·米雷什加拉、雷扎·舒克里和佛罗伦斯·特拉梅尔。2022。《一种语言模型如何保护隐私》,于 2022 年 3 月吉隆坡举办的 2022 ACM 公平性、责任和透明度会议,第 2280-92 页。纽约:美国计算机协会。https://doi.org/10.1145/3531146.3534642。
Brown, Tom B., Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al. 2020. ‘Language Models Are Few-Shot Learners’. arXiv. https://doi.org/10.48550/arXiv.2005.14165. 汤姆·布朗、本杰明·曼恩、尼克·赖德、梅兰妮·苏比、贾里德·卡普兰、普拉夫拉·达里瓦尔、阿文德·尼拉坎坦等人。2020。《语言模型是少样本学习者》。arXiv。https://doi.org/10.48550/arXiv.2005.14165.
Carlini, Nicholas, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, et al. 2021. ‘Extracting Training Data from Large Language Models’. arXiv. https://doi.org/10.48550/arXiv.2012.07805. 卡尔尼尼,尼古拉斯,弗洛里安·特拉梅,埃里克·沃莱斯,马修·亚基尔斯基,阿里尔·赫伯特-沃斯,凯瑟琳·李,亚当·罗伯茨等人。 2021。 《从大型语言模型中提取训练数据》。 arXiv。 https://doi.org/10.48550/arXiv.2012.07805。
Carlini, Nicolas, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramèr, Borja Balle, Daphne Ippolito, and Eric Wallace. 2023. ‘Extracting Training Data from Diffusion Models’. In , 5253-70. https://www.usenix.org/conference/usenixsecurity23/presentation/carlini. 卡尔利尼、尼古拉斯、杰米·海耶斯、米拉德·纳斯尔、马修·亚基尔斯基、维卡什·赛维格、弗洛里安·特勒梅尔、博尔哈·巴勒、达芙妮·伊波利托和埃里克·沃拉斯. 2023. '从扩散模型中提取训练数据'. 在, 5253-70. https://www.usenix.org/conference/usenixsecurity23/presentation/carlini.
Chaturvedi, R., & Chaturvedi, S. (2024). It’s All in the Name: A Character-Based Approach to Infer Religion. Political Analysis, 32(1), 34-49. 查特尔维迪,R.和查特尔维迪,S. (2024)。名称的全部内容:基于角色的宗教推论方法。《政治分析》, 32(1), 34-49.
Chung, Hyung Won, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, et al. 2022. 'Scaling Instruction-Finetuned Language Models’. arXiv. https://doi.org/10.48550/arXiv.2210.11416. 钟证浩、刘浩、肖恩·隆普雷、巴雷特·索普、易威廉、威廉·费达斯、李云轩等. 2022. '扩展指令精调语言模型'. arXiv. https://doi.org/10.48550/arXiv.2210.11416.
Clifford Chance. 2023. 'The EU’s AI Act: What Do We Know about the Critical Political 克利福德-钱斯.2023.《欧盟人工智能法案:我们对关键政治谈判知道什么》
Cohen, Roi, May Hamri, Mor Geva, and Amir Globerson. 2023. ‘LM vs LM: Detecting Factual Errors via Cross Examination’. arXiv. https://doi.org/10.48550/arXiv.2305.13281. 科恩, 罗伊, 梅·哈姆里, 摩尔·格瓦, 和阿米尔·格洛伯森。 2023。 'LM vs LM: 通过交叉审查检测事实性错误'。 arXiv。 https://doi.org/10.48550/arXiv.2305.13281。
Dahl, Matthew, Varun Magesh, Mirac Suzgun, e Daniel E. Ho. «Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models». arXiv, 2 gennaio 2024. https://doi.org/10.48550/arXiv.2401.01301. 达尔、马修、马格什、苏志根、E.何丹尼尔。《大法律虚构:大语言模型中法律幻觉概况》。《arXiv》,2024 年 1 月 2 日。https://doi.org/10.48550/arXiv.2401.01301。
Dheu, Orian, De Bruyne Jan, Ducuing, Charlotte, “The European Commission’s Approach to Extra-Contractual Liability and AI - A First Analysis and Evaluation of the Two Proposals”, CiTiP Working Paper Series 2022. https://papers.ssrn.com/sol3/papers.cfm?abstract id=4239792 欧盟委员会对非合同责任和人工智能的方法-首次分析和评估两个提案
De Cristofaro, Emiliano. 2020. ‘An Overview of Privacy in Machine Learning’. arXiv. https://doi.org/10.48550/arXiv.2005.08679. 德·克里斯托法罗,埃米利亚诺。 2020 年。《机器学习隐私概述》。arXiv。https://doi.org/10.48550/arXiv.2005.08679。
de la Durantaye, Katharina. 2023. “‘Garbage In, Garbage Out’-Die Regulierung generativer KI durch Urheberrecht.” ZUM 10 (2023): 645-660. 德拉·德朗塔耶,卡塔琳娜。2023。"'垃圾输入,垃圾输出'—通过版权对生成式人工智能进行监管"。ZUM 10 (2023): 645-660。
Dermawan, Artha. n.d. ‘Text and Data Mining Exceptions in the Development of Generative AI Models: What the EU Member States Could Learn from the Japanese “Nonenjoyment” Purposes?’ The Journal of World Intellectual Property n/a (n/a). Accessed 13 August 2023. https://doi.org/10.1111/jwip. 12285. 德尔曼万,亚尔塔。未注明日期。'欧盟成员国从日本"无法享受的"目的中可以学到什么:生成式 AI 模型发展中的文本和数据挖掘例外?'《世界知识产权杂志》未注明卷期。2023 年 8 月 13 日访问。https://doi.org/10.1111/jwip.12285。
Donnelly, Mary, and Maeve McDonagh. 2019. ‘Health Research, Consent, and the GDPR Exemption’. European Journal of Health Law 26 (2): 97-119. https://doi.org/10.1163/15718093-12262427. 多内利、玛丽和麦夫·麦唐纳。2019。'健康研究、同意和 GDPR 豁免'。《欧洲卫生法律杂志》26 (2): 97-119。https://doi.org/10.1163/15718093-12262427。
Dornis, Tim W. 2021. ‘Of “Authorless Works” and "Inventions without Inventor"The Muddy Waters of “AI Autonomy” in Intellectual Property Doctrine’. SSRN Scholarly Paper. Rochester, NY. https://doi.org/10.2139/ssrn. 3776236. 多尼斯,蒂姆·W。2021 年。"无作者作品"和"无创造者发明"——知识产权学说中"AI 自主性"的浑水。SSRN 学术论文。罗切斯特,纽约。https://doi.org/10.2139/ssrn. 3776236。
Du, Yilun, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, and Igor Mordatch. 2023. ‘Improving Factuality and Reasoning in Language Models through Multiagent Debate’. arXiv.Org. 23 May 2023. https://arxiv.org/abs/2305.14325v1. 杜一伦,李双,安东尼奥·托拉尔巴,约书亚·B·特南鲍姆,与伊戈尔·莫尔达奇。 2023。"通过多代理辩论提高语言模型的可信度和推理能力"。 arXiv.Org. 2023 年 5 月 23 日。 https://arxiv.org/abs/2305.14325v1。
Durantaye, Katharina de la. 2023. “Garbage In, Garbage Out” - Die Regulierung Generativer KI Durch Urheberrecht’. SSRN Scholarly Paper. Rochester, NY. https://papers.ssrn.com/abstract=4571908=4571908. 杜兰塔耶,卡塔琳娜·德拉。2023 年。"垃圾输入,垃圾输出" - 通过版权法对生成式人工智能进行规管。SSRN 学术论文。罗切斯特,纽约。https://papers.ssrn.com/abstract =4571908=4571908 。
Eckhardt, Philipp, Kotovskaia, Aanastaia. 2023. ^(@){ }^{\circ} The EU’s cybersecurity framework: the interplay between the Cyber Resilience Act and the NIS 2 Directive’. Int. Cybersecur. Law Rev. 4, 147-164. https://doi.org/10.1365/s43439-023-00084-z. 埃克哈特,菲利普,科托夫斯卡娅,安纳斯塔西娅。2023 年。 ^(@){ }^{\circ} 欧盟的网络安全框架:网络韧性法案与 NIS2 指令之间的相互作用。国际网络安全法评论。4,147-164。https://doi.org/10.1365/s43439-023-00084-z。
Engel, Andreas. 2020. ‘Can a Patent Be Granted for an AI-Generated Invention?’ GRUR International 69 (11): 1123-29. https://doi.org/10.1093/grurint/ikaa117. 恩格尔, 安德烈斯. 2020. '人工智能生成的发明是否可获得专利?' 德国工业产权与版权协会论文集 69 (11): 1123-29. https://doi.org/10.1093/grurint/ikaa117.
Falco, Gregory, Ben Shneiderman, Julia Badger, Ryan Carrier, Anton Dahbura, David Danks, Martin Eling, et al. 2021. ‘Governing AI Safety through Independent Audits’. Nature Machine Intelligence 3 (7): 566-71. https://doi.org/10.1038/s42256-02100370-7. 法尔科、格雷戈里、本·施奈德曼、朱莉娅·巴格尔、瑞恩·凯里尔、安东·达布拉、大卫·丹克斯、马丁·埃林等。2021。"通过独立审计管理 AI 安全"。《自然机器智能》3(7):566-71。https://doi.org/10.1038/s42256-02100370-7。
Feldman, Vitaly. 2021. ‘Does Learning Require Memorization? A Short Tale about a Long Tail’. arXiv. https://doi.org/10.48550/arXiv.1906.05271. 费尔德曼,维塔利.2021.《学习是否需要记忆?关于长尾的短篇故事》.arXiv.https://doi.org/10.48550/arXiv.1906.05271.
Floridi, Luciano. 2023. 'Machine Unlearning: Its Nature, Scope, and Importance for a “Delete Culture”. Philosophy & Technology 36 (2): 42. https://doi.org/10.1007/s13347-023-00644-5. 佛罗里迪,卢西亚诺。2023 年。'机器反学习:其本质、范围和对"删除文化"的重要性'。《哲学与技术》36 卷,第 2 期,第 42 页。https://doi.org/10.1007/s13347-023-00644-5。
Foster, David. 2022. Generative Deep Learning. O’Reilly 福斯特, 大卫. 2022. 生成式深度学习. O'Reilly.
Fredrikson, Matt, Somesh Jha, and Thomas Ristenpart. 2015. ‘Model Inversion Attacks That Exploit Confidence Information and Basic Countermeasures’. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, 1322-33. CCS '15. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/2810103.2813677. 弗雷德里克松、马特、索梅什·杰哈和托马斯·里斯滕帕特。2015。"利用置信度信息和基本对策的模型反转攻击"。第 22 届 ACM SIGSAC 计算机与通信安全会议论文集,1322-33 页。纽约,纽约,美国:计算机机械学会。https://doi.org/10.1145/2810103.2813677。
Ganguli, Deep, Danny Hernandez, Liane Lovitt, Nova DasSarma, Tom Henighan, Andy Jones, Nicholas Joseph, et al. 2022. 'Predictability and Surprise in Large 你好,我将直接提供译文,不添加任何其他文本:
刚古里,戴普,丹尼·埃尔南德斯,连恩·洛维特,诺瓦·达萨玛,汤姆·海宁汉,安迪·琼斯,尼古拉斯·约瑟夫等。2022 年。"大型语言模型中的可预测性和惊喜"。
Generative Models’. In 2022 ACM Conference on Fairness, Accountability, and Transparency, 1747-64. https://doi.org/10.1145/3531146.3533229. 生成式模型。在 2022 年 ACM 公平性、责任和透明度会议上,1747-64 页。https://doi.org/10.1145/3531146.3533229。
Geiger, Christophe, and Bernd Justin Jütte. 2021. ‘Towards a Virtuous Legal Framework for Content Moderation by Digital Platforms in the EU? The Commission’s Guidance on Article 17 CDSM Directive in the Light of the YouTube/Cyando Judgement and the AG’s Opinion in C-401/19’. SSRN Scholarly Paper. Rochester, NY. https://doi.org/10.2139/ssrn. 3889049. 数字平台在欧盟内容审核的美好法律框架?欧盟委员会关于 CDSM 指令第 17 条的指引 - 根据 YouTube/Cyando 判决和 C-401/19 案件总检察长意见
Geiger, Christophe, Giancarlo Frosio, and Oleksandr Bulayenko. 2018. ‘The Exception for Text and Data Mining (TDM) in the Proposed Directive on Copyright in the Digital Single Market - Legal Aspects’. SSRN Scholarly Paper. Rochester, NY. https://doi.org/10.2139/ssrn. 3160586. 格鲁格,克里斯托夫,吉安卡洛·弗罗西奥和奥列克桑德尔·布拉延科。2018。"《数字单一市场版权指令》拟议中的文本和数据挖掘(TDM)例外 - 法律层面"。SSRN 学术论文。罗切斯特,纽约。https://doi.org/10.2139/ssrn.3160586。
Gervais, Daniel. 2022. ‘AI Derivatives: The Application to the Derivative Work Right to Literary and Artistic Productions of AI Machines’. Seton Hall Law Review 53 (4). https://scholarship.law.vanderbilt.edu/faculty-publications/1263. 格维斯,丹尼尔。2022 年。'人工智能衍生品:对文学和艺术作品人工智能机器生产的衍生作品权利的应用'。塞顿霍尔法律评论 53(4)。https://scholarship.law.vanderbilt.edu/faculty-publications/1263。
Gil Gonzalez, Elena, and Paul de Hert. 2019. ‘Understanding the Legal Provisions That Allow Processing and Profiling of Personal Data-an Analysis of GDPR Provisions and Principles’. ERA Forum 2019 (4): 597-621. https://doi.org/10.1007/s12027-018-0546-z. 吉尔·冈萨雷斯、埃琳娜和保罗·德赫特。2019 年。《理解允许处理和分析个人数据的法律条款——对 GDPR 条款和原则的分析》。《ERA 论坛》2019 年第 4 期,第 597-621 页。https://doi.org/10.1007/s12027-018-0546-z。
Goldstein, Josh A., Girish Sastry, Micah Musser, Renee DiResta, Matthew Gentzel, and Katerina Sedova. 2023. ‘Generative Language Models and Automated Influence Operations: Emerging Threats and Potential Mitigations’. arXiv. https://doi.org/10.48550/arXiv.2301.04246. 戈德斯坦,乔什 A.,吉里什·萨斯特里,米卡·穆瑟,雷妮·迪雷斯塔,马修·根特泽尔和卡特丽娜·塞多瓦。 2023。《生成式语言模型和自动化影响力运作:新兴威胁和潜在缓解措施》。arXiv。https://doi.org/10.48550/arXiv.2301.04246.
Goold, Patrick Russell. 2021. ‘The Curious Case of Computer-Generated Works under the Copyright, Designs and Patents Act 1988’. SSRN Scholarly Paper. Rochester, NY. https://papers.ssrn.com/abstract=4072004=4072004. 古德,帕特里克·罗素。2021 年。《版权、外观设计及专利法 1988 下的计算机生成作品的奇异案例》。SSRN 学术论文。罗切斯特,纽约。https://papers.ssrn.com/abstract =4072004=4072004
Griffiths, Jonathan. 2009. ‘The “Three-Step Test” in European Copyright Law Problems and Solutions’. SSRN Scholarly Paper. Rochester, NY. https://papers.ssrn.com/abstract=1476968. 格里菲思,乔纳森。2009。《欧洲版权法中的"三步检验法"问题与解决》。SSRN 学术论文。罗切斯特,纽约。https://papers.ssrn.com/abstract=1476968。
Guadamuz, Andres. 2021. ‘Do Androids Dream of Electric Copyright? Comparative Analysis of Originality in Artificial Intelligence Generated Works’. In Artificial Intelligence and Intellectual Property, edited by Jyh-An Lee, Reto Hilty, and KungChung Liu, 0.quad0 . \quad Oxford University Press. https://doi.org/10.1093/oso/9780198870944.003.0008. 瓜达穆兹,安德烈斯。2021 年。'仿生人梦见电子版权吗?人工智能生成作品原创性的比较分析'。收入于《人工智能与知识产权》,编辑:李㘚安、希尔蒂、刘恭忠,牛津大学出版社。https://doi.org/10.1093/oso/9780198870944.003.0008.
Hacker, Philipp, Andreas Engel, and Marco Mauer. 2023. `Regulating ChatGPT and Other Large Generative AI Models’. arXiv. https://doi.org/10.48550/arXiv.2302.02337. 黑客、菲利普·哈克、安德烈亚斯·恩格尔和马可·毛尔。2023。《监管 ChatGPT 和其他大型生成式 AI 模型》。arXiv。https://doi.org/10.48550/arXiv.2302.02337。
Hacker, Philipp. 2021. ‘A Legal Framework for AI Training Data-from First Principles to the Artificial Intelligence Act’. Law, Innovation and Technology 13 (2): 257-301. https://doi.org/10.1080/17579961.2021.1977219. 黑客,菲利普.2021.'从基本原理到人工智能法案的 AI 培训数据的法律框架'.法律,创新与技术 13(2):257-301.https://doi.org/10.1080/17579961.2021.1977219.
—. 2023a. ‘The European AI Liability Directives-Critique of a Half-Hearted Approach and Lessons for the Future’. Computer Law & Security Review 51 (November): 105871. https://doi.org/10.1016/j.clsr.2023.105871. 《欧洲人工智能责任指令—对仓促方法的批评及未来经验教训》。计算机法律与安全评论 51 (2023 年 11 月):105871。https://doi.org/10.1016/j.clsr.2023.105871。
—. 2023b. ‘What’s Missing from the EU AI Act: Addressing the Four Key Challenges of Large Language Models’. Verfassungsblog, December. https://doi.org/10.17176/20231214-111133-0. 《欧盟人工智能法案缺少哪些:应对大型语言模型的四大挑战》。Verfassungsblog,2023 年 12 月。https://doi.org/10.17176/20231214-111133-0.
-_ 2023c. ‘Statement on the AI Act Trilogue Results’, Working Paper, on arxiv. -_ 2023 年. '人工智能法案三方对话结果声明', 工作文件, 发表在 arxiv 上.
__ 2024. “Sustainable AI Regulation.” Common Market Law Review (forthcoming), https://arxiv.org/abs/2306.00292. 2024 年。"可持续的人工智能监管"。《共同市场法评论》(即将出版),https://arxiv.org/abs/2306.00292。
Henderson, Peter, Xuechen Li, Dan Jurafsky, Tatsunori Hashimoto, Mark A. Lemley, and Percy Liang. 2023. ‘Foundation Models and Fair Use’. arXiv. https://doi.org/10.48550/arXiv.2303.15715. 亨德森、彼得、李雪晨、丹·朱拉夫斯基、立克斯·塔特斯诺里、马克·A·雷米和珀西·梁。2023 年。《基础模型和合理使用》。arXiv。https://doi.org/10.48550/arXiv.2303.15715。
Hilty, Reto M, Jörg Hoffmann, and Stefan Scheuerer. 2021. ‘Intellectual Property Justification for Artificial Intelligence’. In Artificial Intelligence and Intellectual Property, edited by Jyh-An Lee, Reto Hilty, and Kung-Chung Liu, 0. Oxford University Press. https://doi.org/10.1093/oso/9780198870944.003.0004. 希尔蒂、霍夫曼和舒勒尔。 2021 年。 "人工智能知识产权的正当性"。在《人工智能与知识产权》中,编辑:Jyh-An Lee、Reto Hilty 和 Kung-Chung Liu,0。 牛津大学出版社。 https://doi.org/10.1093/oso/9780198870944.003.0004。
Hine, Emmie, Claudio Novelli, Mariarosaria Taddeo, and Luciano Floridi. 2023. ‘Supporting Trustworthy AI Through Machine Unlearning’. SSRN Scholarly Paper. Rochester, NY. https://doi.org/10.2139/ssrn. 4643518. 宁静,埃米,克劳迪奥·诺维利,玛丽亚罗萨丽亚·塔迪奥,和卢奇亚诺·弗洛里迪。2023。《通过机器遗忘支持值得信赖的人工智能》。SSRN 学术论文。罗切斯特,纽约。https://doi.org/10.2139/ssrn.4643518。
Hristov, K. (2016). Artificial intelligence and the copyright dilemma. Idea, 57, 431. 霍里斯托夫,K.(2016)。人工智能和版权困境。 Idea, 57, 431。
Hu, Zhiting, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov, and Eric P. Xing. 2018. ‘Toward Controlled Generation of Text’. arXiv. https://doi.org/10.48550/arXiv.1703.00955. 胡政婷、杨子超、梁小丹、萨拉赫丁诺夫和埃里克·P·辛格。 2018 年。《迈向可控文本生成》。 arXiv。 https://doi.org/10.48550/arXiv.1703.00955。
Hugenholtz, P. Bernt, and João Pedro Quintais. 2021. ‘Copyright and Artificial Creation: Does EU Copyright Law Protect AI-Assisted Output?’ IIC - International Review of Intellectual Property and Competition Law 52 (9): 1190-1216. https://doi.org/10.1007/s40319-021-01115-0. 于建霖,P. Bernt 和 João Pedro Quintais。 2021。《版权和人工创造:欧盟版权法是否保护 AI 辅助输出?》IIC-知识产权与竞争法国际评论 52(9):1190-1216。 https://doi.org/10.1007/s40319-021-01115-0。
Ji, Ziwei, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. 2023. ‘Survey of Hallucination in Natural Language Generation’. ACM Computing Surveys 55 (12): 248:1-248:38. https://doi.org/10.1145/3571730. 纪志伟, 李娜英, 丽塔·弗里斯克, 于铁正, 苏丹, 徐燕, 石井悦子, 邦耶晋, 安德里亚·马多托, 冯柏丝. 2023. '自然语言生成中幻觉的调查'. ACM Computing Surveys 55 (12): 248:1-248:38. https://doi.org/10.1145/3571730.
Jo, Eun Seo, and Timnit Gebru. 2020. ‘Lessons from Archives: Strategies for Collecting Sociocultural Data in Machine Learning’. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 306-16. FAT* '20. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3351095.3372829. 乔、恩瑟和提姆尼特·格赞玛。2020。"从档案中学到的经验:收集社会文化数据的策略"。在 2020 年公平、问责与透明度会议论文集中,第 306-316 页。FAT * '20。纽约,美国:计算机机械学会。https://doi.org/10.1145/3351095.3372829。
Joint Research Centre (European Commission), Henrik Junklewitz, Ronan Hamon, Antoine-Alexandre André, Tatjana Evas, Josep Soler Garrido, and Ignacio Sanchez Martin. 2023. Cybersecurity of Artificial Intelligence in the AI Act: Guiding Principles 人工智能行为法的人工智能网络安全:指导原则
欧洲委员会联合研究中心,Henrik Junklewitz, Ronan Hamon, Antoine-Alexandre André, Tatjana Evas, Josep Soler Garrido, 和 Ignacio Sanchez Martin。 2023 年。
to Address the Cybersecurity Requirement for High Risk AI Systems. LU: Publications Office of the European Union. https://data.europa.eu/doi/10.2760/271009. 应对高风险 AI 系统的网络安全需求。卢:欧洲联盟出版物办公室。https://data.europa.eu/doi/10.2760/271009。
Kang, Haoqiang, Juntong Ni, and Huaxiu Yao. 2023. ‘Ever: Mitigating Hallucination in Large Language Models through Real-Time Verification and Rectification’. arXiv. https://doi.org/10.48550/arXiv.2311.09114. 康浩强, 倪俊通, 姚华休. 2023. '永不: 通过实时验证和纠正来缓解大型语言模型中的虚构'. arXiv. https://doi.org/10.48550/arXiv.2311.09114.
Kinsella, Bret. 2023. ‘What Is GPTBot and Why You Want OpenAI’s New Web Crawler to Index Your Content’. Substack newsletter. Synthedia (blog). 7 August 2023. https://synthedia.substack.com/p/what-is-gptbot-and-why-you-want- 金赛拉,布雷特。2023 年。'GPTBot 是什么以及为什么你想让 OpenAI 的新网络爬虫索引你的内容'。Substack 新闻稿。Synthedia(博客)。2023 年 8 月 7 日。https://synthedia.substack.com/p/what-is-gptbot-and-why-you-want-
openais?utm_medium=reader2. 开放式人工智能
Klawonn, Thilo. 2019. ‘Urheberrechtliche Grenzen Des Web Scrapings (Web Scraping under German Copyright Law)’. SSRN Scholarly Paper. Rochester, NY. https://doi.org/10.2139/ssrn. 3491192. 克劳文,蒂洛。2019。'网络爬虫的版权边界(德国版权法下的网络爬虫)' 。SSRN 学术论文。罗彻斯特,纽约。https://doi.org/10.2139/ssrn. 3491192。
Lee, Cheolhyoung, Kyunghyun Cho, and Wanmo Kang. 2020. ‘Mixout: Effective Regularization to Finetune Large-Scale Pretrained Language Models’. arXiv. https://doi.org/10.48550/arXiv.1909.11299. 李彩贤、赵庆贤、康完模。2020。"Mixout:有效的正则化方法来微调大规模预训练的语言模型"。《arXiv》。https://doi.org/10.48550/arXiv.1909.11299。
Lee, Jyh-An, Reto Hilty, and Kung-Chung Liu, eds. 2021. ‘Artificial Intelligence and Intellectual Property’. In, 0. Oxford: Oxford University Press. https://doi.org/10.1093/oso/9780198870944.003.0001. 李志安, Reto Hilty 和刘孔聪(编). 2021. '人工智能与知识产权'.载于: 0. 牛津: 牛津大学出版社. https://doi.org/10.1093/oso/9780198870944.003.0001.
Lehman, Eric, Sarthak Jain, Karl Pichotta, Yoav Goldberg, and Byron C. Wallace. 2021. ‘Does BERT Pretrained on Clinical Notes Reveal Sensitive Data?’ arXiv. https://doi.org/10.48550/arXiv.2104.07762. 勒曼, 埃里克, 贾恩, 萨尔塔克, 皮乔塔, 卡尔, 高德堡, 尤瓦夫, 华莱士, 拜伦.C. 2021. '医疗笔记预训练的 BERT 是否泄露了敏感数据?'arXiv. https://doi.org/10.48550/arXiv.2104.07762.
Leistner, Matthias. 2020. ‘European Copyright Licensing and Infringement Liability Under Art. 17 DSM-Directive Compared to Secondary Liability of Content Platforms in the U.S.-Can We Make the New European System a Global Opportunity Instead of a Local Challenge?’ SSRN Scholarly Paper. Rochester, NY. https://papers.ssrn.com/abstract=3572040=3572040. 莱斯特纳,马蒂亚斯。2020 年。'欧洲版权许可和第 17 条数字单一市场指令下的侵权责任与美国内容平台的二次责任-我们能否将这一新的欧洲体系变为一个全球机遇而不是一个地方挑战?'SSRN 学术论文。罗切斯特,纽约。https://papers.ssrn.com/abstract =3572040=3572040 。
Li, Xian, Ping Yu, Chunting Zhou, Timo Schick, Luke Zettlemoyer, Omer Levy, Jason Weston, and Mike Lewis. 2023. ‘Self-Alignment with Instruction Backtranslation’. arXiv. https://doi.org/10.48550/arXiv.2308.06259. 李仙、平宇、钟春霞、蒂莫·施克、卢克·泽特尔莫尔、奥默·列维、杰森·韦斯顿和迈克·刘易斯。2023。《指令反翻译的自我调准》。arXiv。https://doi.org/10.48550/arXiv.2308.06259。
Malgieri, Gianclaudio. 2023. Vulnerability and Data Protection Law. Oxford Data Protection & Privacy Law. Oxford, New York: Oxford University Press. 马尔杰里,詹克劳迪奥。2023。漏洞与数据保护法。牛津数据保护与隐私法。牛津,纽约:牛津大学出版社。
Marcus, Gary, and Reid Southen. 2024. ‘Generative AI Has a Visual Plagiarism Problem’, 6 January 2024. https://spectrum.ieee.org/midjourney-copyright. 马科斯、加里和里德·南登。2024 年 1 月 6 日。《生成性人工智能存在视觉抄袭问题》。https://spectrum.ieee.org/midjourney-copyright。
Moës, Nicolas, and Frank Ryan. 2023. Heavy is the Head that Wears the Crown. A risk-based tiered approach to governing General Purpose AI. The Future Society. 重担在头上
Mourby, Miranda, Katharina Ó Cathaoir, and Catherine Bjerre Collin. 2021. ‘Transparency of Machine-Learning in Healthcare: The GDPR & European Health Law’. Computer Law & Security Review 43 (November): 105611. https://doi.org/10.1016/j.clsr.2021.105611. 穆尔比、米兰达、凯瑟琳娜和凯瑟琳·比尔·科林。2021 年。《医疗保健中机器学习的透明度:欧盟 GDPR 法规和欧洲健康法》。《计算机法律与安全评论》第 43 期(11 月):105611。https://doi.org/10.1016/j.clsr.2021.105611。
Nguyen, Thanh Tam, Thanh Trung Huynh, Phi Le Nguyen, Alan Wee-Chung Liew, Hongzhi Yin, and Quoc Viet Hung Nguyen. 2022. ‘A Survey of Machine Unlearning’. arXiv. https://doi.org/10.48550/arXiv.2209.02299. 阮成壶,黄天洪,黎黎阮,刘卫荣,殷红之,阮越胜. 2022. '机器遗忘调查'. arXiv. https://doi.org/10.48550/arXiv.2209.02299.
Nordemann, Jan Bernd. 2023. “Neu: Täterschaftliche Haftung von Hostprovidern im Urheberrecht bei (Verkehrs-)Pflichtverletzungen im Internet.” ZUM: 806-816. 纳德曼,扬·伯恩德。2023。"新:互联网上(交通)义务违反的版权法中主犯责任宿主提供商"。ZUM:806-816。
Novelli, Claudio, Federico Casolari, Antonino Rotolo, Mariarosaria Taddeo, e Luciano Floridi. 2023. «Taking AI Risks Seriously: A New Assessment Model for the AI Act». AI & SOCIETY, luglio. https://doi.org/10.1007/s00146-02301723-z. 诺维利,克劳迪奥,费德里科·卡索拉里,安东尼诺·罗托洛,玛利亚罗萨里亚·塔迪奥,卢西亚诺·弗洛里迪. 2023. "认真对待人工智能风险: 人工智能法案的新评估模型". AI & SOCIETY, 7 月. https://doi.org/10.1007/s00146-02301723-z.
—_. 2024. «AI Risk Assessment: A Scenario-Based, Proportional Methodology for the AI Act». Digital Society 3 (1): 13. https://doi.org/10.1007/s44206-02400095-1. 2024. «AI 风险评估:人工智能法案的基于情景的比例化方法». 数字社会 3 (1): 13. https://doi.org/10.1007/s44206-02400095-1.
Novelli, Claudio, Mariarosaria Taddeo, and Luciano Floridi. 2023. ‘Accountability in Artificial Intelligence: What It Is and How It Works’. AI & SOCIETY, February. https://doi.org/10.1007/s00146-023-01635-y. 诺维利、克劳迪奥、玛丽罗萨里亚·塔德奥和卢基亚诺·弗罗里迪。2023。《人工智能问责制:它是什么及其工作原理》。《人工智能与社会》,二月。https://doi.org/10.1007/s00146-023-01635-y。
Oliver, Jo. 2001. “Copyright in the WTO: the panel decision on the three-step test.” Colum. JL & Arts 25: 119. 奥利弗,乔。2001 年。"WTO 的版权:三步检验法的小组决定。"哥伦比亚法律与艺术杂志 25: 119。
Oostveen, Manon. 2016. 'Identifiability and the Applicability of Data Protection to Big Data’. International Data Privacy Law 6 (4): 299-309. https://doi.org/10.1093/idpl/ipw012. 奥斯特文, 曼农。 2016 年。 "可识别性和大数据的数据保护适用性"。 《国际数据隐私法》6(4): 299-309。 https://doi.org/10.1093/idpl/ipw012。
Peloquin, David, Michael DiMaio, Barbara Bierer, and Mark Barnes. 2020. ‘Disruptive and Avoidable: GDPR Challenges to Secondary Research Uses of Data’. European Journal of Human Genetics 28 (6): 697-705. https://doi.org/10.1038/s41431-0200596-x. 佩洛金,大卫,迈克尔·迪玛奥,芭芭拉·比埃尔和马克·巴恩斯。2020 年。'破坏性和可避免的:GDPR 对数据二次研究用途的挑战'。《欧洲人类遗传学杂志》28(6):697-705。https://doi.org/10.1038/s41431-0200596-x。
Pesch, Paulina Jo, and Rainer Böhme. 2023. “Artpocalypse now?-Generative KI und die Vervielfältigung von Trainingsbildern.” GRUR: 997-1007. 佩施、保利娜·乔和赖纳·博姆. 2023. "艺术启示录现在?-生成性 KI 和训练图像的增加." GRUR: 997-1007.
Purtova, Nadezhda. 2018. «The law of everything. Broad concept of personal data and future of EU data protection law». Law, Innovation and Technology 10 (1): 40-81. https://doi.org/10.1080/17579961.2018.1452176. 普尔托瓦,纳季兹达。2018。«万物之法:个人数据的广泛概念和欧盟数据保护法的未来»。法律、创新与技术 10(1):40-81。https://doi.org/10.1080/17579961.2018.1452176。
Ramalho, Ana. 2018. 'Patentability of AI-Generated Inventions: Is a Reform of the Patent System Needed?’ SSRN Scholarly Paper. Rochester, NY. https://doi.org/10.2139/ssrn. 3168703. 拉马尔霍, 安娜. 2018. '人工智能生成发明的可专利性:是否需要专利制度改革?' SSRN 学术论文. 罗切斯特, 纽约. https://doi.org/10.2139/ssrn.3168703.
Ranade, Priyanka, Aritran Piplai, Sudip Mittal, Anupam Joshi, and Tim Finin. 2021. ‘Generating Fake Cyber Threat Intelligence Using Transformer-Based Models’. arXiv. https://doi.org/10.48550/arXiv.2102.04351. 拉纳德、普丽扬卡、阿里特兰·皮拉伊、苏迪普·米塔尔、阿努帕姆·乔希和蒂姆·菲宁。2021。《利用基于变压器的模型生成虚假网络威胁情报》。arXiv。https://doi.org/10.48550/arXiv.2102.04351。
Rosati, Eleonora. 2018. “The exception for text and data mining (TDM) in the proposed Directive on Copyright in the Digital Single Market: technical aspects.” European Parliament. 罗萨蒂,埃莱诺拉。 2018 年。 "拟议的数字单一市场版权指令中的文本和数据挖掘(TDM)例外:技术方面。" 欧洲议会。
Sammarco, Pieremilio. 2020. "L’attività Di Web Scraping Nelle Banche Dati Ed Il Riuso Delle Informazioni’. Diritto dell’informazione e dell’informatica 35 (2). 萨马尔科,皮埃尔米利奥。2020 年。"网络爬取在数据库中的活动及信息的二次利用"。信息法律 45(2)。
Sartor, Giovanni, Francesca Lagioia, and Giuseppe Contissa. 2018. ‘The Use of Copyrighted Works by AI Systems: Art Works in the Data Mill’. SSRN Scholarly Paper. Rochester, NY. https://doi.org/10.2139/ssrn. 3264742. 萨托尔·乔瓦尼,弗朗西斯卡·拉乔亚,朱塞佩·孔蒂萨。2018。《人工智能系统对版权作品的使用:数据工厂中的艺术作品》。SSRN 学术论文。罗切斯特,纽约。https://doi.org/10.2139/ssrn.3264742。
Spedicato, Giorgio. 2019. ‘Creatività artificiale, mercato e proprietà intellettuale’. Rivista Di Diritto Industriale 4-5: 253-307. 斯佩迪卡托,乔治奥。2019。"人工创造力、市场和知识产权"。工业法律杂志 4-5:253-307。
The Future Society. 2023. EU AI Act Compliance Analysis: General-Purpose AI Models in Focus. Report. 未来社会。2023 年。欧盟人工智能法案合规性分析:重点关注通用人工智能模型。报告。
Theodorou, Andreas, and Virginia Dignum. 2020. ‘Towards Ethical and Socio-Legal Governance in AI’. Nature Machine Intelligence 2 (1): 10-12. https://doi.org/10.1038/s42256-019-0136-y. 狄格诺,安德烈亚斯,和维吉尼亚。2020。《走向人工智能的伦理和社会-法律治理》。《自然机器智能》2 (1): 10-12。https://doi.org/10.1038/s42256-019-0136-y。
Tonmoy, S. M. Towhidul Islam, S. M. Mehedi Zaman, Vinija Jain, Anku Rani, Vipula Rawte, Aman Chadha, and Amitava Das. 2024. ‘A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models’. arXiv. https://doi.org/10.48550/arXiv.2401.01313. 汤姆伊, S. M. 图希杜尔·伊斯兰, S. M. 梅希迪·扎曼, Vinija Jain, Anku Rani, Vipula Rawte, 阿曼·查德哈和阿米塔瓦·达斯. 2024. '大型语言模型中幻觉缓解技术的全面调查'. arXiv. https://doi.org/10.48550/arXiv.2401.01313.
Varytimidou, C. (2023). The New A(I)rt Movement and Its Copyright Protection: Immoral or E-Moral? GRUR Int., 357-363. 瓦丽提穆度, C. (2023). 新的 A(I)艺术运动及其版权保护:不道德还是电子道德?GRUR Int., 357-363.
Veale, Michael, Reuben Binns, and Lilian Edwards. 2018. 'Algorithms That Remember: Model Inversion Attacks and Data Protection Law’. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 376 (2133): 20180083. https://doi.org/10.1098/rsta.2018.0083. 韦林·迈克尔,鲁本·宾斯,莉莉安·爱德华兹。2018。《记忆的算法:模型反转攻击和数据保护法》。《皇家学会哲学交易 A 辑:数学、物理和工程科学》376(2133):20180083。https://doi.org/10.1098/rsta.2018.0083。
Villaronga, E. F., Kieseberg, P., & Li, T. (2018). Humans forget, machines remember: Artificial intelligence and the right to be forgotten. Computer Law & Security Review, 34(2),304-31334(2), 304-313. 维拉龙加, E. F., 基森堡, P., & 李, T. (2018). 人类忘记,机器记住:人工智能和被遗忘的权利. Computer Law & Security Review, 34(2),304-31334(2), 304-313 .
Wallace, Eric, Shi Feng, Nikhil Kandpal, Matt Gardner, and Sameer Singh. 2021. ‘Universal Adversarial Triggers for Attacking and Analyzing NLP’. arXiv. https://doi.org/10.48550/arXiv.1908.07125. 华莱士、艾瑞克、石峰、尼基尔·坎达帕尔、马特·加德纳和萨米尔·辛格。2021 年。《针对 NLP 的通用对抗性触发器》。arXiv。https://doi.org/10.48550/arXiv.1908.07125。
Weidinger, Laura, John Mellor, Maribeth Rauh, Conor Griffin, Jonathan Uesato, PoSen Huang, Myra Cheng, et al. 2021. ‘Ethical and Social Risks of Harm from Language Models’. arXiv. https://doi.org/10.48550/arXiv.2112.04359. 魏丁格,劳拉;梅勒尔,约翰;劳,玛丽贝丝;格里芬,康纳;尤萨托,乔纳森;黄,PoSen;陈,美娜 等。2021。《来自语言模型的潜在伤害的伦理和社会风险》。《arXiv》。https://doi.org/10.48550/arXiv.2112.04359。
Xiao, Yuxin, Paul Pu Liang, Umang Bhatt, Willie Neiswanger, Ruslan Salakhutdinov, and Louis-Philippe Morency. 2022. ‘Uncertainty Quantification with Pre-Trained Language Models: A Large-Scale Empirical Analysis’. arXiv. https://doi.org/10.48550/arXiv.2210.04714. 肖宇鑫、鲁思兰·萨拉赫丁诺夫、保罗·浦量、扬·奈斯旺格和路易斯-菲利普·莫朗西。2022。《预训练语言模型的不确定性量化:大规模的实证分析》。arXiv。https://doi.org/10.48550/arXiv.2210.04714。
Ye, Hongbin, Tong Liu, Aijia Zhang, Wei Hua, and Weiqiang Jia. 2023. ‘Cognitive Mirage: A Review of Hallucinations in Large Language Models’. arXiv. https://doi.org/10.48550/arXiv.2309.06794. 叶红斌、童流、张艾嘉、华巍、贾伟强。2023。《认知幻觉:大型语言模型中幻觉的综述》。arXiv.https://doi.org/10.48550/arXiv.2309.06794。
Zarsky, Tal. 2017. ‘Incompatible: The GDPR in the Age of Big Data’. SSRN Scholarly Paper. Rochester, NY. https://papers.ssrn.com/abstract=3022646. 扎尔斯基,泰尔。2017 年。《大数据时代的 GDPR 不兼容性》。SSRN 学术论文。罗彻斯特,纽约。https://papers.ssrn.com/abstract=3022646。
Zhang, D., Finckenberg-Broman, P., Hoang, T., Pan, S., Xing, Z., Staples, M., & Xu, X. (2023). Right to be forgotten in the era of large language models: Implications, challenges, and solutions. arXiv preprint arXiv:2307.03941. 张,D.,芬克伯格-布罗马,P.,黄,T.,潘,S.,邢,Z.,斯台普斯,M.和许,X.(2023)。大型语言模型时代的被遗忘权:影响、挑战与解决方案。arXiv 预印本 arXiv:2307.03941。
Zhang, Muru, Ofir Press, William Merrill, Alisa Liu, and Noah A. Smith. 2023. ‘How Language Model Hallucinations Can Snowball’. arXiv. https://doi.org/10.48550/arXiv.2305.13534. 张慕茹、奥菲尔·普雷斯、威廉·梅里尔、阿丽萨·刘和诺亚·A·史密斯. 2023. '语言模型虚假信息的误雪球效应'. arXiv. https://doi.org/10.48550/arXiv.2305.13534.
Ziosi, Marta, Jakob Mökander, Claudio Novelli, Federico Casolari, Mariarosaria Taddeo, and Luciano Floridi. 2023. ‘The EU AI Liability Directive: Shifting the Burden From Proof to Evidence’. SSRN Scholarly Paper. Rochester, NY. https://doi.org/10.2139/ssrn. 4470725. 齐奥西、马塔、雅各布·莫肯德、克劳迪奥·诺维利、费德里科·卡索拉里、玛丽亚·罗萨里亚·塔德奥和卢西亚诺·弗洛里迪。 2023 年。《欧盟人工智能责任指令:从举证责任转向证据》。SSRN 学术论文。罗切斯特,纽约。https://doi.org/10.2139/ssrn.4470725。
Zuiderveen Borgesius, Frederik, Sanne Kruikemeier, Sophie Boerman, and Natali Helberger. 2018. ‘Tracking Walls, Take-It-Or-Leave-It Choices, the GDPR, and the ePrivacy Regulation’. SSRN Scholarly Paper. Rochester, NY. https://papers.ssrn.com/abstract=3141290=3141290. 祖德法斯;博尔格吕斯,弗雷德里克,萨涅·库里克梅尔,索菲·博尔曼,和纳塔利·赫尔贝格。 2018。 《跟踪墙,非此即彼选择,GDPR 和电子隐私条例》。 SSRN 学术论文。罗切斯特,纽约。 https://papers.ssrn.com/abstract =3141290=3141290 。
^(1){ }^{1} While Generative AI encompasses a wider range of systems than LLMs, their overlapping legal concerns necessitate considering them together. However, we will maintain a focus on LLMs. 尽管生成式人工智能涵盖了比LLMs广泛的系统,但它们重叠的法律问题需要我们一起考虑。然而,我们将把重点放在LLMs上。
2 European Commission, Directorate-General for Communications Networks, Content, and Technology, European enterprise survey on the use of technologies based on artificial intelligence: final report, Publications Office, 2020. The survey refers to the broader category of natural language processing models, pp. 71-72. 2 欧洲委员会, 通信网络、内容和技术总司, 基于人工智能技术的欧洲企业调查: 最终报告, 出版办公室, 2020 年。该调查涉及自然语言处理模型的更广泛类别, 第 71-72 页。 ^(3){ }^{3} Proposal for a Directive of the European Parliament and of the Council on liability for defective products (COM/2022/495 final). 关于产品缺陷责任的欧洲议会和理事会指令提案(COM/2022/495 最终版)。 ^(4){ }^{4} Proposal for a Directive of the European Parliament and of the Council on adapting non-contractual civil liability rules to artificial intelligence (COM/2022/496 final). ^(4){ }^{4} 提议欧洲议会和理事会关于调整非合同民事责任规则以适应人工智能的指令(COM/2022/496 final)。 ^(5){ }^{5} See Council of the European Union, Interinstitutional File: 2022/0302(COD), Doc. 5809/24 of Jan. 24, 2024, Letter sent to the European Parliament (setting out the final text of the PLD). ^(5){ }^{5} 参见欧洲联盟理事会,文件编号:2022/0302(COD),文件号:5809/24,日期:2024 年 1 月 24 日,发送给欧洲议会的函件(提供《产品责任指令》最终文本)。
^(6){ }^{6} See the corresponding policy suggestion and argument made in (Hacker 2023a, at footnote 107). ^(6){ }^{6} 请参见(Hacker 2023a, 见脚注 107)中的相应政策建议和论点。
^(7){ }^{7} However, the Commission must adjust the threshold as technology advances, like better algorithms or more efficient hardware, to stay current with the latest developments in general purpose AI models. 但是,委员会必须根据技术进步,如更好的算法或更高效的硬件,调整阈值,以跟上通用人工智能模型的最新发展。
^(8){ }^{8} These can introduced both in the Commission’s delegated acts and throughout the standardization process. ^(8){ }^{8} 这些可以在欧盟委员会的委托法规和整个标准化过程中引入。 ^(9){ }^{9} The PLD, which is not tied to the risk categories of the AIA in terms of applicability, cannot do all the work because its provisions apply only to professionals - economic operators - and not to nonprofessional users like the AILD. ^(9){ }^{9} 这个 PLD 并不局限于 AIA 所规定的风险类别,无法完全解决问题,因为它的规定只适用于专业人员-经济运营者,而不适用于 AILD 这样的非专业用户。 ^(10){ }^{10} The dependence on the AIA is less of an issue for the PLD as it has greater harmonization and extensive case law. However, identifying the appropriate safety requirements (Articles 6 and 7) to assess the defectiveness of Generative AI and LLMs remains a challenge. ^(10){ }^{10} 对于 PLD 来说,依赖于 AIA 并不是一个大问题,因为它有更好的协调性和广泛的案例法。但是,确定评估广义 AI 和LLMs缺陷的适当安全要求(第 6 和 7 条)仍然是一个挑战。
^(11){ }^{11} For this discussion, we will concentrate on strategies to prevent LLMs from compromising user privacy and personal data, bypassing what makes a context or a recipient. However, an analysis of these issues is done by (Brown et al. 2022). ^(11){ }^{11} 在本次讨论中,我们将集中精力在策略上,以防止LLMs侵害用户隐私和个人数据,越过什么构成环境或收件人。不过,这些问题的分析由(Brown et al. 2022)完成。 ^(12){ }^{12} This list is not exhaustive. For practitioners, particularly, the records of processing activities (Article 30 GDPR) and the data protection impact assessment (Article 35 GDPR) are very relevant as well. See, e.g., the data protection checklist for AI issued by the Bavarian Data Protection Authority, https://www.lda.bayern.de/media/ki checkliste.pdf. ^(12){ }^{12} 此列表并非详尽无遗。对于从业人员而言,处理活动的记录(GDPR 第 30 条)和数据保护影响评估(GDPR 第 35 条)也同样重要。请参见巴伐利亚数据保护局发布的人工智能数据保护检查表,网址为 https://www.lda.bayern.de/media/ki checkliste.pdf。
^(13){ }^{13} Another possibility is the purpose change test (Article 6(4) GDPR), not explored further here for space constraints. Note that Article 9 GDPR, in our view, applies in addition. ^(13){ }^{13} 另一种可能性是目的变更测试(GDPR 第 6(4)条),由于篇幅限制,此处未进一步探讨。请注意,我们认为 GDPR 第 9 条也适用于本文。
^(14){ }^{14} CJEU, C-252/21, Meta vs. Bundeskartellamt, ECLI:EU:C:2023:537, para. 73. ^(14){ }^{14} 欧洲联盟法院, C-252/21, Meta 对 Bundeskartellamt, ECLI:EU:C:2023:537, 第 73 段。 ^(15){ }^{15} CJEU, C-252/21, Meta vs. Bundeskartellamt, ECLI:EU:C:2023:537, para. 73. ^(15){ }^{15} 欧洲联盟法院, C-252/21, Meta 对 Bundeskartellamt, ECLI:EU:C:2023:537, 第 73 段。
^(18){ }^{18} Garante per la Protezione dei Dati Personali, Provvedimento del 30 marzo 2023 [9870832]. 个人数据保护监管机构,2023 年 3 月 30 日决定[9870832]。
^(19){ }^{19} However, skepticism about opting-out tools has raised because, for example, individual users optingout are not the only holder of their sensitive information (Brown et al. 2022). 然而,人们对选择退出工具持怀疑态度,因为例如,个人用户选择退出并不是他们敏感信息的唯一持有者(Brown et al. 2022)。 ^(20){ }^{20} For a general discussion of these issues, see (J.-A. Lee, Hilty, and Liu 2021) and the compendium provided by WIPO, Revised Issues Paper on Intellectual Property Policy and Artificial Intelligence, 21 May 2020, WIPO/IP/AI/2/GE/20/1 REV. ^(20){ }^{20} 有关这些问题的一般讨论,见(J.-A. Lee、Hilty 和 Liu 2021)以及世界知识产权组织提供的汇编,《知识产权政策与人工智能修订问题文件》,2020 年 5 月 21 日,WIPO/IP/AI/2/GE/20/1 REV。
21 See,e.g.,https://www.bakerlaw.com/services/artificial-intelligence-ai/case-tracker-artificialintelligence-copyrights-and-class-actions/. 21 参见,例如,https://www.bakerlaw.com/services/artificial-intelligence-ai/case-tracker-artificialintelligence-copyrights-and-class-actions/。
22 Cf. European Parliament resolution of 20 October 2020 on intellectual property rights for the development of artificial intelligence technologies, 2020/2015(INI), par. 15. 22 参见 2020 年 10 月 20 日欧洲议会关于人工智能技术发展的知识产权的决议,2020/2015(INI),第 15 段。 ^(23){ }^{23} Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases (“Database Directive”), OJ L 77, 27.3.1996, p. 20 - 28. 欧洲议会和理事会于 1996 年 3 月 11 日关于数据库法律保护的 96/9/EC 指令("数据库指令"),OJ L 77,27.3.1996,第 20-28 页。
24 As clarified by the Court of Justice of the EU in the Ryanair case: CJEU, 15 January 2015, case C30/14 - Ryanair, ECLI:EU:C:2015:10. 正如欧盟法院在瑞安航空案中所明确指出:欧盟法院,2015 年 1 月 15 日,案件编号 C30/14 - 瑞安航空,ECLI:EU:C:2015:10。
^(25){ }^{25} B. Kinsella 'What is GPTBot and Why You Want OpenAI’s New Web Crawler to Index Your Content" blogpost in Synthedia available at: https://synthedia.substack.com/p/what-is-gptbot-andwhy-you-want-openais?utm_source==profile&utm_medium=reader2. ^(25){ }^{25} 《Synthedia》博客中"什么是 GPTBot 以及为什么您希望 OpenAI 的新网络爬虫索引您的内容"一文,作者 B. Kinsella。链接:https://synthedia.substack.com/p/what-is-gptbot-andwhy-you-want-openais?utm_source==profile&utm_medium=reader2。 ^(26){ }^{26} Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Market and amending Directives 96/9/EC and 2001/29/EC (“Digital Single Market Directive”), OJ L 130, 17.5.2019, p. 92 - 125. 2019 年 4 月 17 日欧洲议会和理事会第 2019/790 号指令关于数字单一市场中的版权和相关权利,并修订第 96/9/EC 和 2001/29/EC 号指令("数字单一市场指令"),OJ L 130,17.5.2019, p. 92 - 125.
^(27){ }^{27} See, e.g., Z. Small, “Sarah Silverman Sues OpenAI and Meta Over Copyright Infringement”, The New York Times, 10 July 2023, available at: https://www.nytimes.com/2023/07/10/arts/sarahsilverman-lawsuit-openai-meta.html; B. Brittain, “Lawsuit says OpenAI violated US authors’ copyrights to train AI chatbot”, Reuters, 29 June 2023, available at: https://www.reuters.com/legal/lawsuit-saysopenai-violated-us-authors-copyrights-train-ai-chatbot-2023-06-29/. ^(27){ }^{27} 见,例如,Z. Small,《萨拉·席尔弗曼起诉 OpenAI 和 Meta 侵犯版权》,《纽约时报》,2023 年 7 月 10 日,可访问:https://www.nytimes.com/2023/07/10/arts/sarahsilverman-lawsuit-openai-meta.html;B. Brittain,《法律诉讼称 OpenAI 侵犯美国作者的版权以训练 AI 聊天机器人》,路透社,2023 年 6 月 29 日,可访问:https://www.reuters.com/legal/lawsuit-saysopenai-violated-us-authors-copyrights-train-ai-chatbot-2023-06-29/.
^(28){ }^{28} However, some cases might pose more challenges than others: consider, e.g., the case where an AI system is used to create works that involve existing fictional characters (who are per se protected). 然而,有些案例可能比其他案例面临更多挑战:例如,使用人工智能系统创造涉及现有虚构角色(本身受到保护)的作品的情况。 ^(29){ }^{29} CJEU, Joined Cases C-682/18 and C-683/18, YouTube vs. Cyando, ECLI:EU:C:2021:503. ^(29){ }^{29} 欧洲联盟法院, 案件 C-682/18 和 C-683/18, YouTube vs. Cyando, ECLI:EU:C:2021:503. ^(30){ }^{30} CJEU, Joined Cases C-682/18 and C-683/18, YouTube vs. Cyando, ECLI:EU:C:2021:503, para. 102. The latter point addresses specifically piracy platforms, not YouTube (para. 96 and 101). ^(30){ }^{30} 欧盟法院,合并案件 C-682/18 和 C-683/18,YouTube 诉 Cyando,ECLI:EU:C:2021:503,第 102 段。后一点专门针对盗版平台,而不是 YouTube(第 96 段和第 101 段)。 ^(31){ }^{31} In his case, one would further have to investigate if Art. 17 DSMD constitutes a lex specialis to the more general Cyando case (Geiger and Jütte 2021; Leistner 2020). ^(31){ }^{31} 在他的案例中,还需要进一步调查《数字单一市场指令》第 17 条是否构成对更一般的 Cyando 案件的特别法(Geiger 和 Jütte 2021;Leistner 2020)。
^(32){ }^{32} Cf. Art. 3(1) of the Database Directive; Art. 6 of the Directive 2006/116/EC of the European Parliament and of the Council of 12 December 2006 on the term of protection of copyright and certain related rights (“Term Directive”), OJ L 372, 27.12.2006, p. 12 - 18; Art. 1(3) of the Directive 2009/24/EC of the European Parliament and of the Council of 23 April 2009 on the legal protection of computer programs, OJ L 111, 5.5.2009, p. 16-2216-22. ^(32){ }^{32} 参见数据库指令第 3(1)条;欧洲议会和理事会 2006 年 12 月 12 日关于版权和某些相关权利保护期的 2006/116/EC 指令("期限指令"),OJ L 372,27.12.2006,第 12-18 页;欧洲议会和理事会 2009 年 4 月 23 日关于计算机程序法律保护的 2009/24/EC 指令,OJ L 111,5.5.2009,第 16-2216-22 页。 ^(33){ }^{33} Cf. Art. 56 of the European Patent Convention. ^(33){ }^{33} ^(34){ }^{34} Cf. European Parliament resolution of 20 October 2020 on intellectual property rights for the development of artificial intelligence technologies, 2020/2015(INI), par. 15. ^(34){ }^{34} 参见 2020 年 10 月 20 日欧洲议会关于人工智能技术发展的知识产权问题的决议,2020/2015(INI),第 15 段。 ^(35){ }^{35} CJEU, 1 December 2011, case C-145/10, Painer, ECLI:EU:C:2011:798. ^(35){ }^{35} 欧洲联盟法院,2011 年 12 月 1 日,案件 C-145/10,Painer,ECLI:EU:C:2011:798。
^(36){ }^{36} On 16 March 2023 the US Copyright Office issued formal guidance on the registration of AIgenerated works, confirming that “copyright can protect only material that is the product of human creativity”: see Federal Register, Vol. 88, No. 51, March 16, 2023, Rules and Regulations, p. 16191. 美国版权局于 2023 年 3 月 16 日就人工智能生成作品的注册发布了正式指导意见,确认"版权只能保护人类创造性的产品":见《联邦公报》第 88 卷第 51 期,2023 年 3 月 16 日,规则和条例,第 16191 页。 ^(37){ }^{37} United States District Court for the District of Columbia [2023]: Thaler v. Perlmutter, No. 22-CV384-1564-BAH. 美国哥伦比亚特区联邦地区法院[2023]: Thaler v. Perlmutter, No. 22-CV384-1564-BAH. ^(38){ }^{38} On 21 December 2021 the Legal Board of Appeal of the EPO issued a decision in case J 8/20 (DABUS), confirming that under the European Patent Convention (EPC) an inventor designated in a patent application must be a human being. 在 2021 年 12 月 21 日,欧洲专利局(EPO)上诉法律委员会在案件 J 8/20 (DABUS)中作出裁决,确认根据《欧洲专利公约》(EPC),专利申请中指定的发明人必须是自然人。 ^(39){ }^{39} Cf. recital no. 10 of Directive 2001/29/EC of the European Parliament and of the Council of 22 May 2001 on the harmonisation of certain aspects of copyright and related rights in the information society ^(39){ }^{39} 参见《欧洲议会和理事会 2001 年 5 月 22 日关于信息社会中著作权和相关权利某些方面协调的 2001/29/EC 号指令》第 10 项前言。 ^(40){ }^{40} Cf. Sec. 178 of the UK Copyright, Designs and Patents Act 1988 (“CDP Act”). ^(40){ }^{40} 参见 1988 年英国《版权、设计和专利法》第 178 条。
^(41){ }^{41} Cf. Sec. 9(3) of the CDP Act. Similarly, Sec. 11 of the 1997 Copyright Ordinance (Cap. 528) of Hong Kong and Art. 2 of the 1994 New Zealand Copyright Act. ^(41){ }^{41} 见 CDP 法第 9(3)条。同样地,香港 1997 年《版权条例》(第 528 章)第 11 条和 1994 年新西兰《版权法》第 2 条。
^(42){ }^{42} For instance, the draft legislative proposal of the European Parliament requires that the provider of a foundation model (now GPai shall demonstrate the reduction and mitigation of reasonably foreseeable risks to democracy and the rule of law (Article 28b). ^(42){ }^{42} 例如,欧洲议会的立法草案要求基础模型(现为 GPai)的提供商证明已经减少和缓解对民主和法治的可预见的合理风险(第 28b 条)。 ^(43){ }^{43} Which basically means establishing a metric to measure faithfulness, that is, the extent to which a model’s outputs align with the input data or established truths. ^(43){ }^{43} 这基本上意味着建立一个度量忠实度的指标,也就是模型输出与输入数据或既定事实的一致程度。
^(i){ }^{i} Authors have worked on different sections according to the following division: Claudio Novelli has worked on Sections 1, 2, 3, 5, 6; Federico Casolari has worked on Section 2 ; Philipp Hacker has worked on Sections 2, 3, 4, 5; Giorgio Spedicato has worked on Section 4; Luciano Floidi has worked on Sections 1, 3, 6 . ^(i){ }^{i} 作者根据以下分工工作在不同部分上:Claudio Novelli 负责第 1、2、3、5、6 节;Federico Casolari 负责第 2 节;Philipp Hacker 负责第 2、3、4、5 节;Giorgio Spedicato 负责第 4 节;Luciano Floidi 负责第 1、3、6 节。