这是用户在 2024-7-27 21:35 为 https://scientific-github.5208818.xyz/Demo/2012.13635v4/Doc2X/Original/2012.13635v4350px.html 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

Logic Tensor Networks 逻辑张量网络

Samy Badreddine a,b, ,Artur d Avila Garce zc ,Luciano Serafini d ,Michael Spranger a,b
Samy Badreddine a,b, ,Artur d Avila Garce zc ,Luciano Serafini d ,Michael Spranger a,b
a Sony Computer Science Laboratories Inc,3-14-13 Higashigotanda,141-0022,Tokyo,Japan
索尼计算机科学实验室有限公司,日本东京都 141-0022 东五反田 3-14-13
b Sony AI Inc,1-7-1 Konan,108-0075,Tokyo,Japan
Sony AI Inc, 1-7-1 Konan, 108-0075, 东京, 日本
c City,University of London,Northampton Square,EC1V 0HB,London,United Kingdom
伦敦城市大学,北安普顿广场,EC1V 0HB,伦敦,英国
d Fondazione Bruno Kessler,Via Sommarive 18,38123,Trento,Italy
Fondazione Bruno Kessler, 意大利特伦托市 Sommarive 大街 18 号,邮编 38123

Abstract 摘要

Attempts at combining logic and neural networks into neurosymbolic approaches have been on the increase in recent years. In a neurosymbolic system, symbolic knowledge assists deep learning, which typically uses a sub-symbolic distributed representation, to learn and reason at a higher level of abstraction. We present Logic Tensor Networks (LTN), a neurosymbolic framework that supports querying, learning and reasoning with both rich data and abstract knowledge about the world. LTN introduces a fully differentiable logical language, called Real Logic, whereby the elements of a first-order logic signature are grounded onto data using neural computational graphs and first-order fuzzy logic semantics. We show that LTN provides a uniform language to represent and compute efficiently many of the most important AI tasks such as multi-label classification, relational learning, data clustering, semi-supervised learning, regression, embedding learning and query answering. We implement and illustrate each of the above tasks with several simple explanatory examples using TensorFlow 2. The results indicate that LTN can be a general and powerful framework for neurosymbolic AI.
近年来,将逻辑和神经网络结合为神经符号方法的尝试不断增加。在神经符号系统中,符号知识辅助深度学习,后者通常使用子符号分布表示,以在更高抽象级别上学习和推理。我们提出了逻辑张量网络(LTN),这是一个支持使用丰富数据和关于世界的抽象知识进行查询、学习和推理的神经符号框架。LTN 引入了一个完全可微的逻辑语言,称为真实逻辑,其中一阶逻辑签名的元素通过神经计算图和一阶模糊逻辑语义被接地到数据上。我们展示了 LTN 提供了一个统一的语言,可以有效表示和计算许多最重要的人工智能任务,如多标签分类、关系学习、数据聚类、半监督学习、回归、嵌入学习和查询回答。我们使用 TensorFlow 2 实现并用几个简单的解释性示例说明了上述每个任务。结果表明,LTN 可以成为神经符号人工智能的通用且强大的框架。
Keywords: Neurosymbolic AI, Deep Learning and Reasoning, Many-valued Logics.
关键词:神经符号人工智能,深度学习与推理,多值逻辑。

1. Introduction 1. 引言

Artificial Intelligence (AI) agents are required to learn from their surroundings and reason about what has been learned to make decisions, act in the world, or react to various stimuli. The latest Machine Learning (ML) has adopted mostly a pure sub-symbolic learning approach. Using distributed representations of entities, the latest ML performs quick decision-making without building a comprehensible model of the world. While achieving impressive results in computer vision, natural language, game playing, and multimodal learning, such approaches are known to be data inefficient and to struggle at out-of-distribution generalization. Although the use of appropriate inductive biases can alleviate such shortcomings, in general, sub-symbolic models lack comprehensibility. By contrast, symbolic AI is based on rich, high-level representations of the world that use human-readable symbols. By rich knowledge, we refer to logical representations which are more expressive than propositional logic or propositional probabilistic approaches, and which can express knowledge using full first-order logic, including universal and existential quantification (xandy) ,arbitrary n -ary relations over variables,e.g. R(x,y,z,) ,and function symbols,e.g. father Of(x),x+y ,etc. Symbolic AI has achieved success at theorem proving,logical inference,and verification. However, it also has shortcomings when dealing with incomplete knowledge. It can be inefficient with large amounts of inaccurate data and lack robustness to outliers. Purely symbolic decision algorithms usually have high computational complexity making them impractical for the real world. It is now clear that the predominant approach to ML, where learning is based on recognizing the latent structures hidden in the data, is insufficient and may benefit from symbolic AI [17]. In this context, neurosymbolic AI, which stems from neural networks and symbolic AI, attempts to combine the strength of both paradigms (see [16, 40, 54] for recent surveys). That is to say, combine reasoning with complex representations of knowledge (knowledge-bases, semantic networks, ontologies, trees, and graphs) with learning from complex data (images, time series, sensorimotor data, natural language). Consequently, a main challenge for neurosymbolic AI is the grounding of symbols, including constants, functional and relational symbols, into real data, which is akin to the longstanding symbol grounding problem [30].
人工智能(AI)代理需要从周围环境中学习,并推理所学内容以做出决策,在世界中行动,或对各种刺激做出反应。最新的机器学习(ML)主要采用了一种纯次符号学习方法。通过使用实体的分布式表示,最新的 ML 能够快速做出决策,而无需构建一个可理解的世界模型。虽然在计算机视觉、自然语言、游戏玩法和多模态学习方面取得了令人印象深刻的成果,但这些方法被认为在数据效率上存在问题,并且在超出分布范围的泛化上存在困难。尽管适当使用归纳偏差可以缓解这些缺点,但总体而言,次符号模型缺乏可理解性。相比之下,符号 AI 基于对世界的丰富、高级别表示,使用人类可读的符号。 通过丰富的知识,我们指的是比命题逻辑或命题概率方法更具表现力的逻辑表示,可以使用完整的一阶逻辑来表达知识,包括全称量词和存在量词 (xandy) ,对变量的任意 n 元关系,例如 R(x,y,z,) ,和函数符号,例如父亲 Of(x),x+y 等。符号人工智能在定理证明、逻辑推理和验证方面取得了成功。然而,在处理不完整知识时也存在缺点。当处理大量不准确数据时可能效率低下,并且对异常值缺乏鲁棒性。纯符号决策算法通常具有较高的计算复杂性,使其在实际世界中难以实用。现在清楚地看到,基于识别数据中隐藏的潜在结构进行学习的主导 ML 方法是不足的,可能受益于符号人工智能[17]。在这种背景下,神经符号人工智能源自神经网络和符号人工智能,试图结合两种范式的优势(参见[16, 40, 54]最近的调查报告)。 换句话说,将推理与知识的复杂表征(知识库、语义网络、本体论、树和图形)与从复杂数据中学习(图像、时间序列、感觉运动数据、自然语言)相结合。因此,神经符号人工智能的主要挑战之一是将符号(包括常量、功能和关系符号)与真实数据进行基础化,这类似于长期存在的符号基础问题【30】。

*Corresponding author 通讯作者
Email addresses: badreddine. samy@gmail.com (Samy Badreddine), a. garcez@city.ac.uk (Artur d'Avila Garcez),
电子邮件地址:badreddine.samy@gmail.com(Samy Badreddine),a.garcez@city.ac.uk(Artur d'Avila Garcez),
serafini@fbk.eu (Luciano Serafini), michael . spranger@sony . com (Michael Spranger)
serafini@fbk.eu(Luciano Serafini),michael.spranger@sony.com(Michael Spranger)

Logic Tensor Networks (LTN) are a neurosymbolic framework and computational model that supports learning and reasoning about data with rich knowledge. In LTN, one can represent and effectively compute the most important tasks of deep learning with a fully differentiable first-order logic language, called Real Logic, which adopts infinitely many truth-values in the interval [0,1][22,25] . In particular,LTN supports the specification and computation of the following AI tasks uniformly using the same language: data clustering, classification, relational learning, query answering, semi-supervised learning, regression, and embedding learning.
逻辑张量网络(LTN)是一种神经符号框架和计算模型,支持学习和推理丰富知识的数据。在 LTN 中,可以用全可微的一阶逻辑语言 Real Logic 来表示和有效计算深度学习的最重要任务,Real Logic 采用区间 [0,1][22,25] 中无限多个真值。特别是,LTN 支持使用相同语言统一地规范和计算以下人工智能任务:数据聚类、分类、关系学习、查询回答、半监督学习、回归和嵌入学习。
LTN and Real Logic were first introduced in [62]. Since then, LTN has been applied to different AI tasks involving perception, learning, and reasoning about relational knowledge. In [18, 19], LTN was applied to semantic image interpretation whereby relational knowledge about objects was injected into deep networks for object relationship detection. In [6], LTN was evaluated on its capacity to perform reasoning about ontological knowledge. Furthermore, [7] shows how LTN can be used to learn an embedding of concepts into a latent real space by taking into consideration ontological knowledge about such concepts. In [3], LTN is used to annotate a reinforcement learning environment with prior knowledge and incorporate latent information into an agent. In [42], authors embed LTN in a state-of-the-art convolutional object detector. Extensions and generalizations of LTN have also been proposed in the past years, such as LYRICS [47] and Differentiable Fuzzy Logic (DFL) [68,69]. LYRICS provides an input language allowing one to define background knowledge using a first-order logic where predicate and function symbols are grounded onto any computational graph. DFL analyzes how a large collection of fuzzy logic operators behave in a differentiable learning setting. DFL also introduces new semantics for fuzzy logic implications called sigmoidal implications, and it shows that such semantics outperform other semantics in several semi-supervised machine learning tasks.
LTN 和 Real Logic 首次在[62]中被引入。从那时起,LTN 已被应用于涉及感知、学习和关于关系知识的推理的不同人工智能任务中。在[18, 19]中,LTN 被应用于语义图像解释,其中关于对象的关系知识被注入到深度网络中用于对象关系检测。在[6]中,LTN 被评估其执行关于本体知识的推理的能力。此外,[7]展示了 LTN 如何可以用于通过考虑关于这些概念的本体知识来学习概念的嵌入到潜在实空间中。在[3]中,LTN 被用于使用先验知识注释强化学习环境,并将潜在信息合并到代理中。在[42]中,作者将 LTN 嵌入到最先进的卷积目标检测器中。过去几年中还提出了 LTN 的扩展和泛化,如 LYRICS [47]和可微模糊逻辑(DFL)[68,69]。LYRICS 提供了一种输入语言,允许使用一阶逻辑定义背景知识,其中谓词和函数符号被基于任何计算图。 DFL 分析了在可微分学习环境中大量模糊逻辑运算符的行为。DFL 还引入了称为 Sigmoidal 蕴涵的模糊逻辑蕴涵的新语义,并表明这种语义在几个半监督机器学习任务中优于其他语义。
This paper provides a thorough description of the full formalism and several extensions of LTN. We show using an extensive set of explanatory examples, how LTN can be applied to solve many ML tasks with the help of logical knowledge. In particular, the earlier versions of LTN have been extended with: (1) Explicit domain declaration: constants, variables, functions and predicates are now domain typed (e.g. the constants John and Paris can be from the domain of person and city, respectively). The definition of structured domains is also possible (e.g. the domain couple can be defined as the Cartesian product of two domains of persons); (2) Guarded quantifiers: guarded universal and existential quantifiers now allow the user to limit the quantification to the elements that satisfy some Boolean condition,e.g. x:age(x)<10 (playsPiano (x) enfantProdige (x) ) restricts the quantification to the cases where age is lower than 10; (3) Diagonal quantification: Diagonal quantification allows the user to write statements about specific tuples extracted in order from n variables. For example,if the variables capital and country both have k instances such that the i -th instance of capital corresponds to the i -th instance of country,one can write Diag(capital,country) capitalOf(capital,country).
本文对 LTN 的完整形式主义及若干扩展进行了详尽描述。我们通过大量的解释性示例展示了如何利用逻辑知识将 LTN 应用于解决许多机器学习任务。特别是,LTN 的早期版本已经进行了扩展,包括:(1)显式域声明:常量、变量、函数和谓词现在都具有域类型(例如,常量 John 和 Paris 可以分别来自人员和城市领域)。还可以定义结构化域(例如,可以将夫妇领域定义为两个人员领域的笛卡尔积);(2)保护量词:保护全称量词和存在量词现在允许用户将量化限制在满足某些布尔条件的元素上,例如 x:age(x)<10 (playsPiano (x) enfantProdige (x) )将量化限制在年龄低于 10 岁的情况下;(3)对角量化:对角量化允许用户编写关于从 n 变量中按顺序提取的特定元组的语句。 例如,如果变量 capital 和 country 都有 k 个实例,使得 capital 的第 i 个实例对应于 country 的第 i 个实例,则可以写成 Diag(capital,country) capitalOf(capital,country)。
Inspired by the work of [69], this paper also extends the product t-norm configuration of LTN with the generalized mean aggregator, and it introduces solutions to the vanishing or exploding gradient problems. Finally, the paper formally defines a semantic approach to refutation-based reasoning in Real Logic to verify if a statement is a logical consequence of a knowledge base. Example 4.8 proves that this new approach can better capture logical consequences compared to simply querying unknown formulas after learning (as done in [6]).
受[69]的工作启发,本文还通过引入广义均值聚合器扩展了 LTN 的产品 t-范数配置,并提出了解决梯度消失或爆炸问题的解决方案。最后,本文正式定义了一种基于推理的语义方法,用于验证一个陈述是否是知识库的逻辑推论。示例 4.8 证明了这种新方法相对于学习后简单查询未知公式(如[6]中所做的)能更好地捕捉逻辑推论。
The new version of LTN has been implemented in TensorFlow 2 [1]. Both the LTN library and the code for the examples used in this paper are available at https://github.com/ logictensornetworks/logictensornetworks
LTN 的新版本已在 TensorFlow 2 中实施[1]。本文中使用的 LTN 库和示例代码均可在 https://github.com/logictensornetworks/logictensornetworks 上找到。
The remainder of the paper is organized as follows: In Section 2, we define and illustrate Real Logic as a fully-differentiable first-order logic. In Section 3, we specify learning and reasoning in Real Logic and its modeling into deep networks with Logic Tensor Networks (LTN). In Section 4 we illustrate the reach of LTN by investigating a range of learning problems from clustering to embedding learning. In Section 5, we place LTN in the context of the latest related work in neurosymbolic AI. In Section 6 we conclude and discuss directions for future work. The Appendix contains information about the implementation of LTN in TensorFlow 2, experimental set-ups, the different options for the differentiable logic operators, and a study of their relationship with gradient computations.
本文的其余部分组织如下:在第 2 节中,我们定义和说明了 Real Logic 作为一个完全可微的一阶逻辑。在第 3 节中,我们详细说明了 Real Logic 中的学习和推理,以及将其建模为具有逻辑张量网络(LTN)的深度网络。在第 4 节中,我们通过研究从聚类到嵌入学习的一系列学习问题来说明 LTN 的应用范围。在第 5 节中,我们将 LTN 置于神经符号人工智能最新相关工作的背景下。在第 6 节中,我们总结并讨论未来工作的方向。附录包含了有关在 TensorFlow 2 中实现 LTN 的信息,实验设置,不同可微逻辑运算符的不同选项,以及它们与梯度计算的关系研究。

2. Real Logic 2. 真实逻辑

2.1. Syntax 2.1. 语法

Real Logic forms the basis of Logic Tensor Networks. Real Logic is defined on a first-order language L with a signature that contains a set C of constant symbols (objects),a set F of functional symbols,a set P of relational symbols (predicates),and a set X of variable symbols. L -formulas allow us to specify relational knowledge with variables,e.g. the atomic formula is_friend (v1,v2) may state that the person v1 is a friend of the person v2 ,the formula xy(is_friend(x,y)is_friend(y,x)) states that the relation is_friend is symmetric,and the formula x(y(Italian(x)is_friend(x,y))) states that every person has a friend that is Italian. Since we are interested in learning and reasoning in real-world scenarios where degrees of truth are often fuzzy and exceptions are present, formulas can be partially true, and therefore we adopt fuzzy semantics.
真实逻辑构成了逻辑张量网络的基础。真实逻辑是在一个包含一组常量符号(对象)的签名的一阶语言 L 上定义的,一个功能符号的集合 F ,一个关系符号的集合 P ,和一个变量符号的集合 XL -公式允许我们使用变量来指定关系知识,例如,原子公式 is_friend (v1,v2) 可能陈述人 v1 是人 v2 的朋友,公式 xy(is_friend(x,y)is_friend(y,x)) 表明关系 is_friend 是对称的,公式 x(y(Italian(x)is_friend(x,y))) 表明每个人都有一个意大利朋友。由于我们对学习和推理感兴趣的是真实世界场景,其中真实度常常模糊不清且存在异常,公式可以部分成立,因此我们采用模糊语义。
Objects can be of different types. Similarly, functions and predicates are typed. Therefore, we assume there exists a non-empty set of symbols D called domain symbols. To assign types to the elements of L we introduce the functions D,Din  and Dout  such that:
对象可以是不同类型的。同样,函数和谓词也是有类型的。因此,我们假设存在一个非空符号集合 D ,称为域符号。为了给 L 的元素分配类型,我们引入函数 D,Din Dout  ,使得:
  • D:XCD . Intuitively, D(x) and D(c) returns the domain of a variable x or a constant c .
    D:XCD . 直观地, D(x)D(c) 返回变量 x 或常数 c 的定义域。
  • Din:FPD ,where D is the Kleene star of D ,that is the set of all finite sequences of symbols in D . Intuitively, Din(f) and Din(p) returns the domains of the arguments of a
    Din:FPD ,其中 DD 的 Kleene 星号,即 D 中所有有限符号序列的集合。直观地说, Din(f)Din(p) 返回参数的定义域。
function f or a predicate p . If f takes two arguments (for example, f(x,y) ), Din(f) returns two domains, one per argument.
函数 f 或谓词 p 。如果 f 接受两个参数(例如 f(x,y) ), Din(f) 返回两个域,每个参数一个。
  • Dout :FD . Intuitively, Dout (f) returns the range of a function symbol.
    Dout :FD . 直观地, Dout (f) 返回函数符号的范围。
Real Logic may also contain propositional variables,as follows: if P is a 0-ary predicate with Din(P)= (the empty sequence of domains) then P is a propositional variable (an atom with truth-value in the interval [0,1] ).
真实逻辑也可能包含命题变量,如下所示:如果 P 是一个 0 元谓词,带有 Din(P)= (空域序列),那么 P 是一个命题变量(一个具有真值在区间 [0,1] 内的原子)。
A term is constructed recursively in the usual way from constant symbols, variables, and function symbols. An expression formed by applying a predicate symbol to an appropriate number of terms with appropriate domains is called an atomic formula, which evaluates to true or false in classical logic and a number in [0,1] in the case of Real Logic. We define the set of terms of the language as follows:
术语通常是从常量符号、变量和函数符号递归地构造而成的。通过将谓词符号应用于适当数量的具有适当域的术语而形成的表达式称为原子公式,在经典逻辑中评估为真或假,在实数逻辑中评估为 [0,1] 中的一个数字。我们如下定义语言的术语集合:
  • each element t of XC is a term of the domain D(t) ;
    每个元素 tXC 中都是定义域 D(t) 的一个项;
  • if ti is a term of domain D(ti) for 1in then t1t2tn (the sequence composed of t1 followed by t2 and so on,up to tn ) is a term of the domain D(t1)D(t2)D(tn) ;
    如果 ti 是域 D(ti) 的一个术语,用于 1in ,那么 t1t2tn (由 t1 后跟 t2 等组成的序列,直至 tn )是域 D(t1)D(t2)D(tn) 的一个术语;
  • if t is a term of the domain Din(f) then f(t) is a term of the domain Dout(f) .
    如果 t 是域 Din(f) 的一个术语,则 f(t) 是域 Dout(f) 的一个术语。
We allow the following set of formula in L :
我们允许在 L 中使用以下一组公式:
  • t1=t2 is an atomic formula for any terms t1 and t2 with D(t1)=D(t2) ;
    t1=t2 是对任意项 t1t2 具有 D(t1)=D(t2) 的原子公式;
  • p(t) is an atomic formula if D(t)=Din(p) ;
    p(t) 是一个原子公式,如果 D(t)=Din(p)
  • If ϕ and ψ are formula and x1,,xn are n distinct variable symbols then ϕ,ϕψ and Qx1xnϕ are formula,where is a unary connective, is a binary connective and Q is a quantifier.
    如果 ϕψ 是公式, x1,,xnn 个不同的变量符号,那么 ϕ,ϕψQx1xnϕ 是公式,其中 是一元连接词, 是二元连接词, Q 是一个量词。
We use {¬} (negation), {,,,} (conjunction,disjunction,implication and biconditional,respectively) and Q{,} (universal and existential,respectively).
我们使用 {¬} (否定)、 {,,,} (合取、析取、蕴含和双条件分别)以及 Q{,} (全称和存在分别)。
Example 1. Let Town denote the domain of towns in the world and People denote the domain of living people. Suppose that L contains the constant symbols Alice,Bob and Charlie of domain People,and Rome and Seoul of domain Town. Let x be a variable of domain People and u be a variable of domain Town. The term x,u (i.e. the sequence x followed by u ) has domain People,Town which denotes the Cartesian product between People and Town (People × Town). Alice,Rome is interpreted as an element of the domain People, Town. Let lives in be a predicate with input domain Din  (lives_in) = People,Town. lives_in(Alice,Rome) is a well-formed expression,whereas lives_in(Bob, Charlie) is not.
示例 1. 让 Town 表示世界上的城镇领域,People 表示生活人口领域。假设 L 包含 People 领域的常量符号 Alice,Bob 和 Charlie,以及 Town 领域的 Rome 和 Seoul。让 x 是 People 领域的变量, u 是 Town 领域的变量。术语 x,u (即序列 x 后跟 u )具有 People,Town 领域,表示 People 和 Town 之间的笛卡尔积(People × Town)。Alice,Rome 被解释为 People,Town 领域的元素。让 lives in 是一个具有输入领域 Din  (lives_in)= People,Town 的谓词。lives_in(Alice,Rome)是一个良构表达式,而 lives_in(Bob,Charlie)则不是。

2.2. Semantics of Real Logic
2.2. 实逻辑的语义

The semantics of Real Logic departs from the standard abstract semantics of First-order Logic (FOL). In Real Logic, domains are interpreted concretely by tensors in the real field T Every object denoted by constants, variables, and terms, is interpreted as a tensor of real values. Functions are interpreted as real functions or tensor operations. Predicates are interpreted as functions or tensor operations projecting onto a value in the interval [0,1] .
实际逻辑的语义与一阶逻辑(FOL)的标准抽象语义有所不同。在实际逻辑中,域通过实数域 T 中的张量具体解释。由常量、变量和项表示的每个对象都被解释为实值张量。函数被解释为实函数或张量运算。谓词被解释为函数或张量运算,投影到区间 [0,1] 中的一个值。

1 In the rest of the paper,we commonly use "tensor" to designate "tensor in the real field".
在本文的其余部分,我们通常使用“张量”来指代“实域中的张量”。

To emphasize the fact that in Real Logic symbols are grounded onto real-valued features, we use the term grounding,denoted by G ,in place of interpretation 2 . Notice that this is different from the common use of the term grounding in logic, which indicates the operation of replacing the variables of a term or formula with constants or terms containing no variables. To avoid confusion, we use the synonym instantiation for this purpose. G associates a tensor of real numbers to any term of L ,and a real number in the interval [0,1] to any formula ϕ of L . Intuitively, G(t) are the numeric features of the objects denoted by t ,and G(ϕ) represents the system’s degree of confidence in the truth of ϕ ; the higher the value,the higher the confidence.
强调在真实逻辑中,符号基于实值特征,我们使用术语“接地”,表示为 G ,代替解释 2 。请注意,这与逻辑中“接地”一词的常见用法不同,后者表示用常量或不含变量的术语或公式替换变量的操作。为避免混淆,我们使用同义词“实例化”来表示这一目的。 G 将一组实数张量关联到 L 的任何术语,并将区间 [0,1] 内的实数与 L 的任何公式 ϕ 关联。直觉上, G(t) 是由 t 表示的对象的数值特征, G(ϕ) 代表系统对 ϕ 真实性的信心程度;数值越高,信心越高。

2.2.1. Grounding domains and the signature
2.2.1. 接地域和签名

A grounding for a logical language L on the set of domains D provides the interpretation of both the domain symbols in D and the non-logical symbols in L .
在域集合 D 上为逻辑语言 L 提供了基础,解释了 D 中的域符号和 L 中的非逻辑符号。
Definition 1. A grounding G associates to each domain DD a set G(D)n1ndNRn1××nd .
定义 1. 一个接地 G 将每个域 DD 关联到一个集合 G(D)n1ndNRn1××nd
For every D1DnD,G(D1Dn)=×i=1nG(Di) ,that is G(D1)×G(D2)××G(Dn) .
对于每个 D1DnD,G(D1Dn)=×i=1nG(Di) ,即 G(D1)×G(D2)××G(Dn)
Notice that the elements in G(D) may be tensors of any rank d and any dimensions n1××nd , as N denotes the Kleene star of N 3
请注意, G(D) 中的元素可能是任意秩 d 和任意维度 n1××nd 的张量,如 N 表示 N 的 Kleene 星号
Example 2. Let digit_images denote a domain of images of handwritten digits. If we use images of 256×256 RGB pixels,then G (digit_images) R256×256×3 . Let us consider the predicate is_digit (Z,8) . The terms Z,8 have domains digit_images,digits. Any input to the predicate is a tuple in G (digit_images,digits) =G (digit_images) ×G (digits).
示例 2. 让 digit_images 表示手写数字图像的域。如果我们使用 256×256 RGB 像素的图像,则 G (digit_images) R256×256×3 。让我们考虑谓词 is_digit (Z,8) 。术语 Z,8 具有 digit_images、digits 的域。谓词的任何输入都是 G (digit_images,digits) =G 中的元组(digit_images) ×G (digits)。
A grounding assigns to each constant symbol c ,a tensor G(c) in the domain G(D(c)) ; It assigns to a variable x a finite sequence of tensors d1dk ,each in G(D(x)) . These tensors represent the instances of x . Differently from in FOL where a variable is assigned to a single value of the domain of interpretations at a time, in Real Logic a variable is assigned to a sequence of values in its domain,the k examples of x . A grounding assigns to a function symbol f a function taking tensors from G(Din(f)) as input,and producing a tensor in G(Dout(f)) as output. Finally,a grounding assigns to a predicate symbol p a function taking tensors from G(Din (p)) as input,and producing a truth-value in the interval [0,1] as output.
一个基础为每个常量符号 c 分配一个张量 G(c) 在域 G(D(c)) 中;它为变量 x 分配一个有限序列的张量 d1dk ,每个在 G(D(x)) 中。这些张量代表 x 的实例。与 FOL 中变量一次只分配给解释域中的单个值不同,在实逻辑中,变量被分配给其域中值的序列,即 k 的实例。一个基础为函数符号 f 分配一个从 G(Din(f)) 中取张量为输入,并在 G(Dout(f)) 中产生张量为输出的函数。最后,一个基础为谓词符号 p 分配一个从 G(Din (p)) 中取张量为输入,并在区间 [0,1] 中产生真值为输出。
Definition 2. A grounding G of L is a function defined on the signature of L that satisfies the following conditions:
定义 2. L 的一个基础 G 是在 L 的签名上定义的函数,满足以下条件:
  1. G(x)=d1dk×i=1kG(D(x)) for every variable symbol xX ,with kN0+ . Notice that G(x) is a sequence and not a set,meaning that the same value of G(D(x)) can occur multiple times in G(x) ,as is usual in a Machine Learning data set with "attributes" and "values";
    对于每个变量符号 xX ,具有 kN0+ 。请注意 G(x) 是一个序列而不是一个集合,这意味着 G(D(x)) 的相同值可以在 G(x) 中多次出现,就像在具有“属性”和“值”的机器学习数据集中一样常见;

2 An interpretation is an assignment of truth-values true or false , or in the case of Real Logic a value in [0,1], to a formula. A model is an interpretation that maps a formula to true
解释是将真值分配给公式的一种方式,或者在实际逻辑中是[0,1]范围内的值。模型是将公式映射为真的解释。
3 A tensor of rank 0 corresponds to a scalar,a tensor of rank 1 to a vector,a tensor of rank 2 to a matrix and so forth,in the usual way.
张量的秩为 0 对应于标量,秩为 1 的张量对应于向量,秩为 2 的张量对应于矩阵,依此类推,通常是这样。

  1. G(f)G(Din (f))G(Dout (f)) for every function symbol fF ;
    对于每个函数符号 fF
  1. G(p)G(Din(p))[0,1] for every predicate symbol pP .
    对于每个谓词符号 pP
If a grounding depends on a set of parameters θ , we denote it as Gθ() or G(θ) interchangeably. Section 4 describes how such parameters can be learned using the concept of satisfiability.
如果接地依赖于一组参数 θ ,我们将其表示为 Gθ()G(θ) ,两者可以互换使用。第 4 节描述了如何利用可满足性概念来学习这些参数。

2.2.2. Grounding terms and atomic formulas
2.2.2. 接地术语和原子公式

We now extend the definition of grounding to all first-order terms and atomic formulas. Before formally defining these groundings, we describe on a high level what happens when grounding terms that contain free variables. 4
我们现在将接地的定义扩展到所有一阶项和原子公式。在正式定义这些接地之前,我们从高层次描述接地包含自由变量的项时会发生什么。
Let x be a variable that denotes people. As explained in Definition 2, x is grounded as an explicit sequence of k instances (k=|G(x)|) . Consequently,a term height (x) is also grounded in k height values, each corresponding to one instance. We can generalize to expressions with multiple free variables, as shown in Example 3.
x 表示人的变量。如定义 2 所述, x 被作为 k 实例 (k=|G(x)|) 的显式序列进行基础化。因此,术语高度 (x) 也是基于 k 个高度值进行基础化,每个值对应一个实例。我们可以推广到具有多个自由变量的表达式,如示例 3 所示。
In the formal definition below, instead of considering a single term at a time, it is convenient to consider sequences of terms t=t1t2tk and define the grounding on t (with the definition of the grounding of a single term being derived as a special case). The fact that the sequence of terms t contains n distinct variables x1,,xn is denoted by t(x1,,xn) . The grounding of t(x1,,xn) ,denoted by G(t(x1,,xn)) ,is a tensor with n corresponding axes,one for each free variable, defined as follows:
在下面的正式定义中,与其一次考虑一个术语,不如考虑术语序列 t=t1t2tk 并定义 t 上的接地(将单个术语的接地定义作为特例推导出来)。术语序列 t 包含 n 个不同变量 x1,,xn 的事实用 t(x1,,xn) 表示。 t(x1,,xn) 的接地,用 G(t(x1,,xn)) 表示,是一个张量,具有 n 个对应的轴,每个自由变量对应一个轴,定义如下:
Definition 3. Let t(x1,,xn) be a sequence t1tm of m terms containing n distinct variables x1,,xn . Let each term ti in t contain ni variables xji1,,xjini .
定义 3. 设 t(x1,,xn) 是一个包含 n 个不同变量 x1,,xnm 项的序列 t1tm 。让 t 中的每个项 ti 包含 ni 个变量 xji1,,xjini
  • G(t) is a tensor with dimensions (|G(x1)|,,|G(xn)|) such that the element of this tensor indexed by k1,,kn ,written as G(t)k1kn ,is equal to the concatenation of G(ti)kji1kjini for 1im ;
    G(t) 是一个具有 (|G(x1)|,,|G(xn)|) 维度的张量,使得该张量的元素由 k1,,kn 索引,记为 G(t)k1kn ,等于 G(ti)kji1kjini 的连接,对于 1im
  • G(f(t))i1in=G(f)(G(t)i1in) ,i.e. the element-wise application of G(f) to G(t) ;
    G(f(t))i1in=G(f)(G(t)i1in) ,即对 G(t) 应用 G(f) 的逐元素操作;
  • G(p(t))i1in=G(p)(G(t)i1in) ,i.e. the element-wise application of G(p) to G(t) .
    G(p(t))i1in=G(p)(G(t)i1in) ,即对 G(t) 进行 G(p) 的逐元素应用。
If term ti contains ni variables xj1,,xjni selected from x1,,xn then G(ti)kj1kjni can be obtained from G(t)i1in with an appropriate mapping of indices i to k .
如果术语 ti 包含 ni 个变量,这些变量从 x1,,xn 中选择,则可以通过将索引 i 映射到 k 来从 G(t)i1in 中获得 G(ti)kj1kjni

4 We assume the usual syntactic definition of free and bound variables in FOL. A variable is free if it is not bound by a quantifier (,) .
我们假设一阶逻辑中自由变量和约束变量的通常句法定义。如果变量没有被量词约束,则它是自由的。

Figure 1: Illustration of Example 3 - x and y indicate dimensions associated with the free variables x and y . A tensor representing a term that includes a free variable x will have an axis x . One can index x to obtain results calculated using each of the v1,v2 or v3 values of x . In our graphical convention,the depth of the boxes indicates that the tensor can have feature dimensions (refer to the end of Example 3).
图 1:示例 3 的插图 - xy 表示与自由变量 xy 相关联的维度。表示包含自由变量 x 的项的张量将具有轴 x 。可以索引 x 以获得使用 x 的每个 v1,v2v3 值计算的结果。在我们的图形约定中,方框的深度表示张量可以具有特征维度(参见示例 3 末尾)。
Example 3. Suppose that L contains the variables x and y ,the function f ,the predicate p and the set of domains D={V,W} . Let D(x)=V,D(y)=W,Din(f)=VW,Dout(f)=W and D(p)=VW . In what follows,an example of the grounding of L and D is shown on the left,and the grounding of some examples of possible terms and atomic formulas is shown on the right.
示例 3。假设 L 包含变量 xy ,函数 f ,谓词 p 和域集 D={V,W} 。设 D(x)=V,D(y)=W,Din(f)=VW,Dout(f)=WD(p)=VW 。接下来,左侧展示了 LD 的实例化示例,右侧展示了一些可能项和原子公式的实例化。
G(V)=R+
G(W)=R
G(x)=v1,v2,v3
G(y)=w1,w2
G(p):x,yσ(x+y)
G(f):x,yxy
Notice the dimensions of the results. G(f(x,y)) and G(p(x,f(x,y))) return |G(x)|×|G(y)|=3×2 values, one for each combination of individuals that occur in the variables. For functions, we can have additional dimensions associated to the output domain. Let us suppose a different grounding such that G(Dout (f))=Rm . Then the dimensions of G(f(x,y)) would have been |G(x)|×|G(y)|×m ,where |G(x)|×|G(y)| are the dimensions for indexing the free variables and m are dimensions associated to the output domain of f . Let us call the latter feature dimensions, as captioned in Figure 1. Notice that G(p(x,f(x,y))) will always return a tensor with the exact dimensions |G(x)|×|G(y)|×1 because,under any grounding,a predicate always returns a value in [0,1] . Therefore,as the "feature dimensions" of predicates is always 1,we choose to "squeeze it" and not to represent it in our graphical convention (see Figure 1, the box output by the predicate has no depth).
请注意结果的维度。 G(f(x,y))G(p(x,f(x,y))) 返回 |G(x)|×|G(y)|=3×2 个值,每个值对应变量中出现的个体组合。对于函数,我们可以有额外的维度与输出域相关联。假设有一个不同的基础设定,使得 G(Dout (f))=Rm 。那么 G(f(x,y)) 的维度将会是 |G(x)|×|G(y)|×m ,其中 |G(x)|×|G(y)| 是用于索引自由变量的维度, m 是与 f 的输出域相关联的维度。让我们称后者为特征维度,如图 1 所示。请注意, G(p(x,f(x,y))) 将始终返回一个具有确切维度 |G(x)|×|G(y)|×1 的张量,因为在任何基础设定下,谓词始终返回 [0,1] 中的一个值。因此,由于谓词的“特征维度”始终为 1,我们选择“挤压”它,而不在我们的图示约定中表示它(参见图 1,由谓词输出的方框没有深度)。
Figure 2: Illustration of an element-wise operator implementing conjunction (p(x)q(y)) . We assume that x and y are two different variables. The result has one number in the interval [0,1] to every combination of individuals from G(x) and G(y) .
图 2:实现合取的逐元素运算符示例 (p(x)q(y)) 。我们假设 xy 是两个不同的变量。结果在区间 [0,1] 中有一个数字,对应于 G(x)G(y) 中个体的每种组合。

2.2.3. Connectives and Quantifiers
2.2.3. 连接词和量词

The semantics of the connectives is defined according to the semantics of first-order fuzzy logic [28]. Conjunction () ,disjunction () ,implication () and negation () are associated, respectively,with a t-norm (T) ,a t-conorm (S) ,a fuzzy implication (I) and a fuzzy negation (N) operation FuzzyOp {T,S,I,N} . Definitions of some common fuzzy operators are presented in Appendix B Let ϕ and ψ be two formulas with free variables x1,,xm and y1,,yn ,respectively. Let us assume that the first k variables are common to ϕ and ψ . Recall that and denote the set of unary and binary connectives, respectively. Formally:
逻辑联结词的语义是根据一阶模糊逻辑的语义定义的[28]。分别与 t-范数 (T) 、t-余范数 (S) 、模糊蕴涵 (I) 和模糊否定 (N) 操作 FuzzyOp {T,S,I,N} 相关联的是合取 () 、析取 () 、蕴涵 () 和否定 () 。附录 B 中给出了一些常见模糊运算符的定义。设 ϕψ 是具有自由变量 x1,,xmy1,,yn 的两个公式,分别。让我们假设前 k 个变量是 ϕψ 共有的。回想一下, 分别表示一元和二元联结词的集合。形式上:
(1)G(ϕ)i1,,im=FuzzyOp()(G(ϕ)i1,,im)
(2)G(ϕψ)i1,,im+nk=FuzzyOp()(G(ϕ)i1,,ik,ik+1,,imG(ψ)i1,,ik,im+1,,im+nk)
In (2), (i1,,ik) denote the indices of the k common variables, (ik+1,,im) denote the indices of the mk variables appearing only in ϕ ,and (im+1,,im+nk) denote the indices of the nk variables appearing only in ψ . Intuitively, G(ϕψ) is a tensor whose elements are obtained by applying FuzzyOp() element-wise to every combination of individuals from x1,,xm and y1,,yn (see Figure 2).
在(2)中, (i1,,ik) 表示 k 个共同变量的索引, (ik+1,,im) 表示仅出现在 ϕ 中的 mk 个变量的索引, (im+1,,im+nk) 表示仅出现在 ψ 中的 nk 个变量的索引。直觉上, G(ϕψ) 是一个张量,其元素是通过将 FuzzyOp() 逐个应用于 x1,,xmy1,,yn 中个体的每种组合而获得的(见图 2)。
The semantics of the quantifiers ({,}) is defined with the use of aggregation. Let Agg be a symmetric and continuous aggregation operator, Agg:N[0,1]n[0,1] . An analysis of suitable
量词 ({,}) 的语义是通过聚合来定义的。让 Agg 成为一个对称且连续的聚合算子, Agg:N[0,1]n[0,1] 。对适当的进行分析
aggregation operators is presented in Appendix Appendix B. For every formula ϕ containing x1,,xn free variables,suppose,without loss of generality,that quantification applies to the first h variables. We shall therefore apply Agg to the first h axes of G(ϕ) ,as follows:
聚合算子在附录附录 B 中给出。对于每个包含 x1,,xn 个自由变量的公式 ϕ ,假设不失一般性地,量化应用于前 h 个变量。因此,我们将对 G(ϕ) 的前 h 个轴应用 Agg,如下所示:
(3)G(Qx1,,xh(ϕ))ih+1,,in=Agg(Q)G(ϕ)i1,,ih,ih+1,,in
ih=1,,|G(xh)|
where Agg(Q) is the aggregation operator associated with the quantifier Q . Intuitively,we obtain G(Qx1,,xh(ϕ)) by reducing the dimensions associated with x1,,xh using the operator Agg(Q) (see Figure 3).
其中 Agg(Q) 是与量词 Q 相关联的聚合算子。直观地,我们通过使用算子 Agg(Q) (见图 3)来减少与 x1,,xh 相关联的维度,得到 G(Qx1,,xh(ϕ))
Notice that the above grounded semantics can assign different meanings to the three formulas:
请注意,上述基于基础的语义学可以为这三个公式赋予不同的含义:
xy(ϕ(x,y))x(y(ϕ(x,y)))y(x(ϕ(x,y)))
Figure 3: Illustration of an aggregation operation implementing quantification (yx) over variables x and y . We assume that x and y have different domains. The result is a single number in the interval [0,1] .
图 3:实现对变量 xy 进行量化 (yx) 的聚合操作示例。我们假设 xy 具有不同的域。结果是一个在区间 [0,1] 内的单个数字。
The semantics of the three formulas will coincide if the aggregation operator is bi-symmetric. LTN also allows the following form of quantification, here called diagonal quantification (Diag):
三个公式的语义将重合,如果聚合算子是双对称的。LTN 还允许以下形式的量化,这里称为对角量化(Diag):
(4)G(QDiag(x1,,xh)(ϕ))ih+1,,in=Agg(Q)i=1,,min1jh|G(xj)|G(ϕ)i,,i,ih+1,,in
Diag(x1,,xh) quantifies over specific tuples such that the i -th tuple contains the i -th instance of each of the variables in the argument of Diag, under the assumption that all variables in the argument are grounded onto sequences with the same number of instances. Diag(x1,,xh) is called diagonal quantification because it quantifies over the diagonal of G(ϕ) along the axes associated with x1xh ,although in practice only the diagonal is built and not the entire G(ϕ) ,as shown in Figure 4 For example,given a data set with samples x and target labels y ,if looking to write a statement p(x,y) that holds true for each pair of sample and label,one can write Diag(x,y)p(x,y) given that |G(x)|=|G(y)| . As another example,given two variables x and y whose groundings contain 10 instances of x and y each, the expression Diag(x,y)p(x,y) produces 10 results such that the i -th result corresponds to the i -th instances of each grounding. Without Diag,the expression would be evaluated for all 10×10 combinations of the elements in G(x) and G(y) 5 Diag will find much application in the examples and experiments to follow.
Diag(x1,,xh) 量化特定元组,使第 i 个元组包含 Diag 参数中每个变量的第 i 个实例,假设参数中的所有变量都被基于具有相同实例数的序列。 Diag(x1,,xh) 被称为对角量化,因为它在与 x1xh 相关的轴上量化 G(ϕ) 的对角线,尽管在实践中只构建对角线而不是整个 G(ϕ) ,如图 4 所示。例如,给定一个包含样本 x 和目标标签 y 的数据集,如果要编写一个对每对样本和标签都成立的语句 p(x,y) ,可以写成 Diag(x,y)p(x,y) ,假设 |G(x)|=|G(y)| 。另一个例子,给定两个变量 xy ,它们的基础包含每个 xy 的 10 个实例,表达式 Diag(x,y)p(x,y) 产生 10 个结果,使得第 i 个结果对应于每个基础的第 i 个实例。没有 Diag,该表达式将对 G(x)G(y) 中元素的所有 10×10 种组合进行评估。Diag 在接下来的示例和实验中将有很多应用。

2.3. Guarded Quantifiers 2.3. 保护量词

In many situations, one may wish to quantify over a set of elements of a domain whose grounding satisfy some condition. In particular, one may wish to express such condition using formulas of the language of the form:
在许多情况下,一个人可能希望对满足某些条件的域的一组元素进行量化。特别是,一个人可能希望使用语言形式的公式来表达这种条件:
(5)y(x:age(x)>age(y)(parent(x,y)))
The grounding of such a formula is obtained by aggregating the values of parent (x,y) only for the instances of x that satisfy the condition age(x)>age(y) ,that is:
这种公式的基础是通过聚合仅对满足条件 age(x)>age(y)x 实例的父 (x,y) 的值获得的
Agg()Agg()G(parent(x,y))i,jj=1,,|G(y)|G(age(x))i>G(age(y))ji=1,,|G(x)| s.t. 

5 Notice how Diag is not simply "syntactic sugar" for creating a new variable pairs_xy by stacking pairs of examples from G(x) and G(y) . If the groundings of x and y have incompatible ranks (for instance,if x denotes images and y denotes their labels),stacking them in a tensor G (pairs_xy) is non-trivial,requiring several reshaping operations.
注意 Diag 并不仅仅是通过堆叠来自 G(x)G(y) 的示例对以创建新变量 pairs_xy 的“语法糖”。如果 xy 的基础排名不兼容(例如,如果 x 表示图像, y 表示它们的标签),将它们堆叠在张量 G (pairs_xy)中是不平凡的,需要进行多次重塑操作。

Figure 4: Diagonal Quantification: Diag (x1,x2) quantifies over specific tuples only,such that the i -th tuple contains the i -th instances of the variables x1 and x2 in the groundings G(x1) and G(x2) ,respectively. Diag (x1,x2) assumes,therefore, that x1 and x2 have the same number of instances as in the case of samples x1 and their labels x2 in a typical supervised learning tasks.
图 4:对角量化:Diag (x1,x2) 仅对特定元组进行量化,使得第 i 个元组包含地面化中变量 x1x2 的第 i 个实例。因此,Diag (x1,x2) 假定 x1x2 与典型监督学习任务中样本 x1 及其标签 x2 的实例数量相同。
The evaluation of which tuple is safe is purely symbolic and non-differentiable. Guarded quantifiers operate over only a subset of the variables, when this symbolic knowledge is crisp and available. More generally,in what follows, m is a symbol representing the condition,which we shall call a mask,and G(m) associates a function 6 returning a Boolean to m .
元组的安全性评估是纯粹符号化和不可微分的。当这种符号知识清晰且可用时,保护量词仅作用于变量的子集。更一般地,在接下来的内容中, m 是代表条件的符号,我们将其称为掩码, G(m) 将一个返回布尔值的函数 6 与 m 关联。
(6)G(Qx1,,xh:m(x1,,xn)(ϕ))ih+1,,in= def Agg(Q)i1=1,,|G(x1)|G(ϕ)i1,,ih,ih+1,,in
ih=1,,|G˙(xh)| s.t. G(m)(G(x1)i1,,G(xn)in)
Notice that the semantics of a guarded sentence x:m(x)(ϕ(x)) is different than the semantics of x(m(x)ϕ(x)) . In crisp and traditional FOL,the two statements would be equivalent. In Real Logic,they can give different results. Let G(x) be a sequence of 3 values, G(m(x))=(0,1,1) and G(ϕ(x))=(0.2,0.7,0.8) . Only the second and third instances of x are safe,that is,are in the masked subset. Let be defined using the Reichenbach operator IR(a,b)=1a+ab and be defined using the mean operator. We have G(x(m(x)ϕ(x)))=1+0.7+0.83=0.833 whereas G(x:m(x)(ϕ(x)))=0.7+0.82=0.75 . Also,in the computational graph of the guarded sentence, there are no gradients attached to the instances that do not verify the mask. Similarly, the semantics of x:m(x)(ϕ(x)) is not equivalent to that of x(m(x)ϕ(x)) .
注意到一个受保护句子 x:m(x)(ϕ(x)) 的语义与 x(m(x)ϕ(x)) 的语义不同。在清晰和传统的 FOL 中,这两个陈述将是等价的。在实际逻辑中,它们可能会产生不同的结果。设 G(x) 是一个包含 3 个值的序列, G(m(x))=(0,1,1)G(ϕ(x))=(0.2,0.7,0.8) 。只有 x 的第二和第三个实例是安全的,即属于掩码子集。设 是使用 Reichenbach 运算符 IR(a,b)=1a+ab 定义的, 是使用均值运算符定义的。我们有 G(x(m(x)ϕ(x)))=1+0.7+0.83=0.833G(x:m(x)(ϕ(x)))=0.7+0.82=0.75 。此外,在受保护句子的计算图中,没有梯度附加到不符合掩码的实例上。同样, x:m(x)(ϕ(x)) 的语义与 x(m(x)ϕ(x)) 的语义不等价。

6 In some edge cases,a masking may produce an empty sequence,e.g. if for some value of G(y) ,there is no value in G(x) that satisfies age (x)>age(y) ,we resort to the concept of an empty semantics: returns 1 and returns 0 .
在一些边缘情况下,掩码可能会产生一个空序列,例如,如果对于某个 G(y) 的值,没有任何 G(x) 中满足年龄 (x)>age(y) 的值,我们将诉诸于空语义的概念: 返回 1, 返回 0。

Figure 5: Example of Guarded Quantification: One can filter out elements of the various domains that do not satisfy some condition before the aggregation operators for and are applied.
图 5:受限量化的示例:在应用聚合运算符 之前,可以过滤掉不满足某些条件的各个域的元素。

2.4. Stable Product Real Logic
2.4. 稳定产品的真实逻辑

It has been shown in [69] that not all first-order fuzzy logic semantics are equally suited for gradient-descent optimization. Many fuzzy logic operators can lead to vanishing or exploding gradients. Some operators are also single-passing, in that they propagate gradients to only one input at a time.
在[69]中已经表明,并非所有的一阶模糊逻辑语义都适用于梯度下降优化。许多模糊逻辑运算符可能导致梯度消失或梯度爆炸。有些运算符也是单向传递的,即一次只将梯度传播到一个输入。
In general, the best performing symmetric configuration 7 for the connectives uses the product t-norm TP for conjunction,its dual t-conorm SP for disjunction,standard negation NS ,and the Reichenbach implication IR (the corresponding S-Implication to the above operators). This subset of Real Logic where the grounding of the connectives is restricted to the product configuration is called Product Real Logic in [69]. Given a and b two truth-values in [0,1] :
通常,对于连接词而言,表现最佳的对称配置 7 使用乘积 t-范数 TP 作为合取运算,其对偶 t-余范数 SP 作为析取运算,标准否定 NS ,以及 Reichenbach 蕴涵 IR (对应于上述运算符的 S-蕴涵)。在这种仅限于乘积配置的连接词基础上的 Real Logic 子集被称为 Product Real Logic [69]。给定 ab[0,1] 中的两个真值:
(7)¬:NS(a)=1a
(8):TP(a,b)=ab
(9):SP(a,b)=a+bab
(10)→:IR(a,b)=1a+ab
Appropriate aggregators for and are the generalized mean ApM with p1 to approximate the existential quantification,and the generalized mean w.r.t. the error ApME with p1 to approximate the universal quantification. They can be understood as a smooth maximum and a smooth minimum,respectively. Given n truth-values a1,,an all in [0,1] :
适当的聚合器对于 是广义均值 ApMp1 来近似存在量化,以及与误差 ApME 相关的广义均值 p1 来近似普遍量化。它们分别可以理解为平滑最大值和平滑最小值。给定 n 个真值 a1,,an 都在 [0,1] 中:
(11):ApM(a1,,an)=(1ni=1naip)1pp1
(12):ApME(a1,,an)=1(1ni=1n(1ai)p)1pp1
ApME measures the power of the deviation of each value from the ground truth 1 . With p=2 , it is equivalent to 1RMSE(a,1) ,where RMSE is the root-mean-square error, a is the vector of truth-values and 1 is a vector of 1s .
ApME 衡量每个值与真实值 1 的偏差的能力。与 p=2 一起,它相当于 1RMSE(a,1) ,其中 RMSE 是均方根误差, a 是真值向量,1 是 1s 的向量。

7 We define a symmetric configuration as a set of fuzzy operators such that conjunction and disjunction are defined by a t-norm and its dual t-conorm, respectively, and the implication operator is derived from such conjunction or disjunction operators and standard negation (c.f. Appendix B for details). In [69], van Krieken et al. also analyze non-symmetric configurations and even operators that do not strictly verify fuzzy logic semantics.
我们将对称配置定义为一组模糊运算符的集合,其中合取和析取分别由 t-范数及其对偶 t-余范数定义,并且蕴涵运算符是从这样的合取或析取运算符以及标准否定中推导出来的(详见附录 B)。在[69]中,van Krieken 等人还分析了非对称配置,甚至是严格验证模糊逻辑语义的运算符。

The intuition behind the choice of p is that the higher that p is,the more weight that ApM (resp. ApME) will give to true (resp. false) truth-values,converging to the max (resp. min) operator. Therefore,the value of p can be seen as a hyper-parameter as it offers flexibility to account for outliers in the data depending on the application.
选择 p 的直觉是, p 越高, ApM (或 ApME) )对真(或假)真值的赋予权重就越大,收敛到最大(或最小)运算符。因此, p 的值可以被视为一个超参数,因为它提供了灵活性,可以根据应用程序考虑数据中的异常值。
Nevertheless,Product Real Logic still has the following gradient problems: TP(a,b) has vanishing gradients on the edge case a=b=0;SP(a,b) has vanishing gradients on the edge case a=b=1 ; IR(a,b) has vanishing gradients on the edge case a=0,b=1;ApM(a1,,an) has exploding gradients when i(ai)p tends to 0;ApME(a1,,an) has exploding gradients when i(1ai)p tends to 0 (see Appendix C for details).
然而,Product Real Logic 仍然存在以下梯度问题: TP(a,b) 在边缘情况下梯度消失 a=b=0;SP(a,b) 在边缘情况下梯度消失 a=b=1IR(a,b) 在边缘情况下梯度消失 a=0,b=1;ApM(a1,,an)i(ai)p 趋近于 0;ApME(a1,,an) 时梯度爆炸 i(1ai)pi(1ai)p 趋近于 0 时梯度爆炸(详见附录 C)。
To address these problems,we define the projections π0 and π1 below with ϵ an arbitrarily small positive real number:
为了解决这些问题,我们定义如下的投影 π0π1 ,其中 ϵ 是任意小的正实数:
(13)π0:[0,1]]0,1]:a(1ϵ)a+ϵ
(14)π1:[0,1][0,1[:a(1ϵ)a
We then derive the following stable operators to produce what we call the Stable Product Real Logic configuration:
我们然后推导出以下稳定算子,以产生我们所称的稳定产品实逻辑配置:
(15)NS(a)=NS(a)
(16)TP(a,b)=TP(π0(a),π0(b))
(17)SP(a,b)=SP(π1(a),π1(b))
(18)IR(a,b)=IR(π0(a),π1(b))
(19)ApM(a1,,an)=ApM(π0(a1),,π0(an))p1
(20)ApME(a1,,an)=ApME(π1(a1),,π1(an))p1
It is important noting that the conjunction operator in stable product semantics is not a T-norm 8 TP(a,b) does not satisfy identity in [0,1[ since for any 0a<1,TP(a,1)=(1ϵ)a+ϵa , although ϵ can be chosen arbitrarily small. In the experimental evaluations reported in Section 4, we find that the adoption of the stable product semantics is an important practical step to improve the numerical stability of the learning system.
值得注意的是,稳定产品语义中的连接运算符不是 T-范数 TP(a,b) 不满足 [0,1[ 中的恒等性,因为对于任意 0a<1,TP(a,1)=(1ϵ)a+ϵa ,尽管 ϵ 可以选择任意小。在第 4 节报告的实验评估中,我们发现采用稳定产品语义是改善学习系统数值稳定性的重要实际步骤。

3. Learning, Reasoning, and Querying in Real Logic
3. 在真实逻辑中学习、推理和查询

In Real Logic, one can define the tasks of learning, reasoning and query-answering. Given a Real Logic theory that represents the knowledge of an agent at a given time, learning is the task of making generalizations from specific observations obtained from data. This is often called inductive inference. Reasoning is the task of deriving what knowledge follows from the facts which are currently known. Query answering is the task of evaluating the truth value of a certain logical expression (called a query), or finding the set of objects in the data that evaluate a certain expression to true. In what follows, we define and exemplify each of these tasks. To do so, we first need to specify which types of knowledge can be represented in Real Logic.
在实际逻辑中,可以定义学习、推理和查询回答的任务。给定一个代理在特定时间的知识表示的实际逻辑理论,学习是从数据中获得的具体观察中进行概括的任务。这通常被称为归纳推理。推理是从当前已知的事实中推导出什么知识的任务。查询回答是评估某个逻辑表达式(称为查询)的真值,或找到使某个表达式评估为真的数据对象集合的任务。接下来,我们定义并举例说明这些任务。为此,我们首先需要指定可以在实际逻辑中表示的知识类型。

8 Recall that a T-norm is a function T:[0,1]×[0,1][0,1] satisfying commutativity,monotonicity,associativity and identity,that is, T(a,1)=a .
回想一下,T-范数是一个满足交换性、单调性、结合性和单位元的函数。

3.1. Representing Knowledge with Real Logic
3.1. 用真实逻辑表示知识

In logic-based knowledge representation systems, knowledge is represented by logical formulas whose intended meanings are propositions about a domain of interest. The connection between the symbols occurring in the formulas and what holds in the domain is not represented in the knowledge base and is left implicit since it does not have any effect on the logic computations. In Real Logic, by contrast, the connection between the symbols and the domain is represented explicitly in the language by the grounding G ,which plays an important role in both learning and reasoning. G is an integral part of the knowledge represented by Real Logic. A Real Logic knowledge base is therefore defined by the formulas of the logical language and knowledge about the domain in the form of groundings obtained from data. The following types of knowledge can be represented in Real Logic.
在基于逻辑的知识表示系统中,知识通过逻辑公式来表示,其意图是关于感兴趣领域的命题。公式中出现的符号与领域中的情况之间的联系在知识库中没有被表示,因为它对逻辑计算没有任何影响,被留下隐含的。相比之下,在真实逻辑中,符号与领域之间的联系在语言中通过基础 G 明确表示,这在学习和推理中起着重要作用。 G 是真实逻辑所表示的知识的一个组成部分。因此,真实逻辑知识库由逻辑语言的公式和从数据中获得的基础形式的领域知识定义。以下类型的知识可以在真实逻辑中表示。

3.1.1. Knowledge through symbol groundings
3.1.1. 通过符号基础获得的知识

Boundaries for domain grounding. These are constraints specifying that the value of a certain logical expression must be within a certain range. For instance, one may specify that the domain D must be interpreted in the [0,1] hyper-cube or in the standard n -simplex,i.e. the set d1,,dn(R+)n such that idi=1 . Other intuitive examples of range constraints include the elements of the domain "colour" grounded onto points in [0,1]3 such that every element is associated with the triplet of values (R,G,B) with R,G,B[0,1] ,or the range of a function age(x) as an integer between 0 and 100 .
领域基础的边界。这些是指定某个逻辑表达式的值必须在某个范围内的约束条件。例如,可以指定域 D 必须在 [0,1] 超立方体或标准 n -单纯形中解释,即集合 d1,,dn(R+)n 使得 idi=1 。其他直观的范围约束示例包括将域“颜色”的元素基于点 [0,1]3 ,使得每个元素与值三元组 (R,G,B) 相关联,其中 R,G,B[0,1] ,或将函数 age(x) 的范围定义为介于 0 和 100 之间的整数。
Explicit definition of grounding for symbols. Knowledge can be more strictly incorporated by fixing the grounding of some symbols. If a constant c denotes an object with known features vcRn ,we can fix its grounding G(c)=vc . Training data that consists in a set of n data items such as n images (or tuples known as training examples) can be specified in Real Logic by n constants,e.g. img1,img2,,imgn ,and by their groundings,e.g. G(img1)=Ω,G(img2)=Ω,,G(imgn)=Ω . These can be gathered in a variable imgs. A binary predicate sim that measures the similarity of two objects can be grounded as, e.g., a cosine similarity function of two vectors v and w,(v,w)vwv∥∥w . The output layer of the neural network associated with a multi-class single-label predicate P(x,class) can be a softmax function normalizing the output such that it guarantees exclusive classification, i.e. iP(x,i)=1 ? Grounding of constants and functions allows the computation of the grounding of their results. If,for example, G (transp) is the function that transposes a matrix then G(transp(img1))= .
符号接地的明确定义。通过固定一些符号的接地,知识可以更严格地被纳入。如果一个常量 c 表示具有已知特征 vcRn 的对象,则可以固定其接地 G(c)=vc 。由一组 n 数据项组成的训练数据,例如 n 图像(或称为训练示例的元组),可以通过 n 常量在 Real Logic 中指定,并通过它们的接地,例如 img1,img2,,imgn ,和 G(img1)=Ω,G(img2)=Ω,,G(imgn)=Ω 。这些可以被收集在一个变量 imgs 中。度量两个对象相似性的二元谓词 sim 可以被接地为,例如,两个向量 vw,(v,w)vwv∥∥w 的余弦相似性函数。与多类单标签谓词 P(x,class) 相关联的神经网络的输出层可以是一个归一化输出的 softmax 函数,以确保独占分类,即 iP(x,i)=1 ?常量和函数的接地允许计算它们结果的接地。例如,如果 G (transp)是转置矩阵的函数,则 G(transp(img1))=
Parametric definition of grounding for symbols. Here,the exact grounding of a symbol σ is not known, but it is known that it can be obtained by finding a set of real-valued parameters, that is,via learning. To emphasize this fact,we adopt the notation G(σ)=G(σθσ) where θσ is the set of parameter values that determines the value of G(σ) . The typical example of parametric grounding for constants is the learning of an embedding. Let emb(wordθemb) be a word embedding with parameters θemb which takes as input a word and returns its embedding in Rn . If the words of a vocabulary W={w1,,w|W|} are constant symbols, their groundings G(wiθemb) are defined parametrically w.r.t. θemb as emb(wiθemb) . An example of parametric grounding for a function symbol f is to assume that G(f) is a linear function such that G(f):RmRn maps each vRm into Afv+bf ,with Af a matrix of real numbers and b a vector of real numbers. In this case, G(f)=G(fθf) ,where θf={Af,bf} . Finally, the grounding of a predicate symbol can be given, for example, by a neural network N with parameters θN . As an example,consider a neural network N trained for image classification into n classes: cat,dog,horse,etc. N takes as input a vector v of pixel values and produces as output a vector y=(ycat ,ydog ,yhorse ,) in [0,1]n such that y=N(vθN) , where yc is the probability that input image v is of class c . In case classes are,alternatively, chosen to be represented by unary predicate symbols such as cat(v),dog(v),horse(v), then G(cat(v))=N(vθN)cat ,G(dog(v))=N(vθN)dog ,G(horse(v))=N(vθN)horse  ,etc.
符号的参数化定义。在这里,符号 σ 的确切基础不为人知,但已知可以通过找到一组实值参数来获得,即通过学习。为了强调这一事实,我们采用符号 G(σ)=G(σθσ) ,其中 θσ 是确定 G(σ) 值的参数值集。常数的参数化基础的典型例子是嵌入的学习。设 emb(wordθemb) 是具有参数 θemb 的词嵌入,它以一个词作为输入并返回其在 Rn 中的嵌入。如果词汇表 W={w1,,w|W|} 的词是常数符号,则它们的基础 G(wiθemb) 可以根据 θemb 进行参数化定义,如 emb(wiθemb) 。函数符号的参数化基础的一个例子是假设 f 是一个线性函数,使得 G(f):RmRn 将每个 vRm 映射到 Afv+bf ,其中 Af 是一个实数矩阵, b 是一个实数向量。在这种情况下, G(f)=G(fθf) ,其中 θf={Af,bf} 。最后,谓词符号的基础可以通过具有参数 θN 的神经网络 N 来给出。例如,考虑一个用于图像分类成 n 类(猫、狗、马等)的神经网络 NN 以像素值向量 v 作为输入,并生成在 [0,1]n 中的向量 y=(ycat ,ydog ,yhorse ,) 作为输出,使得 y=N(vθN) ,其中 yc 是输入图像 v 属于类 c 的概率。如果选择用一元谓词符号表示类别,则 cat(v),dog(v),horse(v), ,等等。

9 Notice that softmax is often used as the last layer in neural networks to turn logits into a probability distribution. However, we do not use the softmax function as such here. Instead, we use it here to enforce an exclusivity constraint on satisfiability scores.
请注意,softmax 通常用作神经网络中的最后一层,将 logits 转换为概率分布。但是,在这里我们并不像这样使用 softmax 函数。相反,我们在这里使用它来强制执行对可满足性得分的排他性约束。

3.1.2. Knowledge through formulas
3.1.2. 公式知识

Factual propositions. Knowledge about the properties of specific objects in the domain is represented, as usual, by logical propositions, as exemplified below: Suppose that it is known that img1 is a number eight, img2 is a number nine,and imgn is a number two. This can be represented by adding the following facts to the knowledge-base: nine (img1) ,eight (img2), , two (imgn) . Supervised learning,that is,learning with the use of training examples which include target values (labelled data), is specified in Real Logic by combining grounding definitions and factual propositions. For example,the fact that an image Z is a positive example for the class nine and a negative example for the class eight is specified by defining G(img1)=Z alongside the propositions nine (img1) and ¬eight(img1) . Notice how semi-supervision can be specified naturally in Real Logic by adding propositions containing disjunctions,e.g. eight(img1)nine(img1) ,which state that img1 is either an eight or a nine (or both). Finally, relational learning can be achieved by relating logically multiple objects (defined as constants or variables or even as more complex sequences of terms) such as e.g.: nine(img1)nine(img2) (if img1 is a nine then img2 is not a nine) or nine (img)¬ eight (img) (if an image is a nine then it is not an eight). The use of more complex knowledge including the use of variables such as img above is the topic of generalized propositions, discussed next.
事实命题。关于领域中特定对象的属性的知识通常由逻辑命题表示,如下所示:假设已知 img1 是数字八, img2 是数字九, imgn 是数字二。这可以通过将以下事实添加到知识库中来表示:九 (img1) ,八 (img2), ,二 (imgn) 。监督学习,即使用包括目标值(标记数据)的训练示例进行学习,在 Real Logic 中通过结合基础定义和事实命题来指定。例如,图像 Z 是类别九的正例并且是类别八的负例这一事实是通过定义 G(img1)=Z 以及命题九 (img1)¬eight(img1) 来指定的。请注意,如何在 Real Logic 中通过添加包含析取的命题来自然指定半监督学习,例如 eight(img1)nine(img1) ,这些命题陈述 img1 是八或九(或两者都是)。最后,通过关联逻辑上多个对象(定义为常量或变量,甚至是更复杂的术语序列)可以实现关系学习,例如。 nine(img1)nine(img2) (如果 img1 是九,那么 img2 就不是九)或九 (img)¬(img) (如果一个图像是九,那么它就不是八)。使用更复杂的知识,包括像 img 这样的变量的使用,是下一节讨论的广义命题的主题。
Generalized propositions. General knowledge about all or some of the objects of some domains can be specified in Real Logic by using first-order logic formulas with quantified variables. This general type of knowledge allows one to specify arbitrary constraints on the groundings independently from the specific data available. It allows one to specify, in a concise way, knowledge that holds true for all the objects of a domain. This is especially useful in Machine Learning in the semi-supervised and unsupervised settings, where there is no specific knowledge about a single individual. For example, as part of a task of multi-label classification with constraints on the labels [12], a positive label constraint may express that if an example is labelled with l1,,lk then it should also be labelled with lk+1 . This can be specified in Real Logic with a universally quantified formula: x(l1(x)lk(x)lk+1(x))10 Another example of soft constraints used in Statistical Relational Learning associates the labels of related examples. For instance, in Markov Logic Networks [55], as part of the well-known Smokers and Friends example, people who are smokers are associated by the friendship relation. In Real Logic,the formula xy((smokes(x)friend(x,y))smokes(y)) would be used to encode the soft constraint that friends of smokers are normally smokers.
广义命题。通过使用带有量化变量的一阶逻辑公式,可以在实际逻辑中指定关于某些领域的所有或部分对象的一般知识。这种一般类型的知识允许人们独立于特定数据可用的情况下指定任意约束。它允许以简洁的方式指定适用于领域中所有对象的知识。这在机器学习中的半监督和无监督设置中特别有用,其中没有关于单个个体的特定知识。例如,在具有标签约束的多标签分类任务的一部分中,正标签约束可能表示如果一个示例被标记为 l1,,lk ,那么它也应该被标记为 lk+1 。这可以用一个全称量化的公式在实际逻辑中指定: x(l1(x)lk(x)lk+1(x))10 在统计关系学习中使用的软约束的另一个例子是将相关示例的标签相关联。例如,在马尔可夫逻辑网络中,作为著名的吸烟者和朋友示例的一部分,吸烟者之间通过友谊关系相关联。 在实际逻辑中,公式 xy((smokes(x)friend(x,y))smokes(y)) 将被用来编码这样一个软约束,即吸烟者的朋友通常也是吸烟者。

10 This can also be specified using a guarded quantifier x:((l1(x)lk(x))>th)lk+1(x) where th is a threshold value in [0,1] .
这也可以使用受保护的量词 x:((l1(x)lk(x))>th)lk+1(x) 来指定,其中 th[0,1] 中的阈值。

3.1.3. Knowledge through fuzzy semantics
3.1.3. 通过模糊语义获取知识

Definition for operators. The grounding of a formula ϕ depends on the operators approximating the connectives and quantifiers that appear in ϕ . Different operators give different interpretations of the satisfaction associated with the formula. For instance, the operator ApME(a1,,an) that approximates universal quantification can be understood as a smooth minimum. It depends on a hyper-parameter p (the exponent used in the generalized mean). If p=1 then ApME(a1,,an) corresponds to the arithmetic mean. As p increases,given the same input,the value of the universally quantified formula will decrease as ApME converges to the min operator. To define how strictly the universal quantification should be interpreted in each proposition,one can use different values of p for different propositions of the knowledge base. For instance,a formula xP(x) where ApME is used with a low value for p will in fact denote that P holds for some x ,whereas a formula xQ(x) with a higher p may denote that Q holds for most x .
运算符的定义。公式 ϕ 的基础取决于近似出现在 ϕ 中的连接词和量词的运算符。不同的运算符给出了与公式相关的满足的不同解释。例如,近似普遍量化的运算符 ApME(a1,,an) 可以理解为平滑最小值。它取决于一个超参数 p (广义均值中使用的指数)。如果 p=1 ,那么 ApME(a1,,an) 对应于算术平均值。随着 p 的增加,在给定相同输入的情况下,普遍量化公式的值将随着 ApME 收敛到最小运算符而减少。为了定义每个命题中应该严格解释普遍量化的程度,可以对知识库中不同命题使用不同的 p 值。例如,一个使用 ApME 的公式,其中 p 的值较低,实际上表示 P 对某些 x 成立,而一个具有较高 p 的公式可能表示 Q 对大多数 x 成立。

3.1.4. Satisfiability 3.1.4. 可满足性

In summary, a Real Logic knowledge-base has three components: the first describes knowledge about the grounding of symbols (domains, constants, variables, functions, and predicate symbols); the second is a set of closed logical formulas describing factual propositions and general knowledge; the third lies in the operators and the hyperparameters used to evaluate each formula. The definition that follows formalizes this notion.
总之,一个真实逻辑知识库有三个组成部分:第一个描述符号(领域、常量、变量、函数和谓词符号)基础知识;第二个是一组封闭的逻辑公式,描述事实命题和一般知识;第三个在于用于评估每个公式的运算符和超参数。随后的定义形式化了这一概念。
Definition 4 (Theory/Knowledge-base). A theory of Real Logic is a triple T=K,G(θ),Θ , where K is a set of closed first-order logic formulas defined on the set of symbols S=DX CFP denoting,respectively,domains,variables,constants,function and predicate symbols; G(θ) is a parametric grounding for all the symbols sS and all the logical operators; and Θ={Θs}sS is the hypothesis space for each set of parameters θs associated with symbol s .
定义 4(理论/知识库)。实逻辑的理论是一个三元组 T=K,G(θ),Θ ,其中 K 是定义在表示域、变量、常量、函数和谓词符号集 S=DX CFP 上的一组闭合的一阶逻辑公式; G(θ) 是所有符号 sS 和所有逻辑运算符的参数化基础; Θ={Θs}sS 是与符号 s 相关联的每组参数 θs 的假设空间。
Learning and reasoning in a Real Logic theory are both associated with searching and applying the set of values of parameters θ from the hypothesis space Θ that maximize the satisfaction of the formulas in K . We use the term grounded theory,denoted by K,Gθ ,to refer to a Real Logic theory with a specific set of learned parameter values. This idea shares some similarity with the weighted MAX-SAT problem [43],where the weights for formulas in K are given by their fuzzy truth-values obtained by choosing the parameter values of the grounding. To define this optimization problem, we aggregate the truth-values of all the formulas in K by selecting a formula aggregating operator SatAgg : [0,1][0,1] .
在实际逻辑理论中,学习和推理都与搜索和应用假设空间 Θ 中最大化满足 K 中公式的参数值集 θ 相关联。我们使用术语“基于理论”,表示具有一组特定学习参数值的实际逻辑理论,记为 K,Gθ 。这个想法与加权最大可满足性问题[43]有一些相似之处,其中 K 中公式的权重由通过选择基础参数值获得的模糊真值给出。为了定义这个优化问题,我们通过选择一个公式聚合运算符 SatAgg: [0,1][0,1] ,聚合 K 中所有公式的真值。
Definition 5. The satisfiability of a theory T=K,Gθ with respect to the aggregating operator SatAgg is defined as SatAggϕKGθ(ϕ) .
定义 5. 对于聚合运算符 SatAgg,理论 T=K,Gθ 的可满足性被定义为 SatAggϕKGθ(ϕ)

3.2. Learning 3.2. 学习

Given a Real Logic theory T=(K,G(θ),Θ) ,learning is the process of searching for the set of parameter values θ that maximize the satisfiability of T w.r.t. a given aggregator:
给定一个实际逻辑理论 T=(K,G(θ),Θ) ,学习是搜索一组参数值 θ 的过程,这些参数值最大化相对于给定聚合器 T 的可满足性:
θ=argmaxθΘSatAggϕKGθ(ϕ)
Notice that with this general formulation, one can learn the grounding of constants, functions, and predicates. The learning of the grounding of constants corresponds to the learning of em-beddings. The learning of the grounding of functions corresponds to the learning of generative models or a regression task. Finally, the learning of the grounding of predicates corresponds to a classification task in Machine Learning.
注意到这个一般性的表述,人们可以学习常数、函数和谓词的基础。学习常数的基础对应于学习嵌入。学习函数的基础对应于学习生成模型或回归任务。最后,学习谓词的基础对应于机器学习中的分类任务。
In some cases, it is useful to impose some regularization (as done customarily in ML) on the set of parameters θ ,thus encoding a preference on the hypothesis space Θ ,such as a preference for smaller parameter values. In this case, learning is defined as follows:
在某些情况下,对参数集合 θ 施加一些正则化(类似于机器学习中通常做的),从而对假设空间 Θ 进行偏好编码,比如偏好较小的参数值。在这种情况下,学习被定义如下:
θ=argmaxθΘ(SatAggθGθ(ϕ)λR(θ))
where λR+ is the regularization parameter and R is a regularization function,e.g. L1 or L2 regularization,that is, L1(θ)=θθ|θ| and L2(θ)=θθθ2 .
其中 λR+ 是正则化参数, R 是正则化函数,例如 L1L2 正则化,即 L1(θ)=θθ|θ|L2(θ)=θθθ2
LTN can generalize and extrapolate when querying formulas grounded with unseen data (for example, new individuals from a domain), using knowledge learned with previous groundings (for example, re-using a trained predicate). This is explained in Section 3.3
LTN 可以在查询以未见数据为基础的公式时进行泛化和外推(例如,来自领域的新个体),利用先前基础学习的知识(例如,重新使用训练过的谓词)。这在第 3.3 节中有解释。

3.3. Querying 3.3. 查询

Given a grounded theory T=(K,Gθ) ,query answering allows one to check if a certain fact is true (or, more precisely, by how much it is true since in Real Logic truth-values are real numbers in the interval [0,1]) . There are various types of queries that can be asked of a grounded theory.
给定一个基于理论 T=(K,Gθ) ,查询回答允许我们检查某个事实是否为真(或者更准确地说,它有多真,因为在真实逻辑中,真值是在区间 [0,1]) 内的实数)。有各种类型的查询可以询问基于理论。
A first type of query is called truth queries. Any formula in the language of T can be a truth query. The answer to a truth query ϕq is the truth value of ϕq obtained by computing its grounding, i.e. Gθ(ϕq) . Notice that,if ϕq is a closed formula,the answer is a scalar in [0,1] denoting the truth-value of ϕq according to Gθ . if ϕq contains n free variables x1,,xn ,the answer to the query is a tensor of order n such that the component indexed by i1in is the truth-value of ϕq evaluated in Gθ(x1)i1,,Gθ(xn)in.
第一类查询称为真值查询。语言 T 中的任何公式都可以是真值查询。对真值查询 ϕq 的回答是通过计算其接地得到的 ϕq 的真值。请注意,如果 ϕq 是一个封闭公式,则答案是一个标量,表示 [0,1] 根据 Gθ 的真值。如果 ϕq 包含 n 个自由变量 x1,,xn ,则查询的答案是一个张量,其阶数为 n ,使得由 i1in 索引的分量是在 Gθ(x1)i1,,Gθ(xn)in. 中评估的 ϕq 的真值。
The second type of query is called value queries. Any term in the language of T can be a value query. The answer to a value query tq is a tensor of real numbers obtained by computing the grounding of the term,i.e. Gθ(tq) . Analogously to truth queries,the answer to a value query is a "tensor of tensors" if tq contains variables. Using value queries,one can inspect how a constant or a term, more generally, is embedded in the manifold.
第二种查询类型称为值查询。 T 语言中的任何术语都可以是值查询。值查询 tq 的答案是通过计算术语的接地得到的一组实数张量,即 Gθ(tq) 。类似于真值查询,如果 tq 包含变量,则值查询的答案是“张量的张量”。使用值查询,可以检查常数或术语在流形中的嵌入方式。
The third type of query is called generalization truth queries. With generalization truth queries, we are interested in knowing the truth-values of formulas when these are applied to a new (unseen) set of objects of a domain, such as a validation or a test set of examples typically used in the evaluation of machine learning systems. A generalization truth query is a pair (ϕq(x),U) ,where ϕq is a formula with a free variable x and U=(u(1),,u(k)) is a set of unseen examples whose dimensions are compatible with those of the domain of x . The answer to the query (ϕ^q(x),U) is Gθ(ϕq(x)) for x taking each value u(i),1ik ,in U . The result of this query is therefore a vector of |U| truth-values corresponding to the evaluation of ϕq on new data u(1),,u(k) .
第三种查询类型被称为泛化真值查询。通过泛化真值查询,我们希望了解当这些公式应用于一个新的(未见过的)对象集时的真值,比如在评估机器学习系统时通常用到的验证或测试集示例。一个泛化真值查询是一个对 (ϕq(x),U) ,其中 ϕq 是一个带有自由变量 x 的公式, U=(u(1),,u(k)) 是一组未见过的示例,其维度与 x 的域的维度兼容。对于查询 (ϕ^q(x),U) 的答案是 Gθ(ϕq(x)) ,对于 x 中的每个值 u(i),1ik ,在 U 中。因此,这个查询的结果是一个 |U| 真值向量,对应于在新数据 u(1),,u(k) 上对 ϕq 的评估。
The fourth and final type of query is generalization value queries. These are analogous to generalization truth queries with the difference that they evaluate a term tq(x) ,and not a formula,on new data U . The result,therefore,is a vector of |U| values corresponding to the evaluation of the trained model on a regression task using test data U .
第四种也是最后一种查询类型是泛化值查询。这类似于泛化真值查询,不同之处在于它评估一个术语 tq(x) ,而不是一个公式,在新数据 U 上。因此,结果是一个向量,其中包含对使用测试数据 U 在回归任务上训练模型的评估的 |U| 个值。

3.4. Reasoning 3.4. 推理

3.4.1. Logical consequence in Real Logic
3.4.1. 实逻辑中的逻辑推论

From a pure logic perspective, reasoning is the task of verifying if a formula is a logical consequence of a set of formulas. This can be achieved semantically using model theory () or syntactically via a proof theory () . To characterize reasoning in Real Logic,we adapt the notion of logical consequence for fuzzy logic provided in [9]: A formula ϕ is a fuzzy logical consequence of a finite set of formulas Γ ,in symbols Γϕ if for every fuzzy interpretation f ,if all the formulas in Γ are true (i.e. evaluate to 1) in f then ϕ is true in f . In other words,every model of Γ is a model of ϕ . A direct application of this definition to Real Logic is not practical since in most practical cases the level of satisfiability of a grounded theory (K,Gθ) will not be equal to 1 . We therefore define an interval [q,1] with 12<q<1 and assume that a formula is true if its truth-value is in the interval [q,1] . This leads to the following definition:
从纯逻辑的角度来看,推理是验证一个公式是否是一组公式的逻辑推论的任务。这可以通过模型论语义地实现 () ,也可以通过证明论句法地实现 () 。为了描述实际逻辑中的推理,我们采用了[9]中提供的模糊逻辑的逻辑推论概念:一个公式 ϕ 是一组有限公式 Γ 的模糊逻辑推论,用符号表示为 Γϕ ,如果对于每个模糊解释 f ,如果 Γ 中的所有公式都为真(即求值为 1)在 f 中,则 ϕf 中为真。换句话说, Γ 的每个模型都是 ϕ 的模型。直接将这个定义应用于实际逻辑是不切实际的,因为在大多数实际情况下,一个基础理论 (K,Gθ) 的可满足性水平不会等于 1。因此,我们定义一个区间 [q,1] ,其中 12<q<1 ,并假设如果一个公式的真值在区间 [q,1] 内,则该公式为真。这导致以下定义:
Definition 6. A closed formula ϕ is a logical consequence of a knowledge-base (K,G(θ),Θ) ,in symbols (K,G(θ),Θ)qϕ ,if,for every grounded theory K,Gθ ,if SatAgg(K,Gθ)q then Gθ(ϕ)q .
定义 6. 一个封闭公式 ϕ 是知识库 (K,G(θ),Θ) 的逻辑推论,用符号表示为 (K,G(θ),Θ)qϕ ,如果对于每一个基础理论 K,Gθ ,如果 SatAgg(K,Gθ)qGθ(ϕ)q

3.4.2. Reasoning by optimization
3.4.2. 优化推理

Logical consequence by direct application of Definition 6 requires querying the truth value of ϕ for a potentially infinite set of groundings. Therefore,we consider in practice the following directions:
通过直接应用定义 6 来推导逻辑结论需要查询 ϕ 的真值,这要求对一个潜在无限的基础集进行查询。因此,在实践中,我们考虑以下方向:
Reasoning Option 1 (Querying after learning). This is approximate logical inference by considering only the grounded theories that maximally satisfy (K,G(θ),Θ) . We therefore define that ϕ is a brave logical consequence of a Real Logic knowledge-base (K,G(θ),Θ) if Gθ(ϕ)q for all the θ such that:
推理选项 1(学习后查询)。这是通过仅考虑最大程度满足 (K,G(θ),Θ) 的基础理论来进行近似逻辑推理。因此,我们定义如果对于所有这样的 θGθ(ϕ)q 是实际逻辑知识库 (K,G(θ),Θ) 的一个勇敢的逻辑结果。
θ=argmaxθSatAgg(K,Gθ) and SatAgg(K,Gθ)q
The objective is to find all θ that optimally satisfy the knowledge base and to measure if they also satisfy ϕ . One can search for such θ by running multiple optimizations with the objective function of Section 3.2
目标是找到所有 θ ,以最佳方式满足知识库,并测量它们是否也满足 ϕ 。可以通过运行具有第 3.2 节目标函数的多个优化来搜索这些 θ
This approach is somewhat naive. Even if we run the optimization multiple times with multiple parameter initializations (to, hopefully, reach different optima in the search space), the obtained groundings may not be representative of other optimal or close-to-optimal groundings. In Section 4.8 we give an example that shows the limitations of this approach and motivates the next one.
这种方法有点幼稚。即使我们多次运行优化,使用多个参数初始化(希望在搜索空间中达到不同的最优解),所得到的基础可能不代表其他最优或接近最优的基础。在第 4.8 节中,我们给出一个示例,展示了这种方法的局限性,并激励了下一个方法。
Reasoning Option 2 (Proof by Refutation). Here, we reason by refutation and search for a counterexample to the logical consequence by introducing an alternative search objective. Normally, according to Definition 6, one tries to verify that 11
推理选项 2(反驳证明)。在这里,我们通过反驳进行推理,并通过引入替代搜索目标来寻找逻辑结果的反例。通常情况下,根据定义 6,人们试图验证 11。
(21)for allθΘ,ifGθ(K)qthenGθ(ϕ)q.
Instead, we solve the dual problem:
相反,我们解决双重问题:
(22)there existsθΘsuch thatGθ(K)qandGθ(ϕ)<q.
If Eq.(22) is true then a counterexample to Eq.(21) has been found and the logical consequence does not hold. If Eq. 22 is false then no counterexample to Eq. 21 has been found and the logical consequence is assumed to hold true. A search for such parameters θ (the counterexample) can be performed by minimizing Gθ(ϕ) while imposing a constraint that seeks to invalidate results where Gθ(K)<q . We therefore define:
如果等式(22)成立,则已找到等式(21)的反例,逻辑推论不成立。如果等式 22 不成立,则没有找到等式 21 的反例,逻辑推论被认为成立。可以通过最小化 Gθ(ϕ) 来寻找这样的参数 θ (反例),同时施加一个约束条件,该约束条件旨在使 Gθ(K)<q 的结果无效。因此,我们定义:
penalty(Gθ,q)={c if Gθ(K)<q,0 otherwise, where c>1

11 For simplicity,we temporarily define the notation G(K):=SatAggϕK(K,G) .
为简单起见,我们暂时定义符号 G(K):=SatAggϕK(K,G)

Given G such that: 给定 G ,使得:
(23)G=argminGθ(Gθ(ϕ)+penalty(Gθ,q))
  • If G(K)<q : Then for all Gθ,Gθ(K)<q and therefore (K,G(θ),Θ)qϕ .
    如果 G(K)<q :那么对于所有 Gθ,Gθ(K)<q ,因此 (K,G(θ),Θ)qϕ
  • If G(K)q and G(ϕ)q : Then for all Gθ with Gθ(K)q ,we have that Gθ(ϕ)G(ϕ)q and therefore (K,G(θ),Θ)qϕ .
    如果 G(K)qG(ϕ)q :那么对于所有 GθGθ(K)q ,我们有 Gθ(ϕ)G(ϕ)q ,因此 (K,G(θ),Θ)qϕ
  • If G(K)q and G(ϕ)<q: Then (K,G(θ),Θ)\vDash \not{} qϕ .
    如果 G(K)qG(ϕ)<q: ,那么 (K,G(θ),Θ)\vDash \not{} qϕ
Clearly, Equation (23) cannot be used as an objective function for gradient-descent due to null derivatives. Therefore, we propose to approximate the penalty function with the soft constraint:
显然,由于存在零导数,方程(23)不能作为梯度下降的目标函数。因此,我们建议用软约束来近似惩罚函数:
elu(α,β(qGθ(K)))={β(qGθ(K)) if Gθ(K)q,α(eqGθ(K)1) otherwise,
where α0 and β0 are hyper-parameters (see Figure 6). When Gθ(K)<q ,the penalty is linear in qGθ(K) with a slope of β . Setting β high,the gradients for Gθ(K) will be high in absolute value if the knowledge-base is not satisfied. When Gθ(K)>q ,the penalty is a negative exponential that converges to α . Setting α low but non-zero seeks to ensure that the gradients do not vanish when the penalty should not apply (when the knowledge-base is satisfied). We obtain the following approximate objective function:
其中 α0β0 是超参数(见图 6)。当 Gθ(K)<q 时,惩罚在 qGθ(K) 中是线性的,斜率为 β 。设置 β 较高时,如果知识库不满足, Gθ(K) 的梯度将具有较高的绝对值。当 Gθ(K)>q 时,惩罚是一个收敛到 α 的负指数。设置 α 较低但非零,旨在确保当不应用惩罚时(即知识库满足时),梯度不会消失。我们得到以下近似目标函数:
(24)G=argminGθ(Gθ(ϕ)+elu(α,β(qGθ(K)))
Section 4.8 will illustrate the use of reasoning by refutation with an example in comparison with reasoning as querying after learning. Of course, other forms of reasoning are possible, not least that adopted in [6], but a direct comparison is outside the scope of this paper and left as future work.
第 4.8 节将通过一个示例说明反驳推理的运用,与学习后查询的推理进行比较。当然,还有其他形式的推理是可能的,尤其是在[6]中采用的形式,但直接比较超出了本文的范围,留作未来的工作。

4. The Reach of Logic Tensor Networks
4. 逻辑张量网络的覆盖范围

The objective of this section is to show how the language of Real Logic can be used to specify a number of tasks that involve learning from data and reasoning. Examples of such tasks are classification, regression, clustering, and link prediction. The solution of a problem specified in Real Logic is obtained by interpreting such a specification in Logic Tensor Networks. The LTN library implements Real Logic in Tensorflow 2 [1] and is available from GitHub 13 Every logical operator is grounded using Tensorflow primitives such that LTN implements directly a Tensorflow graph. Due to Tensorflow built-in optimization, LTN is relatively efficient while providing the expressive power of first-order logic. Details on the implementation of the examples described in this section are reported in Appendix A. The implementation of the examples presented here is also available from the LTN repository on GitHub. Except when stated otherwise, the results reported are the average result over 10 runs using a 95% confidence interval. Every example uses a stable real product configuration to approximate the Real Logic operators and the Adam optimizer [35] with a learning rate of 0.001 . Table A.3 in the Appendix gives an overview of the network architectures used to obtain the results reported in this section.
本节的目标是展示如何使用 Real Logic 语言来指定涉及从数据中学习和推理的多项任务。这些任务的示例包括分类、回归、聚类和链接预测。在 Real Logic 中指定的问题的解决方案是通过在逻辑张量网络中解释这样的规范来获得的。LTN 库在 Tensorflow 2 中实现了 Real Logic,并可从 GitHub 获取 13 。每个逻辑运算符都是使用 Tensorflow 原语进行基础化的,因此 LTN 直接实现了一个 Tensorflow 图。由于 Tensorflow 内置优化,LTN 在提供一阶逻辑的表达能力的同时相对高效。关于本节中描述的示例的实现细节在附录 A 中报告。这里呈现的示例的实现也可以从 GitHub 上的 LTN 存储库中获取。除非另有说明,报告的结果是使用 95% 置信区间进行 10 次运行的平均结果。每个示例都使用稳定的真实产品配置来近似 Real Logic 运算符和学习速率为 0.001 的 Adam 优化器[35]。表 A。附录中的第 3 部分概述了用于获得本节报告结果的网络架构。

12 In the objective function, G should satisfy G(K)q before reducing G(ϕ) because the penalty c which is greater than 1 is higher than any potential reduction in G(ϕ) which is smaller or equal to 1 .
在目标函数中, G 应该在减少 G(ϕ) 之前满足 G(K)q ,因为惩罚 c 大于 1,高于任何可能的减少 G(ϕ) (小于或等于 1)。
13 https://github.com/logictensornetworks/logictensornetworks

Figure 6: elu(α,βx) where α0 and β0 are hyper-parameters. The function elu(α,β(qGθ(K))) with α low and β high is a soft constraint for penalty (Gθ,q) suitable for learning.
图 6: elu(α,βx) 其中 α0β0 是超参数。 函数 elu(α,β(qGθ(K))) 具有 α 低和 β 高是一个软约束,适用于学习的惩罚 (Gθ,q)

4.1. Binary Classification
4.1. 二元分类

The simplest machine learning task is binary classification. Suppose that one wants to learn a binary classifier A for a set of points in [0,1]2 . Suppose that a set of positive and negative training examples is given. LTN uses the following language and grounding:
最简单的机器学习任务是二元分类。假设一个人想要为 [0,1]2 中的一组点学习一个二元分类器 A 。假设给定一组正负训练示例。LTN 使用以下语言和基础:

Domains: 领域:

points (denoting the examples).
点(表示示例)。

Variables: 变量:

x+ for the positive examples.
x+ 为正例。
x for the negative examples.
x 用于负例。
x for all examples.  x 中的所有示例。
D(x)=D(x+)=D(x)= points.  D(x)=D(x+)=D(x)= 分。
Predicates: 谓词:
A(x) for the trainable classifier.
可训练分类器的 A(x)
Din(A)= points. Axioms:  Din(A)= 点。公理:
(25)x+A(x+)
(26)x¬A(x)

Grounding: 接地:

G (points) =[0,1]2 .  G (分) =[0,1]2
G(x)[0,1]m×2(G(x)is a sequence ofmpoints,that is,mexamples) .
G(x+)=dG(x)∣∥d(0.5,0.5)∥<0.09
G(x)=dG(x)∣∥d(0.5,0.5)∥≥0.0915
G(Aθ):xsigmoid(MLPθ(x)) ,where MLP is a Multilayer Perceptron with a single output neuron,whose parameters θ are to be learned 16
G(Aθ):xsigmoid(MLPθ(x)) ,其中 MLP 是一个具有单个输出神经元的多层感知器,其参数 θ 需要学习 16

Learning: 学习:

Let us define D the data set of all examples. The objective function with K={x+A(x+),x¬A(x)} is given by argmaxθΘSatAggϕKGθ,xD(ϕ) . 17] In practice,the optimizer uses the following loss function:
让我们定义 D 为所有示例的数据集。具有 K={x+A(x+),x¬A(x)} 的目标函数由 argmaxθΘSatAggϕKGθ,xD(ϕ) 给出。在实践中,优化器使用以下损失函数:
L=(1SatAggϕKGθ,xB(ϕ))
where B is a mini-batch sampled from D18 The objective and loss functions depend on the following hyper-parameters:
其中 B 是从 D18 中抽样的小批量。目标函数和损失函数取决于以下超参数:
  • the choice of fuzzy logic operator semantics used to approximate each connective and quantifier,
    用于近似每个连接词和量词的模糊逻辑运算符语义的选择
  • the choice of hyper-parameters underlying the operators, such as the value of the exponent p in any generalized mean,
    超参数选择是算子的基础,比如在任何广义均值中指数 p 的取值
  • the choice of formula aggregator function.
    公式聚合器函数的选择。
Using the stable product configuration to approximate connectives and quantifiers,and p=2 for every occurrence of ApME ,and using for the formula aggregator also ApME with p=2 , yields the following satisfaction equation:
使用稳定的产品配置来近似连接词和量词,并对每个 ApME 的出现使用 p=2 ,并且对于公式聚合器也使用 ApMEp=2 ,得到以下满足方程:
SatAggϕKGθ(ϕ)=112(1(1(1|G(x+)|vG(x+)(1sigmoid(MLPθ(v)))2)122)122)
+1(1(1|G(x)|vG(x)(sigmoid(MLPθ(v)))2)122))12

14G(x+) are,by definition in this example,the training examples with Euclidean distance to the center (0.5,0.5) smaller than the threshold of 0.09 .
14G(x+) 在这个例子中的定义是,与中心 (0.5,0.5) 的欧氏距离小于 0.09 的训练样本。
15G(x) are,by definition,the training examples with Euclidean distance to the centre (0.5,0.5) larger or equal to the threshold of 0.09 .
15G(x) 是根据定义,到中心 (0.5,0.5) 的欧氏距离大于或等于 0.09 的训练示例。
16sigmoid(x)=11+ex
17 The notation GxD(ϕ(x)) means that the variable x is grounded with the data D (that is, G(x):=D ) when grounding
符号 GxD(ϕ(x)) 表示当对变量 x 进行接地时,使用数据 D (即 G(x):=D )进行接地
ϕ(x) .
18 As usual in ML,while it is possible to compute the loss function and gradients over the entire data set,it is preferred to use mini-batches of the examples.
通常在机器学习中,虽然可以计算整个数据集上的损失函数和梯度,但更倾向于使用示例的小批量。

Figure 7: Symbolic Tensor Computational Graph for the Binary Classification Example. In the figure, Gx+ and Gx are inputs to the network Gθ(A) and the dotted lines indicate the propagation of activation from each input through the network, which produces two outputs.
图 7:二元分类示例的符号张量计算图。在图中, Gx+Gx 是网络 Gθ(A) 的输入,虚线表示激活从每个输入通过网络传播,产生两个输出。
The computational graph of Figure 7 shows Sat AggϕKGθ(ϕ) ) as used with the above loss function.
图 7 的计算图显示了 Sat AggϕKGθ(ϕ) )与上述损失函数的使用。
We are therefore interested in learning the parameters θ of the MLP used to model the binary classifier. We sample 100 data points uniformly from [0,1]2 to populate the data set of positive and negative examples. The data set was split into 50 data points for training and 50 points for testing. The training was carried out for a fixed number of 1000 epochs using backpropagation with the Adam optimizer [35] with a batch size of 64 examples. Figure 8 shows the classification accuracy and satisfaction level of the LTN on both training and test sets averaged over 10 runs using a 95% confidence interval. The accuracy shown is the ratio of examples correctly classified, with an example deemed as being positive if the classifier outputs a value higher than 0.5 .
因此,我们对用于建模二元分类器的 MLP 的参数 θ 感兴趣。我们从 [0,1]2 均匀采样了 100 个数据点,以填充正负样本的数据集。数据集被分为 50 个数据点用于训练,50 个数据点用于测试。训练是使用 Adam 优化器[35]进行的,使用 64 个示例的批处理大小,固定进行了 1000 个时期的训练。图 8 显示了在进行了 10 次运行后,使用 95%置信区间对训练集和测试集上的 LTN 的分类准确度和满意度进行了平均。所显示的准确度是正确分类的示例比例,如果分类器输出高于 0.5,则示例被视为正例。
Notice that a model can reach an accuracy of 100% while satisfaction of the knowledge base is yet not maximized. For example, if the threshold for an example to be deemed as positive is 0.7 , all examples may be classified correctly with a confidence score of 0.7 . In that case, while the accuracy is already maximized,the satisfaction of x+A(x+) would still be 0.7,and can still improve until the confidence for every sample reaches 1.0.
请注意,一个模型的准确率可以达到 100% ,但知识库的满意度尚未最大化。例如,如果将一个示例被视为正例的阈值设为 0.7,那么所有示例可能会以 0.7 的置信度得到正确分类。在这种情况下,虽然准确率已经最大化,但 x+A(x+) 的满意度仍然是 0.7,并且可以继续提高,直到每个样本的置信度达到 1.0。
This first example, although straightforward, illustrates step-by-step the process of using LTN in a simple setting. Notice that, according to the nomenclature of Section 3.3 measuring accuracy amounts to querying the truth query (respectively,the generalization truth query) A(x) for all the examples of the training set (respectively, test set) and comparing the results with the classification threshold. In Figure 9,we show the results of such queries A(x) after optimization. Next,we show how the LTN language can be used to solve progressively more complex problems by combining learning and reasoning.
这个第一个例子虽然简单,但逐步说明了在简单环境中使用 LTN 的过程。请注意,根据第 3.3 节的命名法,测量准确性涉及查询真实查询(分别是泛化真实查询) A(x) ,针对训练集(分别是测试集)中的所有示例,并将结果与分类阈值进行比较。在图 9 中,我们展示了优化后此类查询 A(x) 的结果。接下来,我们展示了如何通过结合学习和推理,逐渐解决更复杂的问题。

4.2. Multi-Class Single-Label Classification
4.2. 多类单标签分类

The natural extension of binary classification is a multi-class classification task. We first approach multi-class single-label classification, which assumes that each example is assigned to one and only one label.
二元分类的自然延伸是多类分类任务。我们首先接触多类单标签分类,这假设每个示例被分配到一个且仅一个标签。
For illustration purposes, we use the Iris flower data set [20], which consists of classification into three mutually exclusive classes; call these A,B ,and C . While one could train three unary predicates A(x),B(x) and C(x) ,it turns out to be more effective if this problem is modeled by a single binary predicate P(x,l) ,where l is a variable denoting a multi-class label,in this case,
为了举例说明,我们使用鸢尾花数据集[20],该数据集包括分类为三个互斥类别;将其称为 A,B ,和 C 。虽然可以训练三个一元谓词 A(x),B(x)C(x) ,但事实证明,如果将这个问题建模为单个二元谓词 P(x,l) 会更有效,其中 l 是表示多类标签的变量,在这种情况下,
Figure 8: Binary Classification task (training and test set performance): Average accuracy (left) and satisfiability (right). Due to the random initializations, accuracy and satisfiability start on average at 0.5 with performance increasing rapidly after a few epochs.
图 8:二元分类任务(训练集和测试集性能):平均准确率(左)和可满足性(右)。由于随机初始化,准确率和可满足性平均从 0.5 开始,在几个时期后性能迅速提高。
classes A,B or C . This syntax allows one to write statements quantifying over the classes,e.g. x(l(P(x,l))) . Since the classes are mutually exclusive,the output layer of the MLP representing P(x,l) will be a softmax layer,instead of a sigmoid function,to ensure the exclusivity constraint on satisfiability scores 19 The problem can be specified as follows:
A,BC 。这种语法允许人们编写量化类的语句,例如 x(l(P(x,l))) 。由于这些类是相互排斥的,代表 P(x,l) 的 MLP 的输出层将是一个 softmax 层,而不是一个 sigmoid 函数,以确保满足得分的排他性约束 19 问题可以如下指定:

Domains: 领域:

items, denoting the examples from the Iris flower data set.
项目,表示鸢尾花数据集中的示例。
labels, denoting the class labels.
标签,表示类别标签。

Variables: 变量:

xA,xB,xC for the positive examples of classes A,B,C .
A,B,C 的正例 xA,xB,xC
x for all examples.  x 中的所有示例。
D(xA)=D(xB)=D(xC)=D(x)= items.  D(xA)=D(xB)=D(xC)=D(x)= 项。

Constants: 常数:

lA,lB,lC ,the labels of classes A (Iris setosa), B (Iris virginica), C (Iris versicolor),respectively.
lA,lB,lC ,类别标签分别为 A (山鸢尾)、 B (维吉尼亚鸢尾)、 C (变色鸢尾)。
D(lA)=D(lB)=D(lC)= labels.  D(lA)=D(lB)=D(lC)= 个标签。
Predicates: 谓词:
P(x,l) denoting the fact that item x is classified as l .
P(x,l) 表示项目 x 被分类为 l
Din(P)= items,labels.  Din(P)= 项,标签。
Axioms: 公理:
(27)xAP(xA,lA)
(28)xBP(xB,lB)
(29)xCP(xC,lC)
Notice that rules about exclusiveness such as x(P(x,lA)(¬P(x,lB)¬P(x,lC))) are not included since such constraints are already imposed by the grounding of P below,more specifically the softmax function.
请注意,关于排他性的规则,如 x(P(x,lA)(¬P(x,lB)¬P(x,lC))) ,并未包含在内,因为这些约束已经由下面 P 的基础所施加,更具体地说是 softmax 函数。

19softmax(x)=exi/jexj

Figure 9: Binary Classification task (querying the trained predicate A(x) ): It is interesting to see how A(x) could be appropriately named as denoting the inside of the central region shown in the figure,and therefore ¬A(x) represents the outside of the region.
图 9:二元分类任务(查询经过训练的谓词 A(x) ):有趣的是看到 A(x) 如何被适当地命名为表示图中所示中心区域的内部,因此 ¬A(x) 代表区域的外部。

Grounding: 接地:

G (items) =R4 ,items are described by 4 features: the length and the width of the sepals and petals, in centimeters.
G (项) =R4 ,物品由 4 个特征描述:萼片和花瓣的长度和宽度,单位为厘米。
G (labels) =N3 ,we use a one-hot encoding to represent classes.
G (标签) =N3 ,我们使用一位有效编码来表示类别。
G(xA)Rm1×4 ,that is, G(xA) is a sequence of m1 examples of class A .
G(xA)Rm1×4 ,即 G(xA)A 类的 m1 个示例序列。
G(xB)Rm2×4,G(xB) is a sequence of m2 examples of class B .
G(xB)Rm2×4,G(xB)B 类的 m2 个示例的序列。
G(xC)Rm3×4,G(xC) is a sequence of m3 examples of class C .
G(xC)Rm3×4,G(xC)C 类的 m3 个示例的序列。
G(x)R(m1+m2+m3)×4,G(x) is a sequence of all the examples.
G(x)R(m1+m2+m3)×4,G(x) 是所有示例的序列。
G(lA)=[1,0,0],G(lB)=[0,1,0],G(lC)=[0,0,1] .
G(Pθ):x,llsoftmax(MLPθ(x)) ,where the MLP has three output neurons corresponding to as many classes,and denotes the dot product as a way of selecting an output for G(Pθ) ; multiplying the MLP’s output by the one-hot vector l gives the truth degree corresponding to the class denoted by l .
MLP 具有三个输出神经元,分别对应三个类别, 表示点积作为选择 G(Pθ) 输出的一种方式;将 MLP 的输出乘以 one-hot 向量 l ,得到与 l 所表示类别对应的真实度。

Learning: 学习:

The logical operators and connectives are approximated using the stable product configuration with p=2 for ApME . For the formula aggregator, ApME is used also with p=2 .
逻辑运算符和连接词使用稳定的产品配置来近似,其中 p=2 代表 ApME 。对于公式聚合器,也使用 p=2
The computational graph of Figure 10 illustrates how SatAggϕKGθ(ϕ) is obtained. If U denotes batches sampled from the data set of all examples, the loss function (to minimize) is:
图 10 的计算图说明了如何获得 SatAggϕKGθ(ϕ) 。如果 U 表示从所有示例数据集中抽样的批次,则损失函数(要最小化)为:
L=1SatAggϕKGθ,xB(ϕ).
Figure 11 shows the result of training with the Adam optimizer with batches of 64 examples. Accuracy measures the ratio of examples correctly classified,with example x labeled as argmaxl(P(x,l))[20] Classification accuracy reaches an average value near 1.0 for both the training and test data after some 100 epochs. Satisfaction levels of the Iris flower predictions continue to increase for the rest of the training (500 epochs) to more than 0.8 .
图 11 显示了使用 Adam 优化器进行训练,每批 64 个示例的结果。准确率衡量了正确分类的示例比例,示例 x 标记为 argmaxl(P(x,l))[20] 。经过大约 100 个 epochs 后,训练数据和测试数据的分类准确率均接近 1.0。鸢尾花预测的满意度水平在接下来的训练过程中继续增加(500 个 epochs),达到了超过 0.8 的值。
It is worth contrasting the choice of using a binary predicate (P(x,l)) in this example with the option of using multiple unary predicates (lA(x),lB(x),lC(x)) ,one for each class. Notice how each predicate is normally associated with an output neuron. In the case of the unary predicates, the networks would be disjoint (or modular), whereas weight-sharing takes place with the use of the binary predicate. Since l is instantiated into lA,lB,lC ,in practice P(x,l) becomes P(x,lA),P(x,lB),P(x,lC) ,which is implemented via three output neurons to which a softmax function applies.
值得注意的是,在这个例子中使用二元谓词 (P(x,l)) 的选择与使用多个一元谓词 (lA(x),lB(x),lC(x)) 的选项形成鲜明对比,每个类别对应一个谓词。请注意,通常每个谓词与一个输出神经元相关联。对于一元谓词,网络将是不相交的(或模块化的),而使用二元谓词则会进行权重共享。由于 l 被实例化为 lA,lB,lC ,在实践中 P(x,l) 变为 P(x,lA),P(x,lB),P(x,lC) ,这是通过三个输出神经元实现的,这些神经元应用 softmax 函数。

4.3. Multi-Class Multi-Label Classification
4.3. 多类别多标签分类

We now turn to multi-label classification, whereby multiple labels can be assigned to each example. As a first example of the reach of LTNs, we shall see how the previous example can be extended naturally using LTN to account for multiple labels, not always a trivial extension for most ML algorithms. The standard approach to the multi-label problem is to provide explicit negative examples for each class. By contrast, LTN can use background knowledge to relate classes directly to each other, thus becoming a powerful tool in the case of the multi-label problem when typically the labeled data is scarce. We explore the Leptograpsus crabs data set [10] consisting of 200 examples of 5 morphological measurements of 50 crabs. The task is to classify the crabs according to their color and sex. There are four labels: blue, orange, male, and female. The color labels are mutually exclusive, and so are the labels for sex. LTN will be used to specify such information logically.
我们现在转向多标签分类,即每个示例可以分配多个标签。作为 LTN 能力的第一个示例,我们将看到如何使用 LTN 自然地扩展先前的示例,以解决多标签问题,对于大多数 ML 算法来说,这并非总是一个微不足道的扩展。解决多标签问题的标准方法是为每个类提供明确的负例。相比之下,LTN 可以使用背景知识直接将类之间联系起来,因此在标记数据通常稀缺的多标签问题中成为一个强大的工具。我们探索了由 200 个示例组成的 Leptograpsus 螃蟹数据集[10],其中包括 50 只螃蟹的 5 个形态测量。任务是根据它们的颜色和性别对螃蟹进行分类。有四个标签:蓝色、橙色、雄性和雌性。颜色标签是相互排斥的,性别标签也是如此。LTN 将被用来逻辑地指定这样的信息。

20 This is also known as top-1 accuracy,as proposed in [39]. Cross-entropy results (tlog(y)) could have been reported here as is common with the use of softmax, although it is worth noting that, of course, the loss function used by LTN is different.
这也被称为 top-1 准确率,如[39]中提出的。 交叉熵结果可能已经被报告,因为使用 softmax 是常见的,尽管值得注意的是,LTN 使用的损失函数是不同的。

Figure 10: Symbolic Tensor Computational Graph for the Multi-Class Single-Label Problem. As before, the dotted lines in the figure indicate the propagation of activation from each input through the network, in this case producing three outputs.
图 10:多类单标签问题的符号张量计算图。与之前一样,图中的虚线表示每个输入通过网络的激活传播,在这种情况下产生三个输出。
Figure 11: Multi-Class Single-Label Classification: Classification accuracy (left) and satisfaction level (right).
图 11:多类单标签分类:分类准确率(左)和满意度水平(右)。

Domains: 领域:

items denoting the examples from the crabs dataset.
螃蟹数据集中示例的项目。
labels denoting the class labels.
标签表示类别标签。

Variables: 变量:

xblue ,xorange ,xmale ,xfemale  for the positive examples of each class.
每个类别的正例 xblue ,xorange ,xmale ,xfemale 
x ,used to denote all the examples.
x ,用于表示所有的例子。
D(xblue )=D(xorange )=D(xmale )=D(xfemale )=D(x)= items.  D(xblue )=D(xorange )=D(xmale )=D(xfemale )=D(x)= 项。

Constants: 常数:

lblue ,lorange ,lmale ,lfemale  (the labels for each class).
lblue ,lorange ,lmale ,lfemale  (每个类别的标签)。
D(lblue )=D(lorange )=D(lmale )=D(lfemale )= labels.  D(lblue )=D(lorange )=D(lmale )=D(lfemale )= 个标签。
Predicates: 谓词:
P(x,l) ,denotes the fact that item x is labelled as l .
P(x,l) 表示项目 x 被标记为 l
Din(P)= items,labels.  Din(P)= 项,标签。

Axioms: 公理:

(30)xblue P(xblue ,lblue )
(31)xorange P(xorange ,lorange )
(32)xmale P(xmale ,lmale )
(33)xfemale P(xfemale ,lfemale )
(34)x¬(P(x,lblue )P(x,lorange ))
(35)x¬(P(x,lmale )P(x,lfemale ))
Notice how logical rules 34 and 35 above represent the mutual exclusion of the labels on
注意上面的逻辑规则 34 和 35 代表标签的互斥
colour and sex, respectively. As a result, negative examples are not used explicitly in this specification.
颜色和性别。因此,在这个规范中,负面例子并没有被明确使用。

Grounding: 接地:

G (items) =R5 ; the examples from the data set are described using 5 features.
G (项) =R5 ;数据集中的示例使用 5 个特征进行描述。
G (labels) =N4 ; one-hot vectors are used to represent class labels 21
G (标签) =N4 ;独热向量用于表示类标签 21
G(xblue )Rm1×5,G(xorange )Rm2×5,G(xmale )Rm3×5,G(xfemale )Rm4×5 . These sequences are not mutually-exclusive,one example can for instance be in both xblue  and xmale  . G˙(lblue )=[1,0,0,0],G(lorange )=[0,1,0,0],G(lmale )=[0,0,1,0],G(lfemale )=[0,0,0,1] . G(Pθ):x,llsigmoid(MLPθ(x)) ,with the MLP having four output neurons corresponding to as many classes. As before, denotes the dot product which selects a single output. By contrast with the previous example, notice the use of a sigmoid function instead of a softmax function.
G(xblue )Rm1×5,G(xorange )Rm2×5,G(xmale )Rm3×5,G(xfemale )Rm4×5 . 这些序列并不是互斥的,例如,一个示例可以同时属于 xblue xmale G˙(lblue )=[1,0,0,0],G(lorange )=[0,1,0,0],G(lmale )=[0,0,1,0],G(lfemale )=[0,0,0,1] . G(Pθ):x,llsigmoid(MLPθ(x)) ,MLP 具有四个输出神经元,分别对应四个类别。与之前一样, 表示点积,选择单个输出。与前一个示例相比,请注意使用的是 Sigmoid 函数而不是 Softmax 函数。

21 There are two possible approaches here: either each item is labeled with one multi-hot encoding or each item is labeled with several one-hot encodings. The latter approach was used in this example.
这里有两种可能的方法:要么每个项目用一个多热编码标记,要么每个项目用几个单热编码标记。在这个例子中采用了后一种方法。

Learning: 学习:

As before, the fuzzy logic operators and connectives are approximated using the stable product configuration with p=2 for ApME ,and for the formula aggregator, ApME is also used with p=2 .
与以往一样,模糊逻辑运算符和连接词使用稳定的乘积配置来近似表示,其中 p=2 代表 ApME ,对于公式聚合器,也使用 ApME 以及 p=2
Figure 12 shows the result of the Adam optimizer using backpropagation trained with batches of 64 examples. This time, the accuracy is defined as 1 -HL, where HL is the average Hamming loss, i.e. the fraction of labels predicted incorrectly, with a classification threshold of 0.5 (given an example u ,if the model outputs a value greater than 0.5 for class C then u is deemed as belonging to class C ). The rightmost graph in Figure 12 illustrates how LTN learns the constraint that a crab cannot have both blue and orange color, which is discussed in more detail in what follows.
图 12 显示了使用 Adam 优化器的结果,使用 64 个示例的批处理进行反向传播训练。这次,准确率被定义为 1-HL,其中 HL 是平均汉明损失,即被错误预测的标签比例,分类阈值为 0.5(给定一个示例 u ,如果模型为类 C 输出大于 0.5 的值,则 u 被视为属于类 C )。图 12 中最右侧的图表说明了 LTN 如何学习约束条件,即螃蟹不能同时具有蓝色和橙色,这将在接下来更详细地讨论。

Querying: 查询:

To illustrate the learning of constraints by LTN, we have queried three formulas that were not explicitly part of the knowledge-base, over time during learning:
为了说明 LTN 对约束条件的学习,我们在学习过程中查询了三个不明确包含在知识库中的公式:
(36)ϕ1:x(P(x,lblue )¬P(x,lorange ))
(37)ϕ2:x(P(x,lblue )P(x,lorange ))
(38)ϕ3:x(P(x,lblue )P(x,lmale ))
For querying,we use p=5 when approximating the universal quantifiers with ApME . A higher p denotes a stricter universal quantification with a stronger focus on outliers (see Section 2.4). 22 We should expect ϕ1 to hold true (every blue crab cannot be orange and vice-versa 23 ,and we should expect ϕ2 (every blue crab is also orange) and ϕ3 (every blue crab is male) to be false. The results are reported in the rightmost plot of Figure 12 Prior to training,the truth-values of ϕ1 to ϕ3 are non-informative. During training one can see,with the maximization of the satisfaction of the knowledge-base, a trend towards the satisfaction of ϕ1 ,and an opposite trend of ϕ2 and ϕ3 towards false.
在查询时,我们使用 p=5 来近似使用 ApME 的全称量词。更高的 p 表示更严格的全称量化,更关注异常值(见第 2.4 节)。 22 我们应该期望 ϕ1 成立(每只蓝蟹都不能是橙色,反之亦然 23 ,我们应该期望 ϕ2 (每只蓝蟹也是橙色)和 ϕ3 (每只蓝蟹都是雄性)是错误的。结果报告在图 12 的最右侧图中。在训练之前, ϕ1ϕ3 的真值是无信息的。在训练过程中,可以看到,随着对知识库满足度的最大化,对 ϕ1 的满足度呈上升趋势,而对 ϕ2ϕ3 的满足度呈下降趋势。

4.4. Semi-Supervised Pattern recognition
4.4. 半监督模式识别

Let us now explore two, more elaborate, classification tasks, which showcase the benefit of using logical reasoning alongside machine learning. With these two examples, we also aim to provide a more direct comparison with a related neurosymbolic system DeepProbLog [41]. The benchmark examples below were introduced in the DeepProbLog paper [41].
让我们现在探讨两个更为复杂的分类任务,展示在机器学习中运用逻辑推理的好处。通过这两个例子,我们还旨在与相关的神经符号系统 DeepProbLog [41] 进行更直接的比较。以下基准示例是在 DeepProbLog 论文[41]中介绍的。

22 Training should usually not focus on outliers,as optimizers would struggle to generalize and tend to get stuck in local minima. However,when querying ϕ1,ϕ2,ϕ3 ,we wish to be more careful about the interpretation of our statement. See also 3.1.3 x(P(x,lblue )¬P(x,lorange ))
培训通常不应侧重于异常值,因为优化器往往难以泛化并倾向于陷入局部最小值。然而,在查询 ϕ1,ϕ2,ϕ3 时,我们希望更加谨慎地解释我们的陈述。另请参阅 3.1.3 x(P(x,lblue )¬P(x,lorange ))

Figure 12: Multi-Class Multi-Label Classification: Classification Accuracy (left), Satisfiability level (middle), and Querying of Constraints (right).
图 12:多类多标签分类:分类准确度(左)、满意度水平(中)、以及约束查询(右)。
Single Digits Addition: Consider the predicate addition (X,Y,N) ,where X and Y are images of digits (the MNIST data set will be used), and N is a natural number corresponding to the sum of these digits. This predicate should return an estimate of the validity of the addition. For instance,addition (B˙,Ω,11) is a valid addition; addition (B,Ω,5) is not.
单个数字的加法:考虑谓词加法 (X,Y,N) ,其中 XY 是数字的图像(将使用 MNIST 数据集),N 是对这些数字的和对应的自然数。该谓词应返回对加法有效性的估计。例如,加法 (B˙,Ω,11) 是有效的加法;加法 (B,Ω,5) 则不是。
Multi Digits Addition: The experiment is extended to numbers with more than one digit. Consider the predicate addition ([X1,X2],[Y1,Y2],N).[X1,X2] and [Y1,Y2] are lists of images of digits,representing two multi-digit numbers; N is a natural number corresponding to the sum of the two multi-digit numbers. For instance,addition ([3,5],[7,2],130) is a valid addition; addition ([3,8],[9,2],26) is not.
多位数加法:实验扩展到具有多位数的数字。考虑谓词加法 ([X1,X2],[Y1,Y2],N).[X1,X2][Y1,Y2] 是数字图像列表,表示两个多位数; N 是对应于两个多位数之和的自然数。例如,加法 ([3,5],[7,2],130) 是有效的加法;加法 ([3,8],[9,2],26) 不是。
A natural neurosymbolic approach is to seek to learn a single-digit classifier and benefit from knowledge readily available about the properties of addition in this case. For instance, suppose that a predicate digit(x,d) gives the likelihood of an image x being of digit d . A definition for addition (3,8,11) in LTN is:
一种自然的神经符号方法是寻求学习一个单数字分类器,并从关于加法属性的已知知识中受益。例如,假设一个谓词 digit(x,d) 给出图像 x 是数字 d 的可能性。在 LTN 中,加法 (3,8,11) 的定义是:
d1,d2:d1+d2=11(digit(B,d1)digit(C,d2))
In [41], the above task is made more complicated by not providing labels for the single-digit images during training. Instead, training takes place on pairs of images with labels made available for the result only, that is, the sum of the individual labels. The single-digit classifier is not explicitly trained by itself; its output is a piece of latent information that is used by the logic. However, this does not pose a problem for end-to-end neurosymbolic systems such as LTN or DeepProbLog for which the gradients can propagate through the logical structures.
在[41]中,上述任务变得更加复杂,因为在训练过程中没有为单个数字图像提供标签。相反,训练是在成对图像上进行的,仅为结果提供标签,即个别标签的总和。单个数字分类器并没有被明确单独训练;它的输出是一段被逻辑使用的潜在信息。然而,对于端到端的神经符号系统,如 LTN 或 DeepProbLog,这并不构成问题,因为梯度可以通过逻辑结构传播。
We start by illustrating a LTN theory that can be used to learn the predicate digit. The specification of the theory below is for the single digit addition example, although it can be extended easily to the multiple digits case.
我们首先阐述了一个可以用来学习谓词数字的 LTN 理论。下面的理论规范适用于单个数字加法示例,尽管它可以很容易地扩展到多个数字的情况。

Domains: 领域:

images, denoting the MNIST digit images,
图像,表示 MNIST 数字图像
results, denoting the integers that label the results of the additions,
结果,表示标记加法结果的整数
digits, denoting the digits from 0 to 9 .
数字,表示从 0 到 9 的数字。

Variables: 变量:

x,y ,ranging over the MNIST images in the data,
x,y ,在数据中遍历 MNIST 图像,
n for the labels,i.e. the result of each addition,
n 标签,即每次相加的结果。
d1,d2 ranging over digits.
d1,d2 变化范围在数字之间。
D(x)=D(y)= images,  D(x)=D(y)= 图像
D(n)= results,  D(n)= 个结果
D(d1)=D(d2)= digits.  D(d1)=D(d2)= 位数字。

Predicates: 谓词:

digit(x,d) for the single digit classifier,where d is a term denoting a digit constant or a digit variable. The classifier should return the probability of an image x being of digit d . Din ( digit )= images,digits. Axioms:
对于单个数字分类器,其中 d 表示数字常数或数字变量的术语。分类器应返回图像 x 是数字 d 的概率。 Din ( digit )= 图像,数字。公理:
Single Digit Addition: 单位数加法:
Diag(x,y,n)
(39)(d1,d2:d1+d2=n
(digit(x,d1)digit(y,d2)))
Multiple Digit Addition: 多位数加法:
Diag(x1,x2,y1,y2,n)
(40)(d1,d2,d3,d4:10d1+d2+10d3+d4=n
(digit(x1,d1)digit(x2,d2)digit(y1,d3)digit(y2,d4)))
Notice the use of Diag: when grounding x,y,n with three sequences of values,the i -th examples of each variable are matching. That is, (G(x)i,G(y)i,G(n)i) is a tuple from our dataset of valid additions. Using the diagonal quantification, LTN aggregates pairs of images and their corresponding result, rather than any combination of images and results.
注意在将 x,y,n 与三个值序列接地时,每个变量的第 i 个示例是匹配的。也就是说, (G(x)i,G(y)i,G(n)i) 是我们数据集中有效加法的元组。使用对角量化,LTN 聚合图像对及其对应结果,而不是任何图像和结果的组合。
Notice also the guarded quantification: by quantifying only on the latent "digit labels" (i.e. d1,d2,) that can add up to the result label ( n ,given in the dataset),we incorporate symbolic information into the system. For example,in (39),if n=3 ,the only valid tuples (d1,d2) are (0,3),(3,0),(1,2),(2,1) . Gradients will only backpropagate to these values.
请注意受限的量化:仅在潜在的“数字标签”上进行量化(即可以相加得到结果标签( n ,在数据集中给出)),我们将符号信息纳入系统中。例如,在(39)中,如果 n=3 ,则唯一有效的元组 (d1,d2)(0,3),(3,0),(1,2),(2,1) 。梯度将仅反向传播到这些值。

Grounding: 接地:

G (images) =[0,1]28×28×1 . The MNIST data set has images of 28 by 28 pixels. The images are grayscale and have just one channel. The RGB pixel values from 0 to 255 of the MNIST data set are converted to the range [0,1] .
G (图像) =[0,1]28×28×1 。MNIST 数据集包含 28x28 像素的图像。这些图像是灰度的,只有一个通道。MNIST 数据集中的 RGB 像素值从 0 到 255 转换为范围 [0,1]
G (results) =N .  G (结果) =N
G(digits)={0,1,,9} .
G(x)[0,1]m×28×28×1,G(y)[0,1]m×28×28×1,G(n)Nm 24
G(d1)=G(d2)=0,1,,9.
G(digitθ):x,d onehot (d)softmax(CNNθ(x)) ,where CNNis a  Convolutional Neural Network with 10 output neurons for each class. Notice that, in contrast with the previous examples, d is an integer label; onehot (d) converts it into a one-hot label.
G(digitθ):x,d onehot (d)softmax(CNNθ(x)) ,其中 CNNis a  每个类别有 10 个输出神经元的卷积神经网络。请注意,与先前的示例相比, d 是一个整数标签;onehot (d) 将其转换为一个独热标签。

24 Notice the use of the same number m of examples for each of these variables as they are supposed to match one-to-one due to the use of Diag.
请注意对这些变量使用相同数量的示例,因为它们应该是一一对应的,这是由于使用了 Diag。

Learning: 学习:

The computational graph of Figure 13 shows the objective function for the satisfiability of the knowledge base. A stable product configuration is used with hyper-parameter p=2 of the operator ApME for universal quantification () . Let p denote the exponent hyper-parameter used in the generalized mean ApM for existential quantification (3). Three scenarios are investigated and compared in the Multiple Digit experiment (Figure 15):
图 13 的计算图显示了知识库可满足性的目标函数。使用稳定的产品配置,操作符 ApME 的超参数 p=2 用于全称量化 () 。让 p 表示广义均值 ApM 中用于存在量化的指数超参数(3)。在多位数实验(图 15)中研究并比较了三种情景:
  1. p=1 throughout the entire experiment,
    整个实验过程中,
  1. p=2 throughout the entire experiment,or
    整个实验过程中,或
  1. p follows a schedule,changing from p=1 to p=6 gradually with the number of training epochs.
    p 遵循一个时间表,随着训练周期的增加,从 p=1 逐渐变化到 p=6
In the Single Digit experiment, only the last scenario above (schedule) is investigated (Figure 14).
在“单个数字”实验中,仅调查了上述最后一个场景(时间表)(图 14)。
We train to maximize satisfiability by using batches of 32 examples of image pairs, labeled by the result of their addition. As done in [41], the experimental results vary the number of examples in the training set to emphasize the generalization abilities of a neurosymbolic approach. Accuracy is measured by predicting the digit values using the predicate digit and reporting the ratio of examples for which the addition is correct. A comparison is made with the same baseline method used in [41]: given a pair of MNIST images, a non-pre-trained CNN outputs embeddings for each image (Siamese neural network). The embeddings are provided as input to dense layers that classify the addition into one of the 19 (respectively, 199) possible results of the Single Digit Addition (respectively, Multiple Digit Addition) experiments. The baseline is trained using a cross-entropy loss between the labels and the predictions. As expected, such a standard deep learning approach struggles with the task without the provision of symbolic meaning about intermediate parts of the problem.
我们通过使用 32 个图像对示例的批次进行训练,这些示例由它们的加法结果标记。如[41]中所做的那样,实验结果会改变训练集中示例的数量,以强调神经符号方法的泛化能力。准确性是通过使用数字谓词预测数字值并报告加法正确的示例比例来衡量的。与[41]中使用的相同基准方法进行比较:给定一对 MNIST 图像,一个未经预训练的 CNN 输出每个图像的嵌入(孪生神经网络)。这些嵌入被提供为输入到密集层,将加法分类为单个数字加法(或多个数字加法)实验的 19(或 199)种可能结果之一。基准方法使用标签和预测之间的交叉熵损失进行训练。如预期的那样,这种标准深度学习方法在没有提供关于问题中间部分的符号意义的情况下会遇到困难。
Experimentally, we find that the optimizer for the neurosymbolic system gets stuck in a local optimum at the initialization in about 1 out of 5 runs. We, therefore, present the results on an average of the 10 best outcomes out of 15 runs of each algorithm (that is, for the baseline as well). The examples of digit pairs selected from the full MNIST data set are randomized at each run.
实验结果表明,神经符号系统的优化器在大约五分之一的运行中在初始化阶段陷入局部最优解。因此,我们对每种算法的 15 次运行中的前 10 个最佳结果进行了平均,包括基准结果。从完整的 MNIST 数据集中选择的数字对示例在每次运行时都是随机的。
Figure 15 shows that the use of p=2 from the start produces poor results. A higher value for p in ApM weighs up the instances with a higher truth-value (see also Appendix C for a discussion). Starting already with a high value for p ,the classes with a higher initial truth-value for a given example will have higher gradients and be prioritized for training, which does not make practical sense when randomly initializing the predicates. Increasing p by following a schedule is the most promising approach. In this particular example, p=1 is also shown to be adequate purely from a learning perspective. However, p=1 implements a simple average which does not account for the meaning of well; the resulting satisfaction value is not meaningful within a reasoning perspective.
图 15 显示,从一开始使用 p=2 会产生较差的结果。在 ApM 中, p 的较高值会平衡具有较高真值的实例(另请参阅附录 C 进行讨论)。从 p 开始即具有较高值,对于给定示例具有较高初始真值的类将具有较高的梯度,并被优先用于训练,这在随机初始化谓词时并不具有实际意义。按照一定计划增加 p 是最有前途的方法。在这个特定示例中,从学习角度来看, p=1 也被证明是合适的。然而, p=1 实现了一个简单的平均值,未能很好地考虑 的含义;由此产生的满意度值在推理角度上并不具有意义。
Table 1 shows that the training and test times of LTN are of the same order of magnitude as those of the CNN baselines. Table 2 shows that LTN reaches similar accuracy as that reported by DeepProbLog.
表 1 显示,LTN 的训练和测试时间与 CNN 基线的数量级相同。表 2 显示,LTN 达到了与 DeepProbLog 报告的类似准确度。
Figure 13: Symbolic Tensor Computational Graph for the Single Digit Addition task. Notice that the figure does not depict accurate dimensions for the tensors; G(x) and G(y) are in fact 4D tensors of dimensions m×28×28×1 . Computing results with the variables d1 or d2 corresponds to the addition of a further axes of dimension 10 .
图 13:单个数字加法任务的符号张量计算图。请注意,图中未准确描绘张量的维度; G(x)G(y) 实际上是维度为 m×28×28×1 的 4D 张量。使用变量 d1d2 计算结果相当于增加一个维度为 10 的轴。
Figure 14: Single Digit Addition Task: Accuracy and satisfiability results (top) and results in the presence of fewer examples (bottom) in comparison with standard Deep Learning using a CNN (blue lines).
图 14:单个数字加法任务:准确性和可满足性结果(顶部)以及与使用 CNN 的标准深度学习相比,在示例较少的情况下的结果(底部)。
Model(Single Digits) (个位数)(Multi Digits) 多位数
Train 列车Test 测试Train 列车Test 测试
baseline 基线2.72±0.23ms1.45±0.21ms3.87±0.24ms2.10±0.30ms
LTN5.36±0.25ms3.44±0.39ms8.51±0.72ms5.72±0.57ms
Table 1: The computation time of training and test steps on the single and multiple digit addition tasks, measured on a computer with a single Nvidia Tesla V100 GPU and averaged over 1000 steps. Each step operates on a batch of 32 examples. The computational efficiency of the LTN and the CNN baseline systems are of the same order of magnitude.
表 1:在单个和多个数字加法任务的训练和测试步骤的计算时间,使用一台配备单个 Nvidia Tesla V100 GPU 的计算机测量,并在 1000 个步骤上取平均值。每个步骤在一个批次的 32 个示例上运行。LTN 和 CNN 基准系统的计算效率数量级相同。
ModelNumber of training examples
训练样本数量
(Single Digits) (个位数)(Multi Digits) 多位数
30 0003 00015 0001 500
baseline 基线95.95±0.2770.59±1.4547.19±0.692.07±0.12
LTN98.04±0.1393.49±0.2895.37±0.2988.21±0.63
DeepProbLog DeepProbLog 深度概率逻辑97.20±0.4592.18±1.5795.16±1.7087.21±1.92
Table 2: Accuracy (in %) on the test set: comparison of the final results obtained with LTN and those reported with DeepProbLog[41]. Although it is difficult to compare directly the results over time (the frameworks are implemented in different libraries), while achieving similar computational efficiency as the CNN baseline, LTN also reaches similar accuracy as that reported by DeepProbLog.
表 2:测试集上的准确率(以%表示):将 LTN 获得的最终结果与 DeepProbLog[41]报告的结果进行比较。虽然直接比较难以比较随时间推移的结果(因为这些框架是在不同的库中实现的),但在实现与 CNN 基准相似的计算效率的同时,LTN 也达到了与 DeepProbLog 报告的准确率相似的水平。

4.5. Regression 4.5. 回归

Another important problem in Machine Learning is regression where a relationship is estimated between one independent variable X and a continuous dependent variable Y . The essence of regression is,therefore,to approximate a function f(x)=y by a function f ,given examples (xi,yi) such that f(xi)=yi . In LTN one can model a regression task by defining f as a learnable function whose parameter values are constrained by data. Additionally, a regression task requires a notion of equality. We,therefore,define the predicate eq as a smooth version of the symbol = to turn the constraint f(xi)=yi into a smooth optimization problem.
机器学习中另一个重要问题是回归,其中估计一个独立变量 X 与一个连续依赖变量 Y 之间的关系。因此,回归的本质是通过给定示例 (xi,yi) 来逼近一个函数 f(x)=y ,使得函数 f 。在 LTN 中,可以通过定义 f 作为一个可学习函数来建模回归任务,其参数值受数据约束。此外,回归任务需要一个相等的概念。因此,我们将谓词 eq 定义为符号 = 的平滑版本,将约束 f(xi)=yi 转化为一个平滑优化问题。
In this example,we explore regression using a problem from a real estate data set25 with 414 examples, each described in terms of 6 real-numbered features: the transaction date (converted to a float), the age of the house, the distance to the nearest station, the number of convenience stores in the vicinity, and the latitude and longitude coordinates. The model has to predict the house price per unit area.
在这个例子中,我们使用来自房地产数据集的问题来探讨回归,该数据集包含 414 个示例,每个示例都用 6 个实数特征描述:交易日期(转换为浮点数)、房屋年龄、距离最近车站的距离、附近便利店的数量,以及纬度和经度坐标。模型需要预测单位面积房屋价格。

Domains: 领域:
samples, denoting the houses and their features.
样本,表示房屋及其特征。
prices, denoting the house prices. Variables:
价格,表示房价。变量:
x for the samples.  x 的样本。
y for the prices.  y 的价格。
D(x)= samples.  D(x)= 个样本。
D(y)= prices.  D(y)= 价格。


25 https://www.kaggle.com/quantbruce/real-estate-price-prediction

Figure 15: Multiple Digit Addition Task: Accuracy and satisfiability results (top) and results in the presence of fewer examples (bottom) in comparison with standard Deep Learning using a CNN (blue lines).
图 15:多位数加法任务:准确性和可满足性结果(顶部)以及与使用 CNN 的标准深度学习相比在较少示例存在时的结果(底部)。

Functions: 功能:

f(x) ,the regression function to be learned.
f(x) ,要学习的回归函数。
Din(f)= samples, Dout(f)= prices.
Din(f)= 个样本, Dout(f)= 价格。

Predicates: 谓词:

eq(y1,y2) ,a smooth equality predicate that measures how similar y1 and y2 are.
eq(y1,y2) ,一个平滑的相等谓词,用于衡量 y1y2 的相似程度。
Din(eq)= prices,prices.  Din(eq)= 价格,价格。
Axioms: 公理:
(41)Diag(x,y)eq(f(x),y)
Notice again the use of Diag: when grounding x and y onto sequences of values,this is done by obeying a one-to-one correspondence between the sequences. In other words, we aggregate pairs of corresponding samples and prices, instead of any combination thereof.
再次注意使用 Diag: 当将 xy 接地到值序列时,这是通过遵守序列之间的一一对应关系来完成的。换句话说,我们聚合相应样本和价格的配对,而不是它们的任何组合。

Grounding: 接地:

G (samples) =R6 .  G (样本) =R6
G (prices) =R .  G (价格) =R
G(x)Rm×6,G(y)Rm×1 . Notice that this specification refers to the same number m of examples for x and y due to the above one-to-one correspondence obtained with the use of Diag.
G(x)Rm×6,G(y)Rm×1 . 请注意,由于使用 Diag 获得的上述一一对应关系,此规范涉及相同数量的 xy 示例 m
G(eq(u,v))=exp(αj(ujvj)2) ,where the hyper-parameter α is a real number that scales how strict the smooth equality is. 26 In our experiments,we use α=0.05 .
G(eq(u,v))=exp(αj(ujvj)2) ,其中超参数 α 是一个实数,用于调整平滑等式的严格程度。 26 在我们的实验中,我们使用 α=0.05
Figure 17: Visualization of LTN solving a regression problem.
图 17:展示了逻辑张量网络解决回归问题的可视化。
G(f(x)θ)=MLPθ(x) ,where MLPθ is a multilayer perceptron which ends in one neuron
G(f(x)θ)=MLPθ(x) ,其中 MLPθ 是以一个神经元结尾的多层感知器
corresponding to a price prediction, with a linear output layer (no activation function).
对应于价格预测,使用线性输出层(无激活函数)。

Learning: 学习:

The theory is constrained by the parameters of the model of f . LTN is used to estimate such parameters by maximizing the satisfaction of the knowledge-base, in the usual way. Approximating using ApME with p=2 ,as before,we randomly split the data set into 330 examples for training and 84 examples for testing. Figure 16 shows the satisfaction level over 500 epochs. We also plot the Root Mean Squared Error (RMSE) between the predicted prices and the labels (i.e. actual prices, also known as target values). We visualize in Figure 17 the strong correlation between actual and predicted prices at the end of one of the runs.
理论受 f 模型参数的限制。LTN 用于通过最大化知识库的满意度来估计这些参数,通常的方式。使用 ApMEp=2 来近似 ,与以往一样,我们随机将数据集分为 330 个示例用于训练和 84 个示例用于测试。图 16 显示了 500 个时期内的满意度水平。我们还绘制了预测价格与标签(即实际价格,也称为目标值)之间的均方根误差(RMSE)。在图 17 中,我们可视化了在其中一个运行结束时实际价格和预测价格之间的强相关性。

4.6. Unsupervised Learning (Clustering)
4.6. 无监督学习(聚类)

In unsupervised learning, labels are either not available or are not used for learning. Clustering is a form of unsupervised learning whereby, without labels, the data is characterized by constraints
在无监督学习中,标签要么不可用,要么不用于学习。聚类是一种无监督学习的形式,通过约束来表征数据,而不使用标签。

26 Intuitively,the smooth equality is exp(αd(u,v)) ,where d(u,v) is the Euclidean distance between u and v . It produces a 1 if the distance is zero; as the distance increases, the result decreases exponentially towards 0 . In case an exponential decrease is undesirable,one can adopt the following alternative equation: eq(u,v)=11+αd(u,v) .
直观地,平滑等式为 exp(αd(u,v)) ,其中 d(u,v)uv 之间的欧几里得距离。如果距离为零,则产生 1;随着距离的增加,结果指数级地向 0 减少。如果指数级减少是不可取的,可以采用以下替代方程: eq(u,v)=11+αd(u,v)

alone. LTN can formulate such constraints, such as:
单独。LTN 可以制定这样的约束条件,例如:
  • clusters should be disjoint,
    簇应该是不相交的
  • every example should be assigned to a cluster,
    每个例子都应分配到一个簇中
  • a cluster should not be empty,
    一个簇不应为空
  • if the points are near, they should belong to the same cluster,
    如果这些点是相邻的,它们应该属于同一个簇
  • if the points are far, they should belong to different clusters, etc.
    如果点之间距离很远,则它们应该属于不同的簇,等等。
Domains: 领域:
points, denoting the data to cluster.
点,表示要进行聚类的数据。
points_pairs, denoting pairs of examples.
点对,表示示例对。
clusters, denoting the cluster.
簇,表示簇。
Variables: 变量:

x,y for all points.  x,y 对于所有点。
D(x)=D(y)= points.  D(x)=D(y)= 分。
D(c)= clusters.  D(c)= 个聚类。

Predicates: 谓词:

C(x,c) ,the truth degree of a given point belonging in a given cluster.
C(x,c) ,给定点属于给定簇的真实度。
Din(C)= points,clusters.  Din(C)= 点,簇。
Axioms: 公理:
(42)xcC(x,c)
(43)cxC(x,c)
(44)(c,x,y:|xy|<thclose )(C(x,c)C(y,c))
(45)(c,x,y:|xy|>thdistant )¬(C(x,c)C(y,c))
Notice the use of guarded quantifiers: all the pairs of points with Euclidean distance lower (resp. higher) than a value thclose  (resp. thdistant  ) should belong in the same cluster (resp. should not). thclose  and thdistant  are arbitrary threshold values that define some of the closest and most distant pairs of points. In our example, they are set to, respectively, 0.2 and 1.0.
注意使用保护量词:所有欧几里德距离低于(或高于)值 thclose  (或 thdistant  )的点对应该属于同一簇(或不应属于)。 thclose thdistant  是定义一些最接近和最远的点对的任意阈值。在我们的示例中,它们分别设置为 0.2 和 1.0。
As done in the example of Section 4.2, the clustering predicate has mutually exclusive satisfiability scores for each cluster using a softmax layer. Therefore, there is no explicit constraint about clusters being disjoint.
正如在第 4.2 节的示例中所做的那样,聚类谓词使用 softmax 层为每个聚类具有互斥的可满足性分数。因此,关于聚类是否不相交没有明确的约束。

Grounding: 接地:

G (points) =[1,1]2 .  G (分) =[1,1]2
G (clusters) =N4 , we use one-hot vectors to represent a choice of 4 clusters.
G (簇) =N4 ,我们使用 one-hot 向量来表示 4 个簇的选择。
G(x)[1,1]m×2 ,that is, x is a sequence of m points. G(y)=G(x) .
G(x)[1,1]m×2 ,即 x 是一系列 m 点。 G(y)=G(x)
thclose =0.2,thdistant =1.0 .
G(c)=[1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1].
G(Cθ):x,ccsoftmax(MLPθ(x)) ,where MLP has 4 output neurons corresponding to the 4 clusters.
G(Cθ):x,ccsoftmax(MLPθ(x)) ,其中 MLP 具有 4 个输出神经元,对应于 4 个簇。
Figure 18: LTN solving a clustering problem by constraint optimization: ground-truth (top) and querying of each cluster C0,C1,C2 and C3 ,in turn.
图 18:LTN 通过约束优化解决聚类问题:地面真相(顶部)和依次查询每个簇 C0,C1,C2C3

Learning: 学习:

We use the stable real product configuration to approximate the logical operators. For ,we use ApME with p=4 . For ,we use ApM with p=1 during the first 100 epochs,and p=6 thereafter, as a simplified version of the schedule used in Section 4.4. The formula aggregator is approximated by ApME with p=2 . The model is trained for a total of 1000 epochs using the Adam optimizer, which is sufficient for LTN to solve the clustering problem shown in Figure 18 Ground-truth data for this task was generated artificially by creating 4 centers, and generating 50 random samples from a multivariate Gaussian distribution around each center. The trained LTN achieves a satisfaction level of the clustering constraints of 0.857 .
我们使用稳定的真实产品配置来近似逻辑运算符。对于 ,我们在前 100 个时期使用 ApMEp=4 。对于 ,我们在前 100 个时期使用 ApMp=1 ,之后使用 p=6 ,作为第 4.4 节中使用的时间表的简化版本。公式聚合器通过 ApMEp=2 来近似。该模型使用 Adam 优化器进行总共 1000 个时期的训练,这足以使 LTN 解决图 18 中显示的聚类问题。此任务的地面真实数据是通过创建 4 个中心,并围绕每个中心从多变量高斯分布中生成 50 个随机样本来人工生成的。经过训练的 LTN 实现了聚类约束的满意水平为 0.857。

4.7. Learning Embeddings with LTN
4.7. 使用 LTN 学习嵌入向量

A classic example of Statistical Relational Learning is the smokers-friends-cancer example introduced in [55]. Below, we show how this example can be formalized in LTN using semi-supervised embedding learning.
统计关系学习的一个经典例子是在[55]中介绍的吸烟者-朋友-癌症例子。下面,我们展示了如何在 LTN 中利用半监督嵌入学习对这个例子进行形式化。
There are 14 people divided into two groups {a,b,,h} and {i,j,,n} . Within each group, there is complete knowledge about smoking habits. In the first group, there is complete knowledge about who has and who does not have cancer. Knowledge about the friendship relation is complete within each group only if symmetry is assumed,that is, x,y (friends (x,y)friends(y,x) ).
有 14 个人分成两组 {a,b,,h}{i,j,,n} 。在每个组内,对吸烟习惯有完全的了解。在第一组中,对谁患癌症和谁没有患癌症有完全的了解。只有在假设对称的情况下,即 x,y (朋友 (x,y)friends(y,x) ),每个组内对友谊关系有完全的了解。
Otherwise,knowledge about friendship is incomplete in that it may be known that e.g. a is a friend of b ,and it may be not known whether b is a friend of a . Finally,there is general knowledge about smoking, friendship, and cancer, namely that smoking causes cancer, friendship is normally symmetric and anti-reflexive, everyone has a friend, and smoking propagates (actively or passively) among friends. All this knowledge is represented in the axioms further below.
否则,关于友谊的知识是不完整的,因为可能知道例如 ab 的朋友,但可能不知道 b 是否是 a 的朋友。最后,关于吸烟、友谊和癌症有一般性知识,即吸烟会导致癌症,友谊通常是对称的和反自反的,每个人都有朋友,吸烟在朋友之间传播(主动或被动)。所有这些知识都在下面的公理中表示。

Domains: 领域:

people, to denote the individuals.
人们,表示个体。
Constants: 常数:
a,b,,h,i,j,,n ,the 14 individuals. Our goal is to learn an adequate embedding for each
a,b,,h,i,j,,n ,这 14 个个体。我们的目标是为每个个体学习一个合适的嵌入。
constant. 常数。
D(a)=D(b)==D(n)= people. 
Variables: 变量:
x,y ranging over the individuals.
x,y 在个体范围内变化。
D(x)=D(y)= people.  D(x)=D(y)= 人。

Predicates: 谓词:

S(x) for smokes, F(x,y) for friends, C(x) for cancer.
吸烟是 S(x) ,交友是 F(x,y) ,癌症是 C(x)
D(S)=D(C)= people. D(F)= people,people.
D(S)=D(C)= 人。 D(F)= 人,人。

Axioms: 公理:

Let X1={a,b,,h} and X2={i,j,,n} be the two groups of individuals.
X1={a,b,,h}X2={i,j,,n} 成为两组个体。
Let S={a,e,f,g,j,n} be the smokers; knowledge is complete in both groups.
S={a,e,f,g,j,n} 代表吸烟者;两组的知识都是完整的。
Let C={a,e} be the individuals with cancer; knowledge is complete in X1 only.
C={a,e} 表示患癌症的个体;只有在 X1 中知识才是完整的。
Let F={(a,b),(a,e),(a,f),(a,g),(b,c),(c,d),(e,f),(g,h),(i,j),(j,m),(k,l),(m,n)} be the F={(a,b),(a,e),(a,f),(a,g),(b,c),(c,d),(e,f),(g,h),(i,j),(j,m),(k,l),(m,n)} 表示
set of friendship relations; knowledge is complete if assuming symmetry.
友谊关系集合;假设对称则知识完整。
These facts are illustrated in Figure 20a
这些事实在图 20a 中有所说明
We have the following axioms:
我们有以下公理:
F(u,v)
(46)for(u,v)F
¬F(u,v)
(47)for(u,v)F,u>v
S(u)
(48)foruS
¬S(u) for u(X1X2)S(49)  u(X1X2)S¬S(u) (49)
C(u) for uC(50)  C(u)uC (50)
¬C(u) for uX1C  uX1C¬C(u)
(51)
(52)x¬F(x,x)
(53)x,y(F(x,y)F(y,x))
(54)xyF(x,y)
(55)x,y((F(x,y)S(x))S(y))
(56)x(S(x)C(x))
(57)x(¬C(x)¬S(x))
Notice that the knowledge base is not satisfiable in the strict logical sense of the word. For
请注意,严格逻辑意义上的知识库是不可满足的。
instance, f is said to smoke but not to have cancer,which is inconsistent with the rule
例如, f 据说吸烟但没有患癌症,这与规则不一致
x(S(x)C(x)) . Hence,it is important to adopt a fuzzy approach as done with MLN or a many-valued fuzzy logic interpretation as done with LTN.
因此,采用类似 MLN 的模糊方法或类似 LTN 的多值模糊逻辑解释是很重要的。

Grounding: 接地:

G (people) =R5 . The model is expected to learn embeddings in R5 .
G (人) =R5 。预计该模型将学习 R5 中的嵌入。
G(aθ)=vθ(a),,G(nθ)=vθ(n) . Every individual is associated with a vector of 5 real
G(aθ)=vθ(a),,G(nθ)=vθ(n) . 每个个体都与一个包含 5 个实数的向量相关联
numbers. The embedding is initialized randomly uniformly.
数字。嵌入是随机均匀初始化的。
G(xθ)=G(yθ)=vθ(a),,vθ(n).
G(Sθ):xsigmoid(MLP_Sθ(x)) ,where MLP_ Sθ has 1 output neuron.
G(Sθ):xsigmoid(MLP_Sθ(x)) ,其中 MLP_ Sθ 有 1 个输出神经元。
G(Fθ):x,ysigmoid(MLP_Fθ(x,y)) ,where MLP_ Fθ has 1 output neuron.
G(Fθ):x,ysigmoid(MLP_Fθ(x,y)) ,其中 MLP_ Fθ 有 1 个输出神经元。
G(Cθ):xsigmoid(MLP_Cθ(x)) ,where MLP _Cθ has 1 output neuron.
G(Cθ):xsigmoid(MLP_Cθ(x)) ,其中 MLP _Cθ 有 1 个输出神经元。
The MLP models for S,F,C are kept simple,so that most of the learning is focused on the embedding.
MLP 模型对 S,F,C 保持简单,使大部分学习集中在嵌入上。

Learning: 学习:

We use the stable real product configuration to approximate the operators. For ,we use ApME with p=2 for all the rules,except for rules (52) and (53),where we use p=6 . The intuition behind this choice of p is that no outliers are to be accepted for the friendship relation since it is expected to be symmetric and anti-reflexive, but outliers are accepted for the other rules. For ,we use ApM with p=1 during the first 200 epochs of training,and p=6 thereafter,with the same motivation as that of the schedule used in Section 4.4 The formula aggregator is approximated by ApME with p=2 .
我们使用稳定的真实产品配置来近似运算符。对于 ,我们使用 ApMEp=2 来处理所有规则,除了规则(52)和(53),在这两个规则中我们使用 p=6 。选择 p 的直觉是,友谊关系不应接受异常值,因为预期它是对称的和反自反的,但其他规则可以接受异常值。对于 ,我们在训练的前 200 个时期使用 ApMp=1 ,之后使用 p=6 ,与第 4.4 节中使用的计划相同。公式聚合器由 ApMEp=2 来近似。
Figure 19 shows the satisfiability over 1000 epochs of training. At the end of one of these runs,we query S(x),F(x,y),C(x) for each individual; the results are shown in Figure 20b We also plot the principal components of the learned embeddings [51] in Figure 21. The friendship relations are learned as expected: (56) "smoking implies cancer" is inferred for group 2 even though such information was not present in the knowledge base. For group 1,the given facts for smoking and cancer for the individuals f and g are slightly altered,as these were inconsistent with the rules. (the rule for smoking propagating via friendship (55) is incompatible with many of the given facts). Increasing the satisfaction of this rule would require decreasing the overall satisfaction of the knowledge base, which explains why it is partly ignored by LTN during training. Finally, it is interesting to note that the principal components for the learned embeddings seem to be linearly separable for the smoking and cancer classifiers (c.f. Figure 21, top right and bottom right plots).
图 19 显示了在 1000 个训练周期内的可满足性。在其中一个运行的结束时,我们查询每个个体的 S(x),F(x,y),C(x) ;结果显示在图 20b 中。我们还在图 21 中绘制了学习嵌入的主要成分[51]。友谊关系如预期那样被学习:对于第 2 组,即使在知识库中没有这样的信息,也会推断出“吸烟意味着癌症”(56)。对于第 1 组,对于个体 fg 的吸烟和癌症的给定事实略有改变,因为这些与规则不一致(吸烟通过友谊传播的规则(55)与许多给定事实不兼容)。增加这一规则的满足度将需要降低知识库的整体满足度,这解释了为什么在训练过程中 LTN 部分忽略了它。最后,有趣的是,学习嵌入的主要成分似乎对吸烟和癌症分类器是线性可分的(参见图 21,右上角和右下角的图)。

Querying: 查询:

To illustrate querying in LTN, we query over time two formulas that are not present in the knowledge-base:
为了说明在 LTN 中的查询,我们会查询两个不在知识库中的公式
(58)ϕ1:p:C(p)S(p)
(59)ϕ2:p,q:(C(p)C(q))F(p,q)
We use p=5 when approximating since the impact of an outlier at querying time should be seen as more important than at learning time. It can be seen that as the grounding approaches satisfiability of the knowledge-base, ϕ1 approaches true,whereas ϕ2 approaches false (c.f. Figure 20a).
在近似 时,我们使用 p=5 ,因为在查询时异常值的影响应被视为比在学习时更重要。可以看到,随着基础逼近知识库的可满足性, ϕ1 逼近真,而 ϕ2 逼近假(参见图 20a)。
Figure 19: Smoker-Friends-Cancer example: Satisfiability levels during training (left) and truth-values of queries ϕ1 and ϕ2 over time (right).
图 19:吸烟者-朋友-癌症示例:训练过程中的可满足性水平(左)和随时间变化的查询 ϕ1ϕ2 的真值(右)。
(a) Incomplete facts in the knowledge-base: axioms for smokers and cancer for individuals a to n (left),friendship relations in group 1 (middle), and friendship relations in group 2 (right).
(a)知识库中的不完整事实:吸烟者和癌症的公理对个体 an (左),第 1 组中的友谊关系(中),第 2 组中的友谊关系(右)。
(b) Querying all the truth-values using LTN after training: smokers and cancer (left), friendship relations (middle and right).
(b)在训练后使用 LTN 查询所有真值:吸烟者和癌症(左)、友谊关系(中和右)。
Figure 20: Smoker-Friends-Cancer example: Illustration of the facts before and after training.
图 20:吸烟者-朋友-癌症示例:培训前后事实的说明。
Figure 21: Smoker-Friends-Cancer example: learned embeddings showing the result of applying PCA on the individuals (top left); truth-values of smokes and cancer predicates for each embedding (top and bottom right); illustration of the friendship relations which are satisfied after learning (bottom left).
图 21:吸烟者-朋友-癌症示例:学习嵌入显示对个体应用 PCA 的结果(左上角);每个嵌入的吸烟和癌症谓词的真值(右上和右下);学习后满足的友谊关系的插图(左下)。

4.8. Reasoning in LTN 4.8. LTN 中的推理

The essence of reasoning is to find out if a closed formula ϕ is the logical consequence of a knowledge-base (K,Gθ,Θ) . Section 3.4 introduced two approaches to this problem in LTN:
推理的本质是找出一个封闭公式 ϕ 是否是知识库 (K,Gθ,Θ) 的逻辑推论。第 3.4 节介绍了 LTN 中解决这个问题的两种方法:
  • By simply querying after learning 27 one seeks to verify if for the grounded theories that maximally satisfy K ,the grounding of ϕ gives a truth-value greater than a threshold q . This often requires checking an infinite number of groundings. Instead, the user approximates the search for these grounded theories by running the optimization a fixed number of times only.
    通过在学习 27 之后简单查询,一个人试图验证对于最大程度满足 K 的基础理论, ϕ 的基础是否给出大于阈值 q 的真值。这通常需要检查无限数量的基础。相反,用户仅运行固定次数的优化来近似搜索这些基础理论。
  • Reasoning by refutation one seeks to find out a counter-example: a grounding that satisfies the knowledge-base K but not the formula ϕ given the threshold q . A search is performed here using a different objective function.
    通过反驳推理,人们试图找到一个反例:一个满足知识库 K 但不满足给定阈值 q 的公式 ϕ 的基础。在这里使用不同的目标函数进行搜索。
We now demonstrate that reasoning by refutation is the preferred option using a simple example where we seek to find out whether (AB)qA .
我们现在通过一个简单的例子来证明,反驳推理是首选的选项,我们试图找出是否 (AB)qA

Propositional Variables: 命题变量:

The symbols A and B denote two propositionial variables.
符号 AB 表示两个命题变量。
Axioms: 公理:
(60)AB

27 Here,learning refers to Section 3.2 which is optimizing using the satisfaction of the knowledge base as an objective.
这里,“学习”指的是优化使用知识库满足作为目标的第 3.2 节。

Figure 22: Querying after learning: 10 runs of the optimizer with objective G=argmaxGθ(Gθ(K)) . All runs converge to the optimum G1 ; the grid search misses the counter-example.
图 22:学习后的查询:优化器运行 10 次,目标为 G=argmaxGθ(Gθ(K)) 。所有运行都收敛到最优值 G1 ;网格搜索错过了反例。

Grounding: 接地:

G(A)=a,G(B)=b ,where a and b are two real-valued parameters. The set of parameters is therefore θ={a,b} . At initialization, a=b=0 .
G(A)=a,G(B)=b ,其中 ab 是两个实值参数。因此参数集合为 θ={a,b} 。在初始化时, a=b=0
We use the probabilistic-sum SP to approximate ,resulting in the following satisfiability measure.
我们使用概率和 SP 来近似 ,从而得到以下的可满足度度量。
(61)Gθ(K)=Gθ(AB)=a+bab.
There are infinite global optima maximizing the satisfiability of the theory,as any Gθ such that Gθ(A)=1 (resp. Gθ(B)=1 ) gives a satisfiability Gθ(K)=1 for any value of Gθ(B) (resp. Gθ(A)) . As expected,the following groundings are examples of global optima:
存在无限个全局最优解,最大化理论的可满足性,因为任何 Gθ 使得 Gθ(A)=1 (或 Gθ(B)=1 )对于任何 Gθ(B) (或 Gθ(A)) )的任何值都给出了一个可满足性 Gθ(K)=1 。如预期的那样,以下的实例是全局最优解的例子:
G1:G1(A)=1,G1(B)=1,G1(K)=1,
G2:G2(A)=1,G2(B)=0,G2(K)=1,
G3:G3(A)=0,G3(B)=1,G3(K)=1.

Reasoning: 推理:

(AB)qA ? That is,given the threshold q=0.95 ,does every Gθ such that Gθ(K)q verify Gθ(ϕ)q . Immediately,one can notice that this is not the case. For instance,the grounding G3 is a counter-example.
(AB)qA ?也就是说,给定阈值 q=0.95 ,是否每个 Gθ 满足 Gθ(K)q 的情况都验证 Gθ(ϕ)q 。立即可以注意到这并非如此。例如,基础 G3 就是一个反例。
If one simply reasons by querying multiple groundings after learning with the usual objective argmax(Gθ)Gθ(K) ,the results will all converge to G1:Gθ(K)a=1b and Gθ(K)b=1a . Every run of the optimizer will increase a and b simultaneously until they reach the optimum a=b=1 . Because the grid search always converges to the same point,no counter-example is found and the logical consequence is mistakenly assumed true. This is illustrated in Figure 22
如果在学习后仅通过查询多个基准来推理,使用通常的目标 argmax(Gθ)Gθ(K) ,结果将全部收敛到 G1:Gθ(K)a=1bGθ(K)b=1a 。优化器的每次运行都会同时增加 ab ,直到它们达到最优 a=b=1 。由于网格搜索总是收敛到相同的点,找不到反例,逻辑推论被错误地认为是真实的。这在图 22 中有所说明。
Reasoning by refutation, however, the objective function has an incentive to find a counterexample with ¬A ,as illustrated in Figure 23 LTN converges to the optimum G3 ,which refutes the logical consequence.
推理通过反驳,然而,客观函数有动机找到一个反例,如图 23 所示,LTN 收敛到最优值 G3 ,这驳斥了逻辑推论。

28 We use the notation G(K):=SatAggϕK(K,G) .
我们使用符号 G(K):=SatAggϕK(K,G)

Figure 23: Reasoning by refutation: one run of the optimizer with objective G=argminGθ(Gθ(ϕ)+elu(α,β(qGθ(K)))) , q=0.95,α=0.05,β=10 . In the first training epochs,the directed search prioritizes the satisfaction of the knowledge base. Then,the minimization of Gθ(ϕ) starts to weigh in more and the search focuses on finding a counter-example. Eventually,the run converges to the optimum G3 ,which refutes the logical consequence.
图 23:反驳推理:优化器的一次运行,目标为 G=argminGθ(Gθ(ϕ)+elu(α,β(qGθ(K))))q=0.95,α=0.05,β=10 。在最初的训练时期,定向搜索优先考虑知识库的满足。然后, Gθ(ϕ) 的最小化开始更加重要,搜索重点转向寻找反例。最终,运行收敛到最优 G3 ,推翻了逻辑推论。

5. Related Work 5. 相关工作

The past years have seen considerable work aiming to integrate symbolic systems and neural networks. We shall focus on work whose objective is to build computational models that integrate deep learning and logical reasoning into a so-called end-to-end (fully differentiable) architecture. We summarize a categorization in Figure 24 where the class containing LTN is further expanded into three sub-classes. The sub-class highlighted in red is the one that contains LTN. The reason why one may wish to combine symbolic AI and neural networks into a neurosymbolic AI system may vary, c.f. [17] for a recent comprehensive overview of approaches and challenges for neurosymbolic AI.
过去几年来,已经进行了大量工作,旨在整合符号系统和神经网络。我们将重点关注那些旨在构建将深度学习和逻辑推理整合到所谓的端到端(完全可微分)架构中的计算模型的工作。我们在图 24 中总结了一个分类,其中包含 LTN 的类别进一步扩展为三个子类。突出显示为红色的子类是包含 LTN 的子类。将符号系统和神经网络结合成为神经符号系统的原因可能各不相同,参见[17]以获取最近关于神经符号人工智能方法和挑战的综合概述。

5.1. Neural architectures for logical reasoning
5.1. 逻辑推理的神经结构

These use neural networks to perform (probabilistic) inference on logical theories. Early work in this direction has shown correspondences between various logical-symbolic systems and neural network models [27, 32, 52, 63, 65]. They have also highlighted the limits of current neural networks as models for knowledge representation. In a nutshell, current neural networks (including deep learning) have been shown capable of representing propositional logic, nonmonotonic logic programming, propositional modal logic, and fragments of first-order logic, but not full first-order or higher-order logic. Recently, there has been a resurgence of interest in the topic with many proposals emerging [13, 48, 53]. In [13], each clause of a Stochastic Logic Program is converted into a factor graph with reasoning becoming differentiable so that it can be implemented by deep networks. In [49], a differentiable unification algorithm is introduced with theorem proving sought to be carried out inside the neural network. Furthermore, in [11, 49] neural networks are used to learn reasoning strategies and logical rule induction.
这些使用神经网络对逻辑理论进行(概率)推理。这方面的早期工作显示了各种逻辑符号系统和神经网络模型之间的对应关系。它们还突显了当前神经网络作为知识表示模型的局限性。简言之,当前神经网络(包括深度学习)已被证明能够表示命题逻辑、非单调逻辑编程、命题模态逻辑以及一阶逻辑的片段,但无法表示完整的一阶或高阶逻辑。最近,对这一主题的兴趣再次高涨,涌现出许多提议。在[13]中,将随机逻辑程序的每个子句转换为一个因子图,推理变得可微分,因此可以由深度网络实现。在[49]中,引入了一个可微分的统一算法,试图在神经网络内部进行定理证明。此外,在[11, 49]中,神经网络被用于学习推理策略和逻辑规则归纳。
Reasoning with LTN (Section 3.4) is reminiscent of this category, given that knowledge is not represented in a traditional logical language but in Real Logic.
用 LTN 进行推理(第 3.4 节)让人想起这个类别,因为知识不是用传统的逻辑语言表示,而是用真实逻辑表示。

5.2. Logical specification of neural network architectures
5.2. 神经网络架构的逻辑规范

Here the goal is to use a logical language to specify the architecture of a neural network. Examples include [13, 24, 26, 56, 66]. In [26], the languages of extended logic programming (logic programs with negation by failure) and answer set programming are used as background knowledge to set up the initial architecture and set of weights of a recurrent neural network, which is subsequently trained from data using backpropagation. In [24], first-order logic programs in the form of Horn clauses are used to define a neural network that can solve Inductive Logic Programming tasks, starting from the most specific hypotheses covering the set of examples. Lifted relational neural networks [66] is a declarative framework where a Datalog program is used as a compact specification of a diverse range of existing advanced neural architectures, with a particular focus on Graph Neural Networks (GNNs) and their generalizations. In [56] a weighted Real Logic is introduced and used to specify neurons in a highly modular neural network that resembles a tree structure, whereby neurons with different activation functions are used to implement the different logic operators.
这里的目标是使用逻辑语言来指定神经网络的架构。示例包括[13, 24, 26, 56, 66]。在[26]中,扩展逻辑编程语言(具有失败否定的逻辑程序)和答案集编程语言被用作背景知识,用于建立递归神经网络的初始架构和权重集,并随后使用反向传播从数据中进行训练。在[24]中,Horn 子句形式的一阶逻辑程序被用来定义一个神经网络,该网络可以解决归纳逻辑编程任务,从覆盖示例集的最具体假设开始。提升关系神经网络[66]是一个声明性框架,其中 Datalog 程序被用作现有高级神经架构的紧凑规范,特别关注图神经网络(GNNs)及其泛化。在[56]中,引入了加权实逻辑,并用于指定高度模块化的神经网络中的神经元,该网络类似于树结构,其中使用具有不同激活函数的神经元来实现不同的逻辑运算符。
To some extent, it is also possible to specify neural architectures using logic in LTN. For example,a user can define a classifier P(x,y) as the formula P(x,y)=(Q(x,y)R(y))S(x,y) . G(P) becomes a computational graph that combines the sub-architectures G(Q),G(R) ,and G(S) according to the syntax of the logical formula.
在某种程度上,也可以使用逻辑在 LTN 中指定神经结构。例如,用户可以将分类器 P(x,y) 定义为公式 P(x,y)=(Q(x,y)R(y))S(x,y)G(P) 成为一个计算图,根据逻辑公式的语法结合子结构 G(Q),G(R)G(S)

5.3. Neurosymbolic architectures for the integration of inductive learning and deductive reasoning
5.3. 用于归纳学习和演绎推理整合的神经符号结构

These architectures seek to enable the integration of inductive and deductive reasoning in a unique fully differentiable framework [15, 23, 41, 46, 47]. The systems that belong to this class combine a neural component with a logical component. The former consists of one or more neural networks, the latter provides a set of algorithms for performing logical tasks such as model checking, satisfiability, and logical consequence. These two components are tightly integrated so that learning and inference in the neural component are influenced by reasoning in the logical component and vice versa. Logic Tensor Networks belong to this category. Neurosymbolic architectures for integrating learning and reasoning can be further separated into three sub-classes:
这些架构旨在实现归纳推理和演绎推理在独特的完全可微分框架中的集成[15, 23, 41, 46, 47]。属于这一类别的系统将神经组件与逻辑组件相结合。前者由一个或多个神经网络组成,后者提供一组用于执行逻辑任务的算法,如模型检查、可满足性和逻辑推论。这两个组件紧密集成,以便神经组件中的学习和推理受到逻辑组件中的推理的影响,反之亦然。逻辑张量网络属于这一类别。用于集成学习和推理的神经符号架构可以进一步分为三个子类:
  1. Approaches that introduce additional layers to the neural network to encode logical constraints which modify the predictions of the network. This sub-class includes Deep Logic Models [46] and Knowledge Enhanced Neural Networks [15].
    引入额外层次到神经网络中,以编码逻辑约束,从而修改网络的预测的方法。这个子类包括深度逻辑模型[46]和知识增强神经网络[15]。
  1. Approaches that integrate logical knowledge as additional constraints in the objective function or loss function used to train the neural network (LTN and [23, 33, 47]).
    将逻辑知识集成为用于训练神经网络的目标函数或损失函数中的额外约束的方法(LTN 和[23, 33, 47])。
  1. Approaches that apply (differentiable) logical inference to compute the consequences of the predictions made by a set of base neural networks. Examples of this sub-class are DeepProblog [41] and Abductive Learning [14].
    应用(可微分)逻辑推理来计算一组基础神经网络所做预测的后果的方法。这个子类的例子有 DeepProblog [41]和 Abductive Learning [14]。
In what follows, we revise recent neurosymbolic architectures in the same class as LTN: Integrating learning and reasoning.
在接下来的内容中,我们将修订与 LTN 相同类别的最近的神经符号架构:整合学习和推理。
Systems that modify the predictions of a base neural network:. Among the approaches that modify the predictions of the neural network using logical constraints are Deep Logic Models [46] and Knowledge Enhanced Neural Networks [15]. Deep Logic Models (DLM) are a general architecture for learning with constraints. Here, we will consider the special case where constraints are expressed by logical formulas. In this case,a DLM predicts the truth-values of a set of n ground atoms of a domain Δ={a1,,ak} . It consists of two models: a neural network f(xw) which takes as input the features x of the elements of Δ and produces as output an evaluation f for all the ground atoms,i.e. f[0,1]n ,and a probability distribution p(yf,λ˙) which is modeled by an undirected graphical model of the exponential family with each logical constraint characterized by a clique that contains the ground atoms, rather similarly to GNNs. The model returns the assignment to the atoms that maximize the weighted truth-value of the constraints and minimize the difference between the prediction of the neural network and a target value y . Formally:
修改基础神经网络预测的系统:修改神经网络预测的方法中,使用逻辑约束的方法包括深度逻辑模型[46]和知识增强神经网络[15]。深度逻辑模型(DLM)是一种用于学习约束的通用架构。在这里,我们将考虑约束由逻辑公式表示的特殊情况。在这种情况下,DLM 预测域 Δ={a1,,ak} 的一组 n 个基本原子的真值。它由两个模型组成:一个神经网络 f(xw) ,它以 Δ 的元素的特征 x 作为输入,并输出所有基本原子的评估 f ,即 f[0,1]n ,以及一个概率分布 p(yf,λ˙) ,它由一个指数族的无向图模型建模,每个逻辑约束由包含基本原子的团表示,与 GNNs 类似。该模型返回最大化约束的加权真值并最小化神经网络预测与目标值 y 之间的差异的原子分配。形式上:
DLM(xλ,w)=argmaxy(cλcΦc(yc)12yf(xw)2)
Figure 24: Three classes of neurosymbolic approaches with Architectures Integrating Learning and Reasoning further subdivided into three sub-classes, with LTN belonging to the sub-class highlighted in red.
图 24:神经符号方法的三类,其中集成学习和推理的架构进一步细分为三个子类,LTN 属于突出显示为红色的子类。
Each Φc(yc) corresponds to a ground propositional formula which is evaluated w.r.t. the target truth assignment y ,and λc is the weight associated with formula Φc . Intuitively,the upper model (the undirected graphical model) should modify the prediction of the lower model (the neural network) minimally to satisfy the constraints. f and y are truth-values of all the ground atoms obtained from the constraints appearing in the upper model in the domain specified by the data input.
每个 Φc(yc) 对应于一个基础命题公式,该公式根据目标真值赋值 y 进行评估, λc 是与公式 Φc 相关联的权重。直觉上,上层模型(无向图模型)应该最小化地修改下层模型(神经网络)的预测,以满足约束条件。 fy 是从数据输入指定的域中出现在上层模型约束中获得的所有基础原子的真值。
Similar to LTN, DLM evaluates constraints using fuzzy semantics. However, it considers only propositional connectives, whereas universal and existential quantifiers are supported in LTN.
与 LTN 类似,DLM 使用模糊语义评估约束。然而,它仅考虑命题连接词,而 LTN 支持全称量词和存在量词。
Inference in DLM requires maximizing the prediction of the model, which might be prohibitive in the presence of a large number of instances. In LTN, inference involves only a forward pass through the neural component which is rather simple and can be carried out in parallel. However, in DLM the weight associated with constraints can be learned, while in LTN they are specified in the background knowledge.
在 DLM 中,推理需要最大化模型的预测,在存在大量实例的情况下可能是禁止的。在 LTN 中,推理仅涉及通过神经组件的前向传递,这相当简单且可以并行进行。然而,在 DLM 中,与约束相关联的权重可以被学习,而在 LTN 中,它们是在背景知识中指定的。
The approach taken in Knowledge Enhanced Neural Networks (KENN) [15] is similar to that of DLM. Starting from the predictions y=fnn(xw) made by a base neural network fnn(w) , KENN adds a knowledge enhancer,which is a function that modifies y based on a set of weighted constraints formulated in terms of clauses. The formal model can be specified as follows:
在知识增强神经网络(KENN)[15] 中采取的方法类似于 DLM。从基础神经网络 fnn(w) 的预测 y=fnn(xw) 开始,KENN 添加了一个知识增强器,这是一个函数,根据以子句形式表达的一组加权约束条件修改 y 。形式化模型可以如下指定:
KENN(xλ,w)=σ(fnn(xw)+cλc(softmax(sign(c)fnn(xw))sign(c)))
where fnn(xw) are the pre-activations of fnn(xw) ,sign (c) is a vector of the same dimension of y containing 1,1 and ,such that sign(c)i=1 (resp. sign(c)i=1 ) if the i -th atom occurs positively (resp. negatively) in c ,or otherwise,and is the element-wise product. KENN learns the weights λ of the clauses in the background knowledge and the base network parameters w by minimizing some standard loss,(e.g. cross-entropy) on a set of training data. If the training data is inconsistent with the constraint, the weight of the constraint will be close to zero. This intuitively implies that the latent knowledge present in the data is preferred to the knowledge specified in the constraints. In LTN, instead, training data and logical constraints are represented uniformly with a formula, and we require that they are both satisfied. A second difference between KENN and LTN is the language: while LTN supports constraints written in full first-order logic, constraints in KENN are limited to universally quantified clauses.
其中 fnn(xw)fnn(xw) 的预激活,符号 (c) 是与 y 相同维度的包含 1,1 的向量,使得 sign(c)i=1 (或 sign(c)i=1 )如果第 i 个原子在 c 中正(或负)出现,否则为 ,而 是逐元素乘积。KENN 通过在一组训练数据上最小化一些标准损失(例如交叉熵)来学习背景知识中子句的权重 λ 和基础网络参数 w 。如果训练数据与约束不一致,则约束的权重将接近零。直观地,这意味着数据中存在的潜在知识优先于约束中指定的知识。而在 LTN 中,训练数据和逻辑约束用一个公式统一表示,并要求它们都得到满足。KENN 和 LTN 之间的第二个区别是语言:虽然 LTN 支持用全一阶逻辑编写的约束,但 KENN 中的约束仅限于全称量化的子句。
Systems that add knowledge to a neural network by adding a term to the loss function:. In [33], a framework is proposed that learns simultaneously from labeled data and logical rules. The proposed architecture is made of a student network fnn and a teacher network,denoted by q . The student network is trained to do the actual predictions, while the teacher network encodes the information of the logical rules. The transfer of information from the teacher to the student network is done by defining a joint loss L for both networks as a convex combination of the loss of the student and the teacher. If y~=fnn(xw) is the prediction of the student network for input x ,the loss is defined as:
通过向损失函数添加术语向神经网络添加知识的系统:在[33]中,提出了一个框架,该框架同时从标记数据和逻辑规则中学习。所提出的架构由学生网络 fnn 和教师网络组成,表示为 q 。学生网络接受训练以进行实际预测,而教师网络编码逻辑规则的信息。从教师网络到学生网络的信息传递是通过定义一个联合损失 L 来完成的,该联合损失是学生和教师损失的凸组合。如果 y~=fnn(xw) 是学生网络对输入 x 的预测,则损失定义为:
(1π)L(y,y~)+πL(q(y~x),y~)
where q(y~x)=exp(cλc(1ϕc(x,y~))) measures how much the predictions y~ satisfy the constraints encoded in the set of clauses {λc:ϕc}cC . Training is iterative. At every iteration,the parameters of the student network are optimized to minimize the loss that takes into account the feedback of the teacher network on the predictions from the previous step. The main difference between this approach and LTN is how the constraints are encoded in the loss. LTN integrates the constraints in the network and optimizes directly their satisfiability with no need for additional training data. Furthermore, the constraints proposed in [33] are universally quantified formulas only.
其中 q(y~x)=exp(cλc(1ϕc(x,y~))) 衡量了预测 y~ 在子句集 {λc:ϕc}cC 中编码的约束条件中的满足程度。训练是迭代的。在每次迭代中,优化学生网络的参数,以最小化考虑老师网络对上一步预测的反馈的损失。这种方法与 LTN 的主要区别在于约束条件如何编码在损失中。LTN 将约束条件集成到网络中,并直接优化它们的可满足性,无需额外的训练数据。此外,[33]中提出的约束条件仅为全称量化公式。
The approach adopted by Lxrcs [47] is analogous to the first version of LTN [61]. Logical constraints are translated into a loss function that measures the (negative) satisfiability level of the network. Differently from LTN, formulas in Lyrics can be associated with weights that are hyper-parameters. In [47], a logarithmic loss function is also used when the product t-norm is adopted. Notice that weights can also be added (indirectly) to LTN by introducing a 0-ary predicate pw to represent a constraint of the form pwϕ . An advantage of this approach would be that the weights could be learned.
Lxrcs [47] 采用的方法类似于 LTN 的第一个版本 [61]。逻辑约束被转化为一个损失函数,用于衡量网络的(负)可满足性水平。与 LTN 不同的是,Lyrics 中的公式可以与超参数相关联。在 [47] 中,当采用乘积 t-范数时,也使用了对数损失函数。请注意,通过引入一个 0-元谓词 pw 来间接地向 LTN 添加权重,以表示形式为 pwϕ 的约束。这种方法的一个优点是可以学习权重。
In [72], a neural network computes the probability of some events being true. The neural network should satisfy a set of propositional logic constraints on its output. These constraints are compiled into arithmetic circuits for weighted model counting, which are then used to compute a loss function. The loss function then captures how close the neural network is to satisfying the propositional logic constraints.
在[72]中,神经网络计算某些事件为真的概率。神经网络应满足其输出的一组命题逻辑约束。这些约束被编译成用于加权模型计数的算术电路,然后用于计算损失函数。损失函数捕捉了神经网络满足命题逻辑约束的接近程度。
Systems that apply logical reasoning on the predictions of a base neural network:. The most notable architecture in this category is DeepProblog [41]. DeepProblog extends the ProbLog framework for probabilistic logic programming to allow the computation of probabilistic evidence from neural networks. A ProbLog program is a logic program where facts and rules can be associated with probability values. Such values can be learned. Inference in ProbLog to answer a query q is performed by knowledge compilation into a function p(qλ) that computes the probability that q is true according to the logic program with relative frequencies λ . In DeepProbLog,a neural network fnn that outputs a probability distribution t=(t1,,tn) over a set of atoms a=(a1,,an) is integrated into ProbLog by extending the logic program with a and the respective probabilities t . The probability of a query q is then given by p(qλ,fnn(xw)) ,where x is the input of fnn and p is the function corresponding to the logic program extended with a . Given a set of queries q , input vectors x and ground-truths y for all the queries,training is performed by minimizing a loss function that measures the distance between the probabilities predicted by the logic program and the ground-truths, as follows:
应用逻辑推理于基础神经网络预测的系统:在这一类别中,最显著的架构是 DeepProblog [41]。DeepProblog 扩展了 ProbLog 框架,用于概率逻辑编程,以允许从神经网络计算概率证据。ProbLog 程序是一个逻辑程序,其中事实和规则可以与概率值相关联。这些值可以被学习。在 ProbLog 中进行推理以回答查询 q 是通过将知识编译成一个函数 p(qλ) 来执行的,该函数根据逻辑程序的相对频率计算 q 为真的概率。在 DeepProbLog 中,一个神经网络 fnn 输出一个概率分布 t=(t1,,tn) ,覆盖一组原子 a=(a1,,an) ,通过将逻辑程序扩展为 a 和相应的概率 t ,将其集成到 ProbLog 中。然后,查询 q 的概率由 p(qλ,fnn(xw)) 给出,其中 xfnn 的输入, p 是与扩展了 a 的逻辑程序对应的函数。给定一组查询 q ,输入向量 x 和所有查询的地面真相 y ,通过最小化衡量逻辑程序预测的概率与地面真相之间距离的损失函数来进行训练,具体如下:
L(y,p(qλ,fnn(xw)))
The most important difference between DeepProbLog and LTN concerns the logic on which they are based. DeepProbLog adopts probabilistic logic programming. The output of the base neural network is interpreted as the probability of certain atoms being true. LTN instead is based on many-valued logic. The predictions of the base neural network are interpreted as fuzzy truth-values (though previous work [67] also formalizes Real Logic as handling probabilities with relaxed constraints). This difference of logic leads to the second main difference between LTN and Deep-Problog: their inference mechanism. DeepProblog performs probabilistic inference (based on model counting) while LTN inference consists of computing the truth-value of a formula starting from the truth-values of its atomic components. The two types of inference are incomparable. However, computing the fuzzy truth-value of a formula is more efficient than model counting, resulting in a more scalable inference task that allows LTN to use full first-order logic with function symbols. In DeepProblog, to perform probabilistic inference, a closed-world assumption is made and a function-free language is used. Typically, DeepProbLog clauses are compiled into Sentential Decision Diagrams (SDDs) to accelerate inference considerably [36], although the compilation step of clauses into the SDD circuit is still costly.
DeepProbLog 和 LTN 之间最重要的区别在于它们所基于的逻辑。DeepProbLog 采用概率逻辑编程。基础神经网络的输出被解释为某些原子为真的概率。而 LTN 则基于多值逻辑。基础神经网络的预测被解释为模糊真值(尽管之前的工作[67]也将真实逻辑形式化为处理带有放松约束的概率)。这种逻辑上的差异导致了 LTN 和 DeepProbLog 之间的第二个主要区别:它们的推理机制。DeepProbLog 执行概率推理(基于模型计数),而 LTN 推理包括计算一个公式的真值,从其原子组件的真值开始。这两种推理类型是无法比较的。然而,计算一个公式的模糊真值比模型计数更有效率,从而实现了一个更可扩展的推理任务,使得 LTN 能够使用带有函数符号的全一阶逻辑。在 DeepProbLog 中,为了执行概率推理,做出了封闭世界假设并使用无函数的语言。 通常,DeepProbLog 子句被编译成句子决策图(SDDs)以显著加速推理[36],尽管将子句编译成 SDD 电路的步骤仍然很昂贵。
An approach that extends the predictions of a base neural network using abductive reasoning is [14]. Given a neural network fnn(xw) that produces a crisp output y{0,1}n for n predicates p1,,pn and background knowledge in the form of a logic program p ,parameters w of fnn are learned alongside a set of additional rules ΔC that define a new concept C w.r.t. p1,,pn such that,for every object o with features xo :
一种利用推理推断扩展基础神经网络预测的方法是[14]。给定一个神经网络 fnn(xw) ,为 n 个谓词 p1,,pn 产生一个清晰的输出 y{0,1}n ,并且以逻辑程序 p 的形式提供背景知识,学习 fnn 的参数 w 以及一组额外规则 ΔC ,这些规则定义了关于 p1,,pn 的一个新概念 C ,使得对于每个具有特征 xo 的对象 o
(62)pfnn(xow)ΔCC(o) if o is an instance of C
pfnn(xow)ΔC¬C(o)ifois not an instance ofC
The task is solved by iterating the following three steps:
任务通过迭代以下三个步骤来解决:
  1. Given the predictions of the neural network {fnn(xow)}oO on the set O of training objects, search for the best ΔC that maximize the number of objects for which (62) holds;
    鉴于神经网络 {fnn(xow)}oO 对训练对象集 O 的预测,搜索最佳 ΔC ,以最大化满足(62)条件的对象数量;
  1. For each object o ,compute by abduction on pΔC ,the explanation p(o) ;
    对于每个对象 o ,在 pΔC 上通过绑架计算,得出解释 p(o)
  1. Retrain fnn with the training set {xo,p(o)}oO .
    使用训练集 {xo,p(o)}oO 重新训练 fnn
Differently from LTN, in [14] the optimization is done separately in an iterative way. The semantics of the logic is crisp, neither fuzzy nor probabilistic, and therefore not fully differentiable. Abductive reasoning is adopted, which is a potentially relevant addition for comparison with symbolic ML and Inductive Logic Programming approaches [50].
与 LTN 不同,在[14]中优化是分开迭代进行的。逻辑的语义是清晰的,既不模糊也不概率化,因此不是完全可微分的。采用了归纳推理,这是一个可能与符号机器学习和归纳逻辑编程方法进行比较的相关补充。
Various other loosely-coupled approaches have been proposed recently such as [44], where image classification is carried out by a neural network in combination with reasoning from text data for concept learning at a higher level of abstraction than what is normally possible with pixel data alone. The proliferation of such approaches has prompted Henry Kautz to propose a taxonomy for neurosymbolic AI in [34] (also discussed in [17]), including recent work combining neural networks with graphical models and graph neural networks [4, 40, 58], statistical relational learning [21, 55], and even verification of neural multi-agent systems [2, 8].
最近提出了各种其他松散耦合的方法,例如[44],其中图像分类是通过神经网络与来自文本数据的推理相结合进行的,以便在比仅使用像素数据更高的抽象级别上进行概念学习。这些方法的泛滥促使 Henry Kautz 在[34]中提出了神经符号人工智能的分类法(也在[17]中讨论),其中包括最近将神经网络与图形模型和图神经网络[4, 40, 58]、统计关系学习[21, 55]以及甚至神经多智能体系统的验证[2, 8]结合起来的工作。

6. Conclusions and Future Work
6. 结论和未来工作

In this paper, we have specified the theory and exemplified the reach of Logic Tensor Networks as a model and system for neurosymbolic AI. LTN is capable of combining approximate reasoning and deep learning, knowledge and data.
在本文中,我们已经明确阐述了逻辑张量网络作为神经符号人工智能模型和系统的理论,并且举例说明了其范围。LTN 能够结合近似推理和深度学习、知识和数据。
For ML practitioners, learning in LTN (see Section 3.2) can be understood as optimizing under first-order logic constraints relaxed into a loss function. For logic practitioners, learning is similar to inductive inference: given a theory, learning makes generalizations from specific observations obtained from data. Compared to other neuro-symbolic architectures (see Section 5), the LTN framework has useful properties for gradient-based optimization (see Section 2.4) and a syntax that supports many traditional ML tasks and their inductive biases (see Section 4), all while remaining computationally efficient (see Table 1).
对于机器学习从业者来说,在 LTN 中学习(见第 3.2 节)可以理解为在第一阶逻辑约束下进行优化,这些约束被放宽为损失函数。对于逻辑从业者来说,学习类似于归纳推理:在给定一个理论的情况下,学习从数据中获得的具体观察中进行概括。与其他神经符号结构(见第 5 节)相比,LTN 框架具有用于基于梯度的优化的有用属性(见第 2.4 节)和支持许多传统机器学习任务及其归纳偏差的语法(见第 4 节),同时保持计算效率(见表 1)。
Section 3.4 discussed reasoning in LTN. Reasoning is normally under-specified within neural networks. Logical reasoning is the task of proving if some knowledge follows from the facts which are currently known. It is traditionally achieved semantically using model theory or syntactically via a proof system. The current LTN framework approaches reasoning semantically, although it should be possible to use LTN and querying alongside a proof system. When reasoning by refutation in LTN,to find out if a statement ϕ is a logical consequence of given data and knowledge-base K ,a proof by refutation attempts to find a semantic counterexample where ¬ϕ and K are satisfied. If the search fails then ϕ is assumed to hold. This approach is efficient in LTN when we allow for a direct search to find counterexamples via gradient-descent optimization. It is assumed that ϕ ,the statement to prove or disprove,is known. Future work could explore automatically inducing which statement ϕ to consider,possibly using syntactical reasoning in the process.
第 3.4 节讨论了 LTN 中的推理。在神经网络中,推理通常是不充分的。逻辑推理是证明某些知识是否从当前已知的事实中得出的任务。传统上,可以通过模型理论或通过证明系统在语义上实现逻辑推理。当前的 LTN 框架在语义上处理推理,尽管可以使用 LTN 和查询与证明系统一起。在 LTN 中通过反驳进行推理时,为了找出一个语义反例,其中 ϕ 是给定数据和知识库 K 的逻辑结论,反驳证明试图找到一个满足 ¬ϕK 的语义反例。如果搜索失败,则假定 ϕ 成立。当允许通过梯度下降优化进行直接搜索以找到反例时,这种方法在 LTN 中是有效的。假设 ϕ ,即要证明或证伪的陈述,是已知的。未来的工作可以探索自动诱导要考虑哪个陈述 ϕ ,可能在过程中使用语法推理。
The paper formalizes Real Logic, the language supporting LTN. The semantics of Real Logic are close to the semantics of Fuzzy FOL with the following major differences: 1) Real Logic domains are typed and restricted to real numbers and real-valued tensors, 2) Real Logic variables are sequences of fixed length, whereas FOL variables are a placeholder for any individual in a domain, 3) Real Logic relations relations are interpreted as mathematical functions, whereas Fuzzy Logic relations are interpreted as fuzzy set membership functions. Concerning the semantics of connectives and quantifiers, some LTN implementations correspond to semantics for t-norm fuzzy logic, but not all. For example, the conjunction operator in stable product semantics is not a t-norm, as pointed out at the end of Section 2.4
本文正式规范了支持 LTN 的 Real Logic 语言。Real Logic 的语义与模糊 FOL 的语义接近,但存在以下主要区别:1)Real Logic 的域是类型化的,并且限制为实数和实值张量,2)Real Logic 的变量是固定长度的序列,而 FOL 的变量是域中任何个体的占位符,3)Real Logic 的关系被解释为数学函数,而模糊逻辑的关系被解释为模糊集成员函数。关于连接词和量词的语义,一些 LTN 实现对应于 t-范数模糊逻辑的语义,但并非全部。例如,在稳定乘积语义中,合取运算符不是 t-范数,正如在第 2.4 节末尾指出的那样。
Integrative neural-symbolic approaches are known for either seeking to bring neurons into a symbolic system (neurons into symbols) [41] or to bring symbols into a neural network (symbols into neurons) [60]. LTN adopts the latter approach but maintaining a close link between the symbols and their grounding into the neural network. The discussion around these two options - neurons into symbols vs. symbols into neurons - is likely to take center stage in the debate around neurosymbolic AI in the next decade. LTN and related approaches are well placed to play an important role in this debate by offering a rich logical language tightly coupled with an efficient distributed implementation into TensorFlow computational graphs.
整合性神经符号方法以将神经元引入符号系统(神经元转化为符号)[41]或将符号引入神经网络(符号转化为神经元)[60]而闻名。LTN 采用后一种方法,但保持符号与其在神经网络中的基础之间的紧密联系。围绕这两种选择 - 神经元转化为符号与符号转化为神经元 - 的讨论可能在未来十年的神经符号人工智能辩论中占据中心舞台。LTN 及相关方法通过提供与 TensorFlow 计算图中的高效分布式实现紧密耦合的丰富逻辑语言,有望在这场辩论中发挥重要作用。
The close connection between first-order logic and its implementation in LTN makes LTN very suitable as a model for the neural-symbolic cycle [27, 29], which seeks to translate between neural and symbolic representations. Such translations can take place at the level of the structure of a neural network, given a symbolic language [27], or at the level of the loss functions, as done by LTN and related approaches [13,45,46]. LTN opens up a number of promising avenues for further research:
一阶逻辑与其在 LTN 中的实现之间的密切联系使得 LTN 非常适合作为神经符号循环的模型,该循环旨在在神经和符号表示之间进行转换。这种转换可以在神经网络结构的层面上进行,给定一个符号语言,也可以在损失函数的层面上进行,就像 LTN 和相关方法所做的那样。LTN 为进一步研究开辟了许多有前途的途径:
Firstly, a continual learning approach might allow one to start with very little knowledge, build up and validate knowledge over time by querying the LTN network. Translations to and from neural and symbolic representations will enable reasoning also to take place at the symbolic level (e.g. alongside a proof system), as proposed recently in [70] with the goal of improving fairness of the network model.
首先,持续学习方法可能使一个人从很少的知识开始,通过查询 LTN 网络逐渐建立和验证知识。神经和符号表示之间的翻译将使推理也能在符号水平上进行(例如,在证明系统旁边),正如最近在[70]中提出的,旨在提高网络模型的公平性。
Secondly, LTN should be compared in large-scale practical use cases with other recent efforts to add structure to neural networks such as the neuro-symbolic concept learner [44] and high-level capsules which were used recently to learn the part-of relation [38], similarly to how LTN was used for semantic image interpretation in [19].
其次,在大规模实际应用案例中,应将 LTN 与其他最近的努力进行比较,以为神经网络添加结构,例如神经符号概念学习者[44]和最近用于学习部分关系的高级胶囊[38],类似于 LTN 在[19]中用于语义图像解释的方式。
Finally, LTN should also be compared with Tensor Product Representations, e.g. [59], which show that state-of-the-art recurrent neural networks may fail at simple question-answering tasks, despite achieving very high accuracy. Efforts in the area of transfer learning, mostly in computer vision, which seek to model systematicity could also be considered a benchmark [5]. Experiments using fewer data and therefore lower energy consumption, out-of-distribution extrapolation, and knowledge-based transfer are all potentially suitable areas of application for LTN as a framework for neurosymbolic AI based on learning from data and compositional knowledge.
最后,LTN 还应与张量积表示进行比较,例如 [59],该表示表明,尽管最先进的循环神经网络在简单的问答任务上可能失败,但却取得了非常高的准确性。在迁移学习领域的努力,主要集中在计算机视觉领域,试图建模系统性,也可以作为一个基准 [5]。利用更少数据和因此更低能耗的实验,超出分布外推,以及基于知识的转移,都是潜在适用于 LTN 的领域,作为一个基于数据学习和组合知识的神经符号 AI 框架。

Acknowledgement 致谢

We would like to thank Benedikt Wagner for his comments and a number of productive discussions on continual learning, knowledge extraction and reasoning in LTNs.
我们要感谢 Benedikt Wagner 对于在 LTNs 中持续学习、知识提取和推理方面的评论和许多富有成效的讨论。

References 参考文献

[1] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Good-fellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Ku-nal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from ten-sorflow.org.
[1] 马丁·阿巴迪,阿希什·阿加瓦尔,保罗·巴翰,尤金·布雷夫多,陈志峰,克雷格·西特罗,格雷格·S·科拉多,安迪·戴维斯,杰弗里·迪恩,马修·德文,桑杰·盖马瓦特,伊恩·古德费洛,安德鲁·哈普,杰弗里·欧文,迈克尔·伊萨德,贾扬清,拉法尔·约瑟福维奇,卢卡什·凯撒,曼吉纳斯·库德勒,乔什·莱文伯格,蒲公英·马内,拉贾特·蒙加,雪莉·摩尔,德里克·默里,克里斯·奥拉,迈克·舒斯特,乔纳森·施伦斯,贝努瓦·斯泰纳,伊利亚·苏特斯凯弗,库纳尔·塔尔瓦尔,保罗·塔克,文森特·范霍克,维杰·瓦苏德万,费尔南达·维加斯,奥里奥尔·维尼亚尔斯,皮特·沃登,马丁·瓦滕伯格,马丁·维克,袁宇,郑晓强。TensorFlow:2015 年异构系统上的大规模机器学习。软件可从 ten-sorflow.org 获取。
[2] Michael Akintunde, Elena Botoeva, Panagiotis Kouvaros, and Alessio Lomuscio. Verifying strategic abilities of neural multi-agent systems. In Proceedings of 17th International Conference on Principles of Knowledge Representation and Reasoning, KR2020, Rhodes, Greece, September 2020.
[2] Michael Akintunde, Elena Botoeva, Panagiotis Kouvaros, 和 Alessio Lomuscio. 验证神经多智能体系统的战略能力. 在第 17 届知识表示与推理原理国际会议论文集中,KR2020,2020 年 9 月,希腊罗得岛。
[3] Samy Badreddine and Michael Spranger. Injecting Prior Knowledge for Transfer Learning into Reinforcement Learning Algorithms using Logic Tensor Networks. arXiv:1906.06576 [cs, stat], June 2019. arXiv: 1906.06576.
Samy Badreddine 和 Michael Spranger。将先前知识注入逻辑张量网络的转移学习应用于强化学习算法。arXiv:1906.06576 [cs, stat],2019 年 6 月。arXiv: 1906.06576。
[4] Peter Battaglia, Razvan Pascanu, Matthew Lai, Danilo Jimenez Rezende, and Koray kavukcuoglu. Interaction networks for learning about objects, relations and physics. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS'16, pages 4509-4517, USA, 2016. Curran Associates Inc.
[4] Peter Battaglia, Razvan Pascanu, Matthew Lai, Danilo Jimenez Rezende, and Koray kavukcuoglu. 用于学习对象、关系和物理的交互网络。在第 30 届国际神经信息处理系统会议论文集中,NIPS'16,第 4509-4517 页,2016 年,美国。Curran Associates Inc.
[5] Yoshua Bengio, Tristan Deleu, Nasim Rahaman, Nan Rosemary Ke, Sebastien Lachapelle, Olexa Bilaniuk, Anirudh Goyal, and Christopher Pal. A meta-transfer objective for learning to disentangle causal mechanisms. In International Conference on Learning Representations, 2020.
[5] Yoshua Bengio, Tristan Deleu, Nasim Rahaman, Nan Rosemary Ke, Sebastien Lachapelle, Olexa Bilaniuk, Anirudh Goyal, and Christopher Pal. 一个元迁移目标用于学习解开因果机制。在 2020 年国际学习表示会议上。
[6] Federico Bianchi and Pascal Hitzler. On the capabilities of logic tensor networks for deductive reasoning. In Proceedings of the AAAI 2019 Spring Symposium on Combining Machine Learning with Knowledge Engineering (AAAI-MAKE 2019) Stanford University, Palo Alto, California, USA, March 25-27, 2019., Stanford University, Palo Alto, California, USA, March 25-27, 2019., 2019.
[6] Federico Bianchi 和 Pascal Hitzler。关于逻辑张量网络在演绎推理中的能力。在 2019 年 AAAI 春季研讨会关于将机器学习与知识工程相结合的论文集(AAAI-MAKE 2019)中发表。2019 年 3 月 25 日至 27 日,美国加利福尼亚州帕洛阿尔托斯坦福大学。
[7] Federico Bianchi, Matteo Palmonari, Pascal Hitzler, and Luciano Serafini. Complementing logical reasoning with sub-symbolic commonsense. In International Joint Conference on Rules and Reasoning, pages 161-170. Springer, 2019.
[7] Federico Bianchi, Matteo Palmonari, Pascal Hitzler, 和 Luciano Serafini. 用亚符号常识补充逻辑推理。在国际规则与推理联合会议上,第 161-170 页。Springer,2019。
[8] Rafael Borges, Artur d'Avila Garcez, and Luís Lamb. Learning and representing temporal knowledge in recurrent networks. IEEE transactions on neural networks / a publication of the IEEE Neural Networks Council, 22:2409-21, 122011.
[8] Rafael Borges, Artur d'Avila Garcez, 和 Luís Lamb. 在循环网络中学习和表示时间知识. IEEE 神经网络交易/IEEE 神经网络委员会出版物, 22:2409-21, 122011.
[9] Liber Běhounek, Petr Cintula, and Petr Hájek. Introduction to mathematical fuzzy logic. In Petr Cintula, Petr Hájek, and Carles Noguera, editors, Handbook of Mathematical Fuzzy Logic, Volume 1, volume 37 of Studies in Logic, Mathematical Logic and Foundations, pages 1-102. College Publications, 2011.
[9] 利贝尔·贝霍内克(Liber Běhounek)、彼得·辛图拉(Petr Cintula)和彼得·哈耶克(Petr Hájek)。《数学模糊逻辑导论》。见彼得·辛图拉(Petr Cintula)、彼得·哈耶克(Petr Hájek)和卡尔斯·诺格拉(Carles Noguera)主编,《数学模糊逻辑手册》,第 1 卷,逻辑研究、数理逻辑与基础研究系列第 37 卷,第 1-102 页。学院出版社,2011 年。
[10] N. A. Campbell and R. J. Mahon. A multivariate study of variation in two species of rock crab of the genus Leptograpsus. Australian Journal of Zoology, 22(3):417-425, 1974. Publisher: CSIRO PUBLISHING.
[10] N. A. Campbell 和 R. J. Mahon。《Leptograpsus 属两种岩蟹变异的多元研究》。《澳大利亚动物学杂志》,22(3):417-425,1974 年。出版商:CSIRO 出版社。
[11] Andres Campero, Aldo Pareja, Tim Klinger, Josh Tenenbaum, and Sebastian Riedel. Logical rule induction and theory learning using neural theorem proving. CoRR, abs/1809.02193, 2018.
[11] Andres Campero, Aldo Pareja, Tim Klinger, Josh Tenenbaum, 和 Sebastian Riedel. 逻辑规则归纳和使用神经定理证明进行理论学习。CoRR, abs/1809.02193, 2018.
[12] Benhui Chen, Xuefen Hong, Lihua Duan, and Jinglu Hu. Improving multi-label classification performance by label constraints. In The 2013 International Joint Conference on Neural Networks (IJCNN), pages 1-5. IEEE, 2013.
[12] 陈本辉,洪雪芬,段丽华,胡静露。通过标签约束提高多标签分类性能。在 2013 年国际神经网络联合会议(IJCNN)上,第 1-5 页。IEEE,2013 年。
[13] William W. Cohen, Fan Yang, and Kathryn Mazaitis. Tensorlog: A probabilistic database implemented using deep-learning infrastructure. J. Artif. Intell. Res., 67:285-325, 2020.
[13] William W. Cohen, Fan Yang, 和 Kathryn Mazaitis. Tensorlog: 一个使用深度学习基础设施实现的概率数据库。J. 人工智能研究, 67:285-325, 2020.
[14] W.-Z. Dai, Q. Xu, Y. Yu, and Z.-H. Zhou. Bridging machine learning and logical reasoning by abductive learning. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, NeurIPS'19, USA, 2019. Curran Associates Inc.
[14] 戴文忠,徐强,余勇,周志华。通过诱导学习架起机器学习和逻辑推理之间的桥梁。在第 33 届国际神经信息处理系统会议论文集中,NeurIPS'19,2019 年,美国。Curran Associates Inc.
[15] Alessandro Daniele and Luciano Serafini. Knowledge enhanced neural networks. In Pacific Rim International Conference on Artificial Intelligence, pages 542-554. Springer, 2019.
[15] Alessandro Daniele 和 Luciano Serafini。知识增强神经网络。在太平洋沿岸国际人工智能会议上,第 542-554 页。Springer,2019 年。
[16] Artur d'Avila Garcez, Marco Gori, Luís C. Lamb, Luciano Serafini, Michael Spranger, and Son N. Tran. Neural-symbolic computing: An effective methodology for principled integration of machine learning and reasoning. FLAP, 6(4):611-632, 2019.
[16] Artur d'Avila Garcez, Marco Gori, Luís C. Lamb, Luciano Serafini, Michael Spranger, and Son N. Tran. 神经符号计算:机器学习和推理的原则性集成的有效方法论. FLAP, 6(4):611-632, 2019.
[17] Artur d'Avila Garcez and Luis C. Lamb. Neurosymbolic AI: The 3rd wave, 2020.
[17] 阿图尔·达维拉·加尔塞兹和路易斯·C·兰布。神经符号人工智能:第三波,2020 年。
[18] Ivan Donadello and Luciano Serafini. Compensating supervision incompleteness with prior knowledge in semantic image interpretation. In 2019 International Joint Conference on Neural Networks (IJCNN), pages 1-8. IEEE, 2019.
[18] Ivan Donadello 和 Luciano Serafini。在 2019 年国际神经网络联合会议(IJCNN)上,通过先验知识来弥补语义图像解释中的监督不完整性。IEEE,2019 年。
[19] Ivan Donadello, Luciano Serafini, and Artur d'Avila Garcez. Logic tensor networks for semantic image interpretation. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pages 1596-1602, 2017.
[19] Ivan Donadello, Luciano Serafini, 和 Artur d'Avila Garcez. 逻辑张量网络用于语义图像解释。在第二十六届国际人工智能联合会议论文集中,IJCAI-17,页码 1596-1602,2017 年。
[20] Dheeru Dua and Casey Graff. UCI machine learning repository, 2017.
[20] Dheeru Dua 和 Casey Graff。UCI 机器学习仓库,2017。
[21] Richard Evans and Edward Grefenstette. Learning explanatory rules from noisy data. J. Artif. Intell. Res., 61:1-64, 2018.
Richard Evans 和 Edward Grefenstette。从嘈杂数据中学习解释性规则。J. Artif. Intell. Res.,61:1-64,2018。
[22] Ronald Fagin, Ryan Riegel, and Alexander Gray. Foundations of reasoning with uncertainty via real-valued logics, 2020.
[22] Ronald Fagin, Ryan Riegel, and Alexander Gray. 2020 年《基于实值逻辑的不确定性推理基础》。
[23] Marc Fischer, Mislav Balunovic, Dana Drachsler-Cohen, Timon Gehr, Ce Zhang, and Martin Vechev. D12: Training and querying neural networks with logic. In International Conference on Machine Learning, pages 1931-1941, 2019.
[23] Marc Fischer, Mislav Balunovic, Dana Drachsler-Cohen, Timon Gehr, Ce Zhang, and Martin Vechev. D12: 使用逻辑训练和查询神经网络。在机器学习国际会议上,第 1931-1941 页,2019 年。
[24] Manoel Franca, Gerson Zaverucha, and Artur d'Avila Garcez. Fast relational learning using bottom clause propositionalization with artificial neural networks. Machine Learning, 94:81- 104,012014.
[24] Manoel Franca, Gerson Zaverucha 和 Artur d'Avila Garcez. 使用人工神经网络进行底层子句命题化的快速关系学习。机器学习,94:81-104,012014。
[25] Dov M. Gabbay and John Woods, editors. The Many Valued and Nonmonotonic Turn in Logic, volume 8 of Handbook of the History of Logic. Elsevier, 2007.
[25] Dov M. Gabbay 和 John Woods,编辑。《逻辑中的多值和非单调转向》,《逻辑史手册》第 8 卷。Elsevier,2007 年。
[26] Artur d'Avila Garcez, Dov M. Gabbay, and Krysia B. Broda. Neural-Symbolic Learning System: Foundations and Applications. Springer-Verlag, Berlin, Heidelberg, 2002.
[26] Artur d'Avila Garcez, Dov M. Gabbay, 和 Krysia B. Broda. 神经符号学习系统:基础和应用. Springer-Verlag, 柏林,海德堡,2002.
[27] Artur d'Avila Garcez, Lus C. Lamb, and Dov M. Gabbay. Neural-Symbolic Cognitive Reasoning. Springer Publishing Company, Incorporated, 1 edition, 2008.
[27] Artur d'Avila Garcez, Lus C. Lamb, and Dov M. Gabbay. 神经符号认知推理. Springer Publishing Company, Incorporated, 1 edition, 2008.
[28] Petr Hajek. Metamathematics of Fuzzy Logic. Kluwer Academic Publishers, 1998.
[28] Petr Hajek. 模糊逻辑的元数学. Kluwer 学术出版社, 1998.
[29] Barbara Hammer and Pascal Hitzler, editors. Perspectives of Neural-Symbolic Integration, volume 77 of Studies in Computational Intelligence. Springer, 2007.
[29] Barbara Hammer 和 Pascal Hitzler,编辑。神经符号整合的视角,计算智能研究中的第 77 卷。Springer,2007 年。
[30] Stevan Harnad. The symbol grounding problem. Physica D: Nonlinear Phenomena, 42(1-3):335- 346, 1990.
Stevan Harnad. 符号概念问题. Physica D: 非线性现象, 42(1-3):335-346, 1990.
[31] Patrick Hohenecker and Thomas Lukasiewicz. Ontology reasoning with deep neural networks. Journal of Artificial Intelligence Research, 68:503-540, 2020.
[31] Patrick Hohenecker 和 Thomas Lukasiewicz。使用深度神经网络进行本体推理。《人工智能研究杂志》,68:503-540,2020 年。
[32] Steffen Hölldobler and Franz J. Kurfess. CHCL - A connectionist infernce system. In Bertram Fronhöfer and Graham Wrightson, editors, Parallelization in Inference Systems, International Workshop, Dagstuhl Castle, Germany, December 17-18, 1990, Proceedings, volume 590 of Lecture Notes in Computer Science, pages 318-342. Springer, 1990.
[32] Steffen Hölldobler 和 Franz J. Kurfess。CHCL - 一个连接主义推理系统。在 Bertram Fronhöfer 和 Graham Wrightson 编辑的《推理系统并行化国际研讨会,德国达格斯图尔城,1990 年 12 月 17-18 日,会议论文集,计算机科学讲座笔记 590 卷》,第 318-342 页。Springer,1990 年。
[33] Zhiting Hu, Xuezhe Ma, Zhengzhong Liu, Eduard Hovy, and Eric Xing. Harnessing deep neural networks with logic rules. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2410-2420, Berlin, Germany, August 2016. Association for Computational Linguistics.
[33] 胡志婷,马学哲,刘正中,埃德华·霍维和艾瑞克·辛。利用逻辑规则驾驭深度神经网络。在第 54 届计算语言学协会年会论文集(第 1 卷:长文)中,第 2410-2420 页,德国柏林,2016 年 8 月。计算语言学协会。
[34] Henry Kautz. The Third AI Summer, AAAI Robert S. Engelmore Memorial Lecture, Thirty-fourth AAAI Conference on Artificial Intelligence, New York, NY, February 10, 2020.
[34] 亨利·考茨。第三次人工智能夏季,AAAI 罗伯特 S.恩格尔莫尔纪念讲座,第三十四届人工智能大会,2020 年 2 月 10 日,纽约。
[35] Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs], January 2017. arXiv: 1412.6980.
[35] Diederik P. Kingma 和 Jimmy Ba。Adam:一种随机优化方法。arXiv:1412.6980 [cs],2017 年 1 月。arXiv: 1412.6980。
[36] Doga Kisa, Guy Van den Broeck, Arthur Choi, and Adnan Darwiche. Probabilistic sentential decision diagrams. In Proceedings of the 14th International Conference on Principles of Knowledge Representation and Reasoning (KR), July 2014.
[36] Doga Kisa, Guy Van den Broeck, Arthur Choi, and Adnan Darwiche. 概率句子决策图。在第 14 届知识表示与推理原理国际会议(KR)论文集中,2014 年 7 月。
[37] Erich Peter Klement, Radko Mesiar, and Endre Pap. Triangular Norms, volume 8 of Trends in Logic. Springer Netherlands, Dordrecht, 2000.
[37] Erich Peter Klement, Radko Mesiar, and Endre Pap. 三角范数, 逻辑趋势中的第 8 卷. Springer Netherlands, Dordrecht, 2000.
[38] Adam Kosiorek, Sara Sabour, Yee Whye Teh, and Geoffrey E Hinton. Stacked capsule autoen-coders. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 15512-15522. Curran Associates, Inc., 2019.
[38] Adam Kosiorek, Sara Sabour, Yee Whye Teh, 和 Geoffrey E Hinton. 堆叠胶囊自动编码器. 在 H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, 和 R. Garnett 编辑, Advances in Neural Information Processing Systems 32, 页码 15512-15522. Curran Associates, Inc., 2019.
[39] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, NIPS'12, page 1097-1105, Red Hook, NY, USA, 2012. Curran Associates Inc.
[39] Alex Krizhevsky, Ilya Sutskever, 和 Geoffrey E. Hinton. 使用深度卷积神经网络进行 Imagenet 分类. 在第 25 届国际神经信息处理系统会议论文集-第 1 卷中,NIPS'12,第 1097-1105 页,2012 年,美国纽约州红钩,Curran Associates Inc.
[40] Luís C. Lamb, Artur d'Avila Garcez, Marco Gori, Marcelo O. R. Prates, Pedro H. C. Avelar, and Moshe Y. Vardi. Graph neural networks meet neural-symbolic computing: A survey and perspective. In Christian Bessiere, editor, Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020 [scheduled for July 2020, Yokohama, Japan, postponed due to the Corona pandemic], pages 4877-4884. ijcai.org, 2020.
[40] Luís C. Lamb, Artur d'Avila Garcez, Marco Gori, Marcelo O. R. Prates, Pedro H. C. Avelar, and Moshe Y. Vardi. 图神经网络遇上神经符号计算:调查与展望. In Christian Bessiere, 编辑, 第二十九届国际人工智能联合大会论文集, IJCAI 2020 [原定于 2020 年 7 月,日本横滨举行,因新冠疫情推迟], 页码 4877-4884. ijcai.org, 2020.
[41] Robin Manhaeve, Sebastijan Dumancic, Angelika Kimmig, Thomas Demeester, and Luc De Raedt. Deepproblog: Neural probabilistic logic programming. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NeurIPS'18, pages 3753-3763, USA, 2018. Curran Associates Inc.
Robin Manhaeve, Sebastijan Dumancic, Angelika Kimmig, Thomas Demeester, 和 Luc De Raedt. Deepproblog: 神经概率逻辑编程. 在第 32 届国际神经信息处理系统会议论文集中,NeurIPS'18,页码 3753-3763,美国,2018 年。Curran Associates Inc.
[42] Francesco Manigrasso, Filomeno Davide Miro, Lia Morra, and Fabrizio Lamberti. Faster-LTN: a neuro-symbolic, end-to-end object detection architecture. arXiv:2107.01877 [cs], July 2021.
[42] Francesco Manigrasso, Filomeno Davide Miro, Lia Morra, and Fabrizio Lamberti. Faster-LTN: 一种神经符号化的端到端目标检测架构. arXiv:2107.01877 [cs], 2021 年 7 月.
[43] Vasco Manquinho, Joao Marques-Silva, and Jordi Planes. Algorithms for weighted boolean optimization. In International conference on theory and applications of satisfiability testing, pages 495-508. Springer, 2009.
[43] Vasco Manquinho, Joao Marques-Silva, and Jordi Planes. 带权布尔优化算法. 在满足性测试理论与应用国际会议上, 495-508 页. Springer, 2009.
[44] Jiayuan Mao, Chuang Gan, Pushmeet Kohli, Joshua B. Tenenbaum, and Jiajun Wu. The neuro-symbolic concept learner: Interpreting scenes, words, and sentences from natural supervision. CoRR, abs/1904.12584, 2019.
[44] 毛佳源,甘创,Pushmeet Kohli,Joshua B. Tenenbaum 和吴家俊。神经符号概念学习者:从自然监督中解释场景,单词和句子。CoRR,abs/1904.12584,2019。
[45] Giuseppe Marra, Francesco Giannini, Michelangelo Diligenti, and Marco Gori. Constraint-based visual generation. In Igor V. Tetko, Vera Kurková, Pavel Karpov, and Fabian J. Theis, editors, Artificial Neural Networks and Machine Learning - ICANN 2019: Image Processing - 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17-19, 2019, Proceedings, Part III, volume 11729 of Lecture Notes in Computer Science, pages 565-577. Springer, 2019.
[45] Giuseppe Marra, Francesco Giannini, Michelangelo Diligenti, 和 Marco Gori. 基于约束的视觉生成. 在 Igor V. Tetko, Vera Kurková, Pavel Karpov, 和 Fabian J. Theis 编辑的 Artificial Neural Networks and Machine Learning - ICANN 2019: Image Processing - 第 28 届国际人工神经网络大会, 德国慕尼黑, 2019 年 9 月 17-19 日, 论文集, 第三部分, Lecture Notes in Computer Science 卷号 11729, 页码 565-577. Springer, 2019.
[46] Giuseppe Marra, Francesco Giannini, Michelangelo Diligenti, and Marco Gori. Integrating learning and reasoning with deep logic models. In Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2019, Wiirzburg, Germany, September 16-20, 2019, Proceedings, Part II, volume 11907 of Lecture Notes in Computer Science, pages 517-532. Springer, 2019.
[46] Giuseppe Marra, Francesco Giannini, Michelangelo Diligenti, and Marco Gori. 将学习和推理与深度逻辑模型相结合。在机器学习和数据库知识发现 - 欧洲会议,ECML PKDD 2019,德国维尔茨堡,2019 年 9 月 16-20 日,第 II 部分,计算机科学讲座笔记卷 11907,第 517-532 页。Springer,2019。
[47] Giuseppe Marra, Francesco Giannini, Michelangelo Diligenti, and Marco Gori. Lyrics: A general interface layer to integrate logic inference and deep learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 283-298. Springer, 2019.
[47] Giuseppe Marra, Francesco Giannini, Michelangelo Diligenti, 和 Marco Gori. Lyrics: 一个通用的接口层,用于集成逻辑推理和深度学习。在欧洲机器学习和数据库知识发现联合会议上,第 283-298 页。Springer,2019。
[48] Giuseppe Marra and Ondřej Kuželka. Neural markov logic networks. arXiv preprint arXiv:1905.13462, 2019.
[48] Giuseppe Marra 和 Ondřej Kuželka。神经马尔可夫逻辑网络。arXiv 预印本 arXiv:1905.13462,2019 年。
[49] Pasquale Minervini, Sebastian Riedel, Pontus Stenetorp, Edward Grefenstette, and Tim Rock-täschel. Learning reasoning strategies in end-to-end differentiable proving, 2020.
[49] Pasquale Minervini, Sebastian Riedel, Pontus Stenetorp, Edward Grefenstette, and Tim Rock-täschel. 学习端到端可微分证明中的推理策略,2020。
[50] Stephen H. Muggleton, Dianhuan Lin, Niels Pahlavi, and Alireza Tamaddoni-Nezhad. Meta-interpretive learning: Application to grammatical inference. Mach. Learn., 94(1):25-49, January 2014.
Stephen H. Muggleton, Dianhuan Lin, Niels Pahlavi 和 Alireza Tamaddoni-Nezhad。元解释学习:应用于语法推断。机器学习,94(1):25-49,2014 年 1 月。
[51] Karl Pearson. Liii. on lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11):559-572, 1901.
[51] 卡尔·皮尔逊。关于空间中点系统的最佳拟合直线和平面。伦敦、爱丁堡和都柏林哲学杂志与科学期刊,2(11):559-572,1901 年。
[52] Gadi Pinkas. Reasoning, nonmonotonicity and learning in connectionist networks that capture propositional knowledge. Artif. Intell., 77(2):203-247, 1995.
[52] Gadi Pinkas. 连接主义网络中的推理、非单调性和学习,捕捉命题知识。人工智能,77(2):203-247,1995 年。
[53] Meng Qu and Jian Tang. Probabilistic logic neural networks for reasoning. In Advances in Neural Information Processing Systems, pages 7712-7722, 2019.
[53] 孟渠和唐健。用于推理的概率逻辑神经网络。在神经信息处理系统的进展中,第 7712-7722 页,2019 年。
[54] Luc De Raedt, Sebastijan Dumančić, Robin Manhaeve, and Giuseppe Marra. From statistical relational to neuro-symbolic artificial intelligence, 2020.
[54] Luc De Raedt, Sebastijan Dumančić, Robin Manhaeve, and Giuseppe Marra. 从统计关系到神经符号人工智能,2020。
[55] Matthew Richardson and Pedro Domingos. Markov logic networks. Mach. Learn., 62(1-2):107- 136, February 2006.
马修·理查森(Matthew Richardson)和佩德罗·多明戈斯(Pedro Domingos)。马尔可夫逻辑网络。机器学习,62(1-2):107-136,2006 年 2 月。
[56] Ryan Riegel, Alexander Gray, Francois Luus, Naweed Khan, Ndivhuwo Makondo, Is-mail Yunus Akhalwaya, Haifeng Qian, Ronald Fagin, Francisco Barahona, Udit Sharma, Sha-jith Ikbal, Hima Karanam, Sumit Neelam, Ankita Likhyani, and Santosh Srivastava. Logical Neural Networks. arXiv:2006.13155 [cs], June 2020. arXiv: 2006.13155.
Ryan Riegel, Alexander Gray, Francois Luus, Naweed Khan, Ndivhuwo Makondo, Is-mail Yunus Akhalwaya, Haifeng Qian, Ronald Fagin, Francisco Barahona, Udit Sharma, Sha-jith Ikbal, Hima Karanam, Sumit Neelam, Ankita Likhyani, and Santosh Srivastava. 逻辑神经网络. arXiv:2006.13155 [cs], 2020 年 6 月. arXiv:2006.13155.
[57] Tim Rocktäschel and Sebastian Riedel. End-to-end differentiable proving. In Advances in Neural Information Processing Systems, pages 3788-3800, 2017.
[57] Tim Rocktäschel 和 Sebastian Riedel。端到端可微证明。在 2017 年神经信息处理系统进展中,第 3788-3800 页。
[58] Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfar-dini. The graph neural network model. Trans. Neur. Netw., 20(1):61-80, January 2009.
[58] Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfar-dini. 图神经网络模型. Trans. Neur. Netw., 20(1):61-80, 2009 年 1 月。
[59] Imanol Schlag and Jürgen Schmidhuber. Learning to reason with third-order tensor products. CoRR, abs/1811.12143, 2018.
[59] Imanol Schlag 和 Jürgen Schmidhuber. 学习使用三阶张量积进行推理。CoRR, abs/1811.12143, 2018.
[60] Imanol Schlag, Paul Smolensky, Roland Fernandez, Nebojsa Jojic, Jürgen Schmidhuber, and Jianfeng Gao. Enhancing the transformer with explicit relational encoding for math problem solving. CoRR, abs/1910.06611, 2019.
Imanol Schlag, Paul Smolensky, Roland Fernandez, Nebojsa Jojic, Jürgen Schmidhuber 和 Jianfeng Gao. 通过显式关系编码增强变压器以解决数学问题。CoRR, abs/1910.06611, 2019.
[61] Luciano Serafini and Artur d'Avila Garcez. Logic tensor networks: Deep learning and logical reasoning from data and knowledge. arXiv preprint arXiv:1606.04422, 2016.
[61] Luciano Serafini 和 Artur d'Avila Garcez。逻辑张量网络:从数据和知识进行深度学习和逻辑推理。arXiv 预印本 arXiv:1606.04422,2016。
[62] Luciano Serafini and Artur d’Avila Garcez. Learning and reasoning with logic tensor networks. In Conference of the Italian Association for Artificial Intelligence, pages 334–348. Springer, 2016.
[62] Luciano Serafini 和 Artur d'Avila Garcez。使用逻辑张量网络进行学习和推理。在意大利人工智能协会会议上,第 334-348 页。Springer,2016 年。
[63] Lokendra Shastri. Advances in SHRUTI-A neurally motivated model of relational knowledge representation and rapid inference using temporal synchrony. Appl. Intell., 11(1):79-108, 1999.
[63] Lokendra Shastri. SHRUTI 的进展-一种神经启发的关系知识表示模型和利用时间同步进行快速推理。Appl. Intell.,11(1):79-108,1999。
[64] Yun Shi. A deep study of fuzzy implications. PhD thesis, Ghent University, 2009.
[64] 史云。模糊蕴涵的深入研究。根特大学博士论文,2009 年。
[65] Paul Smolensky and Géraldine Legendre. The Harmonic Mind: From Neural Computation to Optimality-Theoretic GrammarVolume I: Cognitive Architecture (Bradford Books). The MIT Press, 2006.
保罗·斯莫伦斯基(Paul Smolensky)和杰拉尔丁·勒让德尔(Géraldine Legendre)著。《和谐心灵:从神经计算到最优理论语法 第一卷:认知架构(Bradford Books)》。麻省理工学院出版社,2006 年。
[66] Gustav Sourek, Vojtech Aschenbrenner, Filip Zelezny, Steven Schockaert, and Ondrej Kuzelka. Lifted relational neural networks: Efficient learning of latent relational structures. Journal of Artificial Intelligence Research, 62:69-100, 2018.
[66] Gustav Sourek, Vojtech Aschenbrenner, Filip Zelezny, Steven Schockaert, and Ondrej Kuzelka. 提升的关系神经网络:高效学习潜在的关系结构。 人工智能研究杂志,62:69-100,2018。
[67] Emile van Krieken, Erman Acar, and Frank van Harmelen. Semi-Supervised Learning using Differentiable Reasoning. arXiv:1908.04700 [cs], August 2019. arXiv: 1908.04700.
[67] Emile van Krieken, Erman Acar 和 Frank van Harmelen. 使用可微分推理的半监督学习. arXiv:1908.04700 [cs], 2019 年 8 月. arXiv: 1908.04700.
[68] Emile van Krieken, Erman Acar, and Frank van Harmelen. Analyzing Differentiable Fuzzy Implications. In Proceedings of the 17th International Conference on Principles of Knowledge Representation and Reasoning, pages 893-903, 92020.
[68] Emile van Krieken, Erman Acar, 和 Frank van Harmelen. 分析可微模糊蕴涵。在第 17 届国际知识表示与推理原理会议论文集中,第 893-903 页,92020 年。
[69] Emile van Krieken, Erman Acar, and Frank van Harmelen. Analyzing Differentiable Fuzzy Logic Operators. arXiv:2002.06100 [cs], February 2020. arXiv: 2002.06100.
[69] Emile van Krieken, Erman Acar, 和 Frank van Harmelen. 分析可微模糊逻辑运算符. arXiv:2002.06100 [cs], 2020 年 2 月. arXiv: 2002.06100.
[70] Benedikt Wagner and Artur d'Avila Garcez. Neural-Symbolic Integration for Fairness in AI. Proceedings of the AAAI Spring Symposium: Combining Machine Learning with Knowledge Engineering 2021, page 14, 2021.
[70] Benedikt Wagner 和 Artur d'Avila Garcez. 神经符号一体化在人工智能公平性中的应用. AAAI 春季研讨会论文集:结合机器学习与知识工程 2021, 第 14 页, 2021.
[71] Po-Wei Wang, Priya L Donti, Bryan Wilder, and Zico Kolter. Satnet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver. arXiv preprint arXiv:1905.12149, 2019.
[71] 王柏伟,普里娅·Donti,布莱恩·怀尔德和 Zico Kolter。Satnet:使用可微可满足性求解器桥接深度学习和逻辑推理。arXiv 预印本 arXiv:1905.12149,2019。
[72] Jingyi Xu, Zilu Zhang, Tal Friedman, Yitao Liang, and Guy Broeck. A Semantic Loss Function for Deep Learning with Symbolic Knowledge. In International Conference on Machine Learning, pages 5502-5511. PMLR, July 2018. ISSN: 2640-3498.
[72] 徐静怡,张子路,塔尔·弗里德曼,梁一涛和盖伊·布鲁克。一种用符号知识进行深度学习的语义损失函数。在机器学习国际会议上,第 5502-5511 页。PMLR,2018 年 7 月。ISSN:2640-3498。

Appendix A. Implementation Details
附录 A. 实施细节

The LTN library is implemented in Tensorflow 2 [1] and is available from GitHub 29 . Every logical operator is grounded using Tensorflow primitives. The LTN code implements directly a Tensorflow graph. Due to Tensorflow built-in optimization, LTN is relatively efficient while providing the expressive power of FOL.
LTN 库是在 Tensorflow 2 中实现的,并可从 GitHub 29 获取。每个逻辑运算符都是使用 Tensorflow 原语进行基础化。LTN 代码直接实现了一个 Tensorflow 图。由于 Tensorflow 内置优化,LTN 在提供 FOL 的表达能力的同时相对高效。
Table A.3 shows an overview of the network architectures used to obtain the results of the examples in Section 4 The LTN repository includes the code for these examples. Except if explicitly mentioned otherwise,the reported results are averaged over 10 runs using a 95% confidence interval. Every example uses a stable real product configuration to approximate Real Logic operators, and the Adam optimizer [35] with a learning rate of 0.001 to train the parameters. are usually used with some additional layer(s) to ground symbols. For instance,in experiment 4.2 in G(P):x,l l softmax (MLP(x)) ,the softmax layer normalizes the raw predictions of MLP to probabilities in [0,1] ,and the multiplication with the one-hot label l selects the probability for one given class.
表 A.3 显示了用于获取第 4 节示例结果的网络架构概述。LTN 存储库包含这些示例的代码。除非另有明确说明,报告的结果是在 10 次运行中使用 95% 置信区间进行平均的。每个示例使用稳定的真实产品配置来近似真实逻辑运算符,并使用学习率为 0.001 的 Adam 优化器[35]来训练参数。通常与一些额外的层一起用于基础符号。例如,在实验 G(P):x,l l softmax (MLP(x)) 中,softmax 层将 MLP 的原始预测归一化为概率 [0,1] ,并与 one-hot 标签的乘法 l 选择给定类别的概率。
Task 任务Network 网络Architecture 建筑
4.1 4.2MLPDense(16)*, Dense(16)*, Dense(1)
密集(16)*,密集(16)*,密集(1)
MLPDense (16) *,Dropout (0.2), Dense (16) *,Dropout (0.2) ,
密集 (16) *,丢弃 (0.2), 密集 (16) *,丢弃 (0.2)
Dense(8)*, Dropout(0.2), Dense(1)
密集(8)*,丢弃(0.2),密集(1)
4.3 4.4MLPDense(16),Dense(16),Dense(8),Dense(1)
CNNMNISTConv, Dense(84)*, Dense(10)
MNISTConv,Dense(84)*,Dense(10)
baseline – SD 基线 - 标准差MNISTConv×2,Dense(84),Dense(19),Softmax
baseline – MD 基线 - MDMNISTConv ×4 ,Dense (128) ,Dense (199) ,Softmax
MNISTConv ×4 ,密集 (128) ,密集 (199) ,Softmax
4.5 4.6 4.7MLPDense(8)*, Dense(8)*, Dense(1)
密集(8)*,密集(8)*,密集(1)
MLPDense(16)*, Dense(16)*, Dense(16)*, Dense(1)
密集(16)*,密集(16)*,密集(16)*,密集(1)
MLP_SDense(8)*, Dense(8)*, Dense(1)
密集(8)*,密集(8)*,密集(1)
MLP_FDense(8)*, Dense(8)*, Dense(1)
密集(8)*,密集(8)*,密集(1)
MLP_CDense(8)*, Dense(8)*, Dense(1)
密集(8)*,密集(8)*,密集(1)
  • : layer ends with an elu activation
    层以 elu 激活结束
Dense (n) : regular fully-connected layer of n units
密集 (n) :具有 n 个单元的常规全连接层
Dropout (r) : dropout layer with rate r
辍学 (r) :辍学率为 r 的辍学层
Conv(f,k) : 2D convolution layer with f filters and a kernel of size k
Conv(f,k) :具有 f 个滤波器和大小为 k 的内核的 2D 卷积层
MP(w,h) : max pooling operation with a w×h pooling window
MP(w,h) :具有 w×h 池化窗口的最大池化操作
MNISTConv : Conv (6,5),MP(2,2),Conv(16,5),MP(2,2),Dense(100) MNISTConv:卷积 (6,5),MP(2,2),Conv(16,5),MP(2,2),Dense(100)
Table A.3: Overview of the neural network architectures used in each example. Notice that in the examples, the networks
表 A.3:每个示例中使用的神经网络架构概述。请注意,在这些示例中,网络

29 https://github.com/logictensornetworks/logictensornetworks

Appendix B. Fuzzy Operators and Properties
附录 B. 模糊运算符和性质

This appendix presents the most common operators used in fuzzy logic literature and some noteworthy properties [28, 37, 64, 69].
本附录介绍了模糊逻辑文献中使用最普遍的运算符以及一些值得注意的特性[28, 37, 64, 69]。
Appendix B.1. Negation 附录 B.1. 否定
Definition 7. A negation is a function N:[0,1][0,1] that at least satisfies:
定义 7. 否定是一个函数 N:[0,1][0,1] ,至少满足以下条件:
N1. Boundary conditions: N(0)=1 and N(1)=0 ,
N1. 边界条件: N(0)=1N(1)=0
N2. Monotonically decreasing: (x,y)[0,1]2,xyN(x)N(y) .
N2. 单调递减: (x,y)[0,1]2,xyN(x)N(y)
Moreover,a negation is said to be strict if N is continuous and strictly decreasing. A negation is said to be strong if x[0,1],N(N(x))=x .
此外,如果 N 连续且严格递减,则称否定为严格的。如果 x[0,1],N(N(x))=x ,则称否定为强。
We commonly use the standard strict and strong negation NS(a)=1a .
我们通常使用标准的严格和强烈的否定 NS(a)=1a
Appendix B.2. Conjunction
附录 B.2. 连词
Definition 8. A conjuction is a function C:[0,1]2[0,1] that at least satisfies:
定义 8. 一个连词是一个至少满足以下条件的函数 C:[0,1]2[0,1] :
C1. boundary conditions: C(0,0)=C(0,1)=C(1,0)=0 and C(1,1)=1 ,
C1. 边界条件: C(0,0)=C(0,1)=C(1,0)=0C(1,1)=1 ,
C2. monotonically increasing: (x,y,z)[0,1]3 ,if xy ,then C(x,z)C(y,z) and C(z,x) C(z,y) .
C2. 单调递增: (x,y,z)[0,1]3 ,如果 xy ,那么 C(x,z)C(y,z)C(z,x) C(z,y)
In fuzzy logic, t-norms are widely used to model conjunction operators.
在模糊逻辑中,t-范数被广泛用于建模合取运算符。
Definition 9. A t-norm (triangular norm) is a function t:[0,1]2[0,1] that at least satisifies:
定义 9. t-范数(三角范数)是一个函数 t:[0,1]2[0,1] ,至少满足以下条件:
T1. boundary conditions: T(x,1)=x ,
T1. 边界条件: T(x,1)=x ,
T2. monotonically increasing,
T2. 单调递增,
T3. commutative, T3. 可交换的,
T4. associative. T4. 联想。
Example 4. Three commonly used t-norms are:
例 4. 三种常用的 t-范数是:
(minimum)TM(x,y)=min(x,y)
(product)TP(x,y)=xy
(Łukasiewicz)TL(x,y)=max(x+y1,0)
Appendix B.3. Disjunction
附录 B.3. 析取
Name 名称ababaRcaSc
Goedel 哥德尔min(a,b)max(a,b)if ac otherwise 如果 ac 否则max(1a,c)
Goguen/Product Goguen/产品aba+bab{1, if acca, otherwise 1a+ac
Lukasiewicz 鲁卡谢维奇max(a+b1,0)min(a+b,1)min(1a+c,1)min(1a+c,1)
Table B.4: Common Symmetric Configurations
表 B.4:常见对称配置
Definition 10. A disjunction is a function D:[0,1]2[0,1] that at least satisfies:
定义 10. 一个析取是一个函数 D:[0,1]2[0,1] ,至少满足以下条件:
D1. boundary conditions: D(0,0)=0 and D(0,1)=D(1,0)=D(1,1)=1 ,
D1. 边界条件: D(0,0)=0D(0,1)=D(1,0)=D(1,1)=1
D2. monotonically increasing: (x,y,z)[0,1]3 ,if xy ,then D(x,z)D(y,z) and D(z,x) D(z,y) .
D2. 单调递增: (x,y,z)[0,1]3 ,如果 xy ,那么 D(x,z)D(y,z)D(z,x) D(z,y)
Disjunctions in fuzzy logic are often modeled with t-conorms.
模糊逻辑中的分离通常用 t-共轭建模。
Definition 11. A t-conorm (triangular conorm) is a function S:[0,1]2[0,1] that at least satisfies:
定义 11. t-共鸣(三角共鸣)是一个函数 S:[0,1]2[0,1] ,至少满足:
S1. boundary conditions: S(x,0)=x ,
S1. 边界条件: S(x,0)=x ,
S2. monotonically increasing,
S2. 单调递增,
S3. commutative, S3. 可交换的,
S4. associative. S4. 联想。
Example 5. Three commonly used t-conorms are:
例 5. 三种常用的 t-共轭是:
(maximum)SM(x,y)=max(x,y)
(probabilistic sum)SP(x,y)=x+yxy
(Łukasiewicz)SL(x,y)=min(x+y,1)
Note that the only distributive pair of t-norm and t-conorm is TM and SM - that is,distributivity of the t-norm over the t-conorm, and inversely.
请注意,唯一的分配对是 TMSM - 即 t-范数对 t-共范数的分配性,反之亦然。
Definition 12. The N -dual t-conorm S of a t-norm T w.r.t. a strict fuzzy negation N is defined as:
定义 12. 对于一个 t-范数 T 关于一个严格模糊否定 NN -对偶 t-共轭 S 被定义为:
(B.1)(x,y)[0,1]2,S(x,y)=N(T(N(x),N(y))).
If N is a strong negation,we also get:
如果 N 是一个强否定,我们也会得到:
(B.2)(x,y)[0,1]2,T(x,y)=N(S(N(x),N(y))).
Appendix B.4. Implication
附录 B.4. 含义
Definition 13. An implication is a function I:[0,1]2[0,1] that at least satisfies:
定义 13. 蕴涵是一个函数 I:[0,1]2[0,1] ,至少满足以下条件:
I1. boundary Conditions: I(0,0)=I(0,1)=I(1,1)=1 and I(1,0)=0
边界条件: I(0,0)=I(0,1)=I(1,1)=1I(1,0)=0
Definition 14. There are two main classes of implications generated from the fuzzy logic operators for negation, conjunction and disjunction.
定义 14. 从模糊逻辑运算符对否定、合取和析取产生的蕴涵有两个主要类别。
S-Implications Strong implications are defined using xy=¬xy (material implication).
S-蕴涵 强蕴涵是使用 xy=¬xy (物质蕴涵)来定义的。
R-Implications Residuated implications are defined using xy=sup{z[0,1]xzy} . One way of understanding this approach is a generalization of modus ponens: the consequent is at least as true as the (fuzzy) conjunction of the antecedent and the implication.
R-蕴涵 通过使用 xy=sup{z[0,1]xzy} 来定义剩余蕴涵。理解这种方法的一种方式是将其视为摩德斯通范式的一般化:结论至少与前提和蕴涵的(模糊)合取一样真实。
Example 6. Popular fuzzy implications and their classes are presented in Table B. 5
例 6. 流行的模糊蕴涵及其类别见表 B 中。5
Name 名称I(x,y)=S-Implication S-蕴涵R-Implication R-蕴涵
Kleene-Dienes IKDmax(1x,y)S=SM N=NS-
Goedel IG 哥德尔 IG {1,xyy, otherwise T=TM
Reichenbach IR1x+xyS=SP N=NS-
Goguen IP 戈根 IP {1,xyy/x, otherwise -T=TP
Lukasiewicz ILukmin(1x+y,1)S=SL N=NST=TL
Table B.5: Popular fuzzy implications and their classes. Strong implications (S-Implications) are defined using a fuzzy negation and fuzzy disjunction. Residuated implications (R-Implications) are defined using a fuzzy conjunction.
表 B.5:常见的模糊蕴涵及其类别。强蕴涵(S-蕴涵)是使用模糊否定和模糊析取来定义的。残余蕴涵(R-蕴涵)是使用模糊合取来定义的。

Appendix B.5. Aggregation
附录 B.5. 聚合

Definition 15. An aggregation operator is a function A:nN[0,1]n[0,1] that at least satisfies:
定义 15. 聚合运算符是一个函数 A:nN[0,1]n[0,1] ,至少满足以下条件:
A1. A(x1,,xn)A(y1,,yn) whenever xiyi for all i{1,,n} ,
A1. A(x1,,xn)A(y1,,yn) 每当 xiyi 对于所有 i{1,,n}
A2. A(x)=x forall x[0,1] ,
A2. 对于所有的 x[0,1]
A3. A(0,,0)=0 and A(1,,1)=1 .
A3. A(0,,0)=0A(1,,1)=1
Example 7. Candidates for universal quantification can be obtained using t-norms with AT(xi)= xi and AT(x1,,xn)=T(x1,AT(x2,,xn)) :
示例 7. 可以使用 t-范数和 AT(xi)= xi 以及 AT(x1,,xn)=T(x1,AT(x2,,xn)) 获得全称量化的候选项:
(minimum)ATM(x1,,xn)=min(x1,,xn)
(product)ATP(x1,,xn)=i=1nxi
(Łukasiewicz)ATL(x1,,xn)=max(i=1nxin+1,0)
Similarly,candidates for existential quantification can be obtained using s-norms with AS(xi)= xi and AS(x1,,xn)=S(x1,AS(x2,,xn)) :
同样,可以使用具有 AS(xi)=AS(x1,,xn)=S(x1,AS(x2,,xn)) 的 s-范数来获得存在量化 的候选项:
(maximum)ASM(x1,,xn)=max(x1,,xn)
(probabilistic sum)ASP(x1,,xn)=1i=1n(1xi)
(Łukasiewicz)ASL(x1,,xn)=min(i=1nxi,1)
(TM,SM,NS)(TP,SP,NS)(TL,SL,NS)
IKDIGIRIPILuk
Commutativity of ,
的可交换性
Associativity of ,
的结合性
Distributivity of over
的分配律
Distributivity of over
的分配律
Distrib. of over ,
, 上的 分布
Double negation ¬¬p=p 双重否定 ¬¬p=p
Law of excluded middle 排中律
Law of non contradiction 矛盾律
De Morgan’s laws 德摩根定律
Material Implication 物质蕴涵
Contraposition 对立论
Table B.6: Common properties for different configurations
表 B.6:不同配置的常见属性
Following are other common aggregators:
以下是其他常见的聚合器:
(mean)AM(x1,,xn)=1ni=1nxi
(p-mean)ApM(x1,,xn)=(1ni=1nxip)1p
(p-mean error)ApME(x1,,xn)=1(1ni=1n(1xi)p)1p
Where ApM is the generalized mean,and ApME can be understood as the generalized mean measured w.r.t. the errors. That is, ApME measures the power of the deviation of each value from the ground truth 1. A few particular values of p yield special cases of aggregators. Notably:
其中 ApM 是广义均值, ApME 可以理解为相对于误差测量的广义均值。也就是说, ApME 衡量了每个值偏离真实值 1 的程度。 p 的一些特定值产生了聚合器的特殊情况。特别是:
-limp+ApM(x1,,xn)=max(x1,,xn),
-limpApM(x1,,xn)=min(x1,,xn),
-limp+ApME(x1,,xn)=min(x1,,xn),
-limpApME(x1,,xn)=max(x1,,xn).
These "smooth" min (resp. max) approximators are good candidates for (resp. ) in a fuzzy context. The value of p leaves more or less room for outliers depending on the use case and its needs. Note that ApME and ApM are related in the same way that and are related using the definition ¬¬ ,where ¬ would be approximated by the standard negation.
这些“平滑”最小(或最大)逼近器是模糊环境中 (或 )的良好候选者。根据使用情况及其需求, p 的值在一定程度上留有异常值的空间。请注意, ApMEApM 的关系与 的关系相同,使用定义 ¬¬ ,其中 ¬ 将由标准否定近似。
We propose to use ApME with p1 to approximate and ApM with p1 to approximate . When p1 ,these operators resemble the lp norm of a vector u=(u1,u2,,un) ,where up=(|u1|p+|u2|p++|un|p)1/p . In our case,many properties of the lp norm can apply to ApM (positive homogeneity,triangular inequality,...).
我们建议使用 ApMEp1 来近似 ApM ,并使用 p1 来近似 。当 p1 时,这些运算符类似于向量 u=(u1,u2,,un)lp 范数,其中 up=(|u1|p+|u2|p++|un|p)1/p 。在我们的情况下, lp 范数的许多属性可以应用于 ApM (正齐次性,三角不等式,...)。

Appendix C. Analyzing Gradients of Generalized Mean Aggregators
附录 C. 分析广义均值聚合器的梯度

[69] show that some operators used in Fuzzy Logics are unsuitable for use in a differentiable learning setting. Three types of gradient problems commonly arise in fuzzy logic operators.
[69] 表明模糊逻辑中使用的一些算子不适合在可微学习环境中使用。模糊逻辑算子中通常出现三种类型的梯度问题。
Single-Passing The derivatives of some operators are non-null for only one argument. The gradients propagate to only one input at a time.
某些运算符的导数仅对一个参数为非零。梯度一次只传播到一个输入。
Vanishing Gradients The gradients vanish on some part of the domain. The learning does not update inputs that are in the vanishing domain.
梯度消失 梯度在域的某些部分消失。学习不会更新处于消失域中的输入。
Exploding Gradients Large error gradients accumulate and result in unstable updates.
梯度爆炸 大误差梯度累积导致不稳定的更新。
Tables C. 7 and C. 8 summarize their conclusions for the most common operators. Also, we underline here exploding gradients issues that arise experimentally in ApM and ApME ,which are not in the original report. Given the truth values of n propositions (x1,,xn) in [0,1]n :
表 C.7 和 C.8 总结了他们对最常见的运算符的结论。此外,我们在这里强调了在 ApMApME 中实验中出现的梯度爆炸问题,这些问题在原始报告中并不存在。考虑 [0,1]n(x1,,xn) 命题的真值:
1.ApM(x1,,xn)=(1nixip)1p
The partial derivatives are ApM(x1,,xn)xi=1n1p(j=1nxjp)1p1xip1 .
偏导数是 ApM(x1,,xn)xi=1n1p(j=1nxjp)1p1xip1
When p>1 ,the operator weights more for inputs with a higher true value -i.e. their partial derivative is also higher - and suits for existential quantification. When p<1 ,the operator weights more for inputs with a lower true value and suits for universal quantification.
p>1 时,运算符对具有更高真值的输入赋予更高的权重 - 即它们的偏导数也更高 - 适用于存在量化。当 p<1 时,运算符对具有更低真值的输入赋予更高的权重,并适用于全称量化。
Exploding Gradients When p>1 ,if j=1nxjp0 ,then (j=1nxjp)1p1 and the gradients explode. When p<1 ,if xi0 ,then xip1 .
梯度爆炸 当 p>1 时,如果 j=1nxjp0 ,则 (j=1nxjp)1p1 ,梯度会爆炸。当 p<1 时,如果 xi0 ,则 xip1
2.ApME(x1,,xn)=1(1ni(1xi)p)1p
The partial derivatives are ApME(x1,,xn)xi=1n1p(j=1n(1xj)p)1p1(1xi)p1 . When p>1 , the operator weights more for inputs with a lower true value -i.e. their partial derivative is also higher - and suits for universal quantification. When p<1 ,the operator weights more for inputs with a higher true value and suits for existential quantification.
偏导数是 ApME(x1,,xn)xi=1n1p(j=1n(1xj)p)1p1(1xi)p1 。当 p>1 时,运算符更加重视具有较低真实值的输入 - 即它们的偏导数也更高 - 适用于全称量化。当 p<1 时,运算符更加重视具有较高真实值的输入,适用于存在量化。
Exploding Gradients 梯度爆炸
When p>1 ,if j=1n(1xj)p0 ,then (j=1n(1xj)p)1p1 and the gradients
p>1 时,如果 j=1n(1xj)p0 ,那么 (j=1n(1xj)p)1p1 和梯度
explode. When p<1 ,if 1xi0 ,then (1xi)p1 .
爆炸。当 p<1 时,如果 1xi0 ,那么 (1xi)p1
We propose the following stable product configuration that does not have any of the aforemen-
我们提出以下稳定的产品配置,不包含任何先前提到的任何内容
Single-Passing 单次通行Vanishing 消失Exploding 爆炸
Goedel (mininum) 哥德尔(最小)
TM,SMx
IKDx
IGxx
Goguen (product) 戈根(产品)
TP , SP(X)
IR(X)
IKDx(X)
Lukasiewicz 鲁卡谢维奇
TL,SLx
ILukx
Table C.7: Gradient problems for some binary connectives. (X) means that the problem only appears on an edge case.
表 C.7:一些二元连接词的梯度问题。 (X) 表示问题仅出现在边缘情况。
Single-Passing 单次通行Vanishing 消失Exploding 爆炸
ATM/ASMx
ATP/ASPx
ATL/ASLx
ApM(X)
ApME(X)
Table C.8: Gradient problems for some aggregators. (X) means that the problem only appears on an edge case.
表 C.8:一些聚合器的梯度问题。 (X) 表示该问题仅出现在边缘情况。
tioned gradient problems:
提到的梯度问题:
(C.1)π0(x)=(1ϵ)x+ϵ
(C.2)π1(x)=(1ϵ)x
(C.3)NS(x)=1x
(C.4)TP(x,y)=π0(x)π0(y)
(C.5)SP(x,y)=π1(x)+π1(y)π1(x)π1(y)
(C.6)IR(x,y)=1π0(x)+π0(x)π1(y)
(C.7)ApM(x1,,xn)=(1ni=1nπ0(xi)p)1pp1
(C.8)ApME(x1,,yn)=1(1ni=1n(1π1(xi))p)1pp1
NS is the operator for negation, TP for conjunction, SP for disjunction, IP for implication, ApM for existential aggregation, ApME for universal aggregation.
NS 是否定运算符, TP 是合取运算符, SP 是析取运算符, IP 是蕴含运算符, ApM 是存在量词聚合, ApME 是全称量词聚合。