这是用户在 2024-7-27 21:35 为 https://scientific-github.5208818.xyz/Demo/2012.13635v4/Doc2X/Original/2012.13635v4350px.html 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

Logic Tensor Networks 逻辑张量网络

Samy Badreddine a,b, ,Artur d Avila Garce zc ,Luciano Serafini d ,Michael Spranger a,b
Samy Badreddine a,b, ,Artur d Avila Garce zc ,Luciano Serafini d ,Michael Spranger a,b
a Sony Computer Science Laboratories Inc,3-14-13 Higashigotanda,141-0022,Tokyo,Japan
索尼计算机科学实验室有限公司,日本东京都 141-0022 东五反田 3-14-13
b Sony AI Inc,1-7-1 Konan,108-0075,Tokyo,Japan
Sony AI Inc, 1-7-1 Konan, 108-0075, 东京, 日本
c City,University of London,Northampton Square,EC1V 0HB,London,United Kingdom
伦敦城市大学,北安普顿广场,EC1V 0HB,伦敦,英国
d Fondazione Bruno Kessler,Via Sommarive 18,38123,Trento,Italy
Fondazione Bruno Kessler, 意大利特伦托市 Sommarive 大街 18 号,邮编 38123

Abstract 摘要

Attempts at combining logic and neural networks into neurosymbolic approaches have been on the increase in recent years. In a neurosymbolic system, symbolic knowledge assists deep learning, which typically uses a sub-symbolic distributed representation, to learn and reason at a higher level of abstraction. We present Logic Tensor Networks (LTN), a neurosymbolic framework that supports querying, learning and reasoning with both rich data and abstract knowledge about the world. LTN introduces a fully differentiable logical language, called Real Logic, whereby the elements of a first-order logic signature are grounded onto data using neural computational graphs and first-order fuzzy logic semantics. We show that LTN provides a uniform language to represent and compute efficiently many of the most important AI tasks such as multi-label classification, relational learning, data clustering, semi-supervised learning, regression, embedding learning and query answering. We implement and illustrate each of the above tasks with several simple explanatory examples using TensorFlow 2. The results indicate that LTN can be a general and powerful framework for neurosymbolic AI.
近年来,将逻辑和神经网络结合为神经符号方法的尝试不断增加。在神经符号系统中,符号知识辅助深度学习,后者通常使用子符号分布表示,以在更高抽象级别上学习和推理。我们提出了逻辑张量网络(LTN),这是一个支持使用丰富数据和关于世界的抽象知识进行查询、学习和推理的神经符号框架。LTN 引入了一个完全可微的逻辑语言,称为真实逻辑,其中一阶逻辑签名的元素通过神经计算图和一阶模糊逻辑语义被接地到数据上。我们展示了 LTN 提供了一个统一的语言,可以有效表示和计算许多最重要的人工智能任务,如多标签分类、关系学习、数据聚类、半监督学习、回归、嵌入学习和查询回答。我们使用 TensorFlow 2 实现并用几个简单的解释性示例说明了上述每个任务。结果表明,LTN 可以成为神经符号人工智能的通用且强大的框架。
Keywords: Neurosymbolic AI, Deep Learning and Reasoning, Many-valued Logics.
关键词:神经符号人工智能,深度学习与推理,多值逻辑。

1. Introduction 1. 引言

Artificial Intelligence (AI) agents are required to learn from their surroundings and reason about what has been learned to make decisions, act in the world, or react to various stimuli. The latest Machine Learning (ML) has adopted mostly a pure sub-symbolic learning approach. Using distributed representations of entities, the latest ML performs quick decision-making without building a comprehensible model of the world. While achieving impressive results in computer vision, natural language, game playing, and multimodal learning, such approaches are known to be data inefficient and to struggle at out-of-distribution generalization. Although the use of appropriate inductive biases can alleviate such shortcomings, in general, sub-symbolic models lack comprehensibility. By contrast, symbolic AI is based on rich, high-level representations of the world that use human-readable symbols. By rich knowledge, we refer to logical representations which are more expressive than propositional logic or propositional probabilistic approaches, and which can express knowledge using full first-order logic, including universal and existential quantification (xandy) ,arbitrary n -ary relations over variables,e.g. R(x,y,z,) ,and function symbols,e.g. father Of(x),x+y ,etc. Symbolic AI has achieved success at theorem proving,logical inference,and verification. However, it also has shortcomings when dealing with incomplete knowledge. It can be inefficient with large amounts of inaccurate data and lack robustness to outliers. Purely symbolic decision algorithms usually have high computational complexity making them impractical for the real world. It is now clear that the predominant approach to ML, where learning is based on recognizing the latent structures hidden in the data, is insufficient and may benefit from symbolic AI [17]. In this context, neurosymbolic AI, which stems from neural networks and symbolic AI, attempts to combine the strength of both paradigms (see [16, 40, 54] for recent surveys). That is to say, combine reasoning with complex representations of knowledge (knowledge-bases, semantic networks, ontologies, trees, and graphs) with learning from complex data (images, time series, sensorimotor data, natural language). Consequently, a main challenge for neurosymbolic AI is the grounding of symbols, including constants, functional and relational symbols, into real data, which is akin to the longstanding symbol grounding problem [30].
人工智能(AI)代理需要从周围环境中学习,并推理所学内容以做出决策,在世界中行动,或对各种刺激做出反应。最新的机器学习(ML)主要采用了一种纯次符号学习方法。通过使用实体的分布式表示,最新的 ML 能够快速做出决策,而无需构建一个可理解的世界模型。虽然在计算机视觉、自然语言、游戏玩法和多模态学习方面取得了令人印象深刻的成果,但这些方法被认为在数据效率上存在问题,并且在超出分布范围的泛化上存在困难。尽管适当使用归纳偏差可以缓解这些缺点,但总体而言,次符号模型缺乏可理解性。相比之下,符号 AI 基于对世界的丰富、高级别表示,使用人类可读的符号。 通过丰富的知识,我们指的是比命题逻辑或命题概率方法更具表现力的逻辑表示,可以使用完整的一阶逻辑来表达知识,包括全称量词和存在量词 (xandy) ,对变量的任意 n 元关系,例如 R(x,y,z,) ,和函数符号,例如父亲 Of(x),x+y 等。符号人工智能在定理证明、逻辑推理和验证方面取得了成功。然而,在处理不完整知识时也存在缺点。当处理大量不准确数据时可能效率低下,并且对异常值缺乏鲁棒性。纯符号决策算法通常具有较高的计算复杂性,使其在实际世界中难以实用。现在清楚地看到,基于识别数据中隐藏的潜在结构进行学习的主导 ML 方法是不足的,可能受益于符号人工智能[17]。在这种背景下,神经符号人工智能源自神经网络和符号人工智能,试图结合两种范式的优势(参见[16, 40, 54]最近的调查报告)。 换句话说,将推理与知识的复杂表征(知识库、语义网络、本体论、树和图形)与从复杂数据中学习(图像、时间序列、感觉运动数据、自然语言)相结合。因此,神经符号人工智能的主要挑战之一是将符号(包括常量、功能和关系符号)与真实数据进行基础化,这类似于长期存在的符号基础问题【30】。

*Corresponding author 通讯作者
Email addresses: badreddine. samy@gmail.com (Samy Badreddine), a. garcez@city.ac.uk (Artur d'Avila Garcez),
电子邮件地址:badreddine.samy@gmail.com(Samy Badreddine),a.garcez@city.ac.uk(Artur d'Avila Garcez),
serafini@fbk.eu (Luciano Serafini), michael . spranger@sony . com (Michael Spranger)
serafini@fbk.eu(Luciano Serafini),michael.spranger@sony.com(Michael Spranger)

Logic Tensor Networks (LTN) are a neurosymbolic framework and computational model that supports learning and reasoning about data with rich knowledge. In LTN, one can represent and effectively compute the most important tasks of deep learning with a fully differentiable first-order logic language, called Real Logic, which adopts infinitely many truth-values in the interval [0,1][22,25] . In particular,LTN supports the specification and computation of the following AI tasks uniformly using the same language: data clustering, classification, relational learning, query answering, semi-supervised learning, regression, and embedding learning.
逻辑张量网络(LTN)是一种神经符号框架和计算模型,支持学习和推理丰富知识的数据。在 LTN 中,可以用全可微的一阶逻辑语言 Real Logic 来表示和有效计算深度学习的最重要任务,Real Logic 采用区间 [0,1][22,25] 中无限多个真值。特别是,LTN 支持使用相同语言统一地规范和计算以下人工智能任务:数据聚类、分类、关系学习、查询回答、半监督学习、回归和嵌入学习。
LTN and Real Logic were first introduced in [62]. Since then, LTN has been applied to different AI tasks involving perception, learning, and reasoning about relational knowledge. In [18, 19], LTN was applied to semantic image interpretation whereby relational knowledge about objects was injected into deep networks for object relationship detection. In [6], LTN was evaluated on its capacity to perform reasoning about ontological knowledge. Furthermore, [7] shows how LTN can be used to learn an embedding of concepts into a latent real space by taking into consideration ontological knowledge about such concepts. In [3], LTN is used to annotate a reinforcement learning environment with prior knowledge and incorporate latent information into an agent. In [42], authors embed LTN in a state-of-the-art convolutional object detector. Extensions and generalizations of LTN have also been proposed in the past years, such as LYRICS [47] and Differentiable Fuzzy Logic (DFL) [68,69]. LYRICS provides an input language allowing one to define background knowledge using a first-order logic where predicate and function symbols are grounded onto any computational graph. DFL analyzes how a large collection of fuzzy logic operators behave in a differentiable learning setting. DFL also introduces new semantics for fuzzy logic implications called sigmoidal implications, and it shows that such semantics outperform other semantics in several semi-supervised machine learning tasks.
LTN 和 Real Logic 首次在[62]中被引入。从那时起,LTN 已被应用于涉及感知、学习和关于关系知识的推理的不同人工智能任务中。在[18, 19]中,LTN 被应用于语义图像解释,其中关于对象的关系知识被注入到深度网络中用于对象关系检测。在[6]中,LTN 被评估其执行关于本体知识的推理的能力。此外,[7]展示了 LTN 如何可以用于通过考虑关于这些概念的本体知识来学习概念的嵌入到潜在实空间中。在[3]中,LTN 被用于使用先验知识注释强化学习环境,并将潜在信息合并到代理中。在[42]中,作者将 LTN 嵌入到最先进的卷积目标检测器中。过去几年中还提出了 LTN 的扩展和泛化,如 LYRICS [47]和可微模糊逻辑(DFL)[68,69]。LYRICS 提供了一种输入语言,允许使用一阶逻辑定义背景知识,其中谓词和函数符号被基于任何计算图。 DFL 分析了在可微分学习环境中大量模糊逻辑运算符的行为。DFL 还引入了称为 Sigmoidal 蕴涵的模糊逻辑蕴涵的新语义,并表明这种语义在几个半监督机器学习任务中优于其他语义。
This paper provides a thorough description of the full formalism and several extensions of LTN. We show using an extensive set of explanatory examples, how LTN can be applied to solve many ML tasks with the help of logical knowledge. In particular, the earlier versions of LTN have been extended with: (1) Explicit domain declaration: constants, variables, functions and predicates are now domain typed (e.g. the constants John and Paris can be from the domain of person and city, respectively). The definition of structured domains is also possible (e.g. the domain couple can be defined as the Cartesian product of two domains of persons); (2) Guarded quantifiers: guarded universal and existential quantifiers now allow the user to limit the quantification to the elements that satisfy some Boolean condition,e.g. x:age(x)<10 (playsPiano (x) enfantProdige (x) ) restricts the quantification to the cases where age is lower than 10; (3) Diagonal quantification: Diagonal quantification allows the user to write statements about specific tuples extracted in order from n variables. For example,if the variables capital and country both have k instances such that the i -th instance of capital corresponds to the i -th instance of country,one can write Diag(capital,country) capitalOf(capital,country).
本文对 LTN 的完整形式主义及若干扩展进行了详尽描述。我们通过大量的解释性示例展示了如何利用逻辑知识将 LTN 应用于解决许多机器学习任务。特别是,LTN 的早期版本已经进行了扩展,包括:(1)显式域声明:常量、变量、函数和谓词现在都具有域类型(例如,常量 John 和 Paris 可以分别来自人员和城市领域)。还可以定义结构化域(例如,可以将夫妇领域定义为两个人员领域的笛卡尔积);(2)保护量词:保护全称量词和存在量词现在允许用户将量化限制在满足某些布尔条件的元素上,例如 x:age(x)<10 (playsPiano (x) enfantProdige (x) )将量化限制在年龄低于 10 岁的情况下;(3)对角量化:对角量化允许用户编写关于从 n 变量中按顺序提取的特定元组的语句。 例如,如果变量 capital 和 country 都有 k 个实例,使得 capital 的第 i 个实例对应于 country 的第 i 个实例,则可以写成 Diag(capital,country) capitalOf(capital,country)。
Inspired by the work of [69], this paper also extends the product t-norm configuration of LTN with the generalized mean aggregator, and it introduces solutions to the vanishing or exploding gradient problems. Finally, the paper formally defines a semantic approach to refutation-based reasoning in Real Logic to verify if a statement is a logical consequence of a knowledge base. Example 4.8 proves that this new approach can better capture logical consequences compared to simply querying unknown formulas after learning (as done in [6]).
受[69]的工作启发,本文还通过引入广义均值聚合器扩展了 LTN 的产品 t-范数配置,并提出了解决梯度消失或爆炸问题的解决方案。最后,本文正式定义了一种基于推理的语义方法,用于验证一个陈述是否是知识库的逻辑推论。示例 4.8 证明了这种新方法相对于学习后简单查询未知公式(如[6]中所做的)能更好地捕捉逻辑推论。
The new version of LTN has been implemented in TensorFlow 2 [1]. Both the LTN library and the code for the examples used in this paper are available at https://github.com/ logictensornetworks/logictensornetworks
LTN 的新版本已在 TensorFlow 2 中实施[1]。本文中使用的 LTN 库和示例代码均可在 https://github.com/logictensornetworks/logictensornetworks 上找到。
The remainder of the paper is organized as follows: In Section 2, we define and illustrate Real Logic as a fully-differentiable first-order logic. In Section 3, we specify learning and reasoning in Real Logic and its modeling into deep networks with Logic Tensor Networks (LTN). In Section 4 we illustrate the reach of LTN by investigating a range of learning problems from clustering to embedding learning. In Section 5, we place LTN in the context of the latest related work in neurosymbolic AI. In Section 6 we conclude and discuss directions for future work. The Appendix contains information about the implementation of LTN in TensorFlow 2, experimental set-ups, the different options for the differentiable logic operators, and a study of their relationship with gradient computations.
本文的其余部分组织如下:在第 2 节中,我们定义和说明了 Real Logic 作为一个完全可微的一阶逻辑。在第 3 节中,我们详细说明了 Real Logic 中的学习和推理,以及将其建模为具有逻辑张量网络(LTN)的深度网络。在第 4 节中,我们通过研究从聚类到嵌入学习的一系列学习问题来说明 LTN 的应用范围。在第 5 节中,我们将 LTN 置于神经符号人工智能最新相关工作的背景下。在第 6 节中,我们总结并讨论未来工作的方向。附录包含了有关在 TensorFlow 2 中实现 LTN 的信息,实验设置,不同可微逻辑运算符的不同选项,以及它们与梯度计算的关系研究。

2. Real Logic 2. 真实逻辑

2.1. Syntax 2.1. 语法

Real Logic forms the basis of Logic Tensor Networks. Real Logic is defined on a first-order language L with a signature that contains a set C of constant symbols (objects),a set F of functional symbols,a set P of relational symbols (predicates),and a set X of variable symbols. L -formulas allow us to specify relational knowledge with variables,e.g. the atomic formula is_friend (v1,v2) may state that the person v1 is a friend of the person v2 ,the formula xy(is_friend(x,y)is_friend(y,x)) states that the relation is_friend is symmetric,and the formula x(y(Italian(x)is_friend(x,y))) states that every person has a friend that is Italian. Since we are interested in learning and reasoning in real-world scenarios where degrees of truth are often fuzzy and exceptions are present, formulas can be partially true, and therefore we adopt fuzzy semantics.
真实逻辑构成了逻辑张量网络的基础。真实逻辑是在一个包含一组常量符号(对象)的签名的一阶语言 L 上定义的,一个功能符号的集合 F ,一个关系符号的集合 P ,和一个变量符号的集合 XL -公式允许我们使用变量来指定关系知识,例如,原子公式 is_friend (v1,v2) 可能陈述人 v1 是人 v2 的朋友,公式 xy(is_friend(x,y)is_friend(y,x)) 表明关系 is_friend 是对称的,公式 x(y(Italian(x)is_friend(x,y))) 表明每个人都有一个意大利朋友。由于我们对学习和推理感兴趣的是真实世界场景,其中真实度常常模糊不清且存在异常,公式可以部分成立,因此我们采用模糊语义。
Objects can be of different types. Similarly, functions and predicates are typed. Therefore, we assume there exists a non-empty set of symbols D called domain symbols. To assign types to the elements of L we introduce the functions D,Din  and Dout  such that:
对象可以是不同类型的。同样,函数和谓词也是有类型的。因此,我们假设存在一个非空符号集合 D ,称为域符号。为了给 L 的元素分配类型,我们引入函数 D,Din Dout  ,使得:
  • D:XCD . Intuitively, D(x) and D(c) returns the domain of a variable x or a constant c .
    D:XCD . 直观地, D(x)D(c) 返回变量 x 或常数 c 的定义域。
  • Din:FPD ,where D is the Kleene star of D ,that is the set of all finite sequences of symbols in D . Intuitively, Din(f) and Din(p) returns the domains of the arguments of a
    Din:FPD ,其中 DD 的 Kleene 星号,即 D 中所有有限符号序列的集合。直观地说, Din(f)Din(p) 返回参数的定义域。
function f or a predicate p . If f takes two arguments (for example, f(x,y) ), Din(f) returns two domains, one per argument.
函数 f 或谓词 p 。如果 f 接受两个参数(例如 f(x,y) ), Din(f) 返回两个域,每个参数一个。
  • Dout :FD . Intuitively, Dout (f) returns the range of a function symbol.
    Dout :FD . 直观地, Dout (f) 返回函数符号的范围。
Real Logic may also contain propositional variables,as follows: if P is a 0-ary predicate with Din(P)= (the empty sequence of domains) then P is a propositional variable (an atom with truth-value in the interval [0,1] ).
真实逻辑也可能包含命题变量,如下所示:如果 P 是一个 0 元谓词,带有 Din(P)= (空域序列),那么 P 是一个命题变量(一个具有真值在区间 [0,1] 内的原子)。
A term is constructed recursively in the usual way from constant symbols, variables, and function symbols. An expression formed by applying a predicate symbol to an appropriate number of terms with appropriate domains is called an atomic formula, which evaluates to true or false in classical logic and a number in [0,1] in the case of Real Logic. We define the set of terms of the language as follows:
术语通常是从常量符号、变量和函数符号递归地构造而成的。通过将谓词符号应用于适当数量的具有适当域的术语而形成的表达式称为原子公式,在经典逻辑中评估为真或假,在实数逻辑中评估为 [0,1] 中的一个数字。我们如下定义语言的术语集合:
  • each element t of XC is a term of the domain D(t) ;
    每个元素 tXC 中都是定义域 D(t) 的一个项;
  • if ti is a term of domain D(ti) for 1in then t1t2tn (the sequence composed of t1 followed by t2 and so on,up to tn ) is a term of the domain D(t1)D(t2)D(tn) ;
    如果 ti 是域 D(ti) 的一个术语,用于 1in ,那么 t1t2tn (由 t1 后跟 t2 等组成的序列,直至 tn )是域 D(t1)D(t2)D(tn) 的一个术语;
  • if t is a term of the domain Din(f) then f(t) is a term of the domain Dout(f) .
    如果 t 是域 Din(f) 的一个术语,则 f(t) 是域 Dout(f) 的一个术语。
We allow the following set of formula in L :
我们允许在 L 中使用以下一组公式:
  • t1=t2 is an atomic formula for any terms t1 and t2 with D(t1)=D(t2) ;
    t1=t2 是对任意项 t1t2 具有 D(t1)=D(t2) 的原子公式;
  • p(t) is an atomic formula if D(t)=Din(p) ;
    p(t) 是一个原子公式,如果 D(t)=Din(p)
  • If ϕ and ψ are formula and x1,,xn are n distinct variable symbols then ϕ,ϕψ and Qx1xnϕ are formula,where is a unary connective, is a binary connective and Q is a quantifier.
    如果 ϕψ 是公式, x1,,xnn 个不同的变量符号,那么 ϕ,ϕψQx1xnϕ 是公式,其中 是一元连接词, 是二元连接词, Q 是一个量词。
We use {¬} (negation), {,,,} (conjunction,disjunction,implication and biconditional,respectively) and Q{,} (universal and existential,respectively).
我们使用 {¬} (否定)、 {,,,} (合取、析取、蕴含和双条件分别)以及 Q{,} (全称和存在分别)。
Example 1. Let Town denote the domain of towns in the world and People denote the domain of living people. Suppose that L contains the constant symbols Alice,Bob and Charlie of domain People,and Rome and Seoul of domain Town. Let x be a variable of domain People and u be a variable of domain Town. The term x,u (i.e. the sequence x followed by u ) has domain People,Town which denotes the Cartesian product between People and Town (People × Town). Alice,Rome is interpreted as an element of the domain People, Town. Let lives in be a predicate with input domain Din  (lives_in) = People,Town. lives_in(Alice,Rome) is a well-formed expression,whereas lives_in(Bob, Charlie) is not.
示例 1. 让 Town 表示世界上的城镇领域,People 表示生活人口领域。假设 L 包含 People 领域的常量符号 Alice,Bob 和 Charlie,以及 Town 领域的 Rome 和 Seoul。让 x 是 People 领域的变量, u 是 Town 领域的变量。术语 x,u (即序列 x 后跟 u )具有 People,Town 领域,表示 People 和 Town 之间的笛卡尔积(People × Town)。Alice,Rome 被解释为 People,Town 领域的元素。让 lives in 是一个具有输入领域 Din  (lives_in)= People,Town 的谓词。lives_in(Alice,Rome)是一个良构表达式,而 lives_in(Bob,Charlie)则不是。

2.2. Semantics of Real Logic
2.2. 实逻辑的语义

The semantics of Real Logic departs from the standard abstract semantics of First-order Logic (FOL). In Real Logic, domains are interpreted concretely by tensors in the real field T Every object denoted by constants, variables, and terms, is interpreted as a tensor of real values. Functions are interpreted as real functions or tensor operations. Predicates are interpreted as functions or tensor operations projecting onto a value in the interval [0,1] .
实际逻辑的语义与一阶逻辑(FOL)的标准抽象语义有所不同。在实际逻辑中,域通过实数域 T 中的张量具体解释。由常量、变量和项表示的每个对象都被解释为实值张量。函数被解释为实函数或张量运算。谓词被解释为函数或张量运算,投影到区间 [0,1] 中的一个值。

1 In the rest of the paper,we commonly use "tensor" to designate "tensor in the real field".
在本文的其余部分,我们通常使用“张量”来指代“实域中的张量”。

To emphasize the fact that in Real Logic symbols are grounded onto real-valued features, we use the term grounding,denoted by G ,in place of interpretation 2 . Notice that this is different from the common use of the term grounding in logic, which indicates the operation of replacing the variables of a term or formula with constants or terms containing no variables. To avoid confusion, we use the synonym instantiation for this purpose. G associates a tensor of real numbers to any term of L ,and a real number in the interval [0,1] to any formula ϕ of L . Intuitively, G(t) are the numeric features of the objects denoted by t ,and G(ϕ) represents the system’s degree of confidence in the truth of ϕ ; the higher the value,the higher the confidence.
强调在真实逻辑中,符号基于实值特征,我们使用术语“接地”,表示为 G ,代替解释 2 。请注意,这与逻辑中“接地”一词的常见用法不同,后者表示用常量或不含变量的术语或公式替换变量的操作。为避免混淆,我们使用同义词“实例化”来表示这一目的。 G 将一组实数张量关联到 L 的任何术语,并将区间 [0,1] 内的实数与 L 的任何公式 ϕ 关联。直觉上, G(t) 是由 t 表示的对象的数值特征, G(ϕ) 代表系统对 ϕ 真实性的信心程度;数值越高,信心越高。

2.2.1. Grounding domains and the signature
2.2.1. 接地域和签名

A grounding for a logical language L on the set of domains D provides the interpretation of both the domain symbols in D and the non-logical symbols in L .
在域集合 D 上为逻辑语言 L 提供了基础,解释了 D 中的域符号和 L 中的非逻辑符号。
Definition 1. A grounding G associates to each domain DD a set G(D)n1ndNRn1××nd .
定义 1. 一个接地 G 将每个域 DD 关联到一个集合 G(D)n1ndNRn1××nd
For every D1DnD,G(D1Dn)=×i=1nG(Di) ,that is G(D1)×G(D2)××G(Dn) .
对于每个 D1DnD,G(D1Dn)=×i=1nG(Di) ,即 G(D1)×G(D2)××G(Dn)
Notice that the elements in G(D) may be tensors of any rank d and any dimensions n1××nd , as N denotes the Kleene star of N 3
请注意, G(D) 中的元素可能是任意秩 d 和任意维度 n1××nd 的张量,如 N 表示 N 的 Kleene 星号
Example 2. Let digit_images denote a domain of images of handwritten digits. If we use images of 256×256 RGB pixels,then G (digit_images) R256×256×3 . Let us consider the predicate is_digit (Z,8) . The terms Z,8 have domains digit_images,digits. Any input to the predicate is a tuple in G (digit_images,digits) =G (digit_images) ×G (digits).
示例 2. 让 digit_images 表示手写数字图像的域。如果我们使用 256×256 RGB 像素的图像,则 G (digit_images) R256×256×3 。让我们考虑谓词 is_digit (Z,8) 。术语 Z,8 具有 digit_images、digits 的域。谓词的任何输入都是 G (digit_images,digits) =G 中的元组(digit_images) ×G (digits)。
A grounding assigns to each constant symbol c ,a tensor G(c) in the domain G(D(c)) ; It assigns to a variable x a finite sequence of tensors d1dk ,each in G(D(x)) . These tensors represent the instances of x . Differently from in FOL where a variable is assigned to a single value of the domain of interpretations at a time, in Real Logic a variable is assigned to a sequence of values in its domain,the k examples of x . A grounding assigns to a function symbol f a function taking tensors from G(Din(f)) as input,and producing a tensor in G(Dout(f)) as output. Finally,a grounding assigns to a predicate symbol p a function taking tensors from G(Din (p)) as input,and producing a truth-value in the interval [0,1] as output.
一个基础为每个常量符号 c 分配一个张量 G(c) 在域 G(D(c)) 中;它为变量 x 分配一个有限序列的张量 d1dk ,每个在 G(D(x)) 中。这些张量代表 x 的实例。与 FOL 中变量一次只分配给解释域中的单个值不同,在实逻辑中,变量被分配给其域中值的序列,即 k 的实例。一个基础为函数符号 f 分配一个从 G(Din(f)) 中取张量为输入,并在 G(Dout(f)) 中产生张量为输出的函数。最后,一个基础为谓词符号 p 分配一个从 G(Din (p)) 中取张量为输入,并在区间 [0,1] 中产生真值为输出。
Definition 2. A grounding G of L is a function defined on the signature of L that satisfies the following conditions:
定义 2. L 的一个基础 G 是在 L 的签名上定义的函数,满足以下条件:
  1. G(x)=d1dk×i=1kG(D(x)) for every variable symbol xX ,with kN0+ . Notice that G(x) is a sequence and not a set,meaning that the same value of G(D(x)) can occur multiple times in G(x) ,as is usual in a Machine Learning data set with "attributes" and "values";
    对于每个变量符号 xX ,具有 kN0+ 。请注意 G(x) 是一个序列而不是一个集合,这意味着 G(D(x)) 的相同值可以在 G(x) 中多次出现,就像在具有“属性”和“值”的机器学习数据集中一样常见;

2 An interpretation is an assignment of truth-values true or false , or in the case of Real Logic a value in [0,1], to a formula. A model is an interpretation that maps a formula to true
解释是将真值分配给公式的一种方式,或者在实际逻辑中是[0,1]范围内的值。模型是将公式映射为真的解释。
3 A tensor of rank 0 corresponds to a scalar,a tensor of rank 1 to a vector,a tensor of rank 2 to a matrix and so forth,in the usual way.
张量的秩为 0 对应于标量,秩为 1 的张量对应于向量,秩为 2 的张量对应于矩阵,依此类推,通常是这样。

  1. G(f)G(Din (f))G(Dout (f)) for every function symbol fF ;
    对于每个函数符号 fF
  1. G(p)G(Din(p))[0,1] for every predicate symbol pP .
    对于每个谓词符号 pP
If a grounding depends on a set of parameters θ , we denote it as Gθ() or G(θ) interchangeably. Section 4 describes how such parameters can be learned using the concept of satisfiability.
如果接地依赖于一组参数 θ ,我们将其表示为 Gθ()G(θ) ,两者可以互换使用。第 4 节描述了如何利用可满足性概念来学习这些参数。

2.2.2. Grounding terms and atomic formulas
2.2.2. 接地术语和原子公式

We now extend the definition of grounding to all first-order terms and atomic formulas. Before formally defining these groundings, we describe on a high level what happens when grounding terms that contain free variables. 4
我们现在将接地的定义扩展到所有一阶项和原子公式。在正式定义这些接地之前,我们从高层次描述接地包含自由变量的项时会发生什么。
Let x be a variable that denotes people. As explained in Definition 2, x is grounded as an explicit sequence of k instances (k=|G(x)|) . Consequently,a term height (x) is also grounded in k height values, each corresponding to one instance. We can generalize to expressions with multiple free variables, as shown in Example 3.
x 表示人的变量。如定义 2 所述, x 被作为 k 实例 (k=|G(x)|) 的显式序列进行基础化。因此,术语高度 (x) 也是基于 k 个高度值进行基础化,每个值对应一个实例。我们可以推广到具有多个自由变量的表达式,如示例 3 所示。
In the formal definition below, instead of considering a single term at a time, it is convenient to consider sequences of terms t=t1t2tk and define the grounding on t (with the definition of the grounding of a single term being derived as a special case). The fact that the sequence of terms t contains n distinct variables x1,,xn is denoted by t(x1,,xn) . The grounding of t(x1,,xn) ,denoted by G(t(x1,,xn)) ,is a tensor with n corresponding axes,one for each free variable, defined as follows:
在下面的正式定义中,与其一次考虑一个术语,不如考虑术语序列 t=t1t2tk 并定义 t 上的接地(将单个术语的接地定义作为特例推导出来)。术语序列 t 包含 n 个不同变量 x1,,xn 的事实用 t(x1,,xn) 表示。 t(x1,,xn) 的接地,用 G(t(x1,,xn)) 表示,是一个张量,具有 n 个对应的轴,每个自由变量对应一个轴,定义如下:
Definition 3. Let t(x1,,xn) be a sequence t1tm of m terms containing n distinct variables x1,,xn . Let each term ti in t contain ni variables xji1,,xjini .
定义 3. 设 t(x1,,xn) 是一个包含 n 个不同变量 x1,,xnm 项的序列 t1tm 。让 t 中的每个项 ti 包含 ni 个变量 xji1,,xjini
  • G(t) is a tensor with dimensions (|G(x1)|,,|G(xn)|) such that the element of this tensor indexed by k1,,kn ,written as G(t)k1kn ,is equal to the concatenation of G(ti)kji1kjini for 1im ;
    G(t) 是一个具有 (|G(x1)|,,|G(xn)|) 维度的张量,使得该张量的元素由 k1,,kn 索引,记为 G(t)k1kn ,等于 G(ti)kji1kjini 的连接,对于 1im
  • G(f(t))i1in=G(f)(G(t)i1in) ,i.e. the element-wise application of G(f) to G(t) ;
    G(f(t))i1in=G(f)(G(t)i1in) ,即对 G(t) 应用 G(f) 的逐元素操作;
  • G(p(t))i1in=G(p)(G(t)i1in) ,i.e. the element-wise application of G(p) to G(t) .
    G(p(t))i1in=G(p)(G(t)i1in) ,即对 G(t) 进行 G(p) 的逐元素应用。
If term ti contains ni variables xj1,,xjni selected from x1,,xn then G(ti)kj1kjni can be obtained from G(t)i1in with an appropriate mapping of indices i to k .
如果术语 ti 包含 ni 个变量,这些变量从 x1,,xn 中选择,则可以通过将索引 i 映射到 k 来从 G(t)i1in 中获得 G(ti)kj1kjni

4 We assume the usual syntactic definition of free and bound variables in FOL. A variable is free if it is not bound by a quantifier (,) .
我们假设一阶逻辑中自由变量和约束变量的通常句法定义。如果变量没有被量词约束,则它是自由的。

Figure 1: Illustration of Example 3 - x and y indicate dimensions associated with the free variables x and y . A tensor representing a term that includes a free variable x will have an axis x . One can index x to obtain results calculated using each of the v1,v2 or v3 values of x . In our graphical convention,the depth of the boxes indicates that the tensor can have feature dimensions (refer to the end of Example 3).
图 1:示例 3 的插图 - xy 表示与自由变量 xy 相关联的维度。表示包含自由变量 x 的项的张量将具有轴 x 。可以索引 x 以获得使用 x 的每个 v1,v2v3 值计算的结果。在我们的图形约定中,方框的深度表示张量可以具有特征维度(参见示例 3 末尾)。
Example 3. Suppose that L contains the variables x and y ,the function f ,the predicate p and the set of domains D={V,W} . Let D(x)=V,D(y)=W,Din(f)=VW,Dout(f)=W and D(p)=VW . In what follows,an example of the grounding of L and D is shown on the left,and the grounding of some examples of possible terms and atomic formulas is shown on the right.
示例 3。假设 L 包含变量 xy ,函数 f ,谓词 p 和域集 D={V,W} 。设 D(x)=V,D(y)=W,Din(f)=VW,Dout(f)=WD(p)=VW 。接下来,左侧展示了 LD 的实例化示例,右侧展示了一些可能项和原子公式的实例化。
G(V)=R+
G(W)=R
G(x)=v1,v2,v3
G(y)=w1,w2
G(p):x,yσ(x+y)
G(f):x,yxy
Notice the dimensions of the results. G(f(x,y)) and G(p(x,f(x,y))) return |G(x)|×|G(y)|=3×2 values, one for each combination of individuals that occur in the variables. For functions, we can have additional dimensions associated to the output domain. Let us suppose a different grounding such that G(Dout (f))=Rm . Then the dimensions of G(f(x,y)) would have been |G(x)|×|G(y)|×m ,where |G(x)|×|G(y)| are the dimensions for indexing the free variables and m are dimensions associated to the output domain of f . Let us call the latter feature dimensions, as captioned in Figure 1. Notice that G(p(x,f(x,y))) will always return a tensor with the exact dimensions |G(x)|×|G(y)|×1 because,under any grounding,a predicate always returns a value in [0,1] . Therefore,as the "feature dimensions" of predicates is always 1,we choose to "squeeze it" and not to represent it in our graphical convention (see Figure 1, the box output by the predicate has no depth).
请注意结果的维度。 G(f(x,y))G(p(x,f(x,y))) 返回 |G(x)|×|G(y)|=3×2 个值,每个值对应变量中出现的个体组合。对于函数,我们可以有额外的维度与输出域相关联。假设有一个不同的基础设定,使得 G(Dout (f))=Rm 。那么 G(f(x,y)) 的维度将会是 |G(x)|×|G(y)|×m ,其中 |G(x)|×|G(y)| 是用于索引自由变量的维度, m 是与 f 的输出域相关联的维度。让我们称后者为特征维度,如图 1 所示。请注意, G(p(x,f(x,y))) 将始终返回一个具有确切维度 |G(x)|×|G(y)|×1 的张量,因为在任何基础设定下,谓词始终返回 [0,1] 中的一个值。因此,由于谓词的“特征维度”始终为 1,我们选择“挤压”它,而不在我们的图示约定中表示它(参见图 1,由谓词输出的方框没有深度)。
Figure 2: Illustration of an element-wise operator implementing conjunction (p(x)q(y)) . We assume that x and y are two different variables. The result has one number in the interval [0,1] to every combination of individuals from G(x) and G(y) .
图 2:实现合取的逐元素运算符示例 (p(x)q(y)) 。我们假设 xy 是两个不同的变量。结果在区间 [0,1] 中有一个数字,对应于 G(x)G(y) 中个体的每种组合。

2.2.3. Connectives and Quantifiers
2.2.3. 连接词和量词

The semantics of the connectives is defined according to the semantics of first-order fuzzy logic [28]. Conjunction () ,disjunction () ,implication () and negation () are associated, respectively,with a t-norm (T) ,a t-conorm (S) ,a fuzzy implication (I) and a fuzzy negation (N) operation FuzzyOp {T,S,I,N} . Definitions of some common fuzzy operators are presented in Appendix B Let ϕ and ψ be two formulas with free variables x1,,xm and y1,,yn ,respectively. Let us assume that the first k variables are common to ϕ and ψ . Recall that and denote the set of unary and binary connectives, respectively. Formally:
逻辑联结词的语义是根据一阶模糊逻辑的语义定义的[28]。分别与 t-范数 (T) 、t-余范数 (S) 、模糊蕴涵 (I) 和模糊否定 (N) 操作 FuzzyOp {T,S,I,N} 相关联的是合取 () 、析取 () 、蕴涵 () 和否定 () 。附录 B 中给出了一些常见模糊运算符的定义。设 ϕψ 是具有自由变量 x1,,xmy1,,yn 的两个公式,分别。让我们假设前 k 个变量是 ϕψ 共有的。回想一下, 分别表示一元和二元联结词的集合。形式上:
(1)G(ϕ)i1,,im=FuzzyOp()(G(ϕ)i1,,im)
(2)G(ϕψ)i1,,im+nk=FuzzyOp()(G(ϕ)i1,,ik,ik+1,,imG(ψ)i1,,ik,im+1,,im+nk)
In (2), (i1,,ik) denote the indices of the k common variables, (ik+1,,im) denote the indices of the mk variables appearing only in ϕ ,and (im+1,,im+nk) denote the indices of the nk variables appearing only in ψ . Intuitively, G(ϕψ) is a tensor whose elements are obtained by applying FuzzyOp() element-wise to every combination of individuals from x1,,xm and y1,,yn (see Figure 2).
在(2)中, (i1,,ik) 表示 k 个共同变量的索引, (ik+1,,im) 表示仅出现在 ϕ 中的 mk 个变量的索引, (im+1,,im+nk) 表示仅出现在 ψ 中的 nk 个变量的索引。直觉上, G(ϕψ) 是一个张量,其元素是通过将 FuzzyOp() 逐个应用于 x1,,xmy1,,yn 中个体的每种组合而获得的(见图 2)。
The semantics of the quantifiers ({,}) is defined with the use of aggregation. Let Agg be a symmetric and continuous aggregation operator, Agg:N[0,1]n[0,1] . An analysis of suitable
量词 ({,}) 的语义是通过聚合来定义的。让 Agg 成为一个对称且连续的聚合算子, Agg:N[0,1]n[0,1] 。对适当的进行分析
aggregation operators is presented in Appendix Appendix B. For every formula ϕ containing x1,,xn free variables,suppose,without loss of generality,that quantification applies to the first h variables. We shall therefore apply Agg to the first h axes of G(ϕ) ,as follows:
聚合算子在附录附录 B 中给出。对于每个包含 x1,,xn 个自由变量的公式 ϕ ,假设不失一般性地,量化应用于前 h 个变量。因此,我们将对 G(ϕ) 的前 h 个轴应用 Agg,如下所示:
(3)G(Qx1,,xh(ϕ))ih+1,,in=Agg(Q)G(ϕ)i1,,ih,ih+1,,in
ih=1,,|G(xh)|
where Agg(Q) is the aggregation operator associated with the quantifier Q . Intuitively,we obtain G(Qx1,,xh(ϕ)) by reducing the dimensions associated with x1,,xh using the operator Agg(Q) (see Figure 3).
其中 Agg(Q) 是与量词 Q 相关联的聚合算子。直观地,我们通过使用算子 Agg(Q) (见图 3)来减少与 x1,,xh 相关联的维度,得到 G(Qx1,,xh(ϕ))
Notice that the above grounded semantics can assign different meanings to the three formulas:
请注意,上述基于基础的语义学可以为这三个公式赋予不同的含义:
xy(ϕ(x,y))x(y(ϕ(x,y)))y(x(ϕ(x,y)))
Figure 3: Illustration of an aggregation operation implementing quantification (yx) over variables x and y . We assume that x and y have different domains. The result is a single number in the interval [0,1] .
图 3:实现对变量 xy 进行量化 (yx) 的聚合操作示例。我们假设 xy 具有不同的域。结果是一个在区间 [0,1] 内的单个数字。
The semantics of the three formulas will coincide if the aggregation operator is bi-symmetric. LTN also allows the following form of quantification, here called diagonal quantification (Diag):
三个公式的语义将重合,如果聚合算子是双对称的。LTN 还允许以下形式的量化,这里称为对角量化(Diag):
(4)G(QDiag(x1,,xh)(ϕ))ih+1,,in=Agg(Q)i=1,,min1jh|G(xj)|G(ϕ)i,,i,ih+1,,in
Diag(x1,,xh) quantifies over specific tuples such that the i -th tuple contains the i -th instance of each of the variables in the argument of Diag, under the assumption that all variables in the argument are grounded onto sequences with the same number of instances. Diag(x1,,xh) is called diagonal quantification because it quantifies over the diagonal of G(ϕ) along the axes associated with x1xh ,although in practice only the diagonal is built and not the entire G(ϕ) ,as shown in Figure 4 For example,given a data set with samples x and target labels y ,if looking to write a statement p(x,y) that holds true for each pair of sample and label,one can write Diag(x,y)p(x,y) given that |G(x)|=|G(y)| . As another example,given two variables x and y whose groundings contain 10 instances of x and y each, the expression Diag(x,y)p(x,y) produces 10 results such that the i -th result corresponds to the i -th instances of each grounding. Without Diag,the expression would be evaluated for all 10×10 combinations of the elements in G(x) and G(y) 5 Diag will find much application in the examples and experiments to follow.
Diag(x1,,xh) 量化特定元组,使第 i 个元组包含 Diag 参数中每个变量的第 i 个实例,假设参数中的所有变量都被基于具有相同实例数的序列。 Diag(x1,,xh) 被称为对角量化,因为它在与 x1xh 相关的轴上量化 G(ϕ) 的对角线,尽管在实践中只构建对角线而不是整个 G(ϕ) ,如图 4 所示。例如,给定一个包含样本 x 和目标标签 y 的数据集,如果要编写一个对每对样本和标签都成立的语句 p(x,y) ,可以写成 Diag(x,y)p(x,y) ,假设 |G(x)|=|G(y)| 。另一个例子,给定两个变量 xy ,它们的基础包含每个 xy 的 10 个实例,表达式 Diag(x,y)p(x,y) 产生 10 个结果,使得第 i 个结果对应于每个基础的第 i 个实例。没有 Diag,该表达式将对 G(x)G(y) 中元素的所有 10×10 种组合进行评估。Diag 在接下来的示例和实验中将有很多应用。

2.3. Guarded Quantifiers 2.3. 保护量词

In many situations, one may wish to quantify over a set of elements of a domain whose grounding satisfy some condition. In particular, one may wish to express such condition using formulas of the language of the form:
在许多情况下,一个人可能希望对满足某些条件的域的一组元素进行量化。特别是,一个人可能希望使用语言形式的公式来表达这种条件:
(5)y(x:age(x)>age(y)(parent(x,y)))
The grounding of such a formula is obtained by aggregating the values of parent (x,y) only for the instances of x that satisfy the condition age(x)>age(y) ,that is:
这种公式的基础是通过聚合仅对满足条件 age(x)>age(y)x 实例的父 (x,y) 的值获得的
Agg()Agg()G(parent(x,y))i,jj=1,,|G(y)|G(age(x))i>G(age(y))ji=1,,|G(x)| s.t. 

5 Notice how Diag is not simply "syntactic sugar" for creating a new variable pairs_xy by stacking pairs of examples from G(x) and G(y) . If the groundings of x and y have incompatible ranks (for instance,if x denotes images and y denotes their labels),stacking them in a tensor G (pairs_xy) is non-trivial,requiring several reshaping operations.
注意 Diag 并不仅仅是通过堆叠来自 G(x)G(y) 的示例对以创建新变量 pairs_xy 的“语法糖”。如果 xy 的基础排名不兼容(例如,如果 x 表示图像, y 表示它们的标签),将它们堆叠在张量 G (pairs_xy)中是不平凡的,需要进行多次重塑操作。

Figure 4: Diagonal Quantification: Diag (x1,x2) quantifies over specific tuples only,such that the i -th tuple contains the i -th instances of the variables x1 and x2 in the groundings G(x1) and G(x2) ,respectively. Diag (x1,x2) assumes,therefore, that x1 and x2 have the same number of instances as in the case of samples x1 and their labels x2 in a typical supervised learning tasks.
图 4:对角量化:Diag (x1,x2) 仅对特定元组进行量化,使得第 i 个元组包含地面化中变量 x1x2 的第 i 个实例。因此,Diag (x1,x2) 假定 x1x2 与典型监督学习任务中样本 x1 及其标签 x2 的实例数量相同。
The evaluation of which tuple is safe is purely symbolic and non-differentiable. Guarded quantifiers operate over only a subset of the variables, when this symbolic knowledge is crisp and available. More generally,in what follows, m is a symbol representing the condition,which we shall call a mask,and G(m) associates a function 6 returning a Boolean to m .
元组的安全性评估是纯粹符号化和不可微分的。当这种符号知识清晰且可用时,保护量词仅作用于变量的子集。更一般地,在接下来的内容中, m 是代表条件的符号,我们将其称为掩码, G(m) 将一个返回布尔值的函数 6 与 m 关联。
(6)G(Qx1,,xh:m(x1,,xn)(ϕ))ih+1,,in= def Agg(Q)i1=1,,|G(x1)|G(ϕ)i1,,ih,ih+1,,in
ih=1,,|G˙(xh)| s.t. G(m)(G(x1)i1,,G(xn)in)
Notice that the semantics of a guarded sentence x:m(x)(ϕ(x)) is different than the semantics of x(m(x)ϕ(x)) . In crisp and traditional FOL,the two statements would be equivalent. In Real Logic,they can give different results. Let G(x) be a sequence of 3 values, G(m(x))=(0,1,1) and G(ϕ(x))=(0.2,0.7,0.8) . Only the second and third instances of x are safe,that is,are in the masked subset. Let be defined using the Reichenbach operator IR(a,b)=1a+ab and be defined using the mean operator. We have G(x(m(x)ϕ(x)))=1+0.7+0.83=0.833 whereas G(x:m(x)(ϕ(x)))=0.7+0.82=0.75 . Also,in the computational graph of the guarded sentence, there are no gradients attached to the instances that do not verify the mask. Similarly, the semantics of x:m(x)(ϕ(x)) is not equivalent to that of x(m(x)ϕ(x)) .
注意到一个受保护句子 x:m(x)(ϕ(x)) 的语义与 x(m(x)ϕ(x)) 的语义不同。在清晰和传统的 FOL 中,这两个陈述将是等价的。在实际逻辑中,它们可能会产生不同的结果。设 G(x) 是一个包含 3 个值的序列, G(m(x))=(0,1,1)G(ϕ(x))=(0.2,0.7,0.8) 。只有 x 的第二和第三个实例是安全的,即属于掩码子集。设 是使用 Reichenbach 运算符 IR(a,b)=1a+ab 定义的, 是使用均值运算符定义的。我们有 G(x(m(x)ϕ(x)))=1+0.7+0.83=0.833G(x:m(x)(ϕ(x)))=0.7+0.82=0.75 。此外,在受保护句子的计算图中,没有梯度附加到不符合掩码的实例上。同样, x:m(x)(ϕ(x)) 的语义与 x(m(x)ϕ(x)) 的语义不等价。

6 In some edge cases,a masking may produce an empty sequence,e.g. if for some value of G(y) ,there is no value in G(x) that satisfies age (x)>age(y) ,we resort to the concept of an empty semantics: returns 1 and returns 0 .
在一些边缘情况下,掩码可能会产生一个空序列,例如,如果对于某个 G(y) 的值,没有任何 G(x) 中满足年龄 (x)>age(y) 的值,我们将诉诸于空语义的概念: 返回 1, 返回 0。

Figure 5: Example of Guarded Quantification: One can filter out elements of the various domains that do not satisfy some condition before the aggregation operators for and are applied.
图 5:受限量化的示例:在应用聚合运算符 之前,可以过滤掉不满足某些条件的各个域的元素。

2.4. Stable Product Real Logic
2.4. 稳定产品的真实逻辑

It has been shown in [69] that not all first-order fuzzy logic semantics are equally suited for gradient-descent optimization. Many fuzzy logic operators can lead to vanishing or exploding gradients. Some operators are also single-passing, in that they propagate gradients to only one input at a time.
在[69]中已经表明,并非所有的一阶模糊逻辑语义都适用于梯度下降优化。许多模糊逻辑运算符可能导致梯度消失或梯度爆炸。有些运算符也是单向传递的,即一次只将梯度传播到一个输入。
In general, the best performing symmetric configuration 7 for the connectives uses the product t-norm TP for conjunction,its dual t-conorm SP for disjunction,standard negation NS ,and the Reichenbach implication IR (the corresponding S-Implication to the above operators). This subset of Real Logic where the grounding of the connectives is restricted to the product configuration is called Product Real Logic in [69]. Given a and b two truth-values in [0,1] :
通常,对于连接词而言,表现最佳的对称配置 7 使用乘积 t-范数 TP 作为合取运算,其对偶 t-余范数 SP 作为析取运算,标准否定 NS ,以及 Reichenbach 蕴涵 IR (对应于上述运算符的 S-蕴涵)。在这种仅限于乘积配置的连接词基础上的 Real Logic 子集被称为 Product Real Logic [69]。给定 ab[0,1] 中的两个真值:
(7)¬:NS(a)=1a
(8):TP(a,b)=ab
(9):SP(a,b)=a+bab
(10)→:IR(a,b)=1a+ab
Appropriate aggregators for and are the generalized mean ApM with p1 to approximate the existential quantification,and the generalized mean w.r.t. the error ApME with p1 to approximate the universal quantification. They can be understood as a smooth maximum and a smooth minimum,respectively. Given n truth-values a1,,an all in [0,1] :
适当的聚合器对于 是广义均值 ApMp1 来近似存在量化,以及与误差 ApME 相关的广义均值 p1 来近似普遍量化。它们分别可以理解为平滑最大值和平滑最小值。给定 n 个真值 a1,,an 都在 [0,1] 中:
(11):ApM(a1,,an)=(1ni=1naip)1pp1
(12):ApME(a1,,an)=1(1ni=1n(1ai)p)1pp1
ApME measures the power of the deviation of each value from the ground truth 1 . With p=2 , it is equivalent to 1RMSE(a,1) ,where RMSE is the root-mean-square error, a is the vector of truth-values and 1 is a vector of 1s .
ApME 衡量每个值与真实值 1 的偏差的能力。与 p=2 一起,它相当于 1RMSE(a,1) ,其中 RMSE 是均方根误差, a 是真值向量,1 是 1s 的向量。

7 We define a symmetric configuration as a set of fuzzy operators such that conjunction and disjunction are defined by a t-norm and its dual t-conorm, respectively, and the implication operator is derived from such conjunction or disjunction operators and standard negation (c.f. Appendix B for details). In [69], van Krieken et al. also analyze non-symmetric configurations and even operators that do not strictly verify fuzzy logic semantics.
我们将对称配置定义为一组模糊运算符的集合,其中合取和析取分别由 t-范数及其对偶 t-余范数定义,并且蕴涵运算符是从这样的合取或析取运算符以及标准否定中推导出来的(详见附录 B)。在[69]中,van Krieken 等人还分析了非对称配置,甚至是严格验证模糊逻辑语义的运算符。

The intuition behind the choice of p is that the higher that p is,the more weight that ApM (resp. ApME) will give to true (resp. false) truth-values,converging to the max (resp. min) operator. Therefore,the value of p can be seen as a hyper-parameter as it offers flexibility to account for outliers in the data depending on the application.
选择 p 的直觉是, p 越高, ApM (或 ApME) )对真(或假)真值的赋予权重就越大,收敛到最大(或最小)运算符。因此, p 的值可以被视为一个超参数,因为它提供了灵活性,可以根据应用程序考虑数据中的异常值。
Nevertheless,Product Real Logic still has the following gradient problems: TP(a,b) has vanishing gradients on the edge case a=b=0;SP(a,b) has vanishing gradients on the edge case a=b=1 ; IR(a,b) has vanishing gradients on the edge case a=0,b=1;ApM(a1,,an) has exploding gradients when i(ai)p tends to 0;ApME(a1,,an) has exploding gradients when i(1ai)p tends to 0 (see Appendix C for details).
然而,Product Real Logic 仍然存在以下梯度问题: TP(a,b) 在边缘情况下梯度消失 a=b=0;SP(a,b) 在边缘情况下梯度消失 a=b=1IR(a,b) 在边缘情况下梯度消失 a=0,b=1;ApM(a1,,an)i(ai)p 趋近于 0;ApME(a1,,an) 时梯度爆炸 i(1ai)pi(1ai)p 趋近于 0 时梯度爆炸(详见附录 C)。
To address these problems,we define the projections π0 and π1 below with ϵ an arbitrarily small positive real number:
为了解决这些问题,我们定义如下的投影 π0π1 ,其中 ϵ 是任意小的正实数:
(13)π0:[0,1]]0,1]:a(1ϵ)a+ϵ
(14)π1:[0,1][0,1[:a(1ϵ)a
We then derive the following stable operators to produce what we call the Stable Product Real Logic configuration:
我们然后推导出以下稳定算子,以产生我们所称的稳定产品实逻辑配置:
(15)NS(a)=NS(a)
(16)TP(a,b)=TP(π0(a),π0(b))
(17)SP(a,b)=SP(π1(a),π1(b))
(18)IR(a,b)=IR(π0(a),π1(b))
(19)ApM(a1,,an)=ApM(π0(a1),,π0(an))p1
(20)ApME(a1,,an)=ApME(π1(a1),,π1(an))p1
It is important noting that the conjunction operator in stable product semantics is not a T-norm 8 TP(a,b) does not satisfy identity in [0,1[ since for any 0a<1,TP(a,1)=(1ϵ)a+ϵa , although ϵ can be chosen arbitrarily small. In the experimental evaluations reported in Section 4, we find that the adoption of the stable product semantics is an important practical step to improve the numerical stability of the learning system.
值得注意的是,稳定产品语义中的连接运算符不是 T-范数 TP(a,b) 不满足 [0,1[ 中的恒等性,因为对于任意 0a<1,TP(a,1)=(1ϵ)a+ϵa ,尽管 ϵ 可以选择任意小。在第 4 节报告的实验评估中,我们发现采用稳定产品语义是改善学习系统数值稳定性的重要实际步骤。

3. Learning, Reasoning, and Querying in Real Logic
3. 在真实逻辑中学习、推理和查询

In Real Logic, one can define the tasks of learning, reasoning and query-answering. Given a Real Logic theory that represents the knowledge of an agent at a given time, learning is the task of making generalizations from specific observations obtained from data. This is often called inductive inference. Reasoning is the task of deriving what knowledge follows from the facts which are currently known. Query answering is the task of evaluating the truth value of a certain logical expression (called a query), or finding the set of objects in the data that evaluate a certain expression to true. In what follows, we define and exemplify each of these tasks. To do so, we first need to specify which types of knowledge can be represented in Real Logic.
在实际逻辑中,可以定义学习、推理和查询回答的任务。给定一个代理在特定时间的知识表示的实际逻辑理论,学习是从数据中获得的具体观察中进行概括的任务。这通常被称为归纳推理。推理是从当前已知的事实中推导出什么知识的任务。查询回答是评估某个逻辑表达式(称为查询)的真值,或找到使某个表达式评估为真的数据对象集合的任务。接下来,我们定义并举例说明这些任务。为此,我们首先需要指定可以在实际逻辑中表示的知识类型。

8 Recall that a T-norm is a function T:[0,1]×[0,1][0,1] satisfying commutativity,monotonicity,associativity and identity,that is, T(a,1)=a .
回想一下,T-范数是一个满足交换性、单调性、结合性和单位元的函数。

3.1. Representing Knowledge with Real Logic
3.1. 用真实逻辑表示知识

In logic-based knowledge representation systems, knowledge is represented by logical formulas whose intended meanings are propositions about a domain of interest. The connection between the symbols occurring in the formulas and what holds in the domain is not represented in the knowledge base and is left implicit since it does not have any effect on the logic computations. In Real Logic, by contrast, the connection between the symbols and the domain is represented explicitly in the language by the grounding G ,which plays an important role in both learning and reasoning. G is an integral part of the knowledge represented by Real Logic. A Real Logic knowledge base is therefore defined by the formulas of the logical language and knowledge about the domain in the form of groundings obtained from data. The following types of knowledge can be represented in Real Logic.
在基于逻辑的知识表示系统中,知识通过逻辑公式来表示,其意图是关于感兴趣领域的命题。公式中出现的符号与领域中的情况之间的联系在知识库中没有被表示,因为它对逻辑计算没有任何影响,被留下隐含的。相比之下,在真实逻辑中,符号与领域之间的联系在语言中通过基础 G 明确表示,这在学习和推理中起着重要作用。 G 是真实逻辑所表示的知识的一个组成部分。因此,真实逻辑知识库由逻辑语言的公式和从数据中获得的基础形式的领域知识定义。以下类型的知识可以在真实逻辑中表示。

3.1.1. Knowledge through symbol groundings
3.1.1. 通过符号基础获得的知识

Boundaries for domain grounding. These are constraints specifying that the value of a certain logical expression must be within a certain range. For instance, one may specify that the domain D must be interpreted in the [0,1] hyper-cube or in the standard n -simplex,i.e. the set d1,,dn(R+)n such that idi=1 . Other intuitive examples of range constraints include the elements of the domain "colour" grounded onto points in [0,1]3 such that every element is associated with the triplet of values (R,G,B) with R,G,B[0,1] ,or the range of a function age(x) as an integer between 0 and 100 .
领域基础的边界。这些是指定某个逻辑表达式的值必须在某个范围内的约束条件。例如,可以指定域 D 必须在 [0,1] 超立方体或标准 n -单纯形中解释,即集合 d1,,dn(R+)n 使得 idi=1 。其他直观的范围约束示例包括将域“颜色”的元素基于点 [0,1]3 ,使得每个元素与值三元组 (R,G,B) 相关联,其中 R,G,B[0,1] ,或将函数 age(x) 的范围定义为介于 0 和 100 之间的整数。
Explicit definition of grounding for symbols. Knowledge can be more strictly incorporated by fixing the grounding of some symbols. If a constant c denotes an object with known features vcRn ,we can fix its grounding G(c)=vc . Training data that consists in a set of n data items such as n images (or tuples known as training examples) can be specified in Real Logic by n constants,e.g. img1,img2,,imgn ,and by their groundings,e.g. G(img1)=Ω,G(img2)=Ω,,G(imgn)=Ω . These can be gathered in a variable imgs. A binary predicate sim that measures the similarity of two objects can be grounded as, e.g., a cosine similarity function of two vectors v and w,(v,w)vwv∥∥w . The output layer of the neural network associated with a multi-class single-label predicate P(x,class) can be a softmax function normalizing the output such that it guarantees exclusive classification, i.e. iP(x,i)=1 ? Grounding of constants and functions allows the computation of the grounding of their results. If,for example, G (transp) is the function that transposes a matrix then G(transp(img1))= .
符号接地的明确定义。通过固定一些符号的接地,知识可以更严格地被纳入。如果一个常量 c 表示具有已知特征 vcRn 的对象,则可以固定其接地 G(c)=vc 。由一组 n 数据项组成的训练数据,例如 n 图像(或称为训练示例的元组),可以通过 n 常量在 Real Logic 中指定,并通过它们的接地,例如 img1,img2,,imgn ,和 G(img1)=Ω,G(img2)=Ω,,G(imgn)=Ω 。这些可以被收集在一个变量 imgs 中。度量两个对象相似性的二元谓词 sim 可以被接地为,例如,两个向量 vw,(v,w)vwv∥∥w 的余弦相似性函数。与多类单标签谓词 P(x,class) 相关联的神经网络的输出层可以是一个归一化输出的 softmax 函数,以确保独占分类,即 iP(x,i)=1 ?常量和函数的接地允许计算它们结果的接地。例如,如果 G (transp)是转置矩阵的函数,则 G(transp(img1))=
Parametric definition of grounding for symbols. Here,the exact grounding of a symbol σ is not known, but it is known that it can be obtained by finding a set of real-valued parameters, that is,via learning. To emphasize this fact,we adopt the notation G(σ)=G(σθσ) where θσ is the set of parameter values that determines the value of G(σ) . The typical example of parametric grounding for constants is the learning of an embedding. Let emb(wordθemb) be a word embedding with parameters θemb which takes as input a word and returns its embedding in Rn . If the words of a vocabulary W={w1,,w|W|} are constant symbols, their groundings G(wiθemb) are defined parametrically w.r.t. θemb as emb(wiθemb) . An example of parametric grounding for a function symbol f is to assume that G(f) is a linear function such that G(f):RmRn maps each vRm into Afv+bf ,with Af a matrix of real numbers and b a vector of real numbers. In this case, G(f)=G(fθf) ,where θf={Af,bf} . Finally, the grounding of a predicate symbol can be given, for example, by a neural network N with parameters θN . As an example,consider a neural network N trained for image classification into n classes: cat,dog,horse,etc. N takes as input a vector v of pixel values and produces as output a vector y=(ycat ,ydog ,yhorse ,) in [0,1]n such that y=N(vθN) , where yc is the probability that input image v is of class c . In case classes are,alternatively, chosen to be represented by unary predicate symbols such as cat(v),dog(v),horse(v), then G(cat(v))=N(vθN)cat ,G(dog(v))=N(vθN)dog ,G(horse(v))=N(vθN)horse  ,etc.
符号的参数化定义。在这里,符号 σ 的确切基础不为人知,但已知可以通过找到一组实值参数来获得,即通过学习。为了强调这一事实,我们采用符号 G(σ)=G(σθσ) ,其中 θσ 是确定 G(σ) 值的参数值集。常数的参数化基础的典型例子是嵌入的学习。设 emb(wordθemb) 是具有参数 θemb 的词嵌入,它以一个词作为输入并返回其在 Rn 中的嵌入。如果词汇表 W={w1,,w|W|} 的词是常数符号,则它们的基础 G(wiθemb) 可以根据 θemb 进行参数化定义,如 emb(wiθemb) 。函数符号的参数化基础的一个例子是假设 f 是一个线性函数,使得 G(f):RmRn 将每个 vRm 映射到 Afv+bf ,其中 Af 是一个实数矩阵, b 是一个实数向量。在这种情况下, G(f)=G(fθf) ,其中 θf={Af,bf} 。最后,谓词符号的基础可以通过具有参数 θN 的神经网络 N 来给出。例如,考虑一个用于图像分类成 n 类(猫、狗、马等)的神经网络 NN 以像素值向量 v 作为输入,并生成在 [0,1]n 中的向量 y=(ycat ,ydog ,yhorse ,) 作为输出,使得 y=N(vθN) ,其中 yc 是输入图像 v 属于类 c 的概率。如果选择用一元谓词符号表示类别,则 cat(v),dog(v),horse(v), ,等等。

9 Notice that softmax is often used as the last layer in neural networks to turn logits into a probability distribution. However, we do not use the softmax function as such here. Instead, we use it here to enforce an exclusivity constraint on satisfiability scores.
请注意,softmax 通常用作神经网络中的最后一层,将 logits 转换为概率分布。但是,在这里我们并不像这样使用 softmax 函数。相反,我们在这里使用它来强制执行对可满足性得分的排他性约束。

3.1.2. Knowledge through formulas
3.1.2. 公式知识

Factual propositions. Knowledge about the properties of specific objects in the domain is represented, as usual, by logical propositions, as exemplified below: Suppose that it is known that img1 is a number eight, img2 is a number nine,and imgn is a number two. This can be represented by adding the following facts to the knowledge-base: nine (img1) ,eight (img2), , two (imgn) . Supervised learning,that is,learning with the use of training examples which include target values (labelled data), is specified in Real Logic by combining grounding definitions and factual propositions. For example,the fact that an image Z is a positive example for the class nine and a negative example for the class eight is specified by defining G(img1)=Z alongside the propositions nine (img1) and ¬eight(img1) . Notice how semi-supervision can be specified naturally in Real Logic by adding propositions containing disjunctions,e.g. eight(img1)nine(img1) ,which state that img1 is either an eight or a nine (or both). Finally, relational learning can be achieved by relating logically multiple objects (defined as constants or variables or even as more complex sequences of terms) such as e.g.: nine(img1)nine(img2) (if img1 is a nine then img2 is not a nine) or nine (img)¬ eight (img) (if an image is a nine then it is not an eight). The use of more complex knowledge including the use of variables such as img above is the topic of generalized propositions, discussed next.
事实命题。关于领域中特定对象的属性的知识通常由逻辑命题表示,如下所示:假设已知 img1 是数字八, img2 是数字九, imgn 是数字二。这可以通过将以下事实添加到知识库中来表示:九 (img1) ,八 (img2), ,二 (imgn) 。监督学习,即使用包括目标值(标记数据)的训练示例进行学习,在 Real Logic 中通过结合基础定义和事实命题来指定。例如,图像 Z 是类别九的正例并且是类别八的负例这一事实是通过定义 G(img1)=Z 以及命题九 (img1)¬eight(img1) 来指定的。请注意,如何在 Real Logic 中通过添加包含析取的命题来自然指定半监督学习,例如 eight(img1)nine(img1) ,这些命题陈述 img1 是八或九(或两者都是)。最后,通过关联逻辑上多个对象(定义为常量或变量,甚至是更复杂的术语序列)可以实现关系学习,例如。 nine(img1)nine(img2) (如果 img1 是九,那么 img2 就不是九)或九 (img)¬(img) (如果一个图像是九,那么它就不是八)。使用更复杂的知识,包括像 img 这样的变量的使用,是下一节讨论的广义命题的主题。
Generalized propositions. General knowledge about all or some of the objects of some domains can be specified in Real Logic by using first-order logic formulas with quantified variables. This general type of knowledge allows one to specify arbitrary constraints on the groundings independently from the specific data available. It allows one to specify, in a concise way, knowledge that holds true for all the objects of a domain. This is especially useful in Machine Learning in the semi-supervised and unsupervised settings, where there is no specific knowledge about a single individual. For example, as part of a task of multi-label classification with constraints on the labels [12], a positive label constraint may express that if an example is labelled with l1,,lk then it should also be labelled with lk+1 . This can be specified in Real Logic with a universally quantified formula: x(l1(x)lk(x)lk+1(x))10 Another example of soft constraints used in Statistical Relational Learning associates the labels of related examples. For instance, in Markov Logic Networks [55], as part of the well-known Smokers and Friends example, people who are smokers are associated by the friendship relation. In Real Logic,the formula xy((smokes(x)friend(x,y))smokes(y)) would be used to encode the soft constraint that friends of smokers are normally smokers.
广义命题。通过使用带有量化变量的一阶逻辑公式,可以在实际逻辑中指定关于某些领域的所有或部分对象的一般知识。这种一般类型的知识允许人们独立于特定数据可用的情况下指定任意约束。它允许以简洁的方式指定适用于领域中所有对象的知识。这在机器学习中的半监督和无监督设置中特别有用,其中没有关于单个个体的特定知识。例如,在具有标签约束的多标签分类任务的一部分中,正标签约束可能表示如果一个示例被标记为 l1,,lk ,那么它也应该被标记为 lk+1 。这可以用一个全称量化的公式在实际逻辑中指定: x(l1(x)lk(x)lk+1(x))10 在统计关系学习中使用的软约束的另一个例子是将相关示例的标签相关联。例如,在马尔可夫逻辑网络中,作为著名的吸烟者和朋友示例的一部分,吸烟者之间通过友谊关系相关联。 在实际逻辑中,公式 xy((smokes(x)friend(x,y))smokes(y)) 将被用来编码这样一个软约束,即吸烟者的朋友通常也是吸烟者。

10 This can also be specified using a guarded quantifier x:((l1(x)lk(x))>th)lk+1(x) where th is a threshold value in [0,1] .
这也可以使用受保护的量词 x:((l1(x)lk(x))>th)lk+1(x) 来指定,其中 th[0,1] 中的阈值。

3.1.3. Knowledge through fuzzy semantics
3.1.3. 通过模糊语义获取知识

Definition for operators. The grounding of a formula ϕ depends on the operators approximating the connectives and quantifiers that appear in ϕ . Different operators give different interpretations of the satisfaction associated with the formula. For instance, the operator ApME(a1,,an) that approximates universal quantification can be understood as a smooth minimum. It depends on a hyper-parameter p (the exponent used in the generalized mean). If p=1 then ApME(a1,,an) corresponds to the arithmetic mean. As p increases,given the same input,the value of the universally quantified formula will decrease as ApME converges to the min operator. To define how strictly the universal quantification should be interpreted in each proposition,one can use different values of p for different propositions of the knowledge base. For instance,a formula xP(x) where ApME is used with a low value for p will in fact denote that P holds for some x ,whereas a formula xQ(x) with a higher p may denote that Q holds for most x .
运算符的定义。公式 ϕ 的基础取决于近似出现在 ϕ 中的连接词和量词的运算符。不同的运算符给出了与公式相关的满足的不同解释。例如,近似普遍量化的运算符 ApME(a1,,an) 可以理解为平滑最小值。它取决于一个超参数 p (广义均值中使用的指数)。如果 p=1 ,那么 ApME(a1,,an) 对应于算术平均值。随着 p 的增加,在给定相同输入的情况下,普遍量化公式的值将随着 ApME 收敛到最小运算符而减少。为了定义每个命题中应该严格解释普遍量化的程度,可以对知识库中不同命题使用不同的 p 值。例如,一个使用 ApME 的公式,其中 p 的值较低,实际上表示 P 对某些 x 成立,而一个具有较高 p 的公式可能表示 Q 对大多数 x 成立。

3.1.4. Satisfiability 3.1.4. 可满足性

In summary, a Real Logic knowledge-base has three components: the first describes knowledge about the grounding of symbols (domains, constants, variables, functions, and predicate symbols); the second is a set of closed logical formulas describing factual propositions and general knowledge; the third lies in the operators and the hyperparameters used to evaluate each formula. The definition that follows formalizes this notion.
总之,一个真实逻辑知识库有三个组成部分:第一个描述符号(领域、常量、变量、函数和谓词符号)基础知识;第二个是一组封闭的逻辑公式,描述事实命题和一般知识;第三个在于用于评估每个公式的运算符和超参数。随后的定义形式化了这一概念。
Definition 4 (Theory/Knowledge-base). A theory of Real Logic is a triple T=K,G(θ),Θ , where K is a set of closed first-order logic formulas defined on the set of symbols S=DX CFP denoting,respectively,domains,variables,constants,function and predicate symbols; G(θ) is a parametric grounding for all the symbols sS and all the logical operators; and Θ={Θs}sS is the hypothesis space for each set of parameters θs associated with symbol s .
定义 4(理论/知识库)。实逻辑的理论是一个三元组 T=K,G(θ),Θ ,其中 K 是定义在表示域、变量、常量、函数和谓词符号集 S=DX CFP 上的一组闭合的一阶逻辑公式; G(θ) 是所有符号 sS 和所有逻辑运算符的参数化基础; Θ={Θs}sS 是与符号 s 相关联的每组参数 θs 的假设空间。
Learning and reasoning in a Real Logic theory are both associated with searching and applying the set of values of parameters θ from the hypothesis space Θ that maximize the satisfaction of the formulas in K . We use the term grounded theory,denoted by K,Gθ ,to refer to a Real Logic theory with a specific set of learned parameter values. This idea shares some similarity with the weighted MAX-SAT problem [43],where the weights for formulas in K are given by their fuzzy truth-values obtained by choosing the parameter values of the grounding. To define this optimization problem, we aggregate the truth-values of all the formulas in K by selecting a formula aggregating operator SatAgg : [0,1][0,1] .
在实际逻辑理论中,学习和推理都与搜索和应用假设空间 Θ 中最大化满足 K 中公式的参数值集 θ 相关联。我们使用术语“基于理论”,表示具有一组特定学习参数值的实际逻辑理论,记为 K,Gθ 。这个想法与加权最大可满足性问题[43]有一些相似之处,其中 K 中公式的权重由通过选择基础参数值获得的模糊真值给出。为了定义这个优化问题,我们通过选择一个公式聚合运算符 SatAgg: [0,1][0,1] ,聚合 K 中所有公式的真值。
Definition 5. The satisfiability of a theory T=K,Gθ with respect to the aggregating operator SatAgg is defined as SatAggϕKGθ(ϕ) .
定义 5. 对于聚合运算符 SatAgg,理论 T=K,Gθ 的可满足性被定义为 SatAggϕKGθ(ϕ)

3.2. Learning 3.2. 学习

Given a Real Logic theory T=(K,G(θ),Θ) ,learning is the process of searching for the set of parameter values θ that maximize the satisfiability of T w.r.t. a given aggregator:
给定一个实际逻辑理论 T=(K,G(θ),Θ) ,学习是搜索一组参数值 θ 的过程,这些参数值最大化相对于给定聚合器 T 的可满足性:
θ=argmaxθΘSatAggϕKGθ(ϕ)
Notice that with this general formulation, one can learn the grounding of constants, functions, and predicates. The learning of the grounding of constants corresponds to the learning of em-beddings. The learning of the grounding of functions corresponds to the learning of generative models or a regression task. Finally, the learning of the grounding of predicates corresponds to a classification task in Machine Learning.
注意到这个一般性的表述,人们可以学习常数、函数和谓词的基础。学习常数的基础对应于学习嵌入。学习函数的基础对应于学习生成模型或回归任务。最后,学习谓词的基础对应于机器学习中的分类任务。
In some cases, it is useful to impose some regularization (as done customarily in ML) on the set of parameters θ ,thus encoding a preference on the hypothesis space Θ ,such as a preference for smaller parameter values. In this case, learning is defined as follows:
在某些情况下,对参数集合 θ 施加一些正则化(类似于机器学习中通常做的),从而对假设空间 Θ 进行偏好编码,比如偏好较小的参数值。在这种情况下,学习被定义如下:
θ=argmaxθΘ(SatAggθGθ(ϕ)λR(θ))
where λR+ is the regularization parameter and R is a regularization function,e.g. L1 or L2 regularization,that is, L1(θ)=θθ|θ| and L2(θ)=θθθ2 .
其中 λR+ 是正则化参数, R 是正则化函数,例如 L1L2 正则化,即 L1(θ)=θθ|θ|L2(θ)=θθθ2
LTN can generalize and extrapolate when querying formulas grounded with unseen data (for example, new individuals from a domain), using knowledge learned with previous groundings (for example, re-using a trained predicate). This is explained in Section 3.3
LTN 可以在查询以未见数据为基础的公式时进行泛化和外推(例如,来自领域的新个体),利用先前基础学习的知识(例如,重新使用训练过的谓词)。这在第 3.3 节中有解释。

3.3. Querying 3.3. 查询

Given a grounded theory T=(K,Gθ) ,query answering allows one to check if a certain fact is true (or, more precisely, by how much it is true since in Real Logic truth-values are real numbers in the interval [0,1]) . There are various types of queries that can be asked of a grounded theory.
给定一个基于理论 T=(K,Gθ) ,查询回答允许我们检查某个事实是否为真(或者更准确地说,它有多真,因为在真实逻辑中,真值是在区间 [0,1]) 内的实数)。有各种类型的查询可以询问基于理论。
A first type of query is called truth queries. Any formula in the language of T can be a truth query. The answer to a truth query ϕq is the truth value of ϕq obtained by computing its grounding, i.e. Gθ(ϕq) . Notice that,if ϕq is a closed formula,the answer is a scalar in [0,1] denoting the truth-value of ϕq according to Gθ . if ϕq contains n free variables x1,,xn ,the answer to the query is a tensor of order n such that the component indexed by i1in is the truth-value of ϕq evaluated in Gθ(x1)i1,,Gθ(xn)in.
第一类查询称为真值查询。语言 T 中的任何公式都可以是真值查询。对真值查询 ϕq 的回答是通过计算其接地得到的 ϕq 的真值。请注意,如果 ϕq 是一个封闭公式,则答案是一个标量,表示 [0,1] 根据 Gθ 的真值。如果 ϕq 包含 n 个自由变量 x1,,xn ,则查询的答案是一个张量,其阶数为 n ,使得由 i1in 索引的分量是在 Gθ(x1)i1,,Gθ(xn)in. 中评估的 ϕq 的真值。
The second type of query is called value queries. Any term in the language of T can be a value query. The answer to a value query tq is a tensor of real numbers obtained by computing the grounding of the term,i.e. Gθ(tq) . Analogously to truth queries,the answer to a value query is a "tensor of tensors" if tq contains variables. Using value queries,one can inspect how a constant or a term, more generally, is embedded in the manifold.
第二种查询类型称为值查询。 T 语言中的任何术语都可以是值查询。值查询 tq 的答案是通过计算术语的接地得到的一组实数张量,即 Gθ(tq) 。类似于真值查询,如果 tq 包含变量,则值查询的答案是“张量的张量”。使用值查询,可以检查常数或术语在流形中的嵌入方式。
The third type of query is called generalization truth queries. With generalization truth queries, we are interested in knowing the truth-values of formulas when these are applied to a new (unseen) set of objects of a domain, such as a validation or a test set of examples typically used in the evaluation of machine learning systems. A generalization truth query is a pair (ϕq(x),U) ,where ϕq is a formula with a free variable x and U=(u(1),,u(k)) is a set of unseen examples whose dimensions are compatible with those of the domain of x . The answer to the query (ϕ^q(x),U) is Gθ(ϕq(x)) for x taking each value u(i),1ik ,in U . The result of this query is therefore a vector of |U| truth-values corresponding to the evaluation of ϕq on new data u(1),,u(k) .
第三种查询类型被称为泛化真值查询。通过泛化真值查询,我们希望了解当这些公式应用于一个新的(未见过的)对象集时的真值,比如在评估机器学习系统时通常用到的验证或测试集示例。一个泛化真值查询是一个对 (ϕq(x),U) ,其中 ϕq 是一个带有自由变量 x 的公式, U=(u(1),,u(k)) 是一组未见过的示例,其维度与 x 的域的维度兼容。对于查询 (ϕ^q(x),U) 的答案是 Gθ(ϕq(x)) ,对于 x 中的每个值 u(i),1ik ,在 U 中。因此,这个查询的结果是一个 |U| 真值向量,对应于在新数据 u(1),,u(k) 上对 ϕq 的评估。
The fourth and final type of query is generalization value queries. These are analogous to generalization truth queries with the difference that they evaluate a term tq(x) ,and not a formula,on new data U . The result,therefore,is a vector of |U| values corresponding to the evaluation of the trained model on a regression task using test data U .
第四种也是最后一种查询类型是泛化值查询。这类似于泛化真值查询,不同之处在于它评估一个术语 tq(x) ,而不是一个公式,在新数据 U 上。因此,结果是一个向量,其中包含对使用测试数据 U 在回归任务上训练模型的评估的 |U| 个值。

3.4. Reasoning 3.4. 推理

3.4.1. Logical consequence in Real Logic
3.4.1. 实逻辑中的逻辑推论

From a pure logic perspective, reasoning is the task of verifying if a formula is a logical consequence of a set of formulas. This can be achieved semantically using model theory () or syntactically via a proof theory () . To characterize reasoning in Real Logic,we adapt the notion of logical consequence for fuzzy logic provided in [9]: A formula ϕ is a fuzzy logical consequence of a finite set of formulas Γ ,in symbols Γϕ if for every fuzzy interpretation f ,if all the formulas in Γ are true (i.e. evaluate to 1) in f then ϕ is true in f . In other words,every model of Γ is a model of ϕ . A direct application of this definition to Real Logic is not practical since in most practical cases the level of satisfiability of a grounded theory (K,Gθ) will not be equal to 1 . We therefore define an interval [q,1] with 12<q<1 and assume that a formula is true if its truth-value is in the interval [q,1] . This leads to the following definition:
从纯逻辑的角度来看,推理是验证一个公式是否是一组公式的逻辑推论的任务。这可以通过模型论语义地实现 () ,也可以通过证明论句法地实现 () 。为了描述实际逻辑中的推理,我们采用了[9]中提供的模糊逻辑的逻辑推论概念:一个公式 ϕ 是一组有限公式 Γ 的模糊逻辑推论,用符号表示为 Γϕ ,如果对于每个模糊解释 f ,如果 Γ 中的所有公式都为真(即求值为 1)在 f 中,则 ϕf 中为真。换句话说, Γ 的每个模型都是 ϕ 的模型。直接将这个定义应用于实际逻辑是不切实际的,因为在大多数实际情况下,一个基础理论 (K,Gθ) 的可满足性水平不会等于 1。因此,我们定义一个区间 [q,1] ,其中 12<q<1 ,并假设如果一个公式的真值在区间 [q,1] 内,则该公式为真。这导致以下定义:
Definition 6. A closed formula ϕ is a logical consequence of a knowledge-base (K,G(θ),Θ) ,in symbols (K,G(θ),Θ)qϕ ,if,for every grounded theory K,Gθ ,if SatAgg(K,Gθ)q then Gθ(ϕ)q .
定义 6. 一个封闭公式 ϕ 是知识库 (K,G(θ),Θ) 的逻辑推论,用符号表示为 (K,G(θ),Θ)qϕ ,如果对于每一个基础理论 K,Gθ ,如果 SatAgg(K,Gθ)qGθ(ϕ)q

3.4.2. Reasoning by optimization
3.4.2. 优化推理

Logical consequence by direct application of Definition 6 requires querying the truth value of ϕ for a potentially infinite set of groundings. Therefore,we consider in practice the following directions:
通过直接应用定义 6 来推导逻辑结论需要查询 ϕ 的真值,这要求对一个潜在无限的基础集进行查询。因此,在实践中,我们考虑以下方向:
Reasoning Option 1 (Querying after learning). This is approximate logical inference by considering only the grounded theories that maximally satisfy (K,G(θ),Θ) . We therefore define that ϕ is a brave logical consequence of a Real Logic knowledge-base (K,G(θ),Θ) if Gθ(ϕ)q for all the θ such that:
推理选项 1(学习后查询)。这是通过仅考虑最大程度满足 (K,G(θ),Θ) 的基础理论来进行近似逻辑推理。因此,我们定义如果对于所有这样的 θGθ(ϕ)q 是实际逻辑知识库 (K,G(θ),Θ) 的一个勇敢的逻辑结果。
θ=argmaxθSatAgg(K,Gθ) and SatAgg(K,Gθ)q
The objective is to find all θ that optimally satisfy the knowledge base and to measure if they also satisfy ϕ . One can search for such θ by running multiple optimizations with the objective function of Section 3.2
目标是找到所有 θ ,以最佳方式满足知识库,并测量它们是否也满足 ϕ 。可以通过运行具有第 3.2 节目标函数的多个优化来搜索这些 θ
This approach is somewhat naive. Even if we run the optimization multiple times with multiple parameter initializations (to, hopefully, reach different optima in the search space), the obtained groundings may not be representative of other optimal or close-to-optimal groundings. In Section 4.8 we give an example that shows the limitations of this approach and motivates the next one.
这种方法有点幼稚。即使我们多次运行优化,使用多个参数初始化(希望在搜索空间中达到不同的最优解),所得到的基础可能不代表其他最优或接近最优的基础。在第 4.8 节中,我们给出一个示例,展示了这种方法的局限性,并激励了下一个方法。
Reasoning Option 2 (Proof by Refutation). Here, we reason by refutation and search for a counterexample to the logical consequence by introducing an alternative search objective. Normally, according to Definition 6, one tries to verify that 11
推理选项 2(反驳证明)。在这里,我们通过反驳进行推理,并通过引入替代搜索目标来寻找逻辑结果的反例。通常情况下,根据定义 6,人们试图验证 11。
(21)for allθΘ,ifGθ(K)qthenGθ(ϕ)q.
Instead, we solve the dual problem:
相反,我们解决双重问题:
(22)there existsθΘsuch thatGθ(K)qandGθ(ϕ)<q.
If Eq.(22) is true then a counterexample to Eq.(21) has been found and the logical consequence does not hold. If Eq. 22 is false then no counterexample to Eq. 21 has been found and the logical consequence is assumed to hold true. A search for such parameters θ (the counterexample) can be performed by minimizing Gθ(ϕ) while imposing a constraint that seeks to invalidate results where Gθ(K)<q . We therefore define:
如果等式(22)成立,则已找到等式(21)的反例,逻辑推论不成立。如果等式 22 不成立,则没有找到等式 21 的反例,逻辑推论被认为成立。可以通过最小化 Gθ(ϕ) 来寻找这样的参数 θ (反例),同时施加一个约束条件,该约束条件旨在使 Gθ(K)<q 的结果无效。因此,我们定义:
penalty(Gθ,q)={c if Gθ(K)<q,0 otherwise, where c>1

11 For simplicity,we temporarily define the notation G(K):=SatAggϕK(K,G) .
为简单起见,我们暂时定义符号 G(K):=SatAggϕK(K,G)

Given G such that: 给定 G ,使得:
(23)G=argminGθ(Gθ(ϕ)+penalty(Gθ,q))
  • If G(K)<q : Then for all Gθ,Gθ(K)<q and therefore (K,G(θ),Θ)qϕ .
    如果 G(K)<q :那么对于所有 Gθ,Gθ(K)<q ,因此 (K,G(θ),Θ)qϕ
  • If G(K)q and G(ϕ)q : Then for all Gθ with Gθ(K)q ,we have that Gθ(ϕ)G(ϕ)q and therefore (K,G(θ),Θ)qϕ .
    如果 G(K)qG(ϕ)q :那么对于所有 GθGθ(K)q ,我们有 Gθ(ϕ)G(ϕ)q ,因此 (K,G(θ),Θ)qϕ
  • If G(K)q and G(ϕ)<q: Then (K,G(θ),Θ)\vDash \not{} qϕ .
    如果 G(K)qG(ϕ)<q: ,那么 (K,G(θ),Θ)\vDash \not{} qϕ
Clearly, Equation (23) cannot be used as an objective function for gradient-descent due to null derivatives. Therefore, we propose to approximate the penalty function with the soft constraint:
显然,由于存在零导数,方程(23)不能作为梯度下降的目标函数。因此,我们建议用软约束来近似惩罚函数:
elu(α,β(qGθ(K)))={β(qGθ(K)) if Gθ(K)q,α(eqGθ(K)1) otherwise,
where α0 and β0 are hyper-parameters (see Figure 6). When Gθ(K)<q ,the penalty is linear in qGθ(K) with a slope of β . Setting β high,the gradients for Gθ(K) will be high in absolute value if the knowledge-base is not satisfied. When Gθ(K)>q ,the penalty is a negative exponential that converges to α . Setting α low but non-zero seeks to ensure that the gradients do not vanish when the penalty should not apply (when the knowledge-base is satisfied). We obtain the following approximate objective function:
其中 α0β0 是超参数(见图 6)。当 Gθ(K)<q 时,惩罚在 qGθ(K) 中是线性的,斜率为 β 。设置 β 较高时,如果知识库不满足, Gθ(K) 的梯度将具有较高的绝对值。当 Gθ(K)>q 时,惩罚是一个收敛到 α 的负指数。设置 α 较低但非零,旨在确保当不应用惩罚时(即知识库满足时),梯度不会消失。我们得到以下近似目标函数:
(24)G=argminGθ(Gθ(ϕ)+elu(α,β(qGθ(K)))
Section 4.8 will illustrate the use of reasoning by refutation with an example in comparison with reasoning as querying after learning. Of course, other forms of reasoning are possible, not least that adopted in [6], but a direct comparison is outside the scope of this paper and left as future work.
第 4.8 节将通过一个示例说明反驳推理的运用,与学习后查询的推理进行比较。当然,还有其他形式的推理是可能的,尤其是在[6]中采用的形式,但直接比较超出了本文的范围,留作未来的工作。

4. The Reach of Logic Tensor Networks
4. 逻辑张量网络的覆盖范围

The objective of this section is to show how the language of Real Logic can be used to specify a number of tasks that involve learning from data and reasoning. Examples of such tasks are classification, regression, clustering, and link prediction. The solution of a problem specified in Real Logic is obtained by interpreting such a specification in Logic Tensor Networks. The LTN library implements Real Logic in Tensorflow 2 [1] and is available from GitHub 13 Every logical operator is grounded using Tensorflow primitives such that LTN implements directly a Tensorflow graph. Due to Tensorflow built-in optimization, LTN is relatively efficient while providing the expressive power of first-order logic. Details on the implementation of the examples described in this section are reported in Appendix A. The implementation of the examples presented here is also available from the LTN repository on GitHub. Except when stated otherwise, the results reported are the average result over 10 runs using a 95% confidence interval. Every example uses a stable real product configuration to approximate the Real Logic operators and the Adam optimizer [35] with a learning rate of 0.001 . Table A.3 in the Appendix gives an overview of the network architectures used to obtain the results reported in this section.
本节的目标是展示如何使用 Real Logic 语言来指定涉及从数据中学习和推理的多项任务。这些任务的示例包括分类、回归、聚类和链接预测。在 Real Logic 中指定的问题的解决方案是通过在逻辑张量网络中解释这样的规范来获得的。LTN 库在 Tensorflow 2 中实现了 Real Logic,并可从 GitHub 获取 13 。每个逻辑运算符都是使用 Tensorflow 原语进行基础化的,因此 LTN 直接实现了一个 Tensorflow 图。由于 Tensorflow 内置优化,LTN 在提供一阶逻辑的表达能力的同时相对高效。关于本节中描述的示例的实现细节在附录 A 中报告。这里呈现的示例的实现也可以从 GitHub 上的 LTN 存储库中获取。除非另有说明,报告的结果是使用 95% 置信区间进行 10 次运行的平均结果。每个示例都使用稳定的真实产品配置来近似 Real Logic 运算符和学习速率为 0.001 的 Adam 优化器[35]。表 A。附录中的第 3 部分概述了用于获得本节报告结果的网络架构。

12 In the objective function, G should satisfy G(K)q before reducing G(ϕ) because the penalty c which is greater than 1 is higher than any potential reduction in G(ϕ) which is smaller or equal to 1 .
在目标函数中, G 应该在减少 G(ϕ) 之前满足 G(K)q ,因为惩罚 c 大于 1,高于任何可能的减少 G(ϕ) (小于或等于 1)。
13 https://github.com/logictensornetworks/logictensornetworks

Figure 6: elu(α,βx) where α0 and β0 are hyper-parameters. The function elu(α,β(qGθ(K))) with α low and β high is a soft constraint for penalty (Gθ,q) suitable for learning.
图 6: elu(α,βx) 其中 α0β0 是超参数。 函数 elu(α,β(qGθ(K))) 具有 α 低和 β 高是一个软约束,适用于学习的惩罚 (Gθ,q)

4.1. Binary Classification
4.1. 二元分类

The simplest machine learning task is binary classification. Suppose that one wants to learn a binary classifier A for a set of points in [0,1]2 . Suppose that a set of positive and negative training examples is given. LTN uses the following language and grounding:
最简单的机器学习任务是二元分类。假设一个人想要为 [0,1]2 中的一组点学习一个二元分类器 A 。假设给定一组正负训练示例。LTN 使用以下语言和基础:

Domains: 领域:

points (denoting the examples).
点(表示示例)。

Variables: 变量:

x+ for the positive examples.
x+ 为正例。
x for the negative examples.
x 用于负例。
x for all examples.  x 中的所有示例。
D(x)=D(x+)=D(x)= points.  D(x)=D(x+)=D(x)= 分。
Predicates: 谓词:
A(x) for the trainable classifier.
可训练分类器的 A(x)
Din(A)= points. Axioms:  Din(A)= 点。公理:
(25)x+A(x+)
(26)x¬A(x)

Grounding: 接地:

G (points) =[0,1]2 .  G (分) =[0,1]2
G(x)[0,1]m×2(G(x)is a sequence ofmpoints,that is,mexamples) .
G(x+)=dG(x)∣∥d(0.5,0.5)∥<0.09
G(x)=dG(x)∣∥d(0.5,0.5)∥≥0.0915
G(Aθ):xsigmoid(MLPθ(x)) ,where MLP is a Multilayer Perceptron with a single output neuron,whose parameters θ are to be learned 16
G(Aθ):xsigmoid(MLPθ(x)) ,其中 MLP 是一个具有单个输出神经元的多层感知器,其参数 θ 需要学习 16

Learning: 学习:

Let us define D the data set of all examples. The objective function with K={x+A(x+),x¬A(x)} is given by argmaxθΘSatAggϕKGθ,xD(ϕ) . 17] In practice,the optimizer uses the following loss function:
让我们定义 D 为所有示例的数据集。具有 K={x+A(x+),x¬A(x)} 的目标函数由 argmaxθΘSatAggϕKGθ,xD(ϕ) 给出。在实践中,优化器使用以下损失函数:
L=(1SatAggϕKGθ,xB(ϕ))
where B is a mini-batch sampled from D18 The objective and loss functions depend on the following hyper-parameters:
其中 B 是从 D18 中抽样的小批量。目标函数和损失函数取决于以下超参数:
  • the choice of fuzzy logic operator semantics used to approximate each connective and quantifier,
    用于近似每个连接词和量词的模糊逻辑运算符语义的选择
  • the choice of hyper-parameters underlying the operators, such as the value of the exponent p in any generalized mean,
    超参数选择是算子的基础,比如在任何广义均值中指数 p 的取值
  • the choice of formula aggregator function.
    公式聚合器函数的选择。
Using the stable product configuration to approximate connectives and quantifiers,and p=2 for every occurrence of ApME ,and using for the formula aggregator also ApME with p=2 , yields the following satisfaction equation:
使用稳定的产品配置来近似连接词和量词,并对每个 ApME 的出现使用 p=2 ,并且对于公式聚合器也使用 ApMEp=2 ,得到以下满足方程:
SatAggϕKGθ(ϕ)=112(1(1(1|G(x+)|vG(x+)(1sigmoid(MLPθ(v)))2)122)122)
+1(1(1|G(x)|vG(x)(sigmoid(MLPθ(v)))2)122))12

14G(x+) are,by definition in this example,the training examples with Euclidean distance to the center (0.5,0.5) smaller than the threshold of 0.09 .
14G(x+) 在这个例子中的定义是,与中心 (0.5,0.5) 的欧氏距离小于 0.09 的训练样本。
15G(x) are,by definition,the training examples with Euclidean distance to the centre (0.5,0.5) larger or equal to the threshold of 0.09 .
15G(x) 是根据定义,到中心 (0.5,0.5) 的欧氏距离大于或等于 0.09 的训练示例。
16sigmoid(x)=11+ex
17 The notation GxD(ϕ(x)) means that the variable x is grounded with the data D (that is, G(x):=D ) when grounding
符号 GxD(ϕ(x)) 表示当对变量 x 进行接地时,使用数据 D (即 G(x):=D )进行接地
ϕ(x) .
18 As usual in ML,while it is possible to compute the loss function and gradients over the entire data set,it is preferred to use mini-batches of the examples.
通常在机器学习中,虽然可以计算整个数据集上的损失函数和梯度,但更倾向于使用示例的小批量。

Figure 7: Symbolic Tensor Computational Graph for the Binary Classification Example. In the figure, Gx+ and Gx are inputs to the network Gθ(A) and the dotted lines indicate the propagation of activation from each input through the network, which produces two outputs.
图 7:二元分类示例的符号张量计算图。在图中, Gx+Gx 是网络 Gθ(A) 的输入,虚线表示激活从每个输入通过网络传播,产生两个输出。
The computational graph of Figure 7 shows Sat AggϕKGθ(ϕ) ) as used with the above loss function.
图 7 的计算图显示了 Sat AggϕKGθ(ϕ) )与上述损失函数的使用。
We are therefore interested in learning the parameters θ of the MLP used to model the binary classifier. We sample 100 data points uniformly from [0,1]2 to populate the data set of positive and negative examples. The data set was split into 50 data points for training and 50 points for testing. The training was carried out for a fixed number of 1000 epochs using backpropagation with the Adam optimizer [35] with a batch size of 64 examples. Figure 8 shows the classification accuracy and satisfaction level of the LTN on both training and test sets averaged over 10 runs using a 95% confidence interval. The accuracy shown is the ratio of examples correctly classified, with an example deemed as being positive if the classifier outputs a value higher than 0.5 .
因此,我们对用于建模二元分类器的 MLP 的参数 θ 感兴趣。我们从 [0,1]2 均匀采样了 100 个数据点,以填充正负样本的数据集。数据集被分为 50 个数据点用于训练,50 个数据点用于测试。训练是使用 Adam 优化器[35]进行的,使用 64 个示例的批处理大小,固定进行了 1000 个时期的训练。图 8 显示了在进行了 10 次运行后,使用 95%置信区间对训练集和测试集上的 LTN 的分类准确度和满意度进行了平均。所显示的准确度是正确分类的示例比例,如果分类器输出高于 0.5,则示例被视为正例。
Notice that a model can reach an accuracy of 100% while satisfaction of the knowledge base is yet not maximized. For example, if the threshold for an example to be deemed as positive is 0.7 , all examples may be classified correctly with a confidence score of 0.7 . In that case, while the accuracy is already maximized,the satisfaction of x+A(x+) would still be 0.7,and can still improve until the confidence for every sample reaches 1.0.
请注意,一个模型的准确率可以达到 100% ,但知识库的满意度尚未最大化。例如,如果将一个示例被视为正例的阈值设为 0.7,那么所有示例可能会以 0.7 的置信度得到正确分类。在这种情况下,虽然准确率已经最大化,但 x+A(x+) 的满意度仍然是 0.7,并且可以继续提高,直到每个样本的置信度达到 1.0。
This first example, although straightforward, illustrates step-by-step the process of using LTN in a simple setting. Notice that, according to the nomenclature of Section 3.3 measuring accuracy amounts to querying the truth query (respectively,the generalization truth query) A(x) for all the examples of the training set (respectively, test set) and comparing the results with the classification threshold. In Figure 9,we show the results of such queries A(x) after optimization. Next,we show how the LTN language can be used to solve progressively more complex problems by combining learning and reasoning.
这个第一个例子虽然简单,但逐步说明了在简单环境中使用 LTN 的过程。请注意,根据第 3.3 节的命名法,测量准确性涉及查询真实查询(分别是泛化真实查询) A(x) ,针对训练集(分别是测试集)中的所有示例,并将结果与分类阈值进行比较。在图 9 中,我们展示了优化后此类查询 A(x) 的结果。接下来,我们展示了如何通过结合学习和推理,逐渐解决更复杂的问题。

4.2. Multi-Class Single-Label Classification
4.2. 多类单标签分类

The natural extension of binary classification is a multi-class classification task. We first approach multi-class single-label classification, which assumes that each example is assigned to one and only one label.
二元分类的自然延伸是多类分类任务。我们首先接触多类单标签分类,这假设每个示例被分配到一个且仅一个标签。
For illustration purposes, we use the Iris flower data set [20], which consists of classification into three mutually exclusive classes; call these A,B ,and C . While one could train three unary predicates A(x),B(x) and C(x) ,it turns out to be more effective if this problem is modeled by a single binary predicate P(x,l) ,where l is a variable denoting a multi-class label,in this case,
为了举例说明,我们使用鸢尾花数据集[20],该数据集包括分类为三个互斥类别;将其称为 A,B ,和 C 。虽然可以训练三个一元谓词 A(x),B(x)C(x) ,但事实证明,如果将这个问题建模为单个二元谓词 P(x,l) 会更有效,其中 l 是表示多类标签的变量,在这种情况下,
Figure 8: Binary Classification task (training and test set performance): Average accuracy (left) and satisfiability (right). Due to the random initializations, accuracy and satisfiability start on average at 0.5 with performance increasing rapidly after a few epochs.
图 8:二元分类任务(训练集和测试集性能):平均准确率(左)和可满足性(右)。由于随机初始化,准确率和可满足性平均从 0.5 开始,在几个时期后性能迅速提高。
classes A,B or C . This syntax allows one to write statements quantifying over the classes,e.g. x(l(P(x,l))) . Since the classes are mutually exclusive,the output layer of the MLP representing P(x,l) will be a softmax layer,instead of a sigmoid function,to ensure the exclusivity constraint on satisfiability scores 19 The problem can be specified as follows:
A,BC 。这种语法允许人们编写量化类的语句,例如 x(l(P(x,l))) 。由于这些类是相互排斥的,代表 P(x,l) 的 MLP 的输出层将是一个 softmax 层,而不是一个 sigmoid 函数,以确保满足得分的排他性约束 19 问题可以如下指定:

Domains: 领域:

items, denoting the examples from the Iris flower data set.
项目,表示鸢尾花数据集中的示例。
labels, denoting the class labels.
标签,表示类别标签。

Variables: 变量:

xA,xB,xC for the positive examples of classes A,B,C .
A,B,C 的正例 xA,xB,xC
x for all examples.  x 中的所有示例。
D(xA)=D(xB)=D(xC)=D(x)= items.  D(xA)=D(xB)=D(xC)=D(x)= 项。

Constants: 常数:

lA,lB,lC ,the labels of classes A (Iris setosa), B (Iris virginica), C (Iris versicolor),respectively.
lA,lB,lC ,类别标签分别为 A (山鸢尾)、 B (维吉尼亚鸢尾)、 C (变色鸢尾)。
D(lA)=D(lB)=D(lC)= labels.  D(lA)=D(lB)=D(lC)= 个标签。
Predicates: 谓词:
P(x,l) denoting the fact that item x is classified as l .
P(x,l) 表示项目 x 被分类为 l
Din(P)= items,labels.  Din(P)= 项,标签。
Axioms: 公理:
(27)xAP(xA,lA)
(28)xBP(xB,lB)
(29)xCP(xC,lC)
Notice that rules about exclusiveness such as x(P(x,lA)(¬P(x,lB)¬P(x,lC))) are not included since such constraints are already imposed by the grounding of P below,more specifically the softmax function.
请注意,关于排他性的规则,如 x(P(x,lA)(¬P(x,lB)¬P(x,lC))) ,并未包含在内,因为这些约束已经由下面 P 的基础所施加,更具体地说是 softmax 函数。

19softmax(x)=exi/jexj

Figure 9: Binary Classification task (querying the trained predicate A(x) ): It is interesting to see how A(x) could be appropriately named as denoting the inside of the central region shown in the figure,and therefore ¬A(x) represents the outside of the region.
图 9:二元分类任务(查询经过训练的谓词 A(x) ):有趣的是看到 A(x) 如何被适当地命名为表示图中所示中心区域的内部,因此 ¬A(x) 代表区域的外部。

Grounding: 接地:

G (items) =R4 ,items are described by 4 features: the length and the width of the sepals and petals, in centimeters.
G (项) =R4 ,物品由 4 个特征描述:萼片和花瓣的长度和宽度,单位为厘米。
G (labels) =N3 ,we use a one-hot encoding to represent classes.
G (标签) =N3 ,我们使用一位有效编码来表示类别。
G(xA)Rm1×4 ,that is, G(xA) is a sequence of m1 examples of class A .
G(xA)Rm1×4 ,即 G(xA)A 类的 m1 个示例序列。
G(xB)Rm2×4,G(xB) is a sequence of m2 examples of class B .
G(xB)Rm2×4,G(xB)B 类的 m2 个示例的序列。
G(xC)Rm3×4,G(xC) is a sequence of m3 examples of class C .
G(xC)Rm3×4,G(xC)C 类的 m3 个示例的序列。
G(x)R(m1+m2+m3)×4,G(x) is a sequence of all the examples.
G(x)R(m1+m2+m3)×4,G(x) 是所有示例的序列。
G(lA)=[1,0,0],G(lB)=[0,1,0],G(lC)=[0,0,1] .
G(Pθ):x,llsoftmax(MLPθ(x)) ,where the MLP has three output neurons corresponding to as many classes,and denotes the dot product as a way of selecting an output for G(Pθ) ; multiplying the MLP’s output by the one-hot vector l gives the truth degree corresponding to the class denoted by l .
MLP 具有三个输出神经元,分别对应三个类别, 表示点积作为选择 G(Pθ) 输出的一种方式;将 MLP 的输出乘以 one-hot 向量 l ,得到与 l 所表示类别对应的真实度。

Learning: 学习:

The logical operators and connectives are approximated using the stable product configuration with p=2 for ApME . For the formula aggregator, ApME is used also with p=2 .
逻辑运算符和连接词使用稳定的产品配置来近似,其中 p=2 代表 ApME 。对于公式聚合器,也使用 p=2
The computational graph of Figure 10 illustrates how SatAggϕKGθ(ϕ) is obtained. If U denotes batches sampled from the data set of all examples, the loss function (to minimize) is:
图 10 的计算图说明了如何获得 SatAggϕKGθ(ϕ) 。如果 U 表示从所有示例数据集中抽样的批次,则损失函数(要最小化)为:
L=1SatAggϕKGθ,xB(ϕ).
Figure 11 shows the result of training with the Adam optimizer with batches of 64 examples. Accuracy measures the ratio of examples correctly classified,with example x labeled as argmaxl(P(x,l))[20] Classification accuracy reaches an average value near 1.0 for both the training and test data after some 100 epochs. Satisfaction levels of the Iris flower predictions continue to increase for the rest of the training (500 epochs) to more than 0.8 .
图 11 显示了使用 Adam 优化器进行训练,每批 64 个示例的结果。准确率衡量了正确分类的示例比例,示例 x 标记为 argmaxl(P(x,l))[20] 。经过大约 100 个 epochs 后,训练数据和测试数据的分类准确率均接近 1.0。鸢尾花预测的满意度水平在接下来的训练过程中继续增加(500 个 epochs),达到了超过 0.8 的值。
It is worth contrasting the choice of using a binary predicate (P(x,l)) in this example with the option of using multiple unary predicates (lA(x),lB(x),lC(x)) ,one for each class. Notice how each predicate is normally associated with an output neuron. In the case of the unary predicates, the networks would be disjoint (or modular), whereas weight-sharing takes place with the use of the binary predicate. Since l is instantiated into lA,lB,lC ,in practice P(x,l) becomes P(x,lA),P(x,lB),P(x,lC) ,which is implemented via three output neurons to which a softmax function applies.
值得注意的是,在这个例子中使用二元谓词 (P(x,l)) 的选择与使用多个一元谓词 (lA(x),lB(x),lC(x)) 的选项形成鲜明对比,每个类别对应一个谓词。请注意,通常每个谓词与一个输出神经元相关联。对于一元谓词,网络将是不相交的(或模块化的),而使用二元谓词则会进行权重共享。由于 l 被实例化为 lA,lB,lC ,在实践中 P(x,l) 变为 P(x,lA),P(x,lB),P(x,lC) ,这是通过三个输出神经元实现的,这些神经元应用 softmax 函数。

4.3. Multi-Class Multi-Label Classification
4.3. 多类别多标签分类

We now turn to multi-label classification, whereby multiple labels can be assigned to each example. As a first example of the reach of LTNs, we shall see how the previous example can be extended naturally using LTN to account for multiple labels, not always a trivial extension for most ML algorithms. The standard approach to the multi-label problem is to provide explicit negative examples for each class. By contrast, LTN can use background knowledge to relate classes directly to each other, thus becoming a powerful tool in the case of the multi-label problem when typically the labeled data is scarce. We explore the Leptograpsus crabs data set [10] consisting of 200 examples of 5 morphological measurements of 50 crabs. The task is to classify the crabs according to their color and sex. There are four labels: blue, orange, male, and female. The color labels are mutually exclusive, and so are the labels for sex. LTN will be used to specify such information logically.
我们现在转向多标签分类,即每个示例可以分配多个标签。作为 LTN 能力的第一个示例,我们将看到如何使用 LTN 自然地扩展先前的示例,以解决多标签问题,对于大多数 ML 算法来说,这并非总是一个微不足道的扩展。解决多标签问题的标准方法是为每个类提供明确的负例。相比之下,LTN 可以使用背景知识直接将类之间联系起来,因此在标记数据通常稀缺的多标签问题中成为一个强大的工具。我们探索了由 200 个示例组成的 Leptograpsus 螃蟹数据集[10],其中包括 50 只螃蟹的 5 个形态测量。任务是根据它们的颜色和性别对螃蟹进行分类。有四个标签:蓝色、橙色、雄性和雌性。颜色标签是相互排斥的,性别标签也是如此。LTN 将被用来逻辑地指定这样的信息。

20 This is also known as top-1 accuracy,as proposed in [39]. Cross-entropy results (tlog(y)) could have been reported here as is common with the use of softmax, although it is worth noting that, of course, the loss function used by LTN is different.
这也被称为 top-1 准确率,如[39]中提出的。 交叉熵结果可能已经被报告,因为使用 softmax 是常见的,尽管值得注意的是,LTN 使用的损失函数是不同的。

Figure 10: Symbolic Tensor Computational Graph for the Multi-Class Single-Label Problem. As before, the dotted lines in the figure indicate the propagation of activation from each input through the network, in this case producing three outputs.
图 10:多类单标签问题的符号张量计算图。与之前一样,图中的虚线表示每个输入通过网络的激活传播,在这种情况下产生三个输出。
Figure 11: Multi-Class Single-Label Classification: Classification accuracy (left) and satisfaction level (right).
图 11:多类单标签分类:分类准确率(左)和满意度水平(右)。

Domains: 领域:

items denoting the examples from the crabs dataset.
螃蟹数据集中示例的项目。
labels denoting the class labels.
标签表示类别标签。

Variables: 变量:

xblue ,xorange ,xmale ,xfemale  for the positive examples of each class.
每个类别的正例 xblue ,xorange ,xmale ,xfemale 
x ,used to denote all the examples.
x ,用于表示所有的例子。
D(xblue )=D(xorange )=D(xmale )=D(xfemale )=D(x)= items.  D(xblue )=D(xorange )=D(xmale )=D(xfemale )=D(x)= 项。

Constants: 常数:

lblue ,lorange ,lmale ,lfemale  (the labels for each class).
lblue ,lorange ,lmale ,lfemale  (每个类别的标签)。
D(lblue )=D(lorange )=D(lmale )=D(lfemale )= labels.  D(lblue )=D(lorange )=D(lmale )=D(lfemale )= 个标签。
Predicates: 谓词:
P(x,l) ,denotes the fact that item x is labelled as l .
P(x,l) 表示项目 x 被标记为 l
Din(P)= items,labels.  Din(P)= 项,标签。

Axioms: 公理:

(30)xblue P(xblue ,lblue )
(31)xorange P(xorange ,lorange )
(32)xmale P(xmale ,lmale )
(33)xfemale P(xfemale ,lfemale )
(34)x¬(P(x,lblue )P(x,lorange ))
(35)x¬(P(x,lmale )P(x,lfemale ))
Notice how logical rules 34 and 35 above represent the mutual exclusion of the labels on
注意上面的逻辑规则 34 和 35 代表标签的互斥
colour and sex, respectively. As a result, negative examples are not used explicitly in this specification.
颜色和性别。因此,在这个规范中,负面例子并没有被明确使用。

Grounding: 接地:

G (items) =R5 ; the examples from the data set are described using 5 features.
G (项) =R5 ;数据集中的示例使用 5 个特征进行描述。
G (labels) =N4 ; one-hot vectors are used to represent class labels 21
G (标签) =N4 ;独热向量用于表示类标签 21
G(xblue )Rm1×5,G(xorange )Rm2×5,G(xmale )Rm3×5,G(xfemale )Rm4×5 . These sequences are not mutually-exclusive,one example can for instance be in both xblue  and xmale  . G˙(lblue )=[1,0,0,0],G(lorange )=[0,1,0,0],G(lmale )=[0,0,1,0],G(lfemale )=[0,0,0,1] . G(Pθ):x,llsigmoid(MLPθ(x)) ,with the MLP having four output neurons corresponding to as many classes. As before, denotes the dot product which selects a single output. By contrast with the previous example, notice the use of a sigmoid function instead of a softmax function.
G(xblue )Rm1×5,G(xorange )Rm2×5,G(xmale )Rm3×5,G(xfemale )Rm4×5 . 这些序列并不是互斥的,例如,一个示例可以同时属于 xblue xmale G˙(lblue )=[1,0,0,0],G(lorange )=[0,1,0,0],G(lmale )=[0,0,1,0],G(lfemale )=[0,0,0,1] . G(Pθ):x,llsigmoid(MLPθ(x)) ,MLP 具有四个输出神经元,分别对应四个类别。与之前一样, 表示点积,选择单个输出。与前一个示例相比,请注意使用的是 Sigmoid 函数而不是 Softmax 函数。

21 There are two possible approaches here: either each item is labeled with one multi-hot encoding or each item is labeled with several one-hot encodings. The latter approach was used in this example.
这里有两种可能的方法:要么每个项目用一个多热编码标记,要么每个项目用几个单热编码标记。在这个例子中采用了后一种方法。

Learning: 学习:

As before, the fuzzy logic operators and connectives are approximated using the stable product configuration with p=2 for ApME ,and for the formula aggregator, ApME is also used with p=2 .
与以往一样,模糊逻辑运算符和连接词使用稳定的乘积配置来近似表示,其中 p=2 代表 ApME ,对于公式聚合器,也使用 ApME 以及 p=2
Figure 12 shows the result of the Adam optimizer using backpropagation trained with batches of 64 examples. This time, the accuracy is defined as 1 -HL, where HL is the average Hamming loss, i.e. the fraction of labels predicted incorrectly, with a classification threshold of 0.5 (given an example u ,if the model outputs a value greater than 0.5 for class C then u is deemed as belonging to class C ). The rightmost graph in Figure 12 illustrates how LTN learns the constraint that a crab cannot have both blue and orange color, which is discussed in more detail in what follows.
图 12 显示了使用 Adam 优化器的结果,使用 64 个示例的批处理进行反向传播训练。这次,准确率被定义为 1-HL,其中 HL 是平均汉明损失,即被错误预测的标签比例,分类阈值为 0.5(给定一个示例 u ,如果模型为类 C 输出大于 0.5 的值,则 u 被视为属于类 C )。图 12 中最右侧的图表说明了 LTN 如何学习约束条件,即螃蟹不能同时具有蓝色和橙色,这将在接下来更详细地讨论。

Querying: 查询:

To illustrate the learning of constraints by LTN, we have queried three formulas that were not explicitly part of the knowledge-base, over time during learning:
为了说明 LTN 对约束条件的学习,我们在学习过程中查询了三个不明确包含在知识库中的公式:
(36)ϕ1:x(P(x,lblue )¬P(x,lorange ))
(37)ϕ2:x(P(x,lblue )P(x,lorange ))
(38)ϕ3:x(P(x,lblue )P(x,lmale ))
For querying,we use p=5 when approximating the universal quantifiers with ApME . A higher p denotes a stricter universal quantification with a stronger focus on outliers (see Section 2.4). 22 We should expect ϕ1 to hold true (every blue crab cannot be orange and vice-versa 23 ,and we should expect ϕ2 (every blue crab is also orange) and ϕ3 (every blue crab is male) to be false. The results are reported in the rightmost plot of Figure 12 Prior to training,the truth-values of ϕ1 to ϕ3 are non-informative. During training one can see,with the maximization of the satisfaction of the knowledge-base, a trend towards the satisfaction of ϕ1 ,and an opposite trend of ϕ2 and ϕ3 towards false.
在查询时,我们使用 p=5 来近似使用 ApME 的全称量词。更高的 p 表示更严格的全称量化,更关注异常值(见第 2.4 节)。 22 我们应该期望 ϕ1 成立(每只蓝蟹都不能是橙色,反之亦然 23 ,我们应该期望 ϕ2 (每只蓝蟹也是橙色)和 ϕ3 (每只蓝蟹都是雄性)是错误的。结果报告在图 12 的最右侧图中。在训练之前, ϕ1ϕ3 的真值是无信息的。在训练过程中,可以看到,随着对知识库满足度的最大化,对 ϕ1 的满足度呈上升趋势,而对 ϕ2ϕ3 的满足度呈下降趋势。

4.4. Semi-Supervised Pattern recognition
4.4. 半监督模式识别

Let us now explore two, more elaborate, classification tasks, which showcase the benefit of using logical reasoning alongside machine learning. With these two examples, we also aim to provide a more direct comparison with a related neurosymbolic system DeepProbLog [41]. The benchmark examples below were introduced in the DeepProbLog paper [41].
让我们现在探讨两个更为复杂的分类任务,展示在机器学习中运用逻辑推理的好处。通过这两个例子,我们还旨在与相关的神经符号系统 DeepProbLog [41] 进行更直接的比较。以下基准示例是在 DeepProbLog 论文[41]中介绍的。

22 Training should usually not focus on outliers,as optimizers would struggle to generalize and tend to get stuck in local minima. However,when querying ϕ1,ϕ2,ϕ3 ,we wish to be more careful about the interpretation of our statement. See also 3.1.3 x(P(x,lblue )¬P(x,lorange ))
培训通常不应侧重于异常值,因为优化器往往难以泛化并倾向于陷入局部最小值。然而,在查询 ϕ1,ϕ2,ϕ3 时,我们希望更加谨慎地解释我们的陈述。另请参阅 3.1.3 x(P(x,lblue )¬P(x,lorange ))

Figure 12: Multi-Class Multi-Label Classification: Classification Accuracy (left), Satisfiability level (middle), and Querying of Constraints (right).
图 12:多类多标签分类:分类准确度(左)、满意度水平(中)、以及约束查询(右)。
Single Digits Addition: Consider the predicate addition (X,Y,N) ,where X and Y are images of digits (the MNIST data set will be used), and N is a natural number corresponding to the sum of these digits. This predicate should return an estimate of the validity of the addition. For instance,addition (B˙,Ω,11) is a valid addition; addition (B,Ω,5) is not.
单个数字的加法:考虑谓词加法 (X,Y,N) ,其中 XY 是数字的图像(将使用 MNIST 数据集),N 是对这些数字的和对应的自然数。该谓词应返回对加法有效性的估计。例如,加法 (B˙,Ω,11) 是有效的加法;加法 (B,Ω,5) 则不是。
Multi Digits Addition: The experiment is extended to numbers with more than one digit. Consider the predicate addition ([X1,X2],[Y1,Y2],N).[X1,X2] and [Y1,Y2] are lists of images of digits,representing two multi-digit numbers; N is a natural number corresponding to the sum of the two multi-digit numbers. For instance,addition ([3,5],[7,2],130) is a valid addition; addition ([3,8],[9,2],26) is not.
多位数加法:实验扩展到具有多位数的数字。考虑谓词加法 ([X1,X2],[Y1,Y2],N).[X1,X2][Y1,Y2] 是数字图像列表,表示两个多位数; N 是对应于两个多位数之和的自然数。例如,加法 ([3,5],[7,2],130) 是有效的加法;加法 ([3,8],[9,2],26) 不是。
A natural neurosymbolic approach is to seek to learn a single-digit classifier and benefit from knowledge readily available about the properties of addition in this case. For instance, suppose that a predicate digit(x,d) gives the likelihood of an image x being of digit d . A definition for addition (3,8,11) in LTN is:
一种自然的神经符号方法是寻求学习一个单数字分类器,并从关于加法属性的已知知识中受益。例如,假设一个谓词 digit(x,d) 给出图像 x 是数字 d 的可能性。在 LTN 中,加法 (3,8,11) 的定义是:
d1,d2:d1+d2=11(digit(B,d1)digit(C,d2))
In [41], the above task is made more complicated by not providing labels for the single-digit images during training. Instead, training takes place on pairs of images with labels made available for the result only, that is, the sum of the individual labels. The single-digit classifier is not explicitly trained by itself; its output is a piece of latent information that is used by the logic. However, this does not pose a problem for end-to-end neurosymbolic systems such as LTN or DeepProbLog for which the gradients can propagate through the logical structures.
在[41]中,上述任务变得更加复杂,因为在训练过程中没有为单个数字图像提供标签。相反,训练是在成对图像上进行的,仅为结果提供标签,即个别标签的总和。单个数字分类器并没有被明确单独训练;它的输出是一段被逻辑使用的潜在信息。然而,对于端到端的神经符号系统,如 LTN 或 DeepProbLog,这并不构成问题,因为梯度可以通过逻辑结构传播。
We start by illustrating a LTN theory that can be used to learn the predicate digit. The specification of the theory below is for the single digit addition example, although it can be extended easily to the multiple digits case.
我们首先阐述了一个可以用来学习谓词数字的 LTN 理论。下面的理论规范适用于单个数字加法示例,尽管它可以很容易地扩展到多个数字的情况。

Domains: 领域:

images, denoting the MNIST digit images,
图像,表示 MNIST 数字图像
results, denoting the integers that label the results of the additions,
结果,表示标记加法结果的整数
digits, denoting the digits from 0 to 9 .
数字,表示从 0 到 9 的数字。

Variables: 变量:

x,y ,ranging over the MNIST images in the data,
x,y ,在数据中遍历 MNIST 图像,
n for the labels,i.e. the result of each addition,
n 标签,即每次相加的结果。
d1,d2 ranging over digits.
d1,d2 变化范围在数字之间。
D(x)=D(y)= images,  D(x)=D(y)= 图像
D(n)= results,  D(n)= 个结果
D(d1)=D(d2)= digits.  D(d1)=D(d2)= 位数字。

Predicates: 谓词:

digit(x,d) for the single digit classifier,where d is a term denoting a digit constant or a digit variable. The classifier should return the probability of an image x being of digit d . Din ( digit )= images,digits. Axioms:
对于单个数字分类器,其中 d 表示数字常数或数字变量的术语。分类器应返回图像 x 是数字 d 的概率。 Din ( digit )= 图像,数字。公理:
Single Digit Addition: 单位数加法:
Diag(x,y,n)
(39)(d1,d2:d1+d2=n
(digit(x,d1)digit(y,d2)))
Multiple Digit Addition: 多位数加法:
Diag(x1,x2,y1,y2,n)
(40)(d1,d2,d3,d4:10d1+d2+10d3+d4=n
(digit(x1,d1)digit(x2,d2)digit(y1,d3)digit(y2,d4)))
Notice the use of Diag: when grounding x,y,n with three sequences of values,the i -th examples of each variable are matching. That is, (G(x)i,G(y)i,G(n)i) is a tuple from our dataset of valid additions. Using the diagonal quantification, LTN aggregates pairs of images and their corresponding result, rather than any combination of images and results.
注意在将 x,y,n 与三个值序列接地时,每个变量的第 i 个示例是匹配的。也就是说, (G(x)i,G(y)i,G(n)i) 是我们数据集中有效加法的元组。使用对角量化,LTN 聚合图像对及其对应结果,而不是任何图像和结果的组合。
Notice also the guarded quantification: by quantifying only on the latent "digit labels" (i.e. d1,d2,) that can add up to the result label ( n ,given in the dataset),we incorporate symbolic information into the system. For example,in (39),if n=3 ,the only valid tuples (d1,d2) are (0,3),(3,0),(1,2),(2,1) . Gradients will only backpropagate to these values.
请注意受限的量化:仅在潜在的“数字标签”上进行量化(即可以相加得到结果标签( n ,在数据集中给出)),我们将符号信息纳入系统中。例如,在(39)中,如果 n=3 ,则唯一有效的元组 (d1,d2)(0,3),(3,0),(1,2),(2,1) 。梯度将仅反向传播到这些值。

Grounding: 接地:

G (images) =[0,1]28×28×1 . The MNIST data set has images of 28 by 28 pixels. The images are grayscale and have just one channel. The RGB pixel values from 0 to 255 of the MNIST data set are converted to the range [0,1] .
G (图像) =[0,1]28×28×1 。MNIST 数据集包含 28x28 像素的图像。这些图像是灰度的,只有一个通道。MNIST 数据集中的 RGB 像素值从 0 到 255 转换为范围 [0,1]
G (results) =N .  G (结果) =N
G(digits)={0,1,,9} .
G(x)[0,1]m×28×28×1,G(y)[0,1]m×28×28×1,G(n)Nm 24
G(d1)=G(d2)=0,1,,9.
G(digitθ):x,d onehot (d)softmax(CNNθ(x)) ,where CNNis a  Convolutional Neural Network with 10 output neurons for each class. Notice that, in contrast with the previous examples, d is an integer label; onehot (d) converts it into a one-hot label.
G(digitθ):x,d onehot (d)softmax(CNNθ(x)) ,其中 CNNis a  每个类别有 10 个输出神经元的卷积神经网络。请注意,与先前的示例相比, d 是一个整数标签;onehot (d) 将其转换为一个独热标签。

24 Notice the use of the same number m of examples for each of these variables as they are supposed to match one-to-one due to the use of Diag.
请注意对这些变量使用相同数量的示例,因为它们应该是一一对应的,这是由于使用了 Diag。

Learning: 学习:

The computational graph of Figure 13 shows the objective function for the satisfiability of the knowledge base. A stable product configuration is used with hyper-parameter p=2 of the operator ApME for universal quantification () . Let p denote the exponent hyper-parameter used in the generalized mean ApM for existential quantification (3). Three scenarios are investigated and compared in the Multiple Digit experiment (Figure 15):
图 13 的计算图显示了知识库可满足性的目标函数。使用稳定的产品配置,操作符 ApME 的超参数 p=2 用于全称量化 () 。让 p 表示广义均值 ApM 中用于存在量化的指数超参数(3)。在多位数实验(图 15)中研究并比较了三种情景:
  1. p=1 throughout the entire experiment,
    整个实验过程中,
  1. p=2 throughout the entire experiment,or
    整个实验过程中,或
  1. p follows a schedule,changing from p=1 to p=6 gradually with the number of training epochs.
    p 遵循一个时间表,随着训练周期的增加,从 p=1 逐渐变化到 p=6
In the Single Digit experiment, only the last scenario above (schedule) is investigated (Figure 14).
在“单个数字”实验中,仅调查了上述最后一个场景(时间表)(图 14)。
We train to maximize satisfiability by using batches of 32 examples of image pairs, labeled by the result of their addition. As done in [41], the experimental results vary the number of examples in the training set to emphasize the generalization abilities of a neurosymbolic approach. Accuracy is measured by predicting the digit values using the predicate digit and reporting the ratio of examples for which the addition is correct. A comparison is made with the same baseline method used in [41]: given a pair of MNIST images, a non-pre-trained CNN outputs embeddings for each image (Siamese neural network). The embeddings are provided as input to dense layers that classify the addition into one of the 19 (respectively, 199) possible results of the Single Digit Addition (respectively, Multiple Digit Addition) experiments. The baseline is trained using a cross-entropy loss between the labels and the predictions. As expected, such a standard deep learning approach struggles with the task without the provision of symbolic meaning about intermediate parts of the problem.
我们通过使用 32 个图像对示例的批次进行训练,这些示例由它们的加法结果标记。如[41]中所做的那样,实验结果会改变训练集中示例的数量,以强调神经符号方法的泛化能力。准确性是通过使用数字谓词预测数字值并报告加法正确的示例比例来衡量的。与[41]中使用的相同基准方法进行比较:给定一对 MNIST 图像,一个未经预训练的 CNN 输出每个图像的嵌入(孪生神经网络)。这些嵌入被提供为输入到密集层,将加法分类为单个数字加法(或多个数字加法)实验的 19(或 199)种可能结果之一。基准方法使用标签和预测之间的交叉熵损失进行训练。如预期的那样,这种标准深度学习方法在没有提供关于问题中间部分的符号意义的情况下会遇到困难。
Experimentally, we find that the optimizer for the neurosymbolic system gets stuck in a local optimum at the initialization in about 1 out of 5 runs. We, therefore, present the results on an average of the 10 best outcomes out of 15 runs of each algorithm (that is, for the baseline as well). The examples of digit pairs selected from the full MNIST data set are randomized at each run.
实验结果表明,神经符号系统的优化器在大约五分之一的运行中在初始化阶段陷入局部最优解。因此,我们对每种算法的 15 次运行中的前 10 个最佳结果进行了平均,包括基准结果。从完整的 MNIST 数据集中选择的数字对示例在每次运行时都是随机的。
Figure 15 shows that the use of p=2 from the start produces poor results. A higher value for p in ApM weighs up the instances with a higher truth-value (see also Appendix C for a discussion). Starting already with a high value for p ,the classes with a higher initial truth-value for a given example will have higher gradients and be prioritized for training, which does not make practical sense when randomly initializing the predicates. Increasing p by following a schedule is the most promising approach. In this particular example, p=1 is also shown to be adequate purely from a learning perspective. However, p=1 implements a simple average which does not account for the meaning of well; the resulting satisfaction value is not meaningful within a reasoning perspective.
图 15 显示,从一开始使用 p=2 会产生较差的结果。在 ApM 中, p 的较高值会平衡具有较高真值的实例(另请参阅附录 C 进行讨论)。从 p 开始即具有较高值,对于给定示例具有较高初始真值的类将具有较高的梯度,并被优先用于训练,这在随机初始化谓词时并不具有实际意义。按照一定计划增加 p 是最有前途的方法。在这个特定示例中,从学习角度来看, p=1 也被证明是合适的。然而, p=1 实现了一个简单的平均值,未能很好地考虑 的含义;由此产生的满意度值在推理角度上并不具有意义。
Table 1 shows that the training and test times of LTN are of the same order of magnitude as those of the CNN baselines. Table 2 shows that LTN reaches similar accuracy as that reported by DeepProbLog.
表 1 显示,LTN 的训练和测试时间与 CNN 基线的数量级相同。表 2 显示,LTN 达到了与 DeepProbLog 报告的类似准确度。
Figure 13: Symbolic Tensor Computational Graph for the Single Digit Addition task. Notice that the figure does not depict accurate dimensions for the tensors; G(x) and G(y) are in fact 4D tensors of dimensions m×28×28×1 . Computing results with the variables d1 or d2 corresponds to the addition of a further axes of dimension 10 .
图 13:单个数字加法任务的符号张量计算图。请注意,图中未准确描绘张量的维度; G(x)G(y) 实际上是维度为 m×28×28×1 的 4D 张量。使用变量 d1d2 计算结果相当于增加一个维度为 10 的轴。
Figure 14: Single Digit Addition Task: Accuracy and satisfiability results (top) and results in the presence of fewer examples (bottom) in comparison with standard Deep Learning using a CNN (blue lines).
图 14:单个数字加法任务:准确性和可满足性结果(顶部)以及与使用 CNN 的标准深度学习相比,在示例较少的情况下的结果(底部)。
Model(Single Digits) (个位数)(Multi Digits) 多位数
Train 列车Test 测试Train 列车Test 测试
baseline 基线2.72±0.23ms1.45±0.21ms3.87±0.24ms2.10±0.30ms
LTN5.36±0.25ms3.44±0.39ms8.51±0.72ms5.72±0.57ms
Table 1: The computation time of training and test steps on the single and multiple digit addition tasks, measured on a computer with a single Nvidia Tesla V100 GPU and averaged over 1000 steps. Each step operates on a batch of 32 examples. The computational efficiency of the LTN and the CNN baseline systems are of the same order of magnitude.
表 1:在单个和多个数字加法任务的训练和测试步骤的计算时间,使用一台配备单个 Nvidia Tesla V100 GPU 的计算机测量,并在 1000 个步骤上取平均值。每个步骤在一个批次的 32 个示例上运行。LTN 和 CNN 基准系统的计算效率数量级相同。
ModelNumber of training examples
训练样本数量
(Single Digits) (个位数)(Multi Digits) 多位数
30 0003 00015 0001 500
baseline 基线95.95±0.2770.59±1.4547.19±0.692.07±0.12
LTN98.04±0.1393.49±0.2895.37±0.2988.21±0.63
DeepProbLog DeepProbLog 深度概率逻辑97.20±0.4592.18±1.5795.16±1.7087.21±1.92
Table 2: Accuracy (in %) on the test set: comparison of the final results obtained with LTN and those reported with DeepProbLog[41]. Although it is difficult to compare directly the results over time (the frameworks are implemented in different libraries), while achieving similar computational efficiency as the CNN baseline, LTN also reaches similar accuracy as that reported by DeepProbLog.
表 2:测试集上的准确率(以%表示):将 LTN 获得的最终结果与 DeepProbLog[41]报告的结果进行比较。虽然直接比较难以比较随时间推移的结果(因为这些框架是在不同的库中实现的),但在实现与 CNN 基准相似的计算效率的同时,LTN 也达到了与 DeepProbLog 报告的准确率相似的水平。

4.5. Regression 4.5. 回归

Another important problem in Machine Learning is regression where a relationship is estimated between one independent variable X and a continuous dependent variable Y . The essence of regression is,therefore,to approximate a function f(x)=y by a function f ,given examples (xi,yi) such that f(xi)=yi . In LTN one can model a regression task by defining f as a learnable function whose parameter values are constrained by data. Additionally, a regression task requires a notion of equality. We,therefore,define the predicate eq as a smooth version of the symbol = to turn the constraint f(xi)=yi into a smooth optimization problem.
机器学习中另一个重要问题是回归,其中估计一个独立变量 X 与一个连续依赖变量 Y 之间的关系。因此,回归的本质是通过给定示例 (xi,yi) 来逼近一个函数 f(x)=y ,使得函数 f 。在 LTN 中,可以通过定义 f 作为一个可学习函数来建模回归任务,其参数值受数据约束。此外,回归任务需要一个相等的概念。因此,我们将谓词 eq 定义为符号 = 的平滑版本,将约束 f(xi)=yi 转化为一个平滑优化问题。
In this example,we explore regression using a problem from a real estate data set25 with 414 examples, each described in terms of 6 real-numbered features: the transaction date (converted to a float), the age of the house, the distance to the nearest station, the number of convenience stores in the vicinity, and the latitude and longitude coordinates. The model has to predict the house price per unit area.
在这个例子中,我们使用来自房地产数据集的问题来探讨回归,该数据集包含 414 个示例,每个示例都用 6 个实数特征描述:交易日期(转换为浮点数)、房屋年龄、距离最近车站的距离、附近便利店的数量,以及纬度和经度坐标。模型需要预测单位面积房屋价格。

Domains: 领域:
samples, denoting the houses and their features.
样本,表示房屋及其特征。
prices, denoting the house prices. Variables:
价格,表示房价。变量:
x for the samples.  x 的样本。
y for the prices.  y 的价格。
D(x)= samples.  D(x)= 个样本。
D(y)= prices.  D(y)= 价格。


25 https://www.kaggle.com/quantbruce/real-estate-price-prediction

Figure 15: Multiple Digit Addition Task: Accuracy and satisfiability results (top) and results in the presence of fewer examples (bottom) in comparison with standard Deep Learning using a CNN (blue lines).
图 15:多位数加法任务:准确性和可满足性结果(顶部)以及与使用 CNN 的标准深度学习相比在较少示例存在时的结果(底部)。

Functions: 功能:

f(x) ,the regression function to be learned.
f(x) ,要学习的回归函数。
Din(f)= samples, Dout(f)= prices.
Din(f)= 个样本, Dout(f)= 价格。

Predicates: 谓词:

eq(y1,y2) ,a smooth equality predicate that measures how similar y1 and y2 are.
eq(y1,y2) ,一个平滑的相等谓词,用于衡量 y1y2 的相似程度。
Din(eq)= prices,prices.  Din(eq)= 价格,价格。
Axioms: 公理:
(41)Diag(x,y)eq(f(x),y)
Notice again the use of Diag: when grounding x and y onto sequences of values,this is done by obeying a one-to-one correspondence between the sequences. In other words, we aggregate pairs of corresponding samples and prices, instead of any combination thereof.
再次注意使用 Diag: 当将 xy 接地到值序列时,这是通过遵守序列之间的一一对应关系来完成的。换句话说,我们聚合相应样本和价格的配对,而不是它们的任何组合。

Grounding: 接地:

G (samples) =R6 .  G (样本) =R6
G (prices) =R .  G (价格) =R
G(x)Rm×6,G(y)Rm×1 . Notice that this specification refers to the same number m of examples for x and y due to the above one-to-one correspondence obtained with the use of Diag.
G(x)Rm×6,G(y)Rm×1 . 请注意,由于使用 Diag 获得的上述一一对应关系,此规范涉及相同数量的 xy 示例 m
G(eq(u,v))=exp(αj(ujvj)2) ,where the hyper-parameter α is a real number that scales how strict the smooth equality is. 26 In our experiments,we use α=0.05 .
G(eq(u,v))=exp(αj(ujvj)2) ,其中超参数 α 是一个实数,用于调整平滑等式的严格程度。 26 在我们的实验中,我们使用 α=0.05
Figure 17: Visualization of LTN solving a regression problem.
图 17:展示了逻辑张量网络解决回归问题的可视化。
G(f(x)θ)=MLPθ(x) ,where MLPθ is a multilayer perceptron which ends in one neuron
G(f(x)θ)=MLPθ(x) ,其中 MLPθ 是以一个神经元结尾的多层感知器
corresponding to a price prediction, with a linear output layer (no activation function).
对应于价格预测,使用线性输出层(无激活函数)。

Learning: 学习:

The theory is constrained by the parameters of the model of f . LTN is used to estimate such parameters by maximizing the satisfaction of the knowledge-base, in the usual way. Approximating using ApME with p=2 ,as before,we randomly split the data set into 330 examples for training and 84 examples for testing. Figure 16 shows the satisfaction level over 500 epochs. We also plot the Root Mean Squared Error (RMSE) between the predicted prices and the labels (i.e. actual prices, also known as target values). We visualize in Figure 17 the strong correlation between actual and predicted prices at the end of one of the runs.
理论受 f 模型参数的限制。LTN 用于通过最大化知识库的满意度来估计这些参数,通常的方式。使用 ApMEp=2 来近似 ,与以往一样,我们随机将数据集分为 330 个示例用于训练和 84 个示例用于测试。图 16 显示了 500 个时期内的满意度水平。我们还绘制了预测价格与标签(即实际价格,也称为目标值)之间的均方根误差(RMSE)。在图 17 中,我们可视化了在其中一个运行结束时实际价格和预测价格之间的强相关性。

4.6. Unsupervised Learning (Clustering)
4.6. 无监督学习(聚类)

In unsupervised learning, labels are either not available or are not used for learning. Clustering is a form of unsupervised learning whereby, without labels, the data is characterized by constraints
在无监督学习中,标签要么不可用,要么不用于学习。聚类是一种无监督学习的形式,通过约束来表征数据,而不使用标签。

26 Intuitively,the smooth equality is exp(αd(u,v)) ,where d(u,v) is the Euclidean distance between u and v . It produces a 1 if the distance is zero; as the distance increases, the result decreases exponentially towards 0 . In case an exponential decrease is undesirable,one can adopt the following alternative equation: eq(u,v)=11+αd(u,v) .
直观地,平滑等式为 exp(αd(u,v)) ,其中 d(u,v)uv 之间的欧几里得距离。如果距离为零,则产生 1;随着距离的增加,结果指数级地向 0 减少。如果指数级减少是不可取的,可以采用以下替代方程: eq(u,v)=11+αd(u,v)

alone. LTN can formulate such constraints, such as:
单独。LTN 可以制定这样的约束条件,例如:
  • clusters should be disjoint,
    簇应该是不相交的
  • every example should be assigned to a cluster,
    每个例子都应分配到一个簇中
  • a cluster should not be empty,
    一个簇不应为空
  • if the points are near, they should belong to the same cluster,
    如果这些点是相邻的,它们应该属于同一个簇
  • if the points are far, they should belong to different clusters, etc.
    如果点之间距离很远,则它们应该属于不同的簇,等等。
Domains: 领域:
points, denoting the data to cluster.
点,表示要进行聚类的数据。
points_pairs, denoting pairs of examples.
点对,表示示例对。
clusters, denoting the cluster.
簇,表示簇。
Variables: 变量:

x,y for all points.  x,y 对于所有点。
D(x)=D(y)= points.  D(x)=D(y)= 分。
D(c)= clusters.  D(c)= 个聚类。

Predicates: 谓词:

C(x,c) ,the truth degree of a given point belonging in a given cluster.
C(x,c) ,给定点属于给定簇的真实度。