32

In SQL, equality involving null always returns null regardless of the other operand; for example, all of the following evaluate to null in SQL:
在 SQL 中,包含 null 的等式总是返回 null ,而不管另一个操作数是什么;例如,在 SQL 中,以下所有语句都计算为 null

0 = null
null = null
0 <> null
null <> null

SQL requires the use of the is null or is not null operator to test for a null value for a nullable variable, which return true or false; for example, null is null is true while null is not null is false.
SQL 需要使用 is nullis not null 运算符来测试 null 值以检查可空变量,返回 truefalse ;例如, null is nulltrue ,而 null is not nullfalse

In contrast, all mainstream programming languages which support null, have their equality operators behave in the usual and predictable way, treating null no different than any regular values. For example, in Java, equality to null behaves just like equality to any other values:
相比之下,所有支持 null 的主流编程语言,其等号运算符的行为都以通常且可预测的方式运作,将 null 视为与任何常规值相同。例如,在 Java 中,与 null 的相等性与与任何其他值的相等性表现一样:

final String x = "";
x == null; // false
null == null; // true
x != null; // true
null != null; // false

For both null in SQL and null (or its equivalent) in mainstream programming languages, it is a placeholder for an absence of value. In fact, when we process data returned from SQL databases in a programming language with null value support, a null from SQL is returned as a null in the other programming language. Why are they treated differently in SQL and in other programming languages in equality operators?
对于 SQL 中的 null 和主流编程语言中的 null (或其等价物),它代表值缺失。事实上,当我们在编程语言中处理从 SQL 数据库返回的数据时,如果该语言支持 null 值,SQL 中的 null 会在其他编程语言中返回 null 。为什么在 SQL 和其他编程语言的等号运算符中,它们会被不同对待?

CC BY-SA 4.0
21
  • 22
    The same is true for the null arithmetic value (usually called "NaN") in many languages, so SQL is hardly unique here.
    许多语言中的空算术值(通常称为“NaN”)也是如此,因此 SQL 在此方面算不上独特。
    Commented Dec 21, 2023 at 16:17
  • 9
    SQL uses null for conflated purposes. Sometimes it means missing data, and other times it means not applicable.
    SQL 使用 null 来表示多种含义。有时它表示缺少数据,其他时候表示不适用。
    – Erik Eidt
    Commented Dec 21, 2023 at 16:42
  • 14
    @TobySpeight Not really; NaN == NaN is false, not NaN.
    @TobySpeight 并非如此; NaN == NaN 为假,而非 NaN。
    Commented Dec 21, 2023 at 18:52
  • 6
    @KarlKnechtel That's more due to technical limitations than an intentional design. IEEE is meant to apply to a wide variety of systems that may not be able to easily support a value other than true or false
    @KarlKnechtel 这更多地是由于技术限制,而不是故意设计。IEEE 旨在适用于各种可能无法轻松支持除真或假之外值的系统
    Commented Dec 21, 2023 at 18:55
  • 11
    @KarlKnechtel But the main point is that it's not true even though it seems like it should be.
    @KarlKnechtel 但关键在于,即使看起来应该是真的,但实际上并非如此。
    – Barmar
    Commented Dec 21, 2023 at 22:49

7 Answers 7

40

It's convenient for the way SQL is typically used. Consider this statements:
SQL 的使用方式很方便。考虑以下语句:

SELECT people.name, cars.model FROM people
INNER JOIN cars
    ON people.car_licenceplate = cars.licenceplate

If null = null, then this would return all pairs of people with no license plate with all unregistered cars in the database, a usually undesirable result.
如果 null = null ,则这将返回数据库中所有未注册车辆以及所有无牌照人员的配对,通常是不希望的结果。

It's particularly convenient that, even if you use any null value even in a more complex expression, you won't get a value back, even if other values may also happen to be null. In other languages you'd need to null check everything in advance to get that behavior, having it by default is very convenient for the type of things SQL is typically used for.
即使在更复杂的表达式中使用任何空值,您也不会得到返回值,即使其他值也可能恰好为空。在其他语言中,您需要预先检查所有空值才能获得这种行为,默认情况下拥有这种行为对于 SQL 通常使用的类型的工作非常方便。

null in SQL is exempt from a lot of other rules too. For example they are excluded from unique constraints. All indicating it represents more the absence of a value rather than a special value.
SQL 中的 null 也被豁免了许多其他规则。例如,它们被排除在唯一约束之外。所有这些都表明它更代表值的缺失,而不是一个特殊的值。

Some other languages do also have a ThreeValueBoolean or a similar type that behaves more like a SQL null, though only for booleans. Also most every language has similar non self-equality for NaN. It's not a concept unique to SQL.
其他一些语言也具有 ThreeValueBoolean 或类似的类型,其行为更像 SQL 的空值,尽管仅限于布尔值。此外,几乎所有语言都具有类似的 NaN 非自身相等性。这并非 SQL 独有的概念。

CC BY-SA 4.0
7
  • This example is flawed. The ownership of cars should be a 1 person to N cars relationship, therefore if a person has no cars, there should be no entries for that person when doing the join.
    此示例存在缺陷。汽车的所有权应为 1 人对 N 车的关系,因此,如果一个人没有汽车,则在执行连接时,不应为此人的记录。
    Commented Dec 22, 2023 at 0:26
  • 9
    @MichaelTsang Yeah, the example is set up for 1 car to N people. Imagine instead of using the license plate as key, they use the driver's license. After all, at least where I'm at, one can't get a license plate or renew it without a valid driver's license. So, people.drivers_license = cars.owners_drivers_license. If null = null were true, then that would pair people without a driver's license to cars that aren't registered for circulation.
    @MichaelTsang 是的,这个例子是为 1 辆车对应 N 个人设计的。想象一下,如果不用车牌号作为键,而是用驾驶证。毕竟,至少在我这里,一个人不能没有有效的驾驶证就拿到或续领车牌。所以, people.drivers_license = cars.owners_drivers_license 。如果 null = null 为真,那么这将把没有驾驶证的人与未注册上路的车辆配对。
    – JoL
    Commented Dec 22, 2023 at 0:43
  • 7
    @MichaelTsang In this hypothetical case some cars are unregistered and have no license plate
    @MichaelTsang 在这种假设情况下,一些汽车未注册且没有牌照
    Commented Dec 22, 2023 at 7:17
  • 10
    NaN != NaN is widely considered a mistake. All languages implement it not because it's desirable behavior, but because they're required to in order to follow IEEE 754
    NaN != NaN 被广泛认为是错误的。所有语言都这么实现,并非因为这是理想的行为,而是因为它们必须遵循 IEEE 754 标准。
    Commented Dec 22, 2023 at 12:46
  • 3
    @P.Hopkinson: Well spotted. This is not unique to IBM's database, it appears to be specified in the SQL Standard. SELECT NULL UNION SELECT NULL only returns only one row in all DB systems I am familiar with.
    @P.Hopkinson:观察得很仔细。这并非 IBM 数据库独有,它似乎在 SQL 标准中有所规定。 SELECT NULL UNION SELECT NULL 在我熟悉的数据库系统中只返回一行。
    – Heinzi
    Commented Dec 23, 2023 at 8:06
23

One way to look at this is to compare these two questions:
一种看待这个问题的方法是比较这两个问题:

  1. Is value A definitely the same as value B?
    值 A 是否绝对等于值 B?
  2. Is value A definitely different from value B?
    值 A 和值 B 是否绝对不同?

On the face of it, these are symmetrical: if question 1 is true, question 2 is false, and vice versa.
表面上看,这些是对称的:如果问题 1 为真,则问题 2 为假;反之亦然。

But what if both A and B are missing or invalid data points?
但是如果 A 和 B 都缺少或无效数据点呢?

  1. False. We can't know for sure that the two missing or invalid data points are the same.
    错误。我们无法确定这两个丢失或无效的数据点是相同的。
  2. False. We can't know for sure that the two missing or invalid data points are different.
    错误。我们无法确定这两个缺失或无效的数据点是不同的。

That puts us in a peculiar position: A = B and A <> B should both be false, but that means that NOT (A = B) is no longer the same as A <> B, which is surprising.
这让我们处在一种特殊境地: A = BA <> B 都应该为假,但这意味着 NOT (A = B) 不再与 A <> B 相同,这令人惊讶。

SQL handles this by returning a further NULL - if the data for A and B is missing, then the information about whether they are the same or different is also missing. This is consistent with other operations on NULL, e.g. NULL + NULL is NULL, because adding two unknown numbers gives you a third unknown number. And since that also includes boolean negation - if A is NULL, then NOT A is also NULL, the result of NOT (A = B) is always the same as A <> B, as we'd intuitively expect.
SQL 通过返回一个额外的 NULL 来处理这种情况——如果 AB 的数据缺失,那么它们是否相同或不同的信息也缺失。这与 NULL 上的其他操作一致,例如 NULL + NULLNULL ,因为两个未知数相加会得到第三个未知数。由于这也包括布尔否定——如果 ANULL ,那么 NOT A 也是 NULL ,因此 NOT (A = B) 的结果总是与 A <> B 相同,正如我们直观地预期的那样。

However, there are situations where we want to ask the strict negation of those questions:
但是,有些情况下,我们想要询问这些问题的严格否定

  1. Is value A not definitely the same as value B? (Strict inverse of question 1)
    值 A 是否绝对不等于值 B?
  2. Is value A not definitely different from value B? (Strict inverse of question 2)
    值 A 是否与值 B 绝对不同? (问题 2 的严格反义)

For these, SQL provides the DISTINCT FROM and NOT DISTINCT FROM operators.
对于这些,SQL 提供了 DISTINCT FROMNOT DISTINCT FROM 运算符。

More commonly, you want to know explicitly that a particular value is or is not null, for which there are the operators IS NULL and IS NOT NULL.
通常情况下,您想明确知道某个特定值是否为空,为此可以使用运算符 IS NULLIS NOT NULL

CC BY-SA 4.0
1
13

I find it can be quite fiendish to explain this aspect of SQL convincingly, because of the sheer depth of an explanation which fully justifies the design and rebuts superficial objections. I don't know whether I have the capability to deliver that explanation.
我发现要令人信服地解释 SQL 的这个方面相当棘手,因为一个完全论证设计并反驳肤浅反对意见的解释需要极深的深度。我不知道我是否有能力提供那样的解释。

Relational Algebra  关系代数

The first thing to mention is EF Codd's Relational Algebra. Although SQL and RA are not completely synonymous, RA was the main theoretical foundation for the design of the SQL language.
首先要提到的是埃弗雷德·科德的关系代数。虽然 SQL 和关系代数(RA)并非完全同义,但关系代数是 SQL 语言设计的主要理论基础。

Some of the main conceptual features of RA is the "relation" (what in SQL is simply called a "table"), the "relational operators" (including the "joins"), the "Null" value, and a system of so-called 3-valued logic (including a certain approach to handling the Null value).
RA 的一些主要概念特征是“关系”(在 SQL 中简单称为“表”)、“关系运算符”(包括“连接”)、“空”值以及所谓的 3 值逻辑系统(包括处理空值的一种特定方法)。

3VL

I'll use "3VL" as a shorthand for referring to the system of both the Null value itself and the handling of it by operators.
我将使用“3VL”作为简写,指代空值本身及其操作员处理该空值的系统。

It's a bit of a misnomer here in the sense that Null is a value which is available in the domain of all SQL data types, including those which have many more than 3 possible values. It is not simply limited to supplementing the Boolean/logical/bit type with a third value. But there's no better and more commonly understood term than "3VL".
这里用“Null”有点名不副实,因为它是一个存在于所有 SQL 数据类型域中的值,包括那些可能具有多于 3 个值的类型。它不仅仅局限于用第三个值来补充布尔/逻辑/位类型。但“3VL”是目前没有更好、更常用的术语。

There are also many more confusing details, though not necessary to examine for this answer.
还有很多更令人困惑的细节,不过对于这个问题的回答来说,不需要检查。

Joins  加入

The way the join operators work in RA and SQL depends inextricably on 3VL.
RA 和 SQL 中连接运算符的工作方式与 3VL 密切相关。

Firstly, each join operation can produce Null values to represent the case where certain rows in input tables were not joined.
首先,每次连接操作都可能产生 Null 值,以表示输入表中某些行未进行连接的情况。

Secondly, each join operator can have Null values in its inputs. This may be either because these Nulls exist in static data, or because a previous join operation (in a query consisting of more than one join) has produced Nulls at an earlier stage.
其次,每个连接操作符的输入都可以包含空值。这可能是因为这些空值存在于静态数据中,也可能是因为之前的连接操作(在包含多于一个连接的查询中)在早期阶段产生了空值。

The only sensible behaviour of the join operators, is that Nulls do not join to one another. Therefore, at least in the context of the join control expressions, Null compared to Null must be false.
连接运算符的唯一合理的行为是空值不互相连接。因此,至少在连接控制表达式的上下文中,空值与空值的比较必须为假。

Justification of how joins work
连接如何运作的说明

This might not seem very intuitive or obviously correct at first glance.
这乍一看可能不太直观或明显正确。

But there are two main justifications.
但是有两个主要理由。

The first is that join operators with this exact behaviour have important algebraic properties (hence the name of the theoretical framework, "Relational Algebra"), and these algebraic properties are crucial for optimisation and performance when dealing with "large shared data banks" (as EF Codd described what we now describe as "databases" serving typical "OLTP loads"). Alterations to the behaviour of the join operators potentially destroy their algebraic properties, and with it the crucial optimising capability.
第一个原因是,具有这种精确行为的连接运算符具有重要的代数性质(因此理论框架被称为“关系代数”),这些代数性质对于处理“大型共享数据银行”(正如 EF Codd 所描述的,我们现在称之为服务典型“OLTP 工作负载”的“数据库”)的优化和性能至关重要。连接运算符行为的更改可能会破坏其代数性质,并随之破坏关键的优化能力。

The meaning of Null  Null 的含义

The second justification relies on explaining what Null means, and how it is used in practice.
第二个理由依赖于解释 Null 的含义及其在实践中的用法。

To many programmers who are new to SQL but familiar with other languages, the word "Null" is what linguists call a "false friend". There is no analogy in any mainstream programming language, for how Null works in SQL.
对于许多接触 SQL 但熟悉其他语言的程序员来说,“Null”这个词就像语言学家所说的“假朋友”。在任何主流编程语言中,都没有关于 Null 在 SQL 中如何工作的类比。

Typically in other languages, Null is associated with the "null pointer", which is invariably the zero-valued pointer on any hardware architecture I'm familiar with. Not so in SQL, which has no concept of memory pointers in its syntax, and where Null definitely does not mean zero.
通常在其他语言中,Null 与“空指针”相关联,在我的所知任何硬件架构中,它总是零值指针。而在 SQL 中并非如此,SQL 的语法中没有内存指针的概念,Null 绝对不表示零。

The Null value in SQL broadly represents the same thing as a "blank space on a paper form". That is, it's meaning is very ambiguous and inconsistent, but it broadly means either "missing" or "inapplicable".
SQL 中的 Null 值大体上表示与“纸质表格上的空白”相同的事物。也就是说,它的含义非常模糊且不一致,但大体上表示“缺失”或“不适用”。

"Missing" broadly means that a certain value should be recorded or relates to a fact that is capable of being recorded, but for whatever reason isn't recorded.
“缺失”通常指应该记录的某个值或与可记录的事实相关,但由于某种原因未被记录。

"Inapplicable" broadly means that a certain field is somehow not applicable to a particular case. For example, a vet might typically record a dog's "owner name and address", but if the dog is a stray born in the wild and has no owner, then the owner name and address is inapplicable to the vet's record about that dog.
"不适用" 通常意味着某个字段对特定情况不适用。例如,兽医通常会记录狗的“主人姓名和地址”,但如果这只狗是野外出生、无主流浪犬,那么主人姓名和地址对兽医记录该狗的信息来说就不适用。

Very often, it may not be clear whether data is missing or inapplicable - for example, at the time of veterinary treatment, it may not be possible to distinguish between a dog which has an unknown owner, and an unowned dog. On a paper form, blank would be left for either case.
很多时候,很难区分数据是缺失还是不适用——例如,在兽医治疗时,可能无法区分拥有未知主人和无主狗。在纸质表格中,两种情况都留空白。

But there is one consistent thing about "paper-blank", which is that a blank on one paper form doesn't mean an association with blanks on other paper forms. If you have a series of forms with names missing from many of them, this doesn't mean all the ones with blank names belong to the same person (who has no name). They almost certainly belong to different people, all of whose names happen not to be recorded on the forms.
但是“纸上空白”有一点是一致的,即一张纸上的空白并不意味着与其他纸张上的空白相关联。如果你有一系列表单,其中许多表单缺少姓名,这并不意味着所有空白姓名的人都属于同一个人(这个人没有名字)。他们几乎肯定属于不同的人,只是他们的名字恰好没有记录在表单上。

If you understand that analogy, you understand why Null doesn't join to Null in SQL. Because blanks don't connect to blanks when dealing with paper records.
如果理解这个类比,你就理解了为什么在 SQL 中 Null 不与 Null 连接。因为处理纸质记录时,空格不会连接到空格。

Records and keys  记录和密钥

There is an underlying conceptualisation here which is about "records" and "keys" - the concept of which predates SQL, RA, and is a common practice when dealing with paper records.
这里有一个潜在的概念化,它与“记录”和“键”有关——这个概念早于 SQL、RA,并且在处理纸质记录时是一种常见做法。

What you commonly have with business records are linkages between different records based on "keys" - for example, a customer account number is a "key". When a customer places an order, you record the details of the order on the form, and you also record a "key" on the same form which is the customer account number. Detailed information about the customer account, and which defines the "key" for that particular account, will be recorded separately from information about each order.
企业记录中常见的是不同记录之间基于“键”的链接——例如,客户帐户号就是一个“键”。当客户下订单时,您会记录订单表单上的详细信息,并且还会在同一表单上记录一个“键”,即客户帐户号。关于客户帐户的详细信息,以及定义该特定帐户的“键”,将与每笔订单的信息分开记录。

The use of the key on the order forms means that all orders can be linked via the account number. This linkage allows a business to do certain things which depend on organising or analysing all the orders of a particular account together, like controlling the total amount of credit extended to a particular customer.
订单表格上的键的使用意味着所有订单可以通过账户号码链接起来。这种链接允许企业执行某些操作,这些操作依赖于将特定账户的所有订单组织或分析在一起,例如控制向特定客户提供的信用额总额。

Now, if there are order forms with no customer account number recorded, these do not link to a single customer account whose key is "blank". Rather, the blank means those forms are unassociated with any customer account - that the customer account is missing or inapplicable.
现在,如果订单表中没有记录客户账户号,这些订单表不会链接到一个客户账户,其键是“空白”。相反,空白表示这些表单与任何客户账户都没有关联——即客户账户丢失或不适用。

So that's what practice the join operators in SQL are reflecting. I hope at this stage I've explained why Nulls shouldn't join to Nulls, and by implication, why Null doesn't compare equal to Null.
所以这就是 SQL 中连接运算符的实践体现。我希望在这个阶段我已经解释了为什么空值不应该连接到空值,以及由此推论,为什么空值不等于空值。

Why the equals operator is defined as it is
为什么等号运算符被定义成这样

Because SQL was designed primarily around it's relational algebra capability and the concept of joining tables, as well as the 3VL concept, the designers have prioritised terseness when using that functionality, and comparisons involving Nulls (such as equality using =, but also including the other standard comparison operators) are never true.
由于 SQL 主要围绕其关系代数能力和表连接的概念,以及 3VL 概念而设计,因此设计者在使用该功能时优先考虑简洁性,并且涉及 Null 的比较(例如使用 = 的相等,但也包括其他标准比较运算符)永远不会为真。

Instead, when Nulls must be specifically compared, the special IS NULL operator is used.
相反,当必须专门比较空值时,会使用特殊运算符 IS NULL

It is still possible to perform all kinds of comparisons in SQL. It is simply more long-winded to write comparison expressions which treat Nulls as equal, such as having to write (x = y) OR (x IS NULL AND y IS NULL).
在 SQL 中仍然可以进行各种比较。但是,将 Nulls 视为相等来编写比较表达式会更冗长,例如需要编写 (x = y) OR (x IS NULL AND y IS NULL)

A final word of warning
最后警告

You might find explanations in this area which attempt to describe Null as "the absence of a value".
您可能在这个区域找到解释,试图将 Null 描述为“值的缺失”。

In my view, that does not describe the reality. Null is very much an explicit value, because computers only work with values (or symbols) and cannot encode or process pure non-values, but it is a value which is frequently (not always) used to represent the fact that there was an absence of recordable information available when the computer record was made.
在我看来,这并不能描述现实情况。空值是一个非常明确的值,因为计算机只处理值(或符号),无法编码或处理纯粹的非值,但它是一个值,经常(并非总是)用来表示在计算机记录创建时没有可记录的信息。

You might also see explanations of how Null comparison is handled, which involve saying that you can't tell whether two Nulls are different or not.
您可能还会看到对空值比较处理方式的解释,其中包括无法判断两个空值是否不同的说明。

In many cases, this begs the question of whether Null is in fact being used to represent missing/unknown information. If Null is used in a capacity of being a marker of inapplicability, or as a default value, then there is no natural reason why these markers should not be considered equal.
在很多情况下,这引发了一个问题,即空值实际上是否被用来表示缺失/未知信息。如果空值被用作无效标记或默认值,那么这些标记不应该被视为相等就没有自然的原因。

But it's also a red herring. The main explanation for the behaviour of Null is how it integrates with the behaviour of the join operators, and how those join operators model the linkages between records, where the presence of Null should almost always mean "do not join" (without necessarily implying missing information).
但这只是一条红鲱鱼。Null 行为的主要解释在于它如何与连接运算符的行为整合,以及这些连接运算符如何对记录之间的链接进行建模,其中 Null 的存在几乎总是意味着“不连接”(而不一定意味着缺少信息)。

CC BY-SA 4.0
7
  • In programming languages with dynamic typing, such as PHP or JavaScript, null (also undefined in JavaScript) is also used as a placeholder for a missing or inapplicable value in place of any data type as well. Their type system (Typescript for JavaScript) explicitly defines a nullable type as a union of the original type and null, a data type itself.
    在动态类型编程语言中,例如 PHP 或 JavaScript, null (在 JavaScript 中也称为 undefined )也用作缺失或不适用值的占位符,代替任何数据类型。它们的类型系统(JavaScript 的 Typescript)明确地将可空类型定义为原始类型和 null 的联合,它本身也是一种数据类型。
    Commented Dec 22, 2023 at 9:39
  • @MichaelTsang, yes many languages have something called null and it is used for some of the same purposes. But the overall mechanics are quite different. Also, I'm not quite sure whether SQL treats nullable fields as being union types. On the whole, I'd be inclined to say it treats the domains of all data types as intrinsically containing a null value - rather than thinking of it as being like a conventional data type plus a bolted-on null value. These kinds of subtleties are why a so-called "impedance mismatch" occurs between SQL and other conventional programming languages.
    @MichaelTsang,是的,许多语言都有名为 null 的东西,它用于一些相同目的。但其整体机制却大相径庭。此外,我不确定 SQL 是否将可为空字段视为联合类型。总而言之,我认为它倾向于将所有数据类型的域内在包含一个 null 值——而不是将其视为一种常规数据类型加上一个附加的 null 值。正是这些细微差别导致了 SQL 与其他常规编程语言之间的所谓“阻抗不匹配”。
    – Steve
    Commented Dec 22, 2023 at 11:34
  • @Steve Treating null as an implicitly valid value for all data types is actually quite common, particularly in languages which expose pointers, or make explicit the concept of "reference types". Often, you can actually have null pointers of different types. In that sense, SQL is treating it more like a union type, since there is only one type of null, and operations are overloaded to give specific results for that type.
    @Steve 将空值隐式地视为所有数据类型的有效值实际上非常常见,尤其是在那些暴露指针或明确表示“引用类型”概念的语言中。实际上,你经常可以拥有不同类型的空指针。从这个意义上讲,SQL 更像是处理联合类型,因为它只有一种 null 类型,并且操作被重载以针对该类型给出特定结果。
    – IMSoP
    Commented Dec 22, 2023 at 13:04
  • @IMSoP, again I'm thinking on the hoof, but I'm not sure there is "only one type of Null". Certainly, there is no independent Null data type. And the Null constant/keyword can be cast to a specific type. Perhaps the design of SQL has been muddled in this area.
    @IMSoP,我又在思考这个问题,但我并不确定“只有单一类型的空值”。当然,没有独立的空值数据类型。而且空值常量/关键字可以转换为特定类型。也许 SQL 的设计在这方面有些混乱。
    – Steve
    Commented Dec 22, 2023 at 15:14
  • 7
    @IMSoP: The problem with bringing types into this is that SQL is not a type-theoretic language. It is a set-theoretic language, which makes limited use of type theory for the sake of coherence and optimization. While nullable fields may coincidentally share some features with a union type, SQL has no real notion of union types or ADTs in general. Rather than thinking of null as a unit type, it is probably better to think of it as the "missing" output of a partial function (in the set theory sense of "function," not the type theory sense).
    @IMSoP:将类型引入该问题在于 SQL 不是一种类型论语言。它是一种集合论语言,为了保持连贯性和优化,只有限地使用了类型理论。虽然可空字段可能偶然与联合类型有一些相似之处,但 SQL 实际上没有联合类型或一般 ADTs 的概念。与其将 null 视为单元类型,不如将其视为偏函数的“缺失”输出(在集合论的“函数”意义上,而不是类型论的意义上)。
    – Kevin
    Commented Dec 22, 2023 at 16:43
4

For booleans, NULL means "the whole Boolean domain", or the set {true, false}
对于布尔值,NULL 表示“整个布尔域”,或集合 {true, false}

In SQL, the expression TRUE OR NULL is true (not a null, as some might expect), and the expression FALSE AND NULL is false.
在 SQL 中,表达式 TRUE OR NULL 为真(并非空值,如某些人可能预期的那样),而表达式 FALSE AND NULL 为假。

If we treat the expression TRUE OR X as a boolean function over the boolean X, its domain is {true, false} and its image is {true}. This function is a constant which doesn't depend on its input. When passed NULL, which we can treat as "any value from the Boolean domain", its result is still true, that's why it evaluates to true.
如果我们将表达式 TRUE OR X 视为布尔变量 X 上的布尔函数,则其定义域为 {true, false},其值域为 {true}。该函数是一个常数,不依赖于其输入。当传入 NULL 时,我们可以将其视为“布尔域中的任意值”,其结果仍然为 true,因此它计算结果为 true。

If we take something like TRUE AND X, whose image is {true, false}, and pass it "any Boolean value", the result can be "any Boolean value" as well, i.e. NULL.
如果我们取像 TRUE AND X 这样的东西,其图像为 {true, false},并传递给它“任何布尔值”,结果也可能是“任何布尔值”,即 NULL。

The NULL comparison behavior seems to be the remnant of a half-assed attempt to extend this logic to other types. For instance, 10 NOT IN (NULL, 5) will evaluate to NULL (because we can make it both true and false, depending on what concrete integer we put instead of the NULL), but 10 NOT IN (NULL, 10) will evaluate to false (because no matter what other value we put instead of NULL, it will always be false).
NULL 比较行为似乎是将此逻辑扩展到其他类型的半途而废的尝试的遗留。例如, 10 NOT IN (NULL, 5) 将评估为 NULL(因为我们可以根据我们放入 NULL 替代的具体整数使其既为真也为假),但 10 NOT IN (NULL, 10) 将评估为假(因为无论我们放入 NULL 的其他值是什么,它始终为假)。

In the same vein, NULL = NULL can be made either true or false by substituting different concrete values on either side of the comparison, so it evaluates to NULL.
同样地, NULL = NULL 可以通过在比较的两侧替换不同的具体值来使其为真或假,因此它计算为 NULL。

Of course, had this logic been seen through, we would expect something like 0 * NULL to evaluate to 0 and COUNT(NULL) evaluate to the same thing as COUNT(*), but it didn't happen (nor it realistically could, if you start thinking about finer details, like what should a - a evaluate to).
当然,如果这种逻辑被看穿,我们期望 0 * NULL 评估为 0,而 COUNT(NULL) 评估为与 COUNT(*) 相同的值,但事实并非如此(如果开始思考更细微的细节,例如 a - a 应该评估为什么值,那么现实中也不可能发生)。

The operator x IS NOT DISTINCT FROM y returns true for two nulls, and so does EXISTS (SELECT x INTERSECT SELECT y) which I use a lot in SQL Server.
运算符 x IS NOT DISTINCT FROM y 对两个空值返回真值, EXISTS (SELECT x INTERSECT SELECT y) 也是如此,我在 SQL Server 中经常使用它。

CC BY-SA 4.0
15
  • I don't think your examples justify the label "half-assed". Multiplying by zero is an interesting edge case which could be defined that way, but wouldn't be very useful in practice. The behaviour of COUNT is perfectly logical: the only thing it does with its optional argument (where COUNT(*) can be seen as a funny way of writing it with no argument) is examine whether it is null or not; if it didn't do that, there would be no point in the argument at all.
    我认为你的例子不足以证明“半吊子”这个标签。将零相乘是一个有趣的边缘情况,可以这样定义,但在实践中不会很有用。 COUNT 的行为完全合乎逻辑:它唯一使用可选参数(而 COUNT(*) 可以看作是无参数的有趣写法)的是检查它是否为空;如果它不做这件事,那么这个参数根本就没有意义。
    – IMSoP
    Commented Dec 22, 2023 at 8:12
  • @IMSoP: what you are saying is true, but I fail to see how it makes this behavior "perfectly logical". To be consistent with the logic of IN, the aggregate functions should be instantly poisoned by a single null, and if you wanted to skip the null values, you would need to filter them out in the WHERE clause. I'm not saying it would be better or easier to use, just that it's inconsistent.
    @IMSoP:你说得对,但我无法理解这如何使该行为“完全合乎逻辑”。为了与 IN 的逻辑保持一致,聚合函数应该被单个空值立即中毒,如果你想跳过空值,你需要在 WHERE 子句中过滤它们。我并不是说这样做会更好或更容易使用,只是它不一致。
    – Quassnoi
    Commented Dec 22, 2023 at 9:13
  • I didn't say it was "perfectly logical", just that it wasn't "half-assed". I think the clearest description of aggregate behaviour is that the aggregate itself is never examining NULL values; rather, the definition of which values to feed into it is "values which aren't null". If you wanted to model them as set-operating functions, you could say that SQL's COUNT(x) is defined as count_items( filter_nulls( evaluate_per_row(x) ) ). In other words, it's more a short-hand in the way the syntax of the language works, rather than in the semantics of the operations.
    我没有说它是“完全合乎逻辑的”,只是说它不是“敷衍了事的”。我认为对聚合行为最清晰的描述是,聚合本身从未检查 NULL 值;相反,定义要馈入它的值是“非空值”。如果您想将它们建模为集合运算函数,您可以说 SQL 的 COUNT(x) 定义为 count_items( filter_nulls( evaluate_per_row(x) ) ) 。换句话说,它更像是一种简写,体现在语言语法的运作方式中,而不是在操作的语义中。
    – IMSoP
    Commented Dec 22, 2023 at 9:56
  • 1
    @Quassnoi, we had an interesting discussion of this a few months ago on SE.SE. The basic principle is that scalar operators propagate nulls, and aggregate operators eliminate nulls. The reason for this distinction in approach is because each caters to the most common use, and the opposite behaviour is easily achieved with a few extra words of syntax localised to the operator in question. If either kind of operator worked like the other, then the current behaviour which each kind of operator has becomes extremely difficult to achieve. So any extra consistency would be foolish.
    @Quassnoi,几个月前我们在 SE.SE 上对此进行过一次有趣的讨论。基本原则是标量运算符传播空值,而聚合运算符消除空值。这种处理方式的区别在于,每种方式都满足最常见的用法,而相反的行为只需在相关运算符的语法中添加几句话就能轻松实现。如果两种运算符都像另一种那样工作,那么每种运算符当前的行为就变得极其难以实现。因此,任何额外的一致性都是愚蠢的。
    – Steve
    Commented Dec 22, 2023 at 11:08
  • 1
    @Steve Yes, I hadn't thought before just how difficult the opposite would be! Given my pseudo-code above, you can easily change the expression x to never evaluate to null (e.g. with coalesce), but if the expansion was count_items( evaluate_per_row(x) ), there would be no change you could make to x that would emulate filter_nulls, because the expression is evaluated per row. AVG(x) is probably a better example, since its value depends on both the number of items in the set and their values, so there's no natural value that you can add to the set which doesn't affect the result.
    @Steve 是的,我之前没有想到反过来会这么难!根据我上面的伪代码,你可以很容易地将表达式 x 更改为永远不计算为 null (例如,使用 coalesce ),但是如果展开式是 count_items( evaluate_per_row(x) ) ,则无法对 x 进行任何更改以模拟 filter_nulls ,因为表达式是逐行计算的。 AVG(x) 可能是一个更好的例子,因为它的值取决于集合中项目的数量及其值,因此没有可以添加到集合中而不会影响结果的自然值。
    – IMSoP
    Commented Dec 22, 2023 at 12:54
3

Simply put, in the world of SQL, NULL means that the value is unknown. It can mean that the value simply doesn't exist, like it does in other languages. But, since it can mean that the value is unknown, then comparing two items, where one or both are of an unknown value (NULL) results in an unknown value--NULL in SQL.
简单来说,在 SQL 世界中, NULL 表示值未知。它可能意味着该值根本不存在,就像其他语言中一样。但是,由于它可能意味着值未知,因此比较两个项目,其中一个或两个项目的数值未知( NULL )会导致结果值未知——在 SQL 中为 NULL

CC BY-SA 4.0
3

Most programming languages implement Boolean, or 2-valued logic, with the familiar TRUE and FALSE. SQL implements a 3-valued logic with TRUE, FALSE, and UNKNOWN. In most SQL DBs, NULLs are treated in comparisons as UNKNOWNs (I believe, T-SQL might be different).
大多数编程语言都实现布尔逻辑或二值逻辑,使用熟悉的 TRUE 和 FALSE。SQL 实现三值逻辑,使用 TRUE、FALSE 和 UNKNOWN。在大多数 SQL 数据库中,NULL 在比较中被视为 UNKNOWN(我相信 T-SQL 可能不同)。

When we use the logical operators AND, OR, and NOT, we sometimes get the same output if we replaced UNKNOWN with TRUE and separately replaced UNKNOWN with FALSE.
当我们使用逻辑运算符 AND、OR 和 NOT 时,有时如果我们将 UNKNOWN 替换为 TRUE,以及分别将 UNKNOWN 替换为 FALSE,我们会得到相同的输出。

  • For example, for TRUE OR UNKNOWN, replacing UNKNOWN with TRUE and replacing UNKNOWN with FALSE both still result in the expression evaluating TRUE.
    例如,对于 TRUE OR UNKNOWN ,用 TRUE 替换 UNKNOWN 和用 FALSE 替换 UNKNOWN 都仍然导致表达式计算为 TRUE。
  • On the other hand, TRUE AND UNKNOWN would be TRUE if the UNKNOWN were TRUE, and FALSE if the UNKNOWN were FALSE.
    另一方面, TRUE AND UNKNOWN 如果 UNKNOWN 为 TRUE,则为 TRUE;如果 UNKNOWN 为 FALSE,则为 FALSE。

Just like for 2VL logical operations, we can create truth tables for 3VL that match this "unknowns logic", a subset of Kleene's K3 logic. Since these operations are commutative, I'll present the main cases:
就像对于 2VL 逻辑运算一样,我们可以为 3VL 创建真值表,使其与这种“未知逻辑”(Kleene 的 K3 逻辑的一个子集)相匹配。由于这些运算具有交换律,我将介绍主要情况:

A B A AND B  A 和 B A OR B  A 或 B A = B
TRUE UNKNOWN UNKNOWN TRUE UNKNOWN
UNKNOWN UNKNOWN UNKNOWN UNKNOWN UNKNOWN
FALSE UNKNOWN FALSE UNKNOWN UNKNOWN

The symmetry is reminiscent of the symmetry in Boolean logic.
对称性让人联想到布尔逻辑的对称性。

= in SQL treats all NULLs as distinct, so A = NULL is always UNKNOWN. Use A IS NULL to check if A is populated with NULL. NOT UNKNOWN is UNKNOWN.
在 SQL 中, = 将所有 NULL 值视为不同的值,因此 A = NULL 始终为 UNKNOWN。使用 A IS NULL 来检查 A 是否填充了 NULL。 NOT UNKNOWNUNKNOWN

A NOT A  不是 A A IS NULL  A 为 NULL
TRUE FALSE FALSE
UNKNOWN UNKNOWN TRUE
FALSE TRUE FALSE

This niceness goes out the window when it comes to other syntax. WHERE will only return rows when the search condition is TRUE, not for FALSE nor UNKNOWN. For a left (outer) join, for any rows from the left table that don't have a match in the right table, the right table rows get a NULL value. From the T-SQL docs on joins
这种友好性在遇到其他语法时就失效了。 WHERE 只有在搜索条件为 TRUE 时才返回行,而不会返回 FALSE 或 UNKNOWN。对于左(外)连接,对于左表中任何没有在右表中找到匹配的行,右表行将获得 NULL 值。来自 T-SQL 文档关于连接的部分

The results do not make it easy to distinguish a NULL in the data from a NULL that represents a failure to join. When NULL values are present in data being joined, it is usually preferable to omit them from the results by using a regular join.
结果很难区分数据中的 NULL 值和表示连接失败的 NULL 值。当连接的数据中存在 NULL 值时,通常最好通过使用常规连接来避免将它们包含在结果中。

See also How NULL values are treated in UNION and UNION ALL. I will defer to other answers on this, as I always have to look up how different SQL keywords handle NULL. It is not intuitive.
另请参阅 UNION 和 UNION ALL 中如何处理 NULL 值。我会参考其他答案,因为我总是需要查找不同的 SQL 关键字如何处理 NULL。这并不直观。


If you think about NULL carefully, you'll realize it serves at least two purposes in SQL: 1. unknown/missing data, and 2. not applicable data. (3. NULLs introduced by joins.) This opens the door for having separate UNKNOWN and NA values, and 4-valued or further many-valued logic, but that gets even more complicated. Lots of literature has been written about SQL's treatment of NULL and why it is not intuitive. See Steve's insightful comment on why this is worse.
如果你仔细思考 NULL,你会意识到它在 SQL 中至少有两个用途:1. 未知/缺失数据,以及 2. 不适用数据。(3. 由连接引入的 NULL)。这为拥有单独的 UNKNOWN 和 NA 值,以及 4 值或更多值逻辑打开了大门,但这会变得更加复杂。已经写了很多关于 SQL 如何处理 NULL 以及为什么它不直观的文章。请参阅 Steve 对此为什么更糟的深刻评论。

CC BY-SA 4.0
2
  • 2
    I've found out why so much Microsoft documentation refers to UNKNOWN despite no relevant use of that keyword in TSQL. It's because the ISO/IEC 9075 standard itself uses the Unknown term. But the standard also states that Unknown is synonymous with Null typed as the Boolean/Bit data type.
    我发现为什么如此多的 Microsoft 文档提到 UNKNOWN ,尽管 TSQL 中没有相关使用该关键字。这是因为 ISO/IEC 9075 标准本身使用了 Unknown 术语。但该标准还指出,Unknown 与布尔/位数据类型中的 Null 类型同义。
    – Steve
    Commented Dec 28, 2023 at 10:46
  • @Steve yeah it's hard for me to find out exactly when SQL treats NULL as UNKNOWN. NULL could be in any data type and SQL implementations don't have to implement their Boolean type.
    @Steve 是啊,我很难确切地知道 SQL 何时将 NULL 视为 UNKNOWN。NULL 可以出现在任何数据类型中,而 SQL 实现不必实现其布尔类型。
    – qwr
    Commented Dec 29, 2023 at 0:26
0

In an outer join (A LEFT JOIN B), a null value in the table B could mean the value is nonexistent. That means the whole row shouldn't exist in an inner join. If someone adds more restrictions about the values in table B after the join (in the WHERE or HAVING part instead of the ON part) without null checks, it effectively falls back to an inner join, that is, making the extra rows in an outer join always discarded, by implementing comparison this way.
在外部连接 ( A LEFT JOIN B ) 中,表 B 中的空值可能意味着该值不存在。这意味着整行不应存在于内部连接中。如果有人在连接之后(在 WHEREHAVING 部分而不是 ON 部分)添加了关于表 B 中值的更多限制,而没有进行空值检查,则它实际上会退回到内部连接,这意味着外部连接中的额外行始终会被丢弃,通过这种方式实现比较。

I can't think of a good example for now. But think about saving the result of an outer join in another table. It makes sense to keep the semantics of the data unchanged and accessible. You shouldn't get another row if you just compared two different indicators about the row should be nonexistent, if a single indicator worked as expected.
我现在想不到一个好的例子。但是想想将外连接的结果保存到另一个表中。保持数据的语义不变且可访问是有意义的。如果只有一个指标按预期工作,那么仅仅比较两行不同的指标而导致另一行不存在,就不应该得到另一行。

In SQL, nulls work like unknown / partially, somewhere between true and false, in logical operations. That is, NULL AND TRUE is null, but NULL AND FALSE is false. The null result from equality comparison matches the result of xnor ((NULL AND NULL) OR (NOT NULL AND NOT NULL)). So it's not really something unexpected.
在 SQL 中,空值在逻辑运算中表现得像未知/部分值,介于真和假之间。也就是说, NULL AND TRUE 为空值,但 NULL AND FALSE 为假。等值比较产生的空值结果与 xnor ( (NULL AND NULL) OR (NOT NULL AND NOT NULL) ) 的结果相符。所以这并非什么意外。

CC BY-SA 4.0
3
  • From the T-SQL docs on joins: The results do not make it easy to distinguish a NULL in the data from a NULL that represents a failure to join. When NULL values are present in data being joined, it is usually preferable to omit them from the results by using a regular join.
    从 T-SQL 文档关于连接的内容:结果并不容易区分数据中的 NULL 和表示连接失败的 NULL。当连接的数据中存在 NULL 值时,通常最好通过使用常规连接来省略结果中的这些 NULL 值。
    – qwr
    Commented Dec 26, 2023 at 19:33
  • @qwr If you use nulls in the data with other semantics in mind, most likely the semantics in the language making queries, and don't consider their semantics in SQL, it's always better to just match the logic of the other language, making any argument moot. But as SQL has ways to generate nulls, it's more productive to think in the SQL way and use nulls to store data with similar semantics, and consider values not fit different kinds of special values. It's not so productive to avoid using SQL features because you are using them as placeholders for values not even fit in the SQL logic.
    @qwr 如果您在数据中使用空值,并考虑了其他语义(很可能来自用于查询的语言的语义),而没有考虑 SQL 中的语义,最好还是匹配其他语言的逻辑,从而使任何争论都变得毫无意义。但是,由于 SQL 有生成空值的方法,因此从 SQL 的角度思考并使用空值存储具有相似语义的数据,并考虑不适合不同类型特殊值的数值,会更有成效。避免使用 SQL 功能,因为您将它们用作不适合 SQL 逻辑的占位符,并没有那么有效。
    – user23013
    Commented Jan 1 at 20:39
  • @qwr And if you really do follow that recommendation, that you don't use outer joins at all when nulls are present in the data, it means as long as you use outer joins, nulls are only used for what I wrote and you don't need to consider anything else, as they are not present elsewhere. So I don't quite understand how it changes anything in my answer.
    @qwr 如果你真的遵循该建议,即在数据中存在空值时根本不使用外连接,这意味着只要你使用外连接,空值就仅用于我所写的内容,你无需考虑其他任何情况,因为它们不在其他地方出现。所以我不太明白它如何改变我的答案。
    – user23013
    Commented Jan 1 at 20:40

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .