I find it can be quite fiendish to explain this aspect of SQL convincingly, because of the sheer depth of an explanation which fully justifies the design and rebuts superficial objections. I don't know whether I have the capability to deliver that explanation.
我发现要令人信服地解释 SQL 的这个方面相当棘手,因为一个完全论证设计并反驳肤浅反对意见的解释需要极深的深度。我不知道我是否有能力提供那样的解释。
Relational Algebra 关系代数
The first thing to mention is EF Codd's Relational Algebra. Although SQL and RA are not completely synonymous, RA was the main theoretical foundation for the design of the SQL language.
首先要提到的是埃弗雷德·科德的关系代数。虽然 SQL 和关系代数(RA)并非完全同义,但关系代数是 SQL 语言设计的主要理论基础。
Some of the main conceptual features of RA is the "relation" (what in SQL is simply called a "table"), the "relational operators" (including the "joins"), the "Null" value, and a system of so-called 3-valued logic (including a certain approach to handling the Null value).
RA 的一些主要概念特征是“关系”(在 SQL 中简单称为“表”)、“关系运算符”(包括“连接”)、“空”值以及所谓的 3 值逻辑系统(包括处理空值的一种特定方法)。
3VL
I'll use "3VL" as a shorthand for referring to the system of both the Null value itself and the handling of it by operators.
我将使用“3VL”作为简写,指代空值本身及其操作员处理该空值的系统。
It's a bit of a misnomer here in the sense that Null is a value which is available in the domain of all SQL data types, including those which have many more than 3 possible values. It is not simply limited to supplementing the Boolean/logical/bit type with a third value. But there's no better and more commonly understood term than "3VL".
这里用“Null”有点名不副实,因为它是一个存在于所有 SQL 数据类型域中的值,包括那些可能具有多于 3 个值的类型。它不仅仅局限于用第三个值来补充布尔/逻辑/位类型。但“3VL”是目前没有更好、更常用的术语。
There are also many more confusing details, though not necessary to examine for this answer.
还有很多更令人困惑的细节,不过对于这个问题的回答来说,不需要检查。
Joins 加入
The way the join operators work in RA and SQL depends inextricably on 3VL.
RA 和 SQL 中连接运算符的工作方式与 3VL 密切相关。
Firstly, each join operation can produce Null values to represent the case where certain rows in input tables were not joined.
首先,每次连接操作都可能产生 Null 值,以表示输入表中某些行未进行连接的情况。
Secondly, each join operator can have Null values in its inputs. This may be either because these Nulls exist in static data, or because a previous join operation (in a query consisting of more than one join) has produced Nulls at an earlier stage.
其次,每个连接操作符的输入都可以包含空值。这可能是因为这些空值存在于静态数据中,也可能是因为之前的连接操作(在包含多于一个连接的查询中)在早期阶段产生了空值。
The only sensible behaviour of the join operators, is that Nulls do not join to one another. Therefore, at least in the context of the join control expressions, Null compared to Null must be false.
连接运算符的唯一合理的行为是空值不互相连接。因此,至少在连接控制表达式的上下文中,空值与空值的比较必须为假。
Justification of how joins work
连接如何运作的说明
This might not seem very intuitive or obviously correct at first glance.
这乍一看可能不太直观或明显正确。
But there are two main justifications.
但是有两个主要理由。
The first is that join operators with this exact behaviour have important algebraic properties (hence the name of the theoretical framework, "Relational Algebra"), and these algebraic properties are crucial for optimisation and performance when dealing with "large shared data banks" (as EF Codd described what we now describe as "databases" serving typical "OLTP loads"). Alterations to the behaviour of the join operators potentially destroy their algebraic properties, and with it the crucial optimising capability.
第一个原因是,具有这种精确行为的连接运算符具有重要的代数性质(因此理论框架被称为“关系代数”),这些代数性质对于处理“大型共享数据银行”(正如 EF Codd 所描述的,我们现在称之为服务典型“OLTP 工作负载”的“数据库”)的优化和性能至关重要。连接运算符行为的更改可能会破坏其代数性质,并随之破坏关键的优化能力。
The meaning of Null Null 的含义
The second justification relies on explaining what Null means, and how it is used in practice.
第二个理由依赖于解释 Null 的含义及其在实践中的用法。
To many programmers who are new to SQL but familiar with other languages, the word "Null" is what linguists call a "false friend". There is no analogy in any mainstream programming language, for how Null works in SQL.
对于许多接触 SQL 但熟悉其他语言的程序员来说,“Null”这个词就像语言学家所说的“假朋友”。在任何主流编程语言中,都没有关于 Null 在 SQL 中如何工作的类比。
Typically in other languages, Null is associated with the "null pointer", which is invariably the zero-valued pointer on any hardware architecture I'm familiar with. Not so in SQL, which has no concept of memory pointers in its syntax, and where Null definitely does not mean zero.
通常在其他语言中,Null 与“空指针”相关联,在我的所知任何硬件架构中,它总是零值指针。而在 SQL 中并非如此,SQL 的语法中没有内存指针的概念,Null 绝对不表示零。
The Null value in SQL broadly represents the same thing as a "blank space on a paper form". That is, it's meaning is very ambiguous and inconsistent, but it broadly means either "missing" or "inapplicable".
SQL 中的 Null 值大体上表示与“纸质表格上的空白”相同的事物。也就是说,它的含义非常模糊且不一致,但大体上表示“缺失”或“不适用”。
"Missing" broadly means that a certain value should be recorded or relates to a fact that is capable of being recorded, but for whatever reason isn't recorded.
“缺失”通常指应该记录的某个值或与可记录的事实相关,但由于某种原因未被记录。
"Inapplicable" broadly means that a certain field is somehow not applicable to a particular case. For example, a vet might typically record a dog's "owner name and address", but if the dog is a stray born in the wild and has no owner, then the owner name and address is inapplicable to the vet's record about that dog.
"不适用" 通常意味着某个字段对特定情况不适用。例如,兽医通常会记录狗的“主人姓名和地址”,但如果这只狗是野外出生、无主流浪犬,那么主人姓名和地址对兽医记录该狗的信息来说就不适用。
Very often, it may not be clear whether data is missing or inapplicable - for example, at the time of veterinary treatment, it may not be possible to distinguish between a dog which has an unknown owner, and an unowned dog. On a paper form, blank would be left for either case.
很多时候,很难区分数据是缺失还是不适用——例如,在兽医治疗时,可能无法区分拥有未知主人和无主狗。在纸质表格中,两种情况都留空白。
But there is one consistent thing about "paper-blank", which is that a blank on one paper form doesn't mean an association with blanks on other paper forms. If you have a series of forms with names missing from many of them, this doesn't mean all the ones with blank names belong to the same person (who has no name). They almost certainly belong to different people, all of whose names happen not to be recorded on the forms.
但是“纸上空白”有一点是一致的,即一张纸上的空白并不意味着与其他纸张上的空白相关联。如果你有一系列表单,其中许多表单缺少姓名,这并不意味着所有空白姓名的人都属于同一个人(这个人没有名字)。他们几乎肯定属于不同的人,只是他们的名字恰好没有记录在表单上。
If you understand that analogy, you understand why Null doesn't join to Null in SQL. Because blanks don't connect to blanks when dealing with paper records.
如果理解这个类比,你就理解了为什么在 SQL 中 Null 不与 Null 连接。因为处理纸质记录时,空格不会连接到空格。
Records and keys 记录和密钥
There is an underlying conceptualisation here which is about "records" and "keys" - the concept of which predates SQL, RA, and is a common practice when dealing with paper records.
这里有一个潜在的概念化,它与“记录”和“键”有关——这个概念早于 SQL、RA,并且在处理纸质记录时是一种常见做法。
What you commonly have with business records are linkages between different records based on "keys" - for example, a customer account number is a "key". When a customer places an order, you record the details of the order on the form, and you also record a "key" on the same form which is the customer account number. Detailed information about the customer account, and which defines the "key" for that particular account, will be recorded separately from information about each order.
企业记录中常见的是不同记录之间基于“键”的链接——例如,客户帐户号就是一个“键”。当客户下订单时,您会记录订单表单上的详细信息,并且还会在同一表单上记录一个“键”,即客户帐户号。关于客户帐户的详细信息,以及定义该特定帐户的“键”,将与每笔订单的信息分开记录。
The use of the key on the order forms means that all orders can be linked via the account number. This linkage allows a business to do certain things which depend on organising or analysing all the orders of a particular account together, like controlling the total amount of credit extended to a particular customer.
订单表格上的键的使用意味着所有订单可以通过账户号码链接起来。这种链接允许企业执行某些操作,这些操作依赖于将特定账户的所有订单组织或分析在一起,例如控制向特定客户提供的信用额总额。
Now, if there are order forms with no customer account number recorded, these do not link to a single customer account whose key is "blank". Rather, the blank means those forms are unassociated with any customer account - that the customer account is missing or inapplicable.
现在,如果订单表中没有记录客户账户号,这些订单表不会链接到一个客户账户,其键是“空白”。相反,空白表示这些表单与任何客户账户都没有关联——即客户账户丢失或不适用。
So that's what practice the join operators in SQL are reflecting. I hope at this stage I've explained why Nulls shouldn't join to Nulls, and by implication, why Null doesn't compare equal to Null.
所以这就是 SQL 中连接运算符的实践体现。我希望在这个阶段我已经解释了为什么空值不应该连接到空值,以及由此推论,为什么空值不等于空值。
Why the equals operator is defined as it is
为什么等号运算符被定义成这样
Because SQL was designed primarily around it's relational algebra capability and the concept of joining tables, as well as the 3VL concept, the designers have prioritised terseness when using that functionality, and comparisons involving Nulls (such as equality using =
, but also including the other standard comparison operators) are never true.
由于 SQL 主要围绕其关系代数能力和表连接的概念,以及 3VL 概念而设计,因此设计者在使用该功能时优先考虑简洁性,并且涉及 Null 的比较(例如使用 =
的相等,但也包括其他标准比较运算符)永远不会为真。
Instead, when Nulls must be specifically compared, the special IS NULL
operator is used.
相反,当必须专门比较空值时,会使用特殊运算符 IS NULL
。
It is still possible to perform all kinds of comparisons in SQL. It is simply more long-winded to write comparison expressions which treat Nulls as equal, such as having to write (x = y) OR (x IS NULL AND y IS NULL)
.
在 SQL 中仍然可以进行各种比较。但是,将 Nulls 视为相等来编写比较表达式会更冗长,例如需要编写 (x = y) OR (x IS NULL AND y IS NULL)
。
A final word of warning
最后警告
You might find explanations in this area which attempt to describe Null as "the absence of a value".
您可能在这个区域找到解释,试图将 Null 描述为“值的缺失”。
In my view, that does not describe the reality. Null is very much an explicit value, because computers only work with values (or symbols) and cannot encode or process pure non-values, but it is a value which is frequently (not always) used to represent the fact that there was an absence of recordable information available when the computer record was made.
在我看来,这并不能描述现实情况。空值是一个非常明确的值,因为计算机只处理值(或符号),无法编码或处理纯粹的非值,但它是一个值,经常(并非总是)用来表示在计算机记录创建时没有可记录的信息。
You might also see explanations of how Null comparison is handled, which involve saying that you can't tell whether two Nulls are different or not.
您可能还会看到对空值比较处理方式的解释,其中包括无法判断两个空值是否不同的说明。
In many cases, this begs the question of whether Null is in fact being used to represent missing/unknown information. If Null is used in a capacity of being a marker of inapplicability, or as a default value, then there is no natural reason why these markers should not be considered equal.
在很多情况下,这引发了一个问题,即空值实际上是否被用来表示缺失/未知信息。如果空值被用作无效标记或默认值,那么这些标记不应该被视为相等就没有自然的原因。
But it's also a red herring. The main explanation for the behaviour of Null is how it integrates with the behaviour of the join operators, and how those join operators model the linkages between records, where the presence of Null should almost always mean "do not join" (without necessarily implying missing information).
但这只是一条红鲱鱼。Null 行为的主要解释在于它如何与连接运算符的行为整合,以及这些连接运算符如何对记录之间的链接进行建模,其中 Null 的存在几乎总是意味着“不连接”(而不一定意味着缺少信息)。
许多语言中的空算术值(通常称为“NaN”)也是如此,因此 SQL 在此方面算不上独特。
SQL 使用 null 来表示多种含义。有时它表示缺少数据,其他时候表示不适用。
NaN == NaN
is false, not NaN.@TobySpeight 并非如此;
NaN == NaN
为假,而非 NaN。@KarlKnechtel 这更多地是由于技术限制,而不是故意设计。IEEE 旨在适用于各种可能无法轻松支持除真或假之外值的系统
@KarlKnechtel 但关键在于,即使看起来应该是真的,但实际上并非如此。