Chapter 2. Entity–Relationship Concepts
第 2 章实体关系概念
Until now we have dealt with databases made up of a number of distinct tables, without concerning ourselves very much with
how the tables and their constituent columns were originally generated. Logical database design, also known simply as database design or database modeling, studies basic properties and interrelationships among data items, with the aim of providing faithful representations of
such items in the basic data structures of a database. Databases with different data models have different structures for
representing data; in relational databases the fundamental structures for representing data are what we have been calling
relational tables. We concentrate on relational databases in this chapter because design for the object-relational model is still in its infancy.
到目前为止,我们已经处理了由许多不同表组成的数据库,而不太关心这些表及其组成列最初是如何生成的。逻辑数据库设计,也简称为数据库设计或数据库建模,研究数据项之间的基本属性和相互关系,目的是在数据库的基本数据结构中提供这些项的忠实表示。不同数据模型的数据库有不同的数据表示结构;在关系数据库中,表示数据的基本结构就是我们所说的关系表。本章我们主要关注关系数据库,因为对象关系模型的设计仍处于起步阶段。
It is the responsibility of the database administrator (DBA) to perform this logical database design, assigning the related
data items of the database to columns of tables in a manner that preserves desirable properties. The most important test of
logical design is that the tables and attributes faithfully reflect interrelationships among objects in the real world and
that this remains true after all likely database updates in the future.
数据库管理员(DBA)有责任执行这种逻辑数据库设计,以保留所需属性的方式将数据库的相关数据项分配给表的列。逻辑设计最重要的测试是表和属性是否忠实地反映了现实世界中对象之间的相互关系,并且在未来所有可能的数据库更新之后仍然如此。
The DBA starts by studying some real-world enterprise, such as a wholesale order business, a company personnel office, or
a college registration department, whose operation needs to be supported on a computerized database system. Often working
with someone who has great expertise about the details of the enterprise, the DBA comes up with a list of data items and underlying
data objects that must be kept track of (in college student registration, this list might include student_names, courses, course_sections, class_rooms, class_periods, etc.), together with a number of rules, or constraints, concerning the interrelatedness of these data items. Typical rules for student registration are the following:
DBA首先研究一些现实世界的企业,例如批发订单业务、公司人事办公室或大学注册部门,其运作需要计算机化数据库系统的支持。 DBA 经常与对企业细节具有丰富专业知识的人合作,提出必须跟踪的数据项和底层数据对象的列表(在大学生注册中,此列表可能包括 student_names , courses , course_sections , class_rooms , class_periods 等),以及有关这些数据项的相互关联性的许多规则或约束。学生注册的典型规则如下:
- Every registered student has a unique student ID number (which we name sid).
每个注册的学生都有一个唯一的学生 ID 号(我们将其命名为 sid )。 - A student can be registered for at most one course section for a given class period.
学生在给定的课程期间最多可以注册一个课程部分。 - A classroom can house at most one course section for a given class period.
在给定的上课时间内,一间教室最多可以容纳一个课程部分。 - And so on. 等等。
From these data items and constraints, the DBA is expected to perform the logical design of the database. Two common techniques
covered in this chapter are used to perform the task of database design. The first is known as the entity–relationship approach (or ER approach), and the second is the normalization approach. The ER approach attempts to provide a taxonomy of data items to allow a DBA to intuitively recognize different
types of data classification objects (entities, weak entities, attributes, relationships, etc.) to classify the listed data
items and their relationships. After creating an ER diagram that illustrates these objects, a relatively straightforward procedure
allows the DBA to translate the design into relational tables and integrity constraints in the database system. The normalization
approach seems entirely different, and perhaps less dependent on intuition: all the data items are listed, and then all interrelatedness
rules (of a recognized kind, known as dependencies) are identified. Design starts with the assumption that all data items are placed in a single huge table and then proceeds
to break down the table into smaller tables. In the resulting set of tables, joins are needed to retrieve the original relationships.
Both the ER modeling approach and the normalization approach are best applied by a DBA with a developed intuition about data
relationships in the real world and about the way those relationships are ultimately modeled as relational tables. The two
approaches tend to lead to identical relational table designs and in fact reinforce one another in providing the needed intuition.
We will not attempt to discriminate between the two in terms of which is more applicable.
DBA 需要根据这些数据项和约束来执行数据库的逻辑设计。本章介绍的两种常见技术用于执行数据库设计任务。第一个称为实体关系方法(或ER方法),第二个是标准化方法。 ER 方法试图提供数据项的分类法,使 DBA 能够直观地识别不同类型的数据分类对象(实体、弱实体、属性、关系等),从而对列出的数据项及其关系进行分类。创建说明这些对象的 ER 图后,DBA 可以通过一个相对简单的过程将设计转换为数据库系统中的关系表和完整性约束。规范化方法似乎完全不同,并且可能不太依赖直觉:列出所有数据项,然后识别所有相互关联规则(属于公认的类型,称为依赖关系)。设计首先假设所有数据项都放置在一个巨大的表中,然后继续将该表分解为较小的表。在生成的表集中,需要连接来检索原始关系。 ER 建模方法和规范化方法最好由 DBA 来应用,该 DBA 对现实世界中的数据关系以及这些关系最终建模为关系表的方式有深入的直觉。这两种方法往往会导致相同的关系表设计,并且实际上在提供所需的直觉方面相互加强。我们不会试图区分两者谁更适用。
One of the major features of logical database design is the emphasis it places on rules of interrelationships between data
items. The naive user often sees a relational table as made up of a set of descriptive columns, one column much like another.
But this is far from accurate, because there are rules that limit possible relationships between values in the columns. For
example, a customers table, conceived as a relation, is a subset of the Cartesian product of four domains, CP = CID × CNAME × CITY × DISCNT. However, in any legal customers table, two rows with the same customer ID (cid) value cannot exist because cid is a unique identifier for a customers row. Here is a perfect example of the kind of rule we wish to take into account in our logical database design. A faithful
table representation enforces such a requirement by specifying that the cid column is a candidate key or the primary key for the customers table. A candidate key is a designated set of columns in a table such that two table rows can never be alike in all these
column values, and where no smaller subset of the key columns has this property. A primary key is a candidate key that has
been chosen by the DBA for external reference from other tables to unique rows in the table.
逻辑数据库设计的主要特征之一是强调数据项之间相互关系的规则。天真的用户经常将关系表视为由一组描述性列组成,其中一列与另一列非常相似。但这远不准确,因为有一些规则限制了列中值之间可能的关系。例如,一个 customers 表被视为一种关系,是四个域的笛卡尔积的子集, CP = CID × CNAME × CITY × DISCNT 。然而,在任何法律 customers 表中,两行具有相同的客户 ID ( cid ) 值不能存在,因为 cid 是一个唯一的标识符 customers 排。这是我们希望在逻辑数据库设计中考虑的规则类型的完美示例。忠实的表表示通过指定以下内容来强制执行这样的要求: cid 列是候选键或主键 customers 桌子。候选键是表中指定的一组列,使得两个表行在所有这些列值中永远不会相似,并且键列的较小子集不具有此属性。主键是 DBA 选择的候选键,用于从其他表外部引用表中的唯一行。
A faithful representation in a computerized database table of a candidate key or a primary key is provided when the table
is created with the SQL Create Table statement (see the syntax given in the declaration in Figure 2.1).
当使用 SQL Create Table 语句创建表时,会在计算机化数据库表中提供候选键或主键的忠实表示(请参见图 2.1中声明中给出的语法)。
FIGURE 2.1. SQL declaration of customers table with primary key cid and candidate key ssn.
图 2.1。 SQL 声明 customers 带主键的表 cid 和候选键 ssn 。
The fact that the ssn column is declared as not null unique in a Create Table statement simply means that in any permitted customers content, two rows cannot have the same ssn value, and thus it is a candidate key. When cid is declared as a primary key in the Create Table statement, this is a more far-reaching statement, making cid the identifier of customers rows that might be used by other tables. Following either of the table definitions of 2.1, a later SQL Insert or Update statement
that would duplicate a cid value or ssn value on two rows of the customers table is illegal and has no effect. Thus, a faithful representation of the table key is maintained by the database system.
事实是 ssn 列在 Create Table 语句中被声明为非空唯一,仅意味着在任何允许的情况下 customers 内容,两行不能相同 ssn 值,因此它是候选键。什么时候 cid 在Create Table语句中被声明为主键,这是一个影响更深远的语句,使得 cid 的标识符 customers 可能被其他表使用的行。遵循 2.1 的任一表定义,后面的 SQL Insert 或 Update 语句将复制 cid 值或 ssn 两行的值 customers 表是非法的并且没有任何作用。因此,数据库系统维护了表键的忠实表示。
Also a number of other clauses of the Create Table statement serve a comparable purpose of limiting possible table content,
and we refer to these as integrity constraints for a table. The interrelationships between columns in relational tables must be understood at a reasonably deep level in
order to properly appreciate some constraints. Although not all concepts of logical design can be faithfully represented in
the SQL of today, SQL is moving in the direction of modeling more and more such concepts. In any event, many of the ideas
of logical design can be useful as an aid to systematic database definition even in the absence of direct system support.
此外,Create Table 语句的许多其他子句也具有限制可能的表内容的类似目的,我们将这些称为表的完整性约束。必须相当深入地理解关系表中列之间的相互关系,以便正确理解某些约束。尽管并不是所有的逻辑设计概念都能在今天的 SQL 中忠实地表示,但 SQL 正在朝着对越来越多的此类概念进行建模的方向发展。无论如何,即使在没有直接系统支持的情况下,逻辑设计的许多思想也可以作为系统数据库定义的辅助手段。
In the following sections, we first introduce a number of definitions of the ER model. The process of normalization is introduced
after some ER intuition has been developed.
在下面的章节中,我们首先介绍 ER 模型的一些定义。在发展了一些 ER 直觉之后,引入了标准化过程。
2.1. Introduction to ER Concepts
2.1. ER 概念简介
The ER approach attempts to define a number of data classification objects; the database designer is then expected to classify
data items by intuitive recognition as belonging in some known classification. Three fundamental data classification objects
introduced in this section are entities, attributes, and relationships.
ER方法试图定义许多数据分类对象;然后,数据库设计者需要通过直观识别将数据项分类为属于某些已知的分类。本节介绍的三个基本数据分类对象是实体、属性和关系。
2.1.1. Entities, Attributes, and Simple ER Diagrams
2.1.1.实体、属性和简单 ER 图
We begin with a definition of the concept of entity.
我们首先定义实体概念。
Definition:Entity. 定义:实体。
An entity is a collection of distinguishable real-world objects with common properties.
实体是具有共同属性的可区分的现实世界对象的集合。
For example, in a college registration database we might have the following entities: Students, Instructors, Class_rooms, Courses, Course_sections, Class_periods, and so on. (Note that entity names are capitalized.) Clearly the set of classrooms in a college fits our definition of an
entity: individual classrooms in the entity Class_rooms are distinguishable (by location—i.e., room number) and have other common properties such as seating capacity (not common
values, but a common property). Class_periods is a somewhat surprising entity—is “MWF from 2:00 to 3:00 pm” a real-world object? However, the test here is that the registration process deals with these class periods as if they were
objects, assigning class periods in student schedules in the same sense that rooms are assigned.
例如,在大学注册数据库中,我们可能有以下实体: Students , Instructors , Class_rooms , Courses , Course_sections , Class_periods , 等等。 (请注意,实体名称大写。)显然,大学中的教室集合符合我们对实体的定义:实体中的各个教室 Class_rooms 是可区分的(通过位置,即房间号),并且具有其他共同属性,例如座位容量(不是共同值,而是共同属性)。 Class_periods 是一个有点令人惊讶的实体——“MWF from 2:00 to 3:00 pm ”是一个真实世界的对象吗?然而,这里的测试是,注册过程将这些课程时间视为对象,在学生时间表中分配课程时间,就像分配房间一样。
To give examples of entities that we have worked with a good deal in the CAP database, we have Customers, Agents, and Products. (Orders is also an entity, but there is some possibility for confusion in this, and we discuss it a bit later.) There is a foreshadowing
here of entities being mapped to relational tables. An entity such as Customers is usually mapped to an actual table, and each row of the table corresponds to one of the distinguishable real-world objects
that make up the entity, called an entity instance, or sometimes an entity occurrence.
为了给出我们在 CAP 数据库中大量合作的实体的例子,我们有 Customers , Agents , 和 Products 。 ( Orders 也是一个实体,但是这可能会造成混淆,我们稍后再讨论。)这里有一个实体被映射到关系表的伏笔。一个实体,例如 Customers 通常映射到一个实际的表,表的每一行对应于构成实体的可区分的现实世界对象之一,称为实体实例,有时也称为实体出现。
Note that we do not yet have a name for the properties by which we tell one entity occurrence from another, the analog to
column values to distinguish rows in a relational table. For now we simply refer to entity instances as being distinguishable,
in the same sense that we would think of the classrooms in a college as being distinguishable, without needing to understand
the room-labeling scheme used. In what follows we always write an entity name with an initial capital letter, but the name
becomes all lowercase when the entity is mapped to a relational table in SQL.
请注意,我们还没有一个属性名称,可以用来区分一个实体的出现与另一个实体的出现,类似于用于区分关系表中的行的列值。现在,我们简单地将实体实例称为可区分的,就像我们认为大学的教室是可区分的一样,无需了解所使用的房间标记方案。在下文中,我们总是以大写字母开头的实体名称,但是当实体映射到 SQL 中的关系表时,名称将变为全小写。
We have chosen an unusual notation by assigning plural entity names: Students, Instructors, Class_rooms, and so forth. More standard would be entities named Student, Instructor, and Class_room. Our plural usage is chosen to emphasize the fact that each represents a set of real-world objects, usually containing multiple elements, and carries over to our plural table names (also somewhat unusual),
which normally contain multiple rows. Entities are represented by rectangles in ER diagrams, as you can see by looking at
Figure 2.2.
我们通过指定复数实体名称来选择一种不寻常的表示法: Students , Instructors , Class_rooms ,等等。更标准的是实体命名 Student , Instructor , 和 Class_room 。我们选择复数用法是为了强调这样一个事实:每个对象代表一组现实世界的对象,通常包含多个元素,并延续到我们的复数表名称(也有点不寻常),它通常包含多行。实体在 ER 图中用矩形表示,如图2.2所示。
FIGURE 2.2. Example of ER diagrams with entities and attributes.
图 2.2。带有实体和属性的 ER 图示例。
Note that some other authors use the terminology entity set or entity type in referring to what we call an entity. Then to these authors, an entity is what we would refer to as an entity instance. We have also noticed occasional ambiguity within a specific author's writing, sometimes referring to an entity set and sometimes
to an entity; we assume that the object that is represented by a rectangle in an ER diagram is an entity, a collection of
real-world objects, and authors who identify such rectangles in the same way agree with our definition. It is unfortunate
that such ambiguity exists, but our notation will be consistent in what follows.
请注意,其他一些作者使用术语实体集或实体类型来指代我们所说的实体。对于这些作者来说,实体就是我们所说的实体实例。我们还注意到特定作者的写作中偶尔会出现歧义,有时指实体集,有时指实体;我们假设 ER 图中的矩形表示的对象是一个实体,是现实世界对象的集合,并且以相同方式识别此类矩形的作者同意我们的定义。不幸的是存在这种歧义,但我们的符号在下文中将保持一致。
In mathematical discussion, for purposes of definition, we usually represent an entity by a single capital letter, possibly
subscripted where several exist (e.g., E, E1, E2, etc.). An entity E is made up of a set of real-world objects, which we represent by subscripted lowercase letters: E = {e1, e2, … , en}. As mentioned above, each distinct representative ei of an entity E is called an entity instance or an entity occurrence.
在数学讨论中,出于定义的目的,我们通常用单个大写字母表示实体,可能在存在多个大写字母的情况下加上下标(例如,E、E 1 、E 2等)。实体 E 由一组现实世界的对象组成,我们用带下标的小写字母表示:E = {e 1 , e 2 , … , en }。如上所述,实体E的每个不同代表e i被称为实体实例或实体出现。
Definition:Attribute. 定义:属性。
An attribute is a data item that describes a property of an entity or a relationship (defined below).
属性是描述实体或关系(定义如下)的属性的数据项。
Recall from the definition of entity that all entity occurrences belonging to a given entity have common properties. In the ER model, these properties are known
as attributes. As we will see, there is no confusion in terminology between an attribute in the ER model and an attribute or column name
in the relational model, because when the ER design is translated into relational terms, the two correspond. A particular
instance of an entity is said to have attribute values for all attributes describing the entity (a null value is possible).
The reader should keep in mind that while we list distinct entity occurrences {e1, e2, … , en} of the entity E, we can't actually tell the occurrences apart without reference to attribute values.
回想一下实体的定义,属于给定实体的所有实体出现都具有公共属性。在 ER 模型中,这些属性称为属性。正如我们将看到的,ER 模型中的属性与关系模型中的属性或列名之间的术语不会混淆,因为当 ER 设计转换为关系术语时,两者是对应的。实体的特定实例被认为具有描述该实体的所有属性的属性值(可能为空值)。读者应该记住,虽然我们列出了实体 E 的不同实体出现次数 {e 1 , e 2 , … , en } ,但如果不参考属性值,我们实际上无法区分这些出现次数。
Each entity has an identifier, an attribute, or set of attributes that takes on unique values for each entity instance; this is the analog of the relational
concept of candidate key. For example, we define an identifier for the Customers entity to be the customer identifier, cid. There might be more than one identifier for a given entity, and when the DBA identifies a single key attribute to be the
universal method of identification for entity occurrences throughout the database, this is called a primary identifier for the entity. Other attributes, such as city for Customers, are not identifiers but descriptive attributes, known as descriptors. Most attributes take on simple values from a domain, as we have seen in the relational model, but a composite attribute is a group of simple attributes that together describe a property. For example, the attribute student_names for the Students entity might be composed of the simple attributes lname, fname, and midinitial. Note that an identifier for an entity is allowed to contain an attribute of composite type. Finally, we define a multivalued attribute to be one that can take on multiple values for a single entity instance. For example, the Employees entity might have an attached multivalued attribute named hobbies, which takes on multiple values provided by the employee asked to list any hobbies or interests. One employee might have
several hobbies, so this is a multivalued attribute.
每个实体都有一个标识符、一个属性或一组属性,它们对每个实体实例具有唯一的值;这类似于候选键的关系概念。例如,我们定义一个标识符 Customers 实体作为客户标识符, cid 。给定实体可能有多个标识符,并且当 DBA 将单个关键属性标识为识别整个数据库中实体出现的通用方法时,这称为该实体的主要标识符。其他属性,例如 city 为了 Customers 、 不是标识符,而是描述性属性,称为描述符。正如我们在关系模型中所看到的,大多数属性都采用域中的简单值,但复合属性是一组共同描述属性的简单属性。例如,属性 student_names 为 Students 实体可能由简单的属性组成 lname , fname , 和 midinitial 。请注意,实体的标识符允许包含复合类型的属性。最后,我们将多值属性定义为可以为单个实体实例采用多个值的属性。例如, Employees 实体可能有一个附加的多值属性,名为 hobbies ,它采用要求列出任何爱好或兴趣的员工提供的多个值。一名员工可能有多种爱好,因此这是一个多值属性。
As mentioned earlier, ER diagrams represent entities as rectangles. Figure 2.2 shows two simple ER diagrams. Simple, single-valued attributes are represented by ovals, attached by a straight line to the
entity. A composite attribute is also in an oval attached directly to the entity, while the simple attributes that make up
the composite are attached to the composite oval. A multivalued attribute is attached by a double line, rather than a single
line, to the entity it describes. The primary identifier attribute is underlined.
如前所述,ER 图将实体表示为矩形。图 2.2显示了两个简单的 ER 图。简单的单值属性由椭圆形表示,并通过直线连接到实体。复合属性也在直接附加到实体的椭圆形中,而构成复合的简单属性则附加到复合椭圆形。多值属性通过双线而不是单线附加到它所描述的实体。主标识符属性带有下划线。
2.1.2. Transforming Entities and Attributes to Relations
2.1.2.将实体和属性转换为关系
Our ultimate aim is to transform the ER design into a set of definitions for relational tables in a computerized database,
which we do through a set of transformation rules.
我们的最终目标是将 ER 设计转换为计算机化数据库中关系表的一组定义,这是通过一组转换规则来实现的。
- Transformation Rule 1. Each entity in an ER diagram is mapped to a single table in a relational database; the table is named after the entity. The
table's columns represent all the single-valued simple attributes attached to the entity (possibly through a composite attribute,
although a composite attribute itself does not become a column of the table). An identifier for an entity is mapped to a candidate
key for the table, as illustrated in Example 2.1, and a primary identifier is mapped to a primary key. Note that the primary identifier of an entity might be a composite
attribute, which therefore translates to a set of attributes in the relational table mapping. Entity occurrences are mapped
to the table's rows.▪
转换规则1. ER图中的每个实体都映射到关系数据库中的单个表;该表以实体命名。表的列表示附加到实体的所有单值简单属性(可能通过复合属性,尽管复合属性本身不会成为表的列)。实体的标识符映射到表的候选键,如示例 2.1所示,主标识符映射到主键。请注意,实体的主要标识符可能是复合属性,因此它会转换为关系表映射中的一组属性。实体出现映射到表的行。▪
EXAMPLE 2.1 例2.1
Here are the two tables, with one example row filled in, mapped from the Students and Employees entities in the ER diagrams of Figure 2.2. The primary key is underlined.
以下是两张表,其中填充了一个示例行,映射自 Students 和 Employees 图 2.2 ER 图中的实体。主键带有下划线。Table students 桌子 students
sid 席德 lname fname Midinitial 1134 Smith 史密斯 John 约翰 L. … … … … Table employees 桌子 employees
eid 开斋节 staddress city state zipcode 197 7 Beacon St 7 灯塔街 Boston 波士顿 MA 02122 … … … … …
- Transformation Rule 2. Given an entity E with primary identifier p, a multivalued attributed attached to E in an ER diagram is mapped to a table of its own; the table is named after the plural
multivalued attribute. The columns of this new table are named after p and a (either p or a might consist of several attributes), and rows of the table correspond to (p, a) value pairs, representing all pairings of attribute values of a associated with entity occurrences in E. The primary key attribute for this table is the set of columns in p and a.▪
转换规则 2。给定一个具有主标识符p的实体 E,ER 图中附加到 E 的多值属性将映射到其自己的表;该表以复数多值属性命名。这个新表的列以p和a命名( p或a可能包含多个属性),表的行对应于 ( p, a ) 值对,表示与实体关联的a的属性值的所有配对该表的主键属性是p和a中的列集。▪
EXAMPLE 2.2 例2.2
Here is an example database of two tables reflecting the ER diagram for the Employees entity and the attached multivalued attribute, hobbies, of Figure 2.2.
这是一个包含两个表的示例数据库,反映了 ER 图 Employees 实体和附加的多值属性, hobbies ,如图 2.2所示。Table employees 桌子 employees
eid 开斋节 staddress city state zipcode 197 7 Beacon St 7 灯塔街 Boston 波士顿 MA 02102 221 19 Brighton St 布莱顿街 19 号 Boston 波士顿 MA 02103 303 153 Mass Ave 大众大道153号 Cambridge 剑桥 MA 02123 … … … … … Table hobbies 桌子 hobbies
eid 开斋节 hobby 爱好 197 chess 棋 197 painting 绘画 197 science fiction 科幻小说 221 reading 阅读 303 bicycling 骑自行车 303 mysteries 谜团 … …
Definition:Relationship. 定义:关系。
Given an ordered list of m entities, E1, E2, … , Em (where the same entity may occur more than once in the list), a relationship R defines a rule of correspondence between the instances of these entities. Specifically, R represents a set of m-tuples, a subset of the Cartesian product of entity instances E1 × E2 × … × Em.
给定 m 个实体的有序列表 E 1 , E 2 , … , E m (其中同一实体可能在列表中出现多次),关系 R 定义这些实体的实例之间的对应规则。具体来说,R 表示 m 元组的集合,即实体实例 E 1 × E 2 × … × E m的笛卡尔积的子集。
2.1.3. Relationships among Entities
2.1.3.实体之间的关系
A particular occurrence of a relationship, corresponding to a tuple of entity occurrences (e1, e2, … , en), where ei is an instance of Ei in the ordered list of the definition, is called a relationship occurrence or relationship instance. The number of entities m in the defining list is called the degree of the relationship. A relationship between two entities is known as a binary relationship. For example, we define teaches to be a binary relationship between Instructors and Course_sections. We indicate that a relationship instance exists by saying that a particular instructor teaches a specific course section.
Another example of a relationship is works_on, defined to relate the two entities Employees and Projects in a large company: Employees works_on Projects.
关系的特定出现,对应于实体出现的元组 (e 1 , e 2 , … , e n ),其中 e i是定义的有序列表中 E i的实例,称为关系出现或关系实例。定义列表中实体的数量m称为关系度。两个实体之间的关系称为二元关系。例如,我们定义 teaches 之间存在二元关系 Instructors 和 Course_sections 。我们通过说特定教师教授特定课程部分来表明存在关系实例。关系的另一个例子是 works_on ,定义为关联两个实体 Employees 和 Projects 在一家大公司: Employees works_on Projects 。
A relationship can also have attached attributes. The relationship works_on might have the attribute percent, indicating the percent of work time during each week that the employee is assigned to work on each specific project (see
Figure 2.3). Note that this percent attribute attached to the works_on relationship would be multivalued if attached to either entity Employees or Projects; the percent attribute is only meaningful in describing a specific employee–project pair, and it is therefore a natural attribute of the
binary relationship works_on.
关系还可以具有附加属性。关系 works_on 可能有属性 percent ,表示每周分配给员工处理每个特定项目的工作时间百分比(见图2.3 )。请注意,这 percent 附加到的属性 works_on 如果附加到任一实体,关系将是多值的 Employees 或者 Projects ;这 percent 属性仅在描述特定的员工-项目对时才有意义,因此它是二元关系的自然属性 works_on 。
FIGURE 2.3. Examples of ER diagrams with relationships.
图 2.3。具有关系的 ER 图示例。
A binary relationship that relates an entity to itself (a subset of E1 × E1) is called a ring, or sometimes a recursive relationship. For example, the Employees entity is related to itself through the relationship manages, where we say that one employee manages another. Relationships are represented by diamonds in an ER diagram, with connecting
lines to the entities they relate. In the case of a ring, the connecting lines are often labeled with the names of the roles
played by the entity instances involved. In Figure 2.3 the two named roles are manager_of and reports_to.
将实体与其自身(E 1 × E 1的子集)相关的二元关系称为环,有时也称为递归关系。例如, Employees 实体通过关系与其自身相关 manages ,我们说一名员工管理另一名员工。关系在 ER 图中用菱形表示,并用连接线连接到它们相关的实体。在环的情况下,连接线通常标有所涉及的实体实例所扮演的角色的名称。在图 2.3中,两个命名角色是 manager_of 和 reports_to 。
Note that we often leave out attributes in an ER diagram to concentrate on relationships between entities without losing our
concentration in excessive detail.
请注意,我们经常在 ER 图中省略属性,以专注于实体之间的关系,同时又不会失去对过多细节的关注。
EXAMPLE 2.3 例2.3
The orders Table in CAP Does Not Represent a Relationship
这 orders CAP 中的表不代表关系Per the relationship definition, the orders table in the CAP database is not a relationship between Customers, Agents, and Products. This is because (cid, aid, pid) triples in the rows of the orders table do not identify a subset of the Cartesian product, Customers × Agents × Products, as required. Instead, some triples of (cid, aid, pid) values occur more than once, and no doubt clearly the designer's intention, since the same customer can order the same product from the same agent on two different occasions. Instead of a relationship, the orders table represents an entity in its own right, with identifier attribute ordno. This makes a good deal of sense, since we might commonly have reason to look up a row in the orders table for reasons unconnected to relating entity occurrences in Customers, Agents, and Products. For example, on request, we might need to check that a past order has been properly billed and shipped. Thus, the entity Orders occurrences are dealt with individually as objects in their own right.
根据关系定义, orders CAP数据库中的表之间没有关系 Customers , Agents , 和 Products 。这是因为 (cid, aid, pid) 的行中的三元组 orders 表不识别笛卡尔积的子集, Customers × Agents × Products ,根据需要。相反,一些三倍 (cid, aid, pid) 值出现不止一次,这无疑清楚地表明了设计师的意图,因为同一个客户可以在两个不同的场合从同一个代理商处订购相同的产品。而不是一种关系, orders 表本身代表一个实体,具有标识符属性 ordno 。这很有意义,因为我们通常可能有理由在 orders 表中的原因与相关实体的出现无关 Customers , Agents , 和 Products 。例如,根据要求,我们可能需要检查过去的订单是否已正确计费和发货。因此,实体 Orders 事件本身作为对象单独处理。
Although the orders table doesn't correspond directly to a relationship, it is clear that there are any number of possible relationships we could
define in terms of the orders table between the Customers, Agents, and Products entities.
虽然 orders 表并不直接对应于关系,很明显,我们可以根据以下关系定义任意数量的可能关系: orders 表之间的 Customers , Agents , 和 Products 实体。
EXAMPLE 2.4 例2.4
Assume that we are performing a study in which we commonly need to know total sales aggregated (summed) from the orders table by customers, agents, and products for the current year. We might do this, for example, to study sales volume relationships between agents and customers, as well as between customers and products, and how those relationships are affected by geographic factors (city values). However, as we begin to plan this application, we decide that it is too inefficient to always perform sums on the orders table to access the basic measures of our study, so we decide to create a new table called yearlies. We define this new table with the following SQL commands:
假设我们正在进行一项研究,其中我们通常需要了解从 orders 表格依据 customers , agents , 和 products 今年。例如,我们可能会这样做来研究之间的销量关系 agents 和 customers ,以及之间 customers 和 products ,以及这些关系如何受到地理因素的影响( city 值)。然而,当我们开始规划这个应用程序时,我们认为总是对 orders 表来访问我们研究的基本指标,因此我们决定创建一个名为 yearlies 。我们使用以下 SQL 命令定义这个新表:
- create table yearlies (cid char(4). aid char(3). pid char(3).
- totqty integer, totdoll float);
- insert into yearlies
- select cid, aid, pid, sum(qty), sum(dollars) from orders
- group by cid, aid, pid;
Once we have the new yearlies table, the totals can be kept up to date by application logic: As each new order is entered, the relevant yearlies row should be updated as well. Now the yearlies table is a relationship, since the (cid, aid, pid) triples in the rows of the table identify a subset of the Cartesian product, Customers × Agents × Products; that is to say, there are now no repeated triples in the yearlies table. Since these triples are unique, (cid, aid, pid) forms the primary key for the yearlies table.
一旦我们有了新的 yearlies 表中,总计可以通过应用程序逻辑保持最新:输入每个新订单时,相关的 yearlies 行也应该更新。现在的 yearlies 表是一种关系,因为 (cid, aid, pid) 表的行中的三元组标识笛卡尔积的子集, Customers × Agents × Products ;也就是说,现在已经没有重复的三元组了 yearlies 桌子。由于这些三元组是唯一的, (cid, aid, pid) 形成主键 yearlies 桌子。
A relationship on more than two entities is called an n-ary relationship. The yearlies relationship on three distinct entities is also known as a ternary relationship. An n-ary relationship with n > 2 can often be replaced by a number of distinct binary relationships in an ER diagram, and this is a good idea if the replacement expresses true binary relationships
for the system. Binary relationships are the ones that are familiar to most practitioners and are sufficient for almost all
applications. However, in some cases, a ternary relationship cannot be decomposed into expressive binary relationships. The
yearlies relationship of Example 2.4 expresses customer-agent-product ordering patterns over a year, a ternary relationship that cannot be decomposed (exactly)
into binary relationships. In converting an ER design to a relational one, a relationship is sometimes translated into a relational
table, and sometimes not. (We will have more to say about this in the next section.) For example, the yearlies relationship (a ternary relationship) is translated into a relational table named yearlies. However, the manages relationship between Employees and Employees, shown in Figure 2.3, does not translate into a table of its own. Instead, this relationship is usually translated into a column in employees
identifying the mgrid to whom the employee reports. This table is shown again in Figure 2.4.
两个以上实体上的关系称为n 元关系。这 yearlies 三个不同实体上的关系也称为三元关系。具有n > 2 的n元关系通常可以被 ER 图中的许多不同的二元关系替换,如果替换表达了系统的真实二元关系,那么这是一个好主意。二元关系是大多数从业者所熟悉的关系,并且足以满足几乎所有应用程序。然而,在某些情况下,三元关系不能分解为可表达的二元关系。这 yearlies 示例 2.4的关系表达了一年内的客户-代理-产品订购模式,这是一种不能(精确地)分解为二元关系的三元关系。在将 ER 设计转换为关系设计时,关系有时会转换为关系表,有时则不会。 (我们将在下一节中对此进行更多讨论。)例如, yearlies 关系(三元关系)被转换为名为的关系表 yearlies 。然而,管理之间的关系 Employees 和 Employees ,如图 2.3所示,不会转换为它自己的表。相反,这种关系通常被转化为员工中的一列,用于标识 mgrid 员工向谁报告。该表再次如图 2.4所示。
FIGURE 2.4. A table representing an entity, Employees, and a ring (recursive relationship), manages.
图 2.4。代表实体的表, Employees ,和一个环(递归关系), manages 。
Note the surprising fact that mgrid is not considered an attribute of the Employees entity, although it exists as a column in the employees table. The mgrid column is what is known as a foreign key in the relational model, and it corresponds to the actual manages relationship in the ER diagram of Figure 2.3. We deal more with this in the next section, after we have had an opportunity to consider some of the properties of relationships.
To summarize this section, Figure 2.5(a) and (b) lists the concepts introduced up to now.
请注意一个令人惊讶的事实: mgrid 不被视为属性 Employees 实体,尽管它作为列存在 employees 桌子。这 mgrid 列就是关系模型中的外键,它对应于实际的 manages 关系如图2.3的ER图。在我们有机会考虑关系的一些属性之后,我们将在下一节中详细讨论这个问题。为了总结本节,图 2.5(a) 和 (b)列出了到目前为止介绍的概念。
FIGURE 2.5. Basic ER concepts: (a) entities and attributes, and (b) relationships.
图 2.5。基本 ER 概念:(a) 实体和属性,以及 (b) 关系。
2.2. Further Details of ER Modeling
2.2. ER 建模的更多细节
Now that we’ve defined some fundamental means of classification, let's discuss properties of relationships in the ER method
of database design.
现在我们已经定义了一些基本的分类方法,接下来我们来讨论数据库设计的 ER 方法中关系的属性。
2.2.1. Cardinality of Entity Participation in a Relationship
2.2.1.实体参与关系的基数
Figure 2.6 illustrates the concepts of minimum and maximum cardinality with which an entity participates in a relationship. Figure 2.6(a), (b), and (c) represent entities E and F on the left and right, respectively, by two sets; elements of the two sets are connected by a
line exactly when a relationship R relates the two entity occurrences represented. Thus, the connecting lines themselves represent
instances of the relation R. Note that the diagrams of Figure 2.6 are not what we refer to as ER diagrams.
图 2.6说明了实体参与关系的最小和最大基数的概念。图2.6(a)、(b)、(c)分别代表左右两个集合的实体E和F;当关系 R 关联所表示的两个实体出现时,这两个集合的元素就通过一条线连接起来。因此,连接线本身代表关系 R 的实例。请注意,图 2.6中的图并不是我们所说的 ER 图。
FIGURE 2.6. Examples of relationships R between two entities E and F.
图 2.6。两个实体 E 和 F 之间关系 R 的示例。
The minimum cardinality with which an entity takes part in a relationship is the minimum number of lines that the DBA allows
to be connected to each entity instance. Note that the diagrams of Figure 2.6 would normally only give examples of relationships at a given moment, and the line connections might change, just as the row content of a table can change, until some entity instances have different numbers of lines connected. On the other
hand, the minimum and maximum cardinality properties of an entity are meant to represent rules laid down by the DBA for all
time, rules that cannot be broken by normal database changes affecting the relationship. In Figure 2.6(a), the DBA clearly permits both entity sets E and F to take part in relationship R with minimum cardinality 0; that is to say,
the DBA does not require a connecting line for each entity instance, since some elements of both sets have no lines connected to them. We symbolize
this by writing min-card(E, R) = 0 and min-card(F, R) = 0. The maximum cardinality with which E and F take part in R is not
obvious from Figure 2.6(a), however. No entity instance has more than one line connected to it, but from an example as of a given moment we have no
guarantee that the line connections won't change in the future so that some entity instances will have more than one line
connected. However, we will assume for purposes of simple explanation that the diagrams of this figure are meant to represent
exactly the cardinalities intended by the DBA. Thus, since no entity instance of E and F in Figure 2.6(a) has more than one incident connecting line, we record this fact using the notation max-card(E, R) = 1 and max-card(F, R)
= 1.
实体参与关系的最小基数是 DBA 允许连接到每个实体实例的最小行数。请注意,图 2.6的图通常仅给出给定时刻的关系示例,并且线路连接可能会发生变化,就像表的行内容可能会发生变化一样,直到某些实体实例具有不同数量的连接线路。另一方面,实体的最小和最大基数属性旨在表示 DBA 始终制定的规则,这些规则不会被影响关系的正常数据库更改所破坏。在图2.6(a)中,DBA明确允许实体集E和F都参与最小基数为0的关系R;也就是说,DBA不需要每个实体实例都有一条连接线,因为两个集合的某些元素都没有连接到它们的线。我们通过写 min-card(E, R) = 0 和 min-card(F, R) = 0 来表示这一点。然而,从图 2.6(a)中,E 和 F 参与 R 的最大基数并不明显。 。没有实体实例有超过一根线连接到它,但从给定时刻的示例来看,我们不能保证线连接将来不会改变,因此某些实体实例将有不止一根线连接。然而,出于简单解释的目的,我们假设该图的图表旨在准确地表示 DBA 想要的基数。因此,由于图 2.6(a)中 E 和 F 的实体实例都没有多于一条事件连接线,因此我们使用 max-card(E, R) = 1 和 max-card(F, R) 记号来记录这一事实= 1。
In Figure 2.6(b), assuming once again that this set of lines is representative of the designer's intention, we can write min-card(E, R) =
0, since not every element of E is connected to a line, but min-card(F, R) = 1, since at least one line is connected to every
element of F, and our assumption implies that this won't change. We also write max-card(E, R) = N, where N means “more than
one”; this means that the designer does not intend to limit to one the number of lines connected to each entity instance of
E. However, we write max-card(F, R) = 1, since every element of F has exactly one line leaving it. Note that the two meaningful
values for min-card are 0 and 1 (where 0 is not really a limitation at all, but 1 stands for the constraint “at least one”), and the two meaningful values for max-card are 1 and N (N is not really a limitation, but
1 represents the constraint “no more than one”). We don't try to differentiate numbers other than 0, 1, and many. Since max-card(E,
R) = N, there are multiple entity instances of F connected to one of E by the relationship. For this reason, F is called the
“many” side and E is called the “one” side in this many-to-one relationship.
在图 2.6(b)中,再次假设这组线代表了设计者的意图,我们可以写 min-card(E, R) = 0,因为并非 E 的每个元素都连接到一条线上,但 min-card(E, R) = 0 -card(F, R) = 1,因为至少有一条线连接到 F 的每个元素,并且我们的假设意味着这不会改变。我们还写 max-card(E, R) = N,其中 N 表示“不止一个”;这意味着设计者并不打算将连接到 E 的每个实体实例的线数限制为 1。但是,我们编写 max-card(F, R) = 1,因为 F 的每个元素都恰好有一条线离开它。请注意,min-card 的两个有意义的值是 0 和 1(其中 0 根本不是真正的限制,而是 1 代表“至少一个”约束),max-card 的两个有意义的值是 1 和N(N 并不是真正的限制,但 1 代表“不超过一个”的约束)。我们不会尝试区分 0、1 和许多以外的数字。由于 max-card(E, R) = N,因此 F 的多个实体实例通过关系连接到 E 之一。因此,在这种多对一关系中,F 称为“多”方,E 称为“一”方。
Note particularly that the “many” side in a many-to-one relationship is the side that has max-card value 1! In Figure 2.6(b), the entity F corresponds to the “many” side of the many-to-one relationship, even though it has min-card(F, R) = max-card(F,
R) = 1. As just explained, the “one” side of a many-to-one relationship is the side where some entity instances can participate
in multiple relationship instances, “shooting out multiple lines” to connect to many entity instances on the “many” side! Phrased this way the terminology makes sense, but this seems to be an easy idea to forget,
and forgetting it can lead to serious confusion.
特别注意,多对一关系中的“多”方是max-card 值为 1 的一方!在图 2.6(b)中,实体 F 对应于多对一关系的“多”方,尽管它的 min-card(F, R) = max-card(F, R) = 1。正如刚才所解释的,多对一关系的“一”侧是一些实体实例可以参与多个关系实例的一侧,“射出多条线”以连接到“多”侧的许多实体实例!以这种方式表述,术语是有道理的,但这似乎是一个很容易忘记的想法,并且忘记它可能会导致严重的混乱。
In Figure 2.6(c) we have min-card(E, R) = 0, min-card(F, R) = 0, max-card(E, R) = N, and max-card(F, R) = N. The meaning of the terms used
for the three diagrams—one-to-one relationship, many-to-one relationship, and many-to-many relationship—are defined later.
在图 2.6(c)中,我们有 min-card(E, R) = 0、min-card(F, R) = 0、max-card(E, R) = N 和 max-card(F, R) = N。这三个图所用术语的含义(一对一关系、多对一关系和多对多关系)稍后定义。
EXAMPLE 2.5 例2.5
In the relationship teaches of Figure 2.3, Instructors teaches Course_sections, the DBA would probably want to make a rule that each course section needs to have at least one instructor assigned to teach it by writing min-card(Course_sections, teaches) = 1. However, we need to be careful in making such a rule, since it means that we will not be able to create a new course section, enter it in the database, assign it a room and a class period, and allow students to register for it, while putting off the decision of who is going to teach it. The DBA might also make the rule that at most one instructor can be assigned to teach a course section by writing max-card(Course_sections, teaches) = 1. On the other hand, if more than one instructor were allowed to share the teaching of a course section, the DBA would write max-card(Course_sections, teaches) = N. This is clearly a significant difference. We probably don't want to make the rule that every instructor teaches some course section (written as min-card(Instructors, teaches) = 1), because an instructor might be on leave, so we settle on min-card(Instructors, teaches) = 0. And in most universities the course load per instructor is greater than one in any given term, so we would set max-card(Instructors, teaches) = N.
在关系中 teaches 图2.3 , Instructors teaches Course_sections ,DBA 可能希望制定一条规则,即每个课程部分都需要至少分配一名讲师通过编写 min-card( Course_sections , teaches ) = 1. 但是,我们在制定这样的规则时需要小心,因为这意味着我们将无法创建新的课程部分,将其输入数据库,为其分配房间和课时,并允许学生注册该课程,同时推迟决定由谁来教授该课程。 DBA 还可能制定规则,通过编写 max-card( Course_sections , teaches ) = 1。另一方面,如果允许多名讲师分享课程部分的教学,则 DBA 会编写 max-card( Course_sections , teaches ) = N。这显然是一个显着差异。我们可能不想制定每个讲师教授某些课程部分的规则(写为 min-card( Instructors , teaches ) = 1),因为教练可能休假,所以我们选择 min-card( Instructors , teaches ) = 0。在大多数大学中,每个教师的课程负担在任何给定学期都大于 1,因此我们将设置 max-card( Instructors , teaches ) = N。
Definition. 定义。
When an entity E takes part in a relationship R with min-card(E, R) = x (x is either 0 or 1) and max-card(E, R) = y (y is either 1 or N), then in the ER diagram the connecting line between E and R can be labeled with the ordered cardinality pair (x, y). We use a new notation to represent this minimum-maximum pair (x, y): card(E, R) = (x, y).
当实体 E 参与关系 R 且 min-card(E, R) = x(x 为 0 或 1)且 max-card(E, R) = y(y 为 1 或 N)时,则在 ER 图中,E 和 R 之间的连接线可以用有序基数对 (x, y) 标记。我们使用一种新的符号来表示这个最小-最大对 (x, y):card(E, R) = (x, y)。
According to the above definition and the assignments of Example 2.5, the edge connecting the entity Course_sections to the relationship teaches should be labeled with the pair (1, 1). In Figure 2.7 we repeat the ER diagrams of Figure 2.3, with the addit ion of ordered pairs (x, y) labeling line connections, to show the minimum and maximum cardinalities for
all ER pairs. The cardinality pair for the Instructors teaches Course_sections diagram follows the discussion of Example 2.5, and other diagrams are filled in with reasonable pair values. We make a number of decisions to arrive at the following rules:
Every employee must work on at least one project (but may work on many); a project might have no employees assigned during
some periods (waiting for staffing), and of course some projects will have a large number of employees working on them; an
employee who acts in the manager_of role (see discussion below) may be managing no other employees at a given time and still be called a manager; and an employee
reports to at most one manager, but may report to none (this possibility exists because there must always be a highest-level
employee in a hierarchy who has no manager).
根据上面的定义和例2.5的赋值,连接实体的边 Course_sections 对关系 teaches 应标有 (1, 1) 对。在图 2.7中,我们重复图 2.3的 ER 图,并添加有序对 (x, y) 标记线连接,以显示所有 ER 对的最小和最大基数。的基数对 Instructors teaches Course_sections 该图遵循例 2.5的讨论,其他图填充了合理的对值。我们做出一系列决定来达成以下规则: 每个员工必须至少参与一个项目(但也可以参与多个项目);一个项目可能在某些时期没有分配员工(等待人员配备),当然有些项目会有大量员工在工作;从事以下活动的雇员 manager_of 角色(参见下面的讨论)可能在给定时间不管理其他员工,但仍称为经理;一名员工最多向一名经理汇报工作,但也可能不向任何人汇报(这种可能性是存在的,因为在层级结构中必定始终存在一名没有经理的最高级别员工)。
FIGURE 2.7. An ER diagram with labels (x, y) on ER connections.
图 2.7。 ER 图,ER 连接上带有标签 (x, y)。
In the Employees-manages diagram shown in Figure 2.7, the normal notation, card(Employees, manages), would be ambiguous. We say that there are two different roles played by the Employees entity in the relationship: the manager_of role and the reports_to role. Each relationship instance in manages connects a managed employee (Employees instance in the reports_to role) to a manager employee (Employees instance in the manager_of role). We use the cardinality notation with entities having parenthesized roles to remove ambiguity.
在 Employees-manages 示意图如图2.7所示,正常表示法, card(Employees , manages) ,会产生歧义。我们说,有两种不同的角色 Employees 关系中的实体: manager_of 角色和 reports_to 角色。中的每个关系实例 manages 连接受管理员工( Employees 实例中的 reports_to 角色)到经理员工( Employees 实例中的 manager_of 角色)。我们对具有括号角色的实体使用基数表示法来消除歧义。
- card(Employees(reports_to). manages) = (0. 1)
- card(Employees(manager_of). manages) = (0. N)
And from these cardinalities we see that an employee who acts in the manager_of role may be managing no other employees at a given time and still be called a manager; and an employee reports to at most
one manager, but may report to none (because of the highest-level employee in a hierarchy who has no manager—if it weren't
for that single person, we could give the label (1, 1) to the reports_to branch of the Employees-manages edge).
从这些基数中我们可以看出,从事以下工作的员工 manager_of 角色可能在给定时间不管理其他员工,但仍称为经理;一名员工最多向一位经理汇报,但也可能不向任何人汇报(因为层次结构中最高级别的员工没有经理,如果不是那个人,我们可以给标签 (1, 1 )到 reports_to 的分支 Employees-manages 边缘)。
Definition. 定义。
When an entity E takes part in a relationship R with max-card(E, R) = 1, then E is said to have single-valued participation in the relationship R. If max-card(E, R) = N, then E is said to be multivalued in this relationship. A binary relationship R between entities E and F is said to be many-to-many, or N-N, if both entities E and F are multi-valued in the relationship. If both E and F are single-valued, the relationship is said to be one-to-one, or 1-1. If E is single-valued and F is multivalued, or the reverse, the relationship is said to be many-to-one, or N-1. (We do not normally speak of a 1-N relationship as distinct from an N-1 relationship.)
当实体 E 参与关系 R 且 max-card(E, R) = 1 时,则称 E 在关系 R 中单值参与。如果 max-card(E, R) = N,则E 在这种关系中被认为是多值的。如果实体 E 和 F 在关系中都是多值的,则实体 E 和 F 之间的二元关系 R 被称为多对多(即 NN)。如果 E 和 F 都是单值,则该关系称为一对一或 1-1。如果 E 是单值且 F 是多值(反之亦然),则该关系称为多对一或 N-1。 (我们通常不会将 1-N 关系与 N-1 关系区分开来。)
2.2.2. One-to-One, Many-to-Many, and Many-to-One Relationships
2.2.2.一对一、多对多和多对一关系
Recall that the “many” side in a many-to-one relationship is the side that has single-valued participation. This might be
better understood by considering the relationship in Figure 2.7, Instructors teaches Course_sections, where card(Course_sections, teaches) = (1, 1), and the Course_sections entity represents the “many” side of the relationship. This is because one instructor teaches “many” course sections, while
the reverse is not true.
回想一下,多对一关系中的“多”方是具有单值参与的一方。通过考虑图 2.7中的关系可能会更好地理解这一点, Instructors teaches Course_sections , 在哪里 card(Course_sections , teaches) = (1, 1),并且 Course_sections 实体代表关系的“多”方。这是因为一位讲师教授“许多”课程部分,而反之则不然。
In the last definition, we see that the values max-card(E, R) and max-card(F, R) determine whether a binary relationship is
many-to-many, many-to-one, or one-to-one. On the other hand, the values min-card(E, R) and min-card(F, R) are not mentioned,
and they are said to be independent of these characterizations. In particular, the fact that min-card(F, R) = 1 in Figure 2.6(b) is independent of the fact that that figure represents a many-to-one relationship. If there were additional elements in entity
F that were not connected by any lines to elements in E (but all current connections remained the same), this would mean that
min-card(F, R) = 0, but the change would not affect the fact that R is a many-to-one relationship. We would still see one
element of E (the second from the top) related to two elements of F; in this case, the entity F is the “many” side of the
relationship.
在最后一个定义中,我们看到值 max-card(E, R) 和 max-card(F, R) 确定二元关系是多对多、多对一还是一对一一。另一方面,没有提到值 min-card(E, R) 和 min-card(F, R),并且据说它们与这些特征无关。特别是,图 2.6(b)中 min-card(F, R) = 1 的事实与该图表示多对一关系的事实无关。如果实体 F 中还有其他元素没有通过任何线路连接到 E 中的元素(但所有当前连接保持不变),这将意味着 min-card(F, R) = 0,但更改不会影响R 是多对一关系这一事实。我们仍然会看到 E 的一个元素(从顶部数第二个)与 F 的两个元素相关;在这种情况下,实体 F 是关系的“多”方。
Although min-card(E, R) and min-card(F, R) have no bearing on whether a binary relationship R is many-to-many, many-to-one,
or one-to-one, a different characterization of entity participation in a relationship is determined by these quantities.
尽管 min-card(E, R) 和 min-card(F, R) 与二元关系 R 是多对多、多对一还是一对一无关,但不同的表征实体参与关系的程度由这些数量决定。
Definition. 定义。
When an entity E that participates in a relationship R has min-card(E, R) = 1, E is said to have mandatory participation in R, or is simply called mandatory in R. An entity E that is not mandatory in R is said to be optional, or to have optional participation.
当参与关系 R 的实体 E 的 min-card(E, R) = 1 时,称 E 在 R 中强制参与,或者简称为 R 中强制。在 R 中非强制的实体 E 为据说是可选的,或者有可选的参与。
2.2.3. Transforming Binary Relationships to Relations
2.2.3.将二元关系转变为关系
We are now prepared to give the transformation rule for a binary many-to-many relationship.
我们现在准备给出二元多对多关系的转换规则。
- Transformation Rule 3. N–N Relationships: When two entities E and F take part in a many-to-many binary relationship R, the relationship is mapped to a representative
table T in the related relational database design. The table contains columns for all attributes in the primary keys of both
tables transformed from entities E and F, and this set of columns forms the primary key for the table T. Table T also contains
columns for all attributes attached to the relationship. Relationship occurrences are represented by rows of the table, with
the related entity instances uniquely identified by their primary key values as rows.▪
转换规则3.N-N关系:当两个实体E和F参与多对多二元关系R时,该关系被映射到相关关系数据库设计中的代表表T。该表包含从实体 E 和 F 转换而来的两个表的主键中所有属性的列,并且这组列形成表 T 的主键。表 T 还包含附加到关系的所有属性的列。关系出现由表的行表示,相关实体实例由其主键值唯一标识为行。▪
EXAMPLE 2.6 例2.6
In Figure 2.7, the relationship works_on is many-to-many between the entities Employees and Projects. The relational design in Figure 2.8 follows Transformation Rule 1 to provide a table for the entity Employees (as specified in Example 2.2) and a table for the entity Projects; it also follows Transformation Rule 3, to provide a table for the relationship works_on.
在图2.7中,关系 works_on 实体之间是多对多 Employees 和 Projects 。图2.8中的关系设计遵循转换规则1,为实体提供一个表 Employees (如示例 2.2中指定)和实体的表 Projects ;它还遵循转换规则 3,为关系提供一个表格 works_on 。FIGURE 2.8. Relational design for Employees works_on Projects of Figure 2.7.
图 2.8。关系设计 Employees works_on Projects 图2.7 。We generally assume that the eid column in the employees table and prid column for the projects table cannot take on null values, since they are the primary keys for their tables and must differentiate all rows by unique values. Similarly, the (eid, prid) pair of columns in the works_on table cannot take on null values in either component, since each row must uniquely designate the employee–project pair related. Note that no primary key column of a relational table can take on null values. Note that although we refer to this as the entity integrity rule, it applies as well to tables arising out of the relationships in the ER model. Note also that the SQL Create Table command provides syntax to impose an integrity constraint on a table that guarantees this rule will not be broken, that no nulls will be assigned. For example, the SQL statement
我们一般假设 eid 栏目中的 employees 表和 prid 列为 projects 表不能采用空值,因为它们是表的主键,并且必须通过唯一值区分所有行。同样, (eid, prid) 中的一对列 works_on 表中的任何一个组件都不能采用空值,因为每一行必须唯一地指定相关的员工-项目对。请注意,关系表的主键列不能采用空值。请注意,虽然我们将此称为实体完整性规则,但它也适用于由 ER 模型中的关系产生的表。另请注意,SQL Create Table 命令提供了对表施加完整性约束的语法,以保证此规则不会被破坏,并且不会分配空值。例如,SQL 语句
- create table projects (prid char(3) not null...);
guarantees that the prid column of the projects table cannot take on null values as a result of later Insert, Delete, or Update statements. There are other constraints as well that have this effect.
保证 prid 的栏目 projects 表不能因后面的 Insert、Delete 或 Update 语句而采用空值。还有其他限制也会产生这种影响。
- Transformation Rule 4. N–1 Relationships: When two entities E and F take part in a many-to-one binary relationship R, the relationship will not be mapped to a table
of its own in a relational database design. Instead, if we assume that the entity F has max-card(F, R) = 1 and thus represents
the “many” side of the relationship, the relational table T transformed from the entity F should include columns constituting
the primary key for the table transformed from the entity E; this is known as a foreign key in T. Since max-card(F, R) = 1, each row of T is related by a foreign key value to at most one instance of the entity E.
If F has mandatory participation in R, then it must be related to exactly one instance of E, and this means that the foreign
key in T cannot take on null values. If F has optional participation in R, then each row of T that is not related can have
null values in all columns of the foreign key. ▪
转换规则 4. N-1 关系:当两个实体 E 和 F 参与多对一二元关系 R 时,在关系数据库设计中,该关系不会映射到其自己的表。相反,如果我们假设实体 F 具有 max-card(F, R) = 1 并因此代表关系的“多”方,则从实体 F 转换而来的关系表 T 应该包括构成实体 F 的主键的列。由实体E转化而来的表;这被称为 T 中的外键。由于 max-card(F, R) = 1,T 的每一行都通过外键值与实体 E 的最多一个实例相关。如果 F 强制参与 R ,那么它必须与 E 的一个实例相关,这意味着 T 中的外键不能采用空值。如果 F 可选参与 R,则 T 中不相关的每一行在外键的所有列中都可以具有空值。 ▪
EXAMPLE 2.7 例2.7
Figure 2.9 shows a relational transformation of the Instructors teaches Course_sections ER diagram of Figure 2.7. Recall that we made the rule that one instructor can teach multiple course sections, but each course section can have only one instructor. The insid column in the Course_sections table is a foreign key, relating a course_sections instance (row) to a unique instructors instance (row).
图 2.9显示了关系变换 Instructors teaches Course_sections ER图如图2.7 。回想一下,我们制定了规则,一名教师可以教授多个课程部分,但每个课程部分只能有一名教师。这 insid 栏目中的 Course_sections 表是一个外键,关联一个 course_sections 实例(行)到唯一 instructors 实例(行)。FIGURE 2.9. Relational design for Instructors teaches Course_sections of Figure 2.7.
图 2.9。关系设计 Instructors teaches Course_sections 图2.7 。The Create Table command in SQL can require a column not to take on null values; therefore, it is possible to guarantee a faithful representation for mandatory participation by the “many” side entity in a many-to-one relationship. Here we can create the course_sections table so no nulls are allowed in the insid column. What we mean by “faithful” is that it becomes impossible for a user to corrupt the data by a thoughtless update, because SQL does not allow a course_sections row with a null value for insid. SQL can also impose a constraint that the foreign key insid value in a row of the course_sections table actually exists as a value in the insid primary key column in the instructors table. This constraint is known as referential integrity.
SQL 中的“创建表”命令可以要求列不采用空值;因此,可以保证多对一关系中“多”方实体强制参与的忠实代表。在这里我们可以创建 course_sections 表中不允许有空值 insid 柱子。我们所说的“忠实”是指用户不可能通过轻率的更新来破坏数据,因为 SQL 不允许 course_sections 具有空值的行 insid 。 SQL 还可以施加一个约束,即外键 insid 的一行中的值 course_sections 表实际上作为值存在于 insid 中的主键列 instructors 桌子。此约束称为引用完整性。
Unfortunately, it is not possible in standard SQL to guarantee a mandatory participation by the “one” side of a many-to-one
relationship, or by either side of a many-to-many relationship. Thus, in Example 2.7 there would be no way to provide a faithful representation in an SQL table definition that would guarantee that every instructor teaches at least one
course.
不幸的是,在标准 SQL 中不可能保证多对一关系的“一”方或多对多关系的任何一方强制参与。因此,在示例 2.7中,无法在 SQL 表定义中提供忠实的表示,以保证每位教师至少教授一门课程。
Note that there are differences of opinion among texts on some of these ER transformation rules for relationships. Teorey
(1994) gives the equivalent to Transformation Rule 4 for N-1 relationships, but Batini et al. (1992) provides an alternate
transformation where the relationship is mapped onto a table of its own if the entity at the “many” side of the relationship
has an optional participation. The reason for this is to avoid possibly heavy use of null values in the foreign key (insid in course_sections in Example 2.7); but since there seems to be nothing wrong with using null values, we follow the transformation of Teorey (1994).
请注意,文本之间对于某些关系的 ER 转换规则存在不同意见。 Teorey (1994) 给出了 N-1 关系的变换规则 4 的等价物,但 Batini 等人。 (1992) 提供了一种替代转换,如果关系“多”方的实体具有可选参与权,则关系将映射到其自己的表上。这样做的原因是为了避免在外键中可能大量使用空值( insid 在 course_sections 例 2.7中);但由于使用空值似乎没有什么问题,所以我们遵循 Teorey (1994) 的转换。
- Transformation Rule 5.1-1 Relationships, Optional Participation: Given two entities E and F that take part in a one-to-one binary relationship R, where participation is optional on either
side, we wish to translate this situation into a relational design. To do this, we create a table S to represent the entity
E, following the prescription of Transformation Rule 1, and similarly a table T to represent the entity F. Then we adjoin
to the table T a set of columns (as a foreign key) constituting the primary key for table S. If we wish, we may also adjoin
to table S a foreign key set of columns referring to the primary key of table T. For any relationship instance in R, a unique
entity instance in E is related to a unique instance in F—in the corresponding rows of S and T, the foreign key column values
filled in to reference the row in the other table arising from the instances related by R.▪
转换规则 5.1-1 关系,可选参与:给定参与一对一二元关系 R 的两个实体 E 和 F,其中任何一方的参与都是可选的,我们希望将这种情况转换为关系设计。为此,我们按照转换规则 1 的规定创建一个表 S 来表示实体 E,并类似地创建一个表 T 来表示实体 F。然后,我们将一组列(作为外键)与表 T 相连) 构成表 S 的主键。如果我们愿意,我们还可以将引用表 T 的主键的外键列集与表 S 相邻。对于 R 中的任何关系实例,E 中的唯一实体实例都是相关的到 F 中的唯一实例 - 在 S 和 T 的相应行中,填充外键列值以引用由 R 相关实例产生的其他表中的行。 - Transformation Rule 6.1-1 Relationships, Mandatory Participation on Both Sides: In the case of a one-to-one relationship with mandatory participation on both sides, it is most appropriate to combine the
tables for the two entities into one, and in this way avoid any foreign keys.▪
转换规则6.1-1关系,双方强制参与:在一对一关系,双方强制参与的情况下,将两个实体的表合并为一个是最合适的,这样避免任何外键。▪
We do not present transformation rules for all possible n-ary relationships with n > 2. Usually such an n-ary relationship is transformed into a table of its own, but if all but one of the entities of the relationship participate
with max-card = 1, then it is possible to represent the relationship with n − 1 foreign keys in the one table that participates with greater cardinality.
我们不会为所有可能的n元关系提供n > 2 的转换规则。通常,这样的n元关系会转换为它自己的表,但如果关系中除一个实体之外的所有实体都参与 max-如果card = 1,则可以用参与基数更大的一张表中的n -1个外键来表示关系。
2.3. Additional ER Concepts
2.3.其他 ER 概念
In this section we introduce a number of additional concepts useful for ER modeling.
在本节中,我们将介绍一些对 ER 建模有用的附加概念。
2.3.1. Cardinality of Attributes
2.3.1.属性的基数
To begin with, we note that the min-card/max-card notation can be used to describe the cardinality of attributes attached
to entities.
首先,我们注意到最小卡/最大卡符号可用于描述附加到实体的属性的基数。
Definition. 定义。
Given an entity E and an attached attribute A, we write min-card(A, E) = 0 to indicate that the attribute A is optional, and min-card(A, E) = 1 to indicate that the attribute A is mandatory. An attribute that is mandatory should correspond to a column declared in the table representing the entity E with no nulls allowed. We write max-card(A, E) = 1 to indicate that the attribute is single valued, and max-card(A, E) = N to indicate that the attribute is multivalued. An attribute A is said to have card(A, E) = (x, y) when min-card(A, E) = x and max-card(A, E) = y. The (x, y) pair can be used to label an attribute–entity connection in an ER diagram to indicate the cardinality of the attribute.
给定实体 E 和附加属性 A,我们写 min-card(A, E) = 0 表示属性 A 是可选的,min-card(A, E) = 1 表示属性 A 是强制的。强制属性应对应于表中声明的表示实体 E 的列,不允许有空值。我们写 max-card(A, E) = 1 表示该属性是单值属性,写 max-card(A, E) = N 表示该属性是多值属性。当 min-card(A, E) = x 且 max-card(A, E) = y 时,属性 A 被称为具有 card(A, E) = (x, y)。 (x, y) 对可用于标记 ER 图中的属性-实体连接,以指示属性的基数。
Attributes that have unlabeled connectors in an ER diagram can be assumed to have cardinality (0, 1) if they are descriptor
attributes, and cardinality (1, 1) if they are identifier attributes. Figure 2.10 recapitulates Figure 2.2 with labeled attribute–entity connectors. (Note that these are not the default cardinalities only because of lack of notation.)
如果 ER 图中具有未标记连接器的属性是描述符属性,则可以假定其基数为 (0, 1);如果它们是标识符属性,则可以假定其基数为 (1, 1)。图 2.10概括了图 2.2,并带有标记的属性-实体连接器。 (请注意,这些不是默认基数,只是因为缺乏符号。)
FIGURE 2.10. ER diagrams with labeled attribute–entity connectors.
图 2.10。带有标记的属性-实体连接器的 ER 图。
In Figure 2.10 we note that the attribute midinitial is optional (some people don't have middle names). The composite attribute student_names is mandatory for Students, but emp_address is optional for Employees. However, given that emp_address exists, all four simple attributes making up the address are mandatory. Both sid and eid have cardinality (1, 1); this is always the case for entity identifiers. The multivalued hobbies attribute has max-card N, as we can also tell from the fact that it is connected to its entity by a double line. The fact that min-card(hobbies, Employees) = 1 is somewhat surprising and indicates that the employee must name at least one hobby for inclusion in the database.
在图 2.10中我们注意到该属性 midinitial 是可选的(有些人没有中间名)。复合属性 student_names 是强制性的 Students , 但 emp_address 是可选的 Employees 。然而,鉴于 emp_address 存在时,构成地址的所有四个简单属性都是强制性的。两个都 sid 和 eid 有基数 (1, 1);对于实体标识符来说总是如此。多值化 hobbies 属性具有最大卡数 N,我们还可以从它通过双线连接到其实体的事实看出。事实上,min-card( hobbies , Employees ) = 1 有点令人惊讶,它表明员工必须至少说出一项爱好才能包含在数据库中。
Definition:Weak Entity. 定义:弱实体。
A weak entity is an entity whose occurrences are dependent for their existence, through a relationship R, on the occurrence of another (strong) entity.
弱实体是这样的实体,其出现通过关系 R 依赖于另一个(强)实体的出现。
2.3.2. Weak Entities 2.3.2.弱实体
As an example, we have been assuming in our CAP design that an order specifies a customer, agent, product, quantity, and dollar
cost. A common design variant that allows multiple products to be ordered at once will create an orders table that relates to customers and agents rows, as well as a line_items table containing individual product purchases; a number of rows in the line_items table relate to one master orders occurrence. The design of this in the ER model is given in Figure 2.11.
例如,我们在 CAP 设计中假设订单指定了客户、代理商、产品、数量和美元成本。允许同时订购多种产品的常见设计变体将创建一个 orders 涉及到的表 customers 和 agents 行,以及 line_items 包含单个产品购买的表格;中的许多行 line_items 表与一位主控相关 orders 发生。 ER 模型中的设计如图 2.11所示。
FIGURE 2.11. A weak entity, Line_items, dependent on the entity Orders.
图 2.11。一个弱小的实体, Line_items ,取决于实体 Orders 。
As we see, the entity Orders is optional in its relationship to Line_items, since each order must start without any line items. Line_items is mandatory in the relationship, because a line-item order for a product cannot exist without a master order containing
it to specify the customer and agent for the order. If the Orders occurrence goes away (the customer cancels it), all occurrences of the weak entity Line_items will likewise disappear. A dead giveaway for a weak entity is the fact that the primary identifier for Line_items (lineno) is only meaningful within some order. In fact, what this implies is that the primary identifier for the weak entity Line_items must include the attributes in the primary identifier for the Orders entity. Attributes such as Line_items are known as external identifier attributes.
正如我们所看到的,实体 Orders 在其关系中是可选的 Line_items ,因为每个订单都必须在没有任何行项目的情况下开始。 Line_items 在关系中是强制性的,因为如果没有包含产品的行项目订单来指定订单的客户和代理,则产品的行项目订单就不可能存在。如果 Orders 事件消失(客户取消它),弱实体的所有事件 Line_items 同样也会消失。对于弱实体来说,一个致命的弱点是,其主要标识符 Line_items ( lineno )仅在一定顺序内才有意义。事实上,这意味着弱实体的主要标识符 Line_items 必须在主标识符中包含属性 Orders 实体。属性如 Line_items 称为外部标识符属性。
When the Line_items weak entity is mapped to a relational table line_items, an ordno column is included by Transformation Rule 4 to represent the N-1 has_item relationship; thus, the primary key for the line_items table is constructed from the external attribute ordno and the weak entity identifier lineno. Note that it is also sometimes difficult to distinguish between a weak entity and a multivalued attribute. For example,
hobbies in Example 2.2 could be identified as a weak entity Hobbies, with an identifier hobby_name. However, Figure 2.11 obviously implies Line_items is a weak entity rather than a multivalued attribute, since Line_items is separately related to another entity, Products.
当 Line_items 弱实体映射到关系表 line_items , 一个 ordno 列包含在转换规则 4 中以表示 N-1 has_item 关系;因此,主键为 line_items 表是根据外部属性构造的 ordno 和弱实体标识符 lineno 。请注意,有时也很难区分弱实体和多值属性。例如, hobbies 例2.2中的可以被识别为弱实体 Hobbies ,带有标识符 hobby_name 。然而,图2.11显然意味着 Line_items 是一个弱实体而不是一个多值属性,因为 Line_items 与另一个实体单独相关, Products 。
2.3.3. Generalization Hierarchies
2.3.3.泛化层次结构
Finally, we introduce the concept of a generalization hierarchy or generalization relationship. The idea is that several entities with common attributes can be generalized into a higher-level supertype entity, or, alternatively, a general entity can be decomposed into lower-level subtype entities. The purpose is to attach attributes at the proper level and thus avoid having attributes of a common entity that require
a large number of null values in each entity instance. For example, assume that we distinguish between Managers and Non_managers as subtype entities of the supertype Employees (see Figure 2.12). Then attributes such as expenseno (for expense reports) can be attached only to the Managers entity, while nonmanager attributes such as union status can be attached to Non_managers. Consultants might form another entity type sharing many properties with Employees, and we could create a new supertype entity named Persons to contain them both. An ER diagram showing a generalization hierarchy normally has arrows (unnamed) directed from the subtype
to the supertype entities.
最后,我们引入泛化层次或泛化关系的概念。这个想法是,具有共同属性的多个实体可以概括为更高级别的超类型实体,或者,通用实体可以分解为较低级别的子类型实体。目的是在适当的级别附加属性,从而避免在每个实体实例中具有需要大量空值的公共实体的属性。例如,假设我们区分 Managers 和 Non_managers 作为超类型的子类型实体 Employees (见图2.12 )。然后属性如 expenseno (用于费用报告)只能附加到 Managers 实体,而非管理者属性,例如 union status 可以附加到 Non_managers 。顾问可能会形成另一种实体类型,与它们共享许多属性 Employees ,我们可以创建一个名为的新超类型实体 Persons 来遏制他们两个。显示泛化层次结构的 ER 图通常具有从子类型指向超类型实体的箭头(未命名)。
FIGURE 2.12. A generalization hierarchy with examples of attributes attached.
图 2.12。带有附加属性示例的泛化层次结构。
The arrow relationship between the subtype entity and the supertype entity is often referred to as an is-a relationship, since a consultant is a person, a manager is an employee, and so forth. Object-relational database systems express these concepts using type inheritance, where objects (rows) of a given subtype contain specific attributes but inherit all attributes of their supertype. In particular, INFORMIX and SQL-99 support inheritance of object types.
子类型实体和父类型实体之间的箭头关系通常称为is-a 关系,因为顾问是人,经理是雇员,等等。对象关系数据库系统使用类型继承来表达这些概念,其中给定子类型的对象(行)包含特定属性,但继承其超类型的所有属性。特别是,INFORMIX 和 SQL-99 支持对象类型的继承。
The relational model provides no support for the concept of generalization hierarchy, so it is necessary to reconfigure such
a design element into simpler concepts. This can happen either prior to transformation into relational tables or as part of
the transformation. Here we give an idea of how to perform such a reconfiguration while remaining in the ER model, before
transformation into a relational representation. We consider one level of generalization hierarchy at a time and give two
alternatives.
关系模型不提供对泛化层次结构概念的支持,因此需要将这样的设计元素重新配置为更简单的概念。这可以在转换为关系表之前发生,也可以作为转换的一部分发生。在这里,我们给出了在转换为关系表示之前如何在保留 ER 模型的同时执行此类重新配置的想法。我们一次考虑一层泛化层次并给出两种选择。
- We can collapse a one-level generalization hierarchy of subtype and supertype entities into a single entity by adding all
attributes of the subtype entities to the supertype entity. An additional attribute must be added to this single entity, which
will discriminate among the various types. As an example, the Employees entity in Figure 2.12 could be augmented to represent managers and nonmanagers as well, by affixing the attributes union_no, expenseno, and emptype to the Employee entity. Now the union_no attribute will be null when emptype has value “Manager,” and similarly expenseno will be null when emptype is “Nonmanager.” The emptype attribute might also designate the supertype case, an important alternative when some entity instances in the supertype fall
in none of the named subtypes.
通过将子类型实体的所有属性添加到超类型实体,我们可以将子类型和超类型实体的一级泛化层次结构折叠为单个实体。必须向这个单一实体添加一个附加属性,这将区分各种类型。举个例子, Employees 通过附加属性,图 2.12中的实体也可以扩展为代表管理者和非管理者 union_no , expenseno , 和 emptype 到 Employee 实体。现在的 union_no 当以下情况时属性将为空 emptype 具有值“Manager”,类似地 expenseno 将为空,当 emptype 是“非经理”。这 emptype 属性还可以指定超类型情况,当超类型中的某些实体实例不属于任何指定子类型时,这是一个重要的替代方案。 - We can retain the supertype entity and all subtype entities as full entities and create explicit named relationships to represent
the is-a relationships.
我们可以将超类型实体和所有子类型实体保留为完整实体,并创建显式命名关系来表示 is-a 关系。
Alternative 2 is particularly useful when the various subtypes and supertype are quite different in attributes and are handled
differently by application logic.
当各种子类型和超类型的属性差异很大并且应用程序逻辑的处理方式不同时,替代方案 2 特别有用。
We do not investigate all concepts of the ER model in full depth here. See the references at the end of this chapter for a
list of texts devoted to complete coverage of the ER model and logical database design.
我们在这里不会全面深入研究 ER 模型的所有概念。请参阅本章末尾的参考资料,获取专门完整介绍 ER 模型和逻辑数据库设计的文本列表。
2.4. Case Study 2.4.案例研究
Let us try to perform an ER design from the beginning, ending up with a set of relational tables. Consider a simple airline
reservation database handling (only) outgoing flights from one airline terminal. We need to keep track of passengers, flights,
departure gates, and seat assignments. We could get almost arbitrarily complex in a real design, since a “flight” actually
brings together a flight crew and an airplane, serviced by a ground crew, slotted into a regularly scheduled departure time
with an assigned flight number on a specific date. But for simplicity, we will assume that we can represent flights with an
entity Flights, having primary identifier flightno (unique identifier values, not repeated on successive days) and descriptive attribute depart_time (actually made up of date and time); other details will be hidden from us. Passengers are represented by another entity,
Passengers, with primary identifier attribute ticketno; a passenger has no other attribute that we care about. We also need to keep track of seats for each flight. We assume that
each seat is an entity instance in its own right, an entity Seats, identified by a seat number, seatno, valid only for a specific flight (different flights might have different airplane seat layouts, and therefore different
sets of seat numbers). We see therefore that seat assignment is a relationship between Passengers and Seats, which we name seat_assign.
让我们尝试从头开始执行 ER 设计,最终得到一组关系表。考虑一个简单的航空公司预订数据库(仅)处理从一个航空公司航站楼出发的航班。我们需要跟踪乘客、航班、登机口和座位分配。在实际设计中,我们可以变得几乎任意复杂,因为“航班”实际上将机组人员和飞机聚集在一起,由地勤人员提供服务,并在特定日期分配到定期起飞时间和指定的航班号。但为了简单起见,我们假设我们可以用一个实体来表示航班 Flights ,具有主标识符 flightno (唯一标识符值,连续几天不重复)和描述性属性 depart_time (实际上由日期和时间组成);其他细节将对我们隐藏。乘客由另一个实体代表, Passengers ,具有主标识符属性 ticketno ;乘客没有我们关心的其他属性。我们还需要跟踪每个航班的座位。我们假设每个座位本身就是一个实体实例,一个实体 Seats ,由座位号标识, seatno ,仅对特定航班有效(不同航班可能有不同的飞机座位布局,因此座位号也不同)。因此我们看到座位分配是以下关系: Passengers 和 Seats ,我们命名为 seat_assign 。
Now think about this specification for a moment. The Passengers entity is easy to picture, and so is the Flights entity. The depart_time attribute for Flights is composite, consisting of simple attributes dtime and ddate. We can add another entity Gates, with primary identifier gateno. We have already defined a Seats entity, but the entity seems to be a little strange: The seatno primary identifier for Seats is only meaningful when related to a Flights instance. This is what is referred to in the previous section as a weak entity, and thus there must be a relationship between
Flights and Seats, which we name has_seat. The identifier for Seats is partially external, encompassing the identifier of the containing flight.
现在考虑一下这个规范。这 Passengers 实体很容易描绘,因此 Flights 实体。这 depart_time 属性为 Flights 是复合的,由简单的属性组成 dtime 和 ddate 。我们可以添加另一个实体 Gates ,带有主标识符 gateno 。我们已经定义了一个 Seats 实体,但实体似乎有点奇怪: seatno 主要标识符 Seats 仅当与某个相关时才有意义 Flights 实例。这就是上一节所说的弱实体,因此它们之间必定存在关系 Flights 和 Seats ,我们命名为 has_seat 。标识符为 Seats 部分是外部的,包含所在航班的标识符。
What other relationships do we have? If we draw the ER diagram for what we have named up to now, we notice that the Gates entity is off by itself. But clearly passengers go to a gate to meet a flight. We model this as two binary relationships
rather than as a ternary relationship: each passenger is related to a specific flight through the relationship Passengers travels_on Flights, and gates normally act as marshaling points for multiple flights (at different times) through the relationship Gates marshals Flights. Figure 2.13 shows the ER diagram so far. The arrow from seatno to flightno symbolizes the fact that the primary identifier for Seats includes the identifier for the master entity Flights.
我们还有哪些其他关系?如果我们为到目前为止所命名的内容绘制 ER 图,我们会注意到 Gates 实体自行关闭。但显然乘客会前往登机口迎接航班。我们将其建模为两个二元关系而不是三元关系:每个乘客通过该关系与特定航班相关 Passengers travels_on Flights ,并且登机口通常通过这种关系充当多个航班(在不同时间)的编组点 Gates marshals Flights 。图 2.13显示了到目前为止的 ER 图。箭头从 seatno 到 flightno 象征着以下事实:主要标识符 Seats 包括主实体的标识符 Flights 。
FIGURE 2.13. Early ER design for a simple airline reservation database.
图 2.13。早期 ER 设计用于简单的航班预订数据库。
Now we need to work out the cardinalities with which the various entities participate in their relationships. Considering
the marshals relationship first, clearly there is exactly one gate for each flight, so card(Flights, marshals) = (1, 1). A single gate might be used for multiple flights at different times, but there is no rule that a gate must be
used at all, so card(Gates, marshals) = (0, N). Now each passenger must travel on exactly one flight, so card(Passengers, travels_on) = (1, 1). A flight must have multiple passengers to fly (the flight will be canceled and the gate reassigned if there are
too few), but the database needs to hold information starting from no passengers, so we set a minimum of 0, and card(Flights, travels_on) = (0, N). A flight must have numerous seats for passengers, so card(Flights, has_seat) = (1, N), and each seat is on a unique flight, so card(Seats, has_seat) = (1, 1). Each passenger must have a seat, and only one, so card(Passengers, seat_assign) = (1, 1), and seats can be used by at most one passenger and may go empty, so card(Seats, seat_assign) = (0, 1). The ER diagram with these cardinality pairs added is pictured in Figure 2.14.
现在我们需要计算出各个实体参与其关系的基数。考虑到 marshals 首先是关系,显然每个航班都有一个登机口,所以卡( Flights , marshals ) = (1, 1)。一个登机口可能会在不同时间用于多个航班,但没有规定必须使用一个登机口,因此卡( Gates , marshals ) = (0, N)。现在每位乘客必须乘坐同一趟航班,因此卡( Passengers , travels_on ) = (1, 1)。一个航班必须有多名乘客飞行(如果乘客太少,航班将被取消,并重新分配登机口),但数据库需要保存从无乘客开始的信息,所以我们设置最小值为0,并且卡( Flights , travels_on ) = (0, N)。一个航班必须有很多乘客座位,所以卡( Flights , has_seat ) = (1, N),并且每个座位都在一个唯一的航班上,因此 card( Seats , has_seat ) = (1, 1)。每个乘客必须有座位,而且只有一个,所以卡( Passengers , seat_assign ) = (1, 1),并且座位最多可供一名乘客使用,并且可能会空,因此卡( Seats , seat_assign ) = (0, 1)。添加了这些基数对的 ER 图如图2.14所示。
FIGURE 2.14. ER design with cardinalities for a simple airline reservation database.
图 2.14。带有基数的 ER 设计用于简单的航空公司预订数据库。
Now the ER design is complete, and we need to transform the design into relational tables. We can begin by creating tables
to map entities, even though this means that we might overlook some attributes that will be needed to represent foreign keys
for relationships. We will simply have to return later when we consider the relationships and add attributes to these tables.
To begin with, we notice with the Flights entity that we don't have multivalued attributes in relational tables, so following the hint of Transformation Rule 1, we
create columns for ddate and dtime in the flights table. All other tables are easily mapped, except for the seats table, where we take the easy way out and use the single column seatno, even though this is not a complete key for the table. Here are the tables so far:
现在ER设计已经完成,我们需要将设计转化为关系表。我们可以从创建表来映射实体开始,尽管这意味着我们可能会忽略表示关系外键所需的一些属性。当我们考虑关系并向这些表添加属性时,我们只需稍后返回即可。首先,我们注意到 Flights 我们在关系表中没有多值属性的实体,因此根据转换规则 1 的提示,我们为以下对象创建列 ddate 和 dtime 在 flights 桌子。所有其他表都可以轻松映射,除了 seats 表,我们采取简单的方法并使用单列 seatno ,即使这不是表的完整键。以下是迄今为止的表格:
passengers | gates | flights | seats | ||
---|---|---|---|---|---|
ticketno 票号 | gateno 加特诺 | flightno 航班号 | ddate | dtime | seatno 座位号 |
… |
Now consider the relationship has_seat, which is N-1 in Figure 2.14, with Seats on the “many” side. By Transformation Rule 4, a foreign key in the seats table will connect each seats row to the appropriate flights row. This completes the primary key for the seats table, which represents a weak entity and therefore needs a foreign key to identify each row.
现在考虑一下关系 has_seat ,即图 2.14中的 N-1 ,其中 Seats 在“多”方面。根据转换规则 4,外键 seats 表将连接每个 seats 行到适当的 flights 排。这样就完成了主键 seats 表,它代表弱实体,因此需要一个外键来标识每一行。
passengers | gates | flights | seats | |||
---|---|---|---|---|---|---|
ticketno 票号 | gateno 加特诺 | flightno 航班号 | ddate | dtime | seatno 座位号 | flightno |
… |
The seat_assign relationship is 1-1, with optional participation by Seats, so by Transformation Rule 5 we can represent this by adjoining to the passengers table a foreign key for seats (this requires two additional columns). We don't expect that we will ever need to look up the
passenger for a given seat, so we place no additional foreign key on the seats table. The resulting table definitions are as follows:
这 seat_assign 关系是 1-1,可选择参与 Seats ,因此根据变换规则 5,我们可以通过邻接来表示这一点 passengers 表一个座位的外键(这需要两个额外的列)。我们不希望我们需要查找给定座位的乘客,因此我们没有在座位上放置额外的外键 seats 桌子。结果表定义如下:
passengers | gates | ||
---|---|---|---|
ticketno 票号 | seatno | flightno | gateno 加特诺 |
… |
flights | seats | |||
---|---|---|---|---|
flightno 航班号 | ddate | dtime | seatno 座位号 | flightno 航班号 |
Now consider the marshals relationship. This is N-1, with Flights on the “many” side, so by Transformation Rule 4 a foreign key in the flights table, gateno, will connect each flights row to the appropriate gates row:
现在考虑 marshals 关系。这是 N-1,其中 Flights 在“多”方面,因此根据转换规则 4,外键 flights 桌子, gateno ,将连接每个 flights 行到适当的 gates 排:
passengers | gates | ||
---|---|---|---|
ticketno 票号 | seatno | flightno | gateno 加特诺 |
… |
flights | seats | ||||
---|---|---|---|---|---|
flightno 航班号 | gateno | ddate | dtime | seatno 座位号 | flightno |
Similarly the travels_on relationship is N-1, with Passengers on the “many” side, so by Transformation Rule 4 a foreign key, flightno, in the passengers table will connect each passengers row to the appropriate flights row. This column already exists in the passengers table, however, so the relational table design is complete.
同样地 travels_on 关系为 N-1,其中 Passengers 在“多”方面,因此根据转换规则 4 外键, flightno ,在 passengers 表将连接每个 passengers 行到适当的 flights 排。该列已存在于 passengers table,但是,这样关系表的设计就完成了。
2.5. Normalization: Preliminaries
2.5.标准化:预备
Normalization is another approach to logical design of a relational database, which seems to share little with the ER model.
However, it will turn out that a relational design based on normalization and a careful ER design transformed into relational
form have nearly identical results, and in fact the two approaches reinforce each other. In the normalization approach, the
designer starts with a real-world situation to be modeled and lists the data items that are candidates to become column names
in relational tables, together with a list of rules about the relatedness of these data items. The aim is to represent all
these data items as attributes of tables that obey restrictive conditions associated with what we call normal forms. These normal form definitions limit the acceptable form of a table so that it has certain desirable properties, thus avoiding
various kinds of anomalous behavior. There is a series of normal form definitions, each more restrictive than the one before;
the forms covered in this chapter are first normal form (1NF), second normal form (2NF), third normal form (3NF), and Boyce-Codd
normal form (BCNF). Other types of normalization, 4NF and 5NF, are less commonly considered and are not covered in detail
in this chapter.
规范化是关系数据库逻辑设计的另一种方法,它似乎与 ER 模型没有什么共同点。然而,事实证明,基于归一化的关系设计和转化为关系形式的仔细 ER 设计具有几乎相同的结果,而且事实上这两种方法是相辅相成的。在规范化方法中,设计者从要建模的真实情况开始,列出将成为关系表中的列名称的候选数据项,以及关于这些数据项的相关性的规则列表。目的是将所有这些数据项表示为表的属性,这些属性遵守与我们所谓的范式相关的限制条件。这些范式定义限制了表的可接受形式,使其具有某些所需的属性,从而避免各种异常行为。有一系列范式定义,每一个都比前一个更具限制性;本章涵盖的形式是第一范式(1NF)、第二范式(2NF)、第三范式(3NF)和Boyce-Codd范式(BCNF)。其他类型的归一化(4NF 和 5NF)不太常见,本章也没有详细介绍。
To begin with, a table in 1NF is simply one that has no multivalued (repeating) fields. SQL language accepts this rule as
basic. In what follows we assume that tables are in 1NF unless otherwise specified. 2NF turns out to be of mainly historical interest, since no sensible designer would leave a database in 2NF but would always
continue normalization until the more restrictive 3NF was reached. From an initial database containing data items that are
all in the same table (sometimes referred to as a universal table) and relatedness rules on these data items, there is a procedure to create an equivalent database with multiple tables, all
of which are in 3NF. (This is what we mean by having a database in 3NF—that all of its tables have 3NF form.) As we proceed
through this chapter we will find that any table that does not obey 3NF can be factored into distinct tables in such a way
that (1) each of the factored tables is in a valid 3NF, and (2) the join of all these factored tables contains exactly the
information in the table from which they were factored. The set of 3NF tables resulting from the initial universal table is known as a 3NF lossless decomposition of the database.
首先,1NF 中的表只是一个没有多值(重复)字段的表。 SQL 语言接受此规则作为基本规则。在下文中,除非另有说明,否则我们假设表为 1NF 。 2NF 事实证明主要具有历史意义,因为明智的设计者不会将数据库保留在 2NF 中,但始终会继续规范化,直到达到更具限制性的 3NF。从包含全部在同一个表(有时称为通用表)中的数据项以及这些数据项的相关性规则的初始数据库中,有一个过程可以创建具有多个表的等效数据库,所有这些表都在 3NF 中。 (这就是我们所说的 3NF 数据库的意思——它的所有表都具有 3NF 形式。)当我们继续阅读本章时,我们会发现任何不遵守 3NF 的表都可以通过这样的方式分解为不同的表(1) 每个分解表都位于有效的 3NF 中,并且 (2) 所有这些分解表的联接准确地包含分解它们的表中的信息。由初始通用表产生的 3NF 表集称为数据库的 3NF无损分解。
There is a third desirable property that we can always provide with a 3NF decomposition. Note that when a new row is added
to one of the tables in the 3NF decomposition (or an old row is updated), it is possible that an erroneous change might break
one of the rules of data item relatedness, mentioned earlier as part of the design input. We wish to impose a constraint on
Insert and Update operations so that such errors will not corrupt the data. The third property that we consider important
in a decomposition, then, is (3) when a table Insert or Update occurs, the possible relatedness rules that might be broken
can be tested by validating data items in the single table affected; there is no need to perform table joins in order to validate
these rules. A 3NF decomposition constructed to have the three desirable properties just mentioned is generally considered
an acceptable database design. It turns out that a further decomposition of tables in 3NF to the more restrictive BCNF is
often unnecessary (many real-world databases in 3NF are also in BCNF), but in cases where further decomposition results, property
(3) no longer holds in the result. Many database designers therefore settle on 3NF design.
我们总是可以通过 3NF 分解来提供第三个理想的属性。请注意,当在 3NF 分解中向其中一个表添加新行(或更新旧行)时,错误的更改可能会破坏数据项相关性的规则之一,正如前面作为 3NF 分解的一部分提到的那样。设计输入。我们希望对插入和更新操作施加限制,以便此类错误不会损坏数据。那么,我们认为在分解中重要的第三个属性是(3)当表插入或更新发生时,可以通过验证受影响的单个表中的数据项来测试可能被破坏的相关性规则;无需执行表连接即可验证这些规则。构建具有刚才提到的三个所需属性的 3NF 分解通常被认为是可接受的数据库设计。事实证明,将 3NF 中的表进一步分解为更具限制性的 BCNF 通常是不必要的(3NF 中的许多现实世界数据库也在 BCNF 中),但在进一步分解结果的情况下,属性 (3) 不再适用结果。因此,许多数据库设计者选择了 3NF 设计。
We will need a good deal of insight into the details of the normalization approach before we are able to properly deal with
some of these ideas. Let us begin to illustrate them with an example.
在我们能够正确处理其中一些想法之前,我们需要深入了解规范化方法的细节。让我们开始用一个例子来说明它们。
2.5.1. A Running Example: Employee Information
2.5.1.运行示例:员工信息
We need an example to clarify some of the definitions of database design that follow. Consider the data items listed in Figure 2.15, representing the employee information that must be modeled by the personnel department of a very large company.
我们需要一个例子来阐明下面的数据库设计的一些定义。考虑图 2.15中列出的数据项,代表一家非常大公司的人事部门必须建模的员工信息。
FIGURE 2.15. Unnormalized data items for employee information.
图 2.15。员工信息的非标准化数据项。
The data items beginning with emp_all represent attributes of what we would refer to in the ER approach as the entity Employees. Other entities underlying the data items of Figure 2.15 include Departments where employees in the company work and Skills that the various employees need to perform their jobs. In the normalization approach, we leave the entity concept unnamed
but reflect it in the data item interrelatedness rules that will be explained shortly, rules known as functional dependencies. The data item emp_id has been created to uniquely identify employees. Each employee works for some single department in the company, and the data
items beginning with dept_ describe the different departments; the data item dept_name uniquely identifies departments, and each department normally has a unique manager (also an employee) with a name given in
dept_mgrname. Finally, we assume that the various employees each possess some number of skills, such as typing or filing, and that data
items beginning with skill_ describe the skills that are tested and used for job assignment and salary determination by the company. The data item skill_id uniquely identifies the skill, which also has a name, skill_name. For each employee who possesses a particular skill, the skill_date describes the date when the skill was last tested, and skill_lvl describes the level of skill the employee displayed at that test.
数据项开头为 emp_all 表示我们在 ER 方法中称为实体的属性 Employees 。图 2.15数据项的其他实体包括 Departments 公司员工在哪里工作以及 Skills 各种员工需要完成他们的工作。在规范化方法中,我们保留实体概念未命名,但将其反映在稍后将解释的数据项相互关联性规则中,这些规则称为函数依赖关系。数据项 emp_id 是为了唯一地识别员工而创建的。每个员工都在公司的某个部门工作,数据项以 dept_ 描述不同的部门;数据项 dept_name 唯一标识部门,每个部门通常有一个唯一的经理(也是员工),其名称为 dept_mgrname 。最后,我们假设各个员工都拥有一定数量的技能,例如打字或归档,并且以 skill_ 描述公司测试和用于工作分配和薪资确定的技能。数据项 skill_id 唯一标识该技能,该技能也有一个名称, skill_name 。对于每个拥有特定技能的员工来说, skill_date 描述上次测试技能的日期,以及 skill_lvl 描述员工在该测试中表现出的技能水平。
Figure 2.16 provides a universal table, emp_info, containing all the data items of employee information from Figure 2.15. Because of 1NF, there can only be atomic values in each row and column position of a table. This poses a difficulty, because
each individual employee might have any number of skills. It is inappropriate to design a table with unique rows for each
emp_id and a distinct column for each piece of skill information—we don't even know the maximum number of skills for an employee,
so we don't know how many columns we should use for skill_id-1, … , skill_id-n. The only solution that will work in a single (universal) table is to give up on having a unique row for each employee and replicate information about the employee, pairing
the employee with different skills on different rows.
图2.16提供了一个通用表, emp_info ,包含图2.15中员工信息的所有数据项。由于1NF,表的每一行和列位置只能有原子值。这带来了困难,因为每个员工可能拥有多种技能。为每个表设计一个具有唯一行的表是不合适的 emp_id 每条技能信息都有一个不同的列 - 我们甚至不知道员工的最大技能数,因此我们不知道应该使用多少列 skill_id-1 , … , skill_id-n 。在单个(通用)表中有效的唯一解决方案是放弃为每个员工提供唯一的行,并复制有关该员工的信息,将具有不同技能的员工配对到不同的行上。
FIGURE 2.16. Single employee information table, emp_info, in 1NF.
图 2.16。单个员工信息表, emp_info ,在 1NF 中。
The intention of the database designer in the emp_info table of Figure 2.16 is that there is a row for every employee–skill pair existing in the company. From this, it should be clear that there cannot
be two rows with the same values for the pair of attributes emp_id and skill_id. The table emp_info has a (candidate) key consisting of the set (pair) of attributes emp_id and skill_id. We confirm that these attributes form a key by noting that the values they take on distinguish any pair of rows in any permissible
content of the table (i.e., for any rows u and v, either u(emp_id) ≠ v(emp_id) or u(skill_id) ≠ v(skill_id)), and that no subset of this set of attributes does the same (there can be two rows u and v such that u(emp_id) = v(emp_id), and there can be two rows r and s such that r(skill_id) = s(skill_id)). We assume in what follows that emp_id and skill_id is the primary key for the emp_info table.
数据库设计者的意图 emp_info 图 2.16的表中,公司中存在的每个员工技能对都有一行。由此可见,不能有两行的属性对具有相同的值 emp_id 和 skill_id 。桌子 emp_info 有一个由属性集(对)组成的(候选)键 emp_id 和 skill_id 。我们通过注意到这些属性所采用的值区分表中任何允许内容中的任何一对行(即,对于任何行 u 和 v,u( emp_id ) ≠ v( emp_id ) 或 u( skill_id ) ≠ v( skill_id )),并且这组属性的子集没有做同样的事情(可以有两行 u 和 v 使得 u( emp_id ) = v( emp_id ),并且可以有两行 r 和 s 使得 r( skill_id ) = s( skill_id ))。我们假设在下文中 emp_id 和 skill_id 是主键 emp_info 桌子。
It turns out that the database design of Figure 2.16 is a bad one, because it is subject to certain anomalies that can corrupt the data when data manipulation statements are
used to update the table.
事实证明,图 2.16的数据库设计是一个糟糕的设计,因为它会受到某些异常的影响,当使用数据操作语句更新表时,这些异常可能会损坏数据。
2.5.2. Anomalies of a Bad Database Design
2.5.2.不良数据库设计的异常情况
It appears that there might be a problem with the emp_info table of Figure 2.16 because there is replication of employee data on different rows. It seems more natural, with the experience we have had up
to now, to have a unique row for each distinct employee. Do we have a good reason for our feeling? Let us look at the behavior
of this table as SQL updates are applied.
看起来可能有问题 emp_info 图 2.16中的表是这样的,因为不同行上存在员工数据的复制。根据我们迄今为止的经验,为每个不同的员工设置一个独特的行似乎更自然。我们的感觉有充分的理由吗?让我们看看应用 SQL 更新时该表的行为。
If some employee were to get a new phone number, we would have to update multiple rows (all rows with different skills for
that employee) in order to change the emp_phone value in a uniform way. If we were to update the phone number of only one row, we might corrupt the data, leaving some rows for that employee with different phone numbers than others. This is commonly known as an update anomaly, and it arises because of data redundancy, duplication of employee phone numbers and other employee attributes on multiple rows of emp_info. Calling this an “anomaly,” with the implication of irregular behavior under update, may seem a bit extreme, since the SQL
language is perfectly capable of updating several rows at once with a Searched Update statement such as:
如果某个员工要获得新的电话号码,我们必须更新多行(该员工具有不同技能的所有行)才能更改 emp_phone 以统一的方式取值。如果我们只更新一行的电话号码,我们可能会损坏数据,为该员工留下一些与其他行不同的电话号码。这通常称为更新异常,它是由于多行数据冗余、员工电话号码和其他员工属性重复而引起的。 emp_info 。称其为“异常”,意味着更新时的不规则行为,可能看起来有点极端,因为 SQL 语言完全能够使用 Searched Update 语句一次更新多行,例如:
- update emp_info set emp_phone = :newphone where emp_id = :eidval;
In fact, the consideration that several rows will be updated is not even apparent from this syntax—the same Searched Update
statement would be used if the table had a unique row for each emp_id value. However, with this replication of phone numbers on different rows, a problem can still arise in performing an update
with a Positioned Update statement. If we encountered a row of the emp_info table in fetching rows from a cursor created for an entirely different purpose, the program might execute the following statement to allow the user to correct an invalid phone number:
事实上,从该语法来看,要更新多行的考虑甚至不明显 - 如果表中的每行都有唯一的行,则将使用相同的 Searched Update 语句 emp_id 价值。但是,通过在不同行上复制电话号码,在使用定位更新语句执行更新时仍然可能会出现问题。如果我们遇到一行 emp_info 当从为完全不同目的而创建的游标中获取行时,程序可能会执行以下语句以允许用户更正无效的电话号码:
- update emp_info set emp_phone = :newphone
- where current of cursor_name;
This would be a programming error, since an experienced programmer would realize that multiple rows need to be updated in order to change an employee phone
number. Still, it is the kind of error that could easily occur in practice, and we would like to be able to create a constraint on the table that makes such an erroneous update impossible. It turns out that the best way to provide such a constraint
is to reconfigure the data items into different tables so as to eliminate the redundant copies of information. This is exactly
what is achieved during the process of normalization. We sum up the idea of an update anomaly in a definition that makes reference
to our intuitive understanding of the ER model.
这将是一个编程错误,因为经验丰富的程序员会意识到需要更新多行才能更改员工电话号码。尽管如此,这种错误在实践中很容易发生,我们希望能够在表上创建一个约束,使这种错误的更新不可能发生。事实证明,提供这种约束的最佳方法是将数据项重新配置到不同的表中,从而消除信息的冗余副本。这正是正常化过程中所实现的目标。我们在定义中总结了更新异常的概念,该定义参考了我们对 ER 模型的直观理解。
Definition:Update Anomaly.
定义:更新异常。A table T is subject to an update anomaly when changing a single attribute value for an entity instance or relationship instance represented in the table that may require that several rows of T be updated.
当更改表中表示的实体实例或关系实例的单个属性值时,表 T 会遇到更新异常,这可能需要更新 T 的几行。
A different sort of problem, known as the delete anomaly, is reflected by the following definition.
以下定义反映了一种不同类型的问题,称为删除异常。
Definition:Delete Anomaly, Insert Anomaly.
定义:删除异常,插入异常。A table T is subject to a delete anomaly when deleting some row of the table to reflect the disappearance of some instance of an entity or relationship that can cause us to lose information about some instance of a different entity or relationship that we do not wish to forget. The insert anomaly is the other face of this problem for inserts, where we cannot represent information about some entity or instance without including information about some other instance of an entity or relationship that does not exist.
当删除表的某些行以反映实体或关系的某些实例的消失时,表 T 会出现删除异常,这可能导致我们丢失我们不希望丢失的有关不同实体或关系的某些实例的信息忘记。插入异常是插入问题的另一面,在这种情况下,如果不包含有关不存在的实体或关系的其他实例的信息,我们就无法表示有关某些实体或实例的信息。
For example, assume that a skill possessed by an employee must be retested after five years to remain current for that employee.
If the employee fails to have the skill retested (and the skill_date column updated), the skill will drop off the emp_info list (an automatic process deletes the row with this emp_id and skill_id). Now consider what happens if the number of skills for some employee goes to zero in the emp_info table with columns of Figure 2.16: No row of any kind will remain for the employee! We have lost the phone number and the department the employee works in because of this delete! This is clearly inappropriate
design. The insert anomaly exists in the emp_info table because we cannot enter a new employee into the table until the employee has acquired some skill; thus it becomes impossible
to hire an employee trainee. Clearly this is just the other face of the delete anomaly, where information about an employee
is lost when the employee loses his or her last skill.
例如,假设员工拥有的一项技能必须在五年后重新测试才能保持该员工的最新技能。如果员工未能重新测试技能(并且 skill_date 栏已更新),该技能将下降 emp_info 列表(自动过程删除带有此的行 emp_id 和 skill_id )。现在考虑一下,如果某些员工的技能数量在 emp_info 包含图 2.16的列的表:不会为员工保留任何类型的行!由于这次删除,我们丢失了该员工的电话号码和所在部门!这显然是不恰当的设计。插入异常存在于 emp_info 表,因为在新员工获得某种技能之前我们无法将其输入表中;因此不可能雇用实习生。显然,这只是删除异常的另一面,当员工失去最后一项技能时,有关员工的信息就会丢失。
Let us jump ahead to a solution for some of the problems mentioned so far. We simply factor the emp_info table and form two tables, the emps table and the skills table, whose column names are listed in Figure 2.17. Notice that the emps table has a unique row for each emp_id (and emp_id is the key for this table), while the skills table has a unique row for each emp_id and skill_id pair, and this pair forms a key for the table. Since there are multiple skills associated with each employee, the emp_id column that we have included in the skills table acts as a foreign key, relating skills back to employees. When we form the natural join of these two tables, the result
is exactly the emp_info table we started with. (We will need to demonstrate this fact in what follows, but for now you should take it on faith.)
However, the delete anomaly is no longer a problem, since if we delete all rows corresponding to skills for any individual
employee, this merely deletes rows in the skills table; the emps table still contains the information we want to retain about the employee, such as emp_phone, dept_name, and the like.
让我们直接找到目前提到的一些问题的解决方案。我们简单地考虑一下 emp_info 表并形成两个表, emps 表和 skills 表,其列名如图2.17所示。请注意, emps 表中的每一个都有一个唯一的行 emp_id (和 emp_id 是该表的关键),而 skills 表中的每一个都有一个唯一的行 emp_id 和 skill_id 对,这对形成表的键。由于每个员工都有多种技能, emp_id 我们已包含在 skills 表充当外键,将技能与员工联系起来。当我们形成这两个表的自然连接时,结果正是 emp_info 我们开始的表。 (我们需要在下面证明这一事实,但现在您应该相信它。)但是,删除异常不再是问题,因为如果我们删除与任何单个员工的技能相对应的所有行,这仅仅删除中的行 skills 桌子;这 emps 表仍然包含我们想要保留的有关员工的信息,例如 emp_phone , dept_name ,等等。
FIGURE 2.17. The emp_info database with two tables.
图 2.17。这 emp_info 有两个表的数据库。
In the sections that follow we will learn how to perform normalization, to factor tables so that all anomalies are removed
from our representation. Note that we haven't yet achieved this with the tables of Figure 2.17; as we will see shortly, a number of anomalies still exist. We will need a good deal of insight into the details of the normalization
approach before we are able to properly deal with fundamental normalization concepts. In the following sections we present
some needed mathematical preliminaries to database normalization. Because it is not always possible to show a real-life application
for all these concepts as they are introduced, we ask the reader to be patient. The value of the concepts will become clear
in the end.
在接下来的部分中,我们将学习如何执行标准化、因子表,以便从我们的表示中删除所有异常。请注意,我们尚未通过图 2.17的表格实现这一目标;正如我们很快就会看到的,许多异常现象仍然存在。在我们能够正确处理基本的标准化概念之前,我们需要深入了解标准化方法的细节。在以下各节中,我们将介绍数据库规范化所需的一些数学基础知识。因为在介绍这些概念时并不总是能够展示它们在现实生活中的应用,所以我们要求读者耐心等待。这些概念的价值最终将变得清晰。
2.6. Functional Dependencies
2.6。功能依赖
A functional dependency (FD) defines the most commonly encountered type of relatedness property between data items of a database. We usually only
need to consider relatedness between column attributes of a single relational table, and our definition reflects this. We
represent rows of a table T by the notation r1, r2, … , and follow standard convention by referring to attributes, rather than columns, of the table T. Individual attributes
of a table will be represented by letters such as A, B, … , and the letters X, Y, … will refer to subsets of attributes. We
follow the notation that ri(A) represents the value of row ri at attribute A.
函数依赖(FD) 定义数据库数据项之间最常遇到的相关性属性类型。我们通常只需要考虑单个关系表的列属性之间的相关性,我们的定义反映了这一点。我们用符号 r 1 , r 2 , … 来表示表 T 的行,并通过引用表 T 的属性而不是列来遵循标准约定。表的各个属性将由字母表示,例如 A、 B, … 和字母 X, Y, … 表示属性的子集。我们遵循 r i (A) 表示属性 A 处 r i行的值的表示法。
Definition. 定义。
Given a table T containing at least two attributes designated by A and B, we say that A → B (read “A functionally determines B” or “B is functionally dependent on A”), if and only if it is the intent of the designer, for any set of rows that might exist in the table, that two rows in T cannot agree in value for A and disagree in value for B. A more formal way of saying this is: Given two rows r1 and r2 in T, if r1(A) = r2(A), then r1(B) = r2(B). We will usually try to use the less formal statement in what follows.
给定一个包含至少两个由 A 和 B 指定的属性的表 T,我们说 A → B(读作“A 在功能上决定 B”或“B 在功能上依赖于 A”),当且仅当这是设计者认为,对于表中可能存在的任何行集,T 中的两行 A 的值不能一致,B 的值不能一致。更正式的说法是:给定两行 r 1和 r 2 T,如果r 1 (A) = r 2 (A),则r 1 (B) = r 2 (B)。我们通常会在下文中尝试使用不太正式的陈述。
This definition is comparable to the definition of a function in mathematics: For every element in attribute A (which appears on some row), there is a unique corresponding element (on
the same row) in attribute B. See Figure 2.18 for a graphical representation of the functional dependency concept.
这个定义类似于数学中函数的定义:对于属性 A 中的每个元素(出现在某行上),属性 B 中都有一个唯一对应的元素(在同一行上)。参见图 2.18 的图形表示函数依赖的概念。
FIGURE 2.18. Graphical depiction of functional dependency.
图 2.18。函数依赖性的图形描述。
EXAMPLE 2.8 例2.8
In the emp_info table of Figure 2.16, the following functional dependencies hold:
在 emp_info 如图 2.16所示,存在以下函数依赖关系:
- emp_id → emp_name
- emp_id → emp_phone
- emp_id → dept_name
In ER terms, we know this is true because emp_id is an identifier for the Employee entity, and the other data items simply represent other attributes of the entity; once the entity is identified, all the other attributes follow. But we also recognize these facts intuitively.
用急诊室术语来说,我们知道这是真的,因为 emp_id 是一个标识符 Employee 实体,其他数据项仅代表该实体的其他属性;一旦实体被识别,所有其他属性都会随之而来。但我们也凭直觉认识到这些事实。If we saw two rows in the single table emp_info design of Figure 2.16 with the same emp_id value and different emp_phone values, we would believe that the data are corrupted (assuming that every employee has a unique phone), but if we saw two rows with the same emp_phone value and different emp_id values, our first thought would be that they represented different employees who shared a phone. But the two situations are symmetric; it is simply our understanding of the data that makes the first one seem to imply corrupted data. We look to emp_id to break ties and uniquely identify employees. Note that what we are saying implies that, while emp_id functionally determines emp_phone, emp_phone does not functionally determine emp_id. We sometimes express this second fact with this notation:
如果我们在单个表中看到两行 emp_info 与图2.16的设计相同 emp_id 价值和不同 emp_phone 值,我们会认为数据已损坏(假设每个员工都有唯一的电话),但如果我们看到两行具有相同的值 emp_phone 价值和不同 emp_id 值,我们的第一个想法是它们代表共享一部手机的不同员工。但这两种情况是对称的;只是我们对数据的理解使得第一个数据似乎暗示着损坏的数据。我们期待 emp_id 打破联系并唯一识别员工。请注意,我们所说的意味着,虽然 emp_id 功能上决定 emp_phone , emp_phone 不能从功能上决定 emp_id 。我们有时用这样的符号来表达第二个事实:
EXAMPLE 2.9 例2.9
Following are three tables to investigate for functional dependencies between attributes (note that some of the tables break the unique row rule, but we accept them as valid tables for purposes of illustration). In these tables we assume that it is the intent of the designer that exactly this set of rows should lie in each table—no changes will ever occur in the tables. Thus, we can determine what functional dependencies exist by examining the data. This is a very unusual situation. Normally we determine functional dependencies from understanding the data items and rules of the enterprise (e.g., each employee has a single phone number, employees can share a phone, etc.), as in Example 2.8. These rules exist before any data have been placed in the tables.
以下是用于调查属性之间功能依赖关系的三个表(请注意,某些表违反了唯一行规则,但出于说明目的,我们接受它们作为有效表)。在这些表中,我们假设设计者的意图是这组行应该准确地位于每个表中,表中永远不会发生任何更改。因此,我们可以通过检查数据来确定存在哪些函数依赖关系。这是一个非常不寻常的情况。通常我们通过了解企业的数据项和规则来确定功能依赖性(例如,每个员工都有一个电话号码,员工可以共享一部电话等),如示例 2.8所示。这些规则在任何数据放入表之前就存在。
T1 T2 T3 Row # 排 # A B A B A B 1 x1 y1 x1 y1 x1 y1 2 x2 y2 x2 y4 x2 y4 3 x3 y1 x1 y1 x1 y1 4 x4 y1 x3 y2 X3 y2 5 x5 y2 x2 y4 X2 y4 6 x6 y2 x4 y3 X4 y4 In table T1 we can easily see that A → B; we merely need to check that for every pair of rows r1 and r2, if r1(A) = r2(A), then r1(B) = r2(B). However, there is no pair of rows in T1 with equal values for column A, so the condition is trivially satisfied. At the same time, in T1, B ↛ A (read “column B does not functionally determine column A”), since, for example, if r1 is row 1 and r2 is row 3, then r1(B) = r2(B) = y1, but r1(A) = x1 ≠ r2(A) = x3. In table T2, we have A → B (we just need to check that rows 1 and 3, which have matching pairs of A values, also have matching B values, and similarly check rows 2 and 5), and B → A. Finally, in table T3, A → B but B ↛ A (note that if r1 is row 2 and r2 is row 6, then r1(B) = r2(B) = y4, but r1(A) = x2 ≠ r2(A) = x4).
在表T1中我们很容易看出A→B;我们只需要检查每对行 r 1和 r 2 ,如果 r 1 (A) = r 2 (A),则 r 1 (B) = r 2 (B)。然而,T1 中不存在 A 列具有相同值的行对,因此条件基本满足。同时,在 T1 中,B ↛ A(读作“B 列在功能上不能确定 A 列”),因为,例如,如果 r 1是第 1 行,r 2是第 3 行,则 r 1 (B) = r 2 (B) = y1,但 r 1 (A) = x1 ≠ r 2 (A) = x3。在表 T2 中,我们有 A → B(我们只需要检查第 1 行和第 3 行是否有匹配的 A 值对,也有匹配的 B 值,并类似地检查第 2 行和第 5 行),以及 B → A。 ,在表 T3 中,A → B 但 B ↛ A(请注意,如果 r 1为第 2 行且 r 2为第 6 行,则 r 1 (B) = r 2 (B) = y4,但 r 1 (A) = x2 ≠ r 2 (A) = x4)。
It is obvious how to extend the definition for functional dependency to its full generality, dealing with sets of attributes.
显然如何将函数依赖的定义扩展至其完整的通用性,处理属性集。
Definition. 定义。
We are given a table T with two sets of attributes, designated by X = A1 A2, …, Ak and Y = B1 B2, …, Bm, where some of the attributes from X may overlap with some of the attributes from Y. We say that X → Y (read “X functionally determines Y” or “Y is functionally dependent on X”), if and only if it is the intent of the designer, for any set of rows that might exist in the table, that two rows in T cannot agree in value on the attributes of X and simultaneously disagree in value on the attributes of Y. Note that two rows agree in value on the attributes of X if they agree on all of the attributes of X, and they disagree in value on the attributes of Y if they disagree on any of the attributes of Y. More formally, given any two rows r1 and r2 in T, if r1(Ai) = r2(Ai) for every Ai in X, then r1(Bj) = r2(Bj) for every Bj in Y.
给定一个包含两组属性的表 T,指定为 X = A 1 A 2 , …, A k和 Y = B 1 B 2 , …, B m ,其中 X 中的一些属性可能与 X 中的一些属性重叠。来自 Y 的属性。当且仅当它是设计者的意图时,对于可能存在的任何行集,我们说 X → Y(读作“X 在功能上决定 Y”或“Y 在功能上依赖于 X”)在表中,T 中的两行不能在 X 的属性上的值一致,同时在 Y 的属性上的值不一致。请注意,如果两行在 X 的属性上的值一致,则它们在 X 的属性上一致。 X,如果他们对 Y 的任何属性有不同意见,那么他们对 Y 属性的值也有不同意见。更正式地讲,给定 T 中的任意两行 r 1和 r 2 ,如果 r 1 (A i ) = r 2 (A i ) 对于 X 中的每个 A i ,则对于 Y 中的每个 B j ,r 1 (B j ) = r 2 (B j )。
EXAMPLE 2.10 例2.10
We list here what we claim are all the functional dependencies for the emp_info table of Figure 2.16 (with missing attributes in Figure 2.15). With this FD list, all the information needed for the normalization procedure has been provided.
我们在这里列出了我们声称的所有功能依赖项 emp_info 图 2.16的表(图 2.15中缺少属性)。通过这个 FD 列表,已经提供了标准化过程所需的所有信息。
- emp_id → emp_name emp_phone dept_name
- dept_name → dept_phone dept_mgrname
- skill_id → skill_name
- emp_id skill_id → skill_date skill_lvl
You should be able to interpret each of these functional dependencies and see if you agree with them. For example, FD 1 states that if we know the emp_id, then the emp_name, emp_phone, and dept_name are determined. Note that FD 1 is just another way of stating the FDs of Example 2.8. That is, if we know the FDs given there,
您应该能够解释每个功能依赖项并查看您是否同意它们。例如,FD 1 指出如果我们知道 emp_id ,那么 emp_name , emp_phone , 和 dept_name 已确定。请注意,FD 1 只是说明例 2.8的 FD 的另一种方式。也就是说,如果我们知道那里给出的 FD,
- emp_id → emp_name, emp_id → emp_phone, and emp_id → dept_name,
we can conclude that FD 1 holds.
我们可以得出结论 FD 1 成立。To say this in yet another way, the three FDs of Example 2.8 together imply FD 1. Similarly, from FD 1 we can conclude that the three FDs of Example 2.8 hold. A simple rule of FD implication is used to arrive at these conclusions, based on the FD definition. We will learn more about such rules shortly.
换句话说,例 2.8的三个 FD 一起意味着 FD 1。同样,从 FD 1 我们可以得出例 2.8的三个 FD 成立的结论。基于 FD 定义,使用 FD 蕴涵的简单规则得出这些结论。我们很快就会了解有关此类规则的更多信息。Because the FDs given in 1–4 are all the FDs for the emp_info table, we can conclude, for example, that the designer does not intend that skill_name be unique for a specific skill. Since skill_id is a unique identifier for the skill, to have a unique skill_name would presumably mean that skill_name → skill_id, the reverse of FD 3. However, this FD does not exist in the set, nor is it implied. (A quick test to see that it isn't implied is to note that skill_name does not occur on the left side of any FD in the set.) We also note that we do not have the FD dept_mgrname → dept_name, which presumably means that although each department has a unique manager, one manager might simultaneously manage more than one department. Finally, note that skill_lvl and skill_date are only meaningful as attributes of the relationship between an Employee entity and a Skill entity. If we said that a given employee had a skill level of 9, it would be necessary to ask, “For what skill?”; and if we said that we know there is a skill level of 9 for “typing,” we would wonder, “What employee?” Thus, we need to name both the emp_id and the skill_id to determine these attributes.
因为1-4中给出的FD都是针对 emp_info 例如,我们可以得出结论,设计者无意 skill_name 对于特定技能来说是独一无二的。自从 skill_id 是技能的唯一标识符,具有唯一的 skill_name 大概意味着 skill_name → skill_id ,与FD 3相反。但是,该FD在集合中不存在,也不是隐含的。 (快速测试一下是否暗示这一点是要注意的是 skill_name 不会出现在集合中任何 FD 的左侧。)我们还注意到我们没有 FD dept_mgrname → dept_name ,这可能意味着虽然每个部门都有一位唯一的经理,但一名经理可能同时管理多个部门。最后,请注意 skill_lvl 和 skill_date 仅作为对象之间关系的属性才有意义 Employee 实体和一个 Skill 实体。如果我们说某位员工的技能级别为 9,则有必要问“什么技能?”;如果我们说我们知道“打字”的技能水平为 9,我们会想,“哪个员工?”因此,我们需要命名 emp_id 和 skill_id 来确定这些属性。
2.6.1. Logical Implications among Functional Dependencies
2.6.1.函数依赖关系之间的逻辑含义
In Example 2.10 a number of conclusions were drawn that depended on understanding implications among functional dependencies. In what follows,
we will derive certain rules of implication among FDs that follow directly from previous definition. The reader needs to understand
many such rules at both a rigorous and an intuitive level to properly appreciate some of the techniques of normalization that
are presented in later sections. We begin with a very basic rule.
在示例 2.10中,得出了许多结论,这些结论取决于对函数依赖关系之间含义的理解。接下来,我们将直接从先前的定义推导出 FD 之间的某些蕴涵规则。读者需要在严格和直观的层面上理解许多这样的规则,才能正确理解后面几节中介绍的一些规范化技术。我们从一个非常基本的规则开始。
Theorem 2.1:Inclusion Rule.
定理2.1:包含规则。We are given a table T with a specified heading (set of attributes), Head (T). If X and Y are sets of attributes contained in Head(T), and Y ⊆ X, then X → Y.
我们得到一个带有指定标题(属性集)Head (T) 的表 T。如果 X 和 Y 是 Head(T) 中包含的属性集,且 Y ⊆ X,则 X → Y。
- PROOF. To show that X → Y, we need only demonstrate that there is no pair of rows u and v that agree in value on the attributes of X and simultaneously disagree in value on the attributes of Y. But this is obvious, since two rows can never agree in value on the attributes of X and simultaneously disagree on a subset of those attributes.▪
证明。为了证明 X → Y,我们只需要证明不存在一对行 u 和 v 在 X 的属性上的值一致,同时在 Y 的属性上的值不一致。但这很明显,因为两行可以永远不会在 X 的属性值上达成一致,同时也不同意这些属性的子集。▪
The inclusion rule provides us with a large number of FDs that are true for any table of attributes, irrespective of the intended
content.
包含规则为我们提供了大量适用于任何属性表的 FD,无论预期内容如何。
Definition:Trivial Dependency.
定义:微不足道的依赖。A trivial dependency is an FD of the form X → Y, in a table T where X ∪ Y ⊆ Head(T), that will hold for any possible content of the table T.
平凡依赖是表 T 中 X → Y 形式的 FD,其中 X ∪ Y ⊆ Head(T),它将适用于表 T 的任何可能内容。
We can prove that trivial dependencies always arise as a result of the inclusion rule.
我们可以证明,平凡的依赖关系总是由于包含规则而出现。
Theorem 2.2. 定理2.2。
Given a trivial dependency X → Y in T, it must be the case that Y ⊆ X.
给定 T 中的平凡依赖 X → Y,则必定是 Y ⊆ X。
- PROOF. Given the table T with a heading containing the attributes in X ∪ Y, consider the set of attributes Y – X (attributes in Y that are not in X). Since X → Y is a trivial dependency, it must hold for any possible content of the table T. We will assume Y – X is nonempty and reach a contradiction. If the set Y – X is nonempty, let A be an attribute contained in Y – X. Since A ∉ X, it is possible to construct two rows, u and v, in the table T, alike in values for all attributes in X, but having different values for the attribute A. But with these two rows in T, the dependency X → Y does not hold, since rows u and v agree in value on attributes of X and disagree on attributes of Y (because A ∈ Y). Since a trivial dependency is supposed to hold for any possible content of the table T, we have created a contradiction, and from this we conclude that the set Y – X cannot contain an attribute A, and therefore Y ⊆ X.▪
证明。给定表 T,其标题包含 X ∪ Y 中的属性,请考虑属性集 Y – X(Y 中不存在于 X 中的属性)。由于 X → Y 是一个平凡的依赖关系,因此它对于表 T 的任何可能内容都必须成立。我们将假设 Y – X 是非空的并得出矛盾。如果集合 Y – X 非空,则设 A 为 Y – X 中包含的属性。由于 A ∉ X,因此可以在表 T 中构造两行 u 和 v,X 中所有属性的值都相同,但属性 A 具有不同的值。但是对于 T 中的这两行,依赖性 X → Y 不成立,因为行 u 和 v 在 X 的属性上一致,但在 Y 的属性上不一致(因为 A ∈ Y )。由于表 T 的任何可能内容都应该存在平凡依赖性,因此我们创建了一个矛盾,由此我们得出结论:集合 Y – X 不能包含属性 A,因此 Y ⊆ X。
2.6.2. Armstrong's Axioms
2.6.2.阿姆斯特朗公理
The inclusion rule is one rule of implication by which FDs can be generated that are guaranteed to hold for all possible tables.
It turns out that from a small set of basic rules of implication, we can derive all others. We list here three basic rules
that we call Armstrong's Axioms (Figure 2.19).
包含规则是一种暗示规则,通过该规则可以生成保证适用于所有可能表的 FD。事实证明,从一小组基本蕴涵规则中,我们可以推导出所有其他规则。我们在此列出了三个基本规则,我们称之为阿姆斯特朗公理(图 2.19 )。
FIGURE 2.19. Armstrong's Axioms.
图 2.19。阿姆斯特朗公理。
Definition:Armstrong's Axioms.
定义:阿姆斯特朗公理。Assume in what follows that we are given a table T, and that all sets of attributes X, Y, Z are contained in Head(T). Then we have the following rules of implication.
假设下面给定一个表 T,并且所有属性集 X、Y、Z 都包含在 Head(T) 中。那么我们有以下蕴涵规则。
- Inclusion rule: If Y ⊆ X, then X → Y.
包含规则:如果 Y ⊆ X,则 X → Y。- Transitivity rule: If X → Y and Y → Z, then X → Z.
传递性规则:如果 X → Y 且 Y → Z,则 X → Z。- Augmentation rule: If X → Y, then X Z → Y Z.
增广规则:如果X → Y,则XZ → Y Z。Just as we list attributes with spaces between them in a functional dependency to represent a set containing those attributes, two sets of attributes in sequence imply a union operation. Thus, the augmentation rule could be rewritten: If X → Y, then X ∪ Z → Y ∪ Z.
正如我们在函数依赖中列出属性之间有空格来表示包含这些属性的集合一样,按顺序排列的两组属性意味着并集操作。因此,增广规则可以重写:如果 X → Y,则 X ∪ Z → Y ∪ Z。
We have already proved the inclusion rule, in Theorem 2.1, so let us prove the augmentation rule now in Theorem 2.3.
我们已经在定理 2.1 中证明了包含规则,所以现在让我们在定理 2.3 中证明增广规则。
Theorem 2.3:Augmentation Rule.
定理2.3:增广规则。We wish to show that if X → Y, then X Z → Y Z. Assume that X → Y, and consider any two rows u and v in T that agree on the attributes of X Z (i.e., X ∪ Z). We need merely show that u and v cannot disagree on the attributes of Y Z. But since u and v agree on all attributes of X Z, they certainly agree on all attributes of X; and since we are assuming that X → Y, then u and v must agree on all attributes of Y. Similarly, since u and v agree on all attributes of X Z, they certainly agree on all attributes of Z. Therefore, u and v agree on all attributes of Y and all attributes of Z, and the proof is complete.▪
我们希望证明,如果 X → Y,则 XZ → Y Z。假设 X → Y,并考虑 T 中符合 XZ 属性的任意两行 u 和 v(即 X ∪ Z)。我们只需要证明 u 和 v 不能对 Y Z 的属性有不同意见。但是,由于 u 和 v 对 XZ 的所有属性都一致,所以它们当然对 X 的所有属性都一致;并且由于我们假设 X → Y,那么 u 和 v 必须在 Y 的所有属性上达成一致。同样,由于 u 和 v 在 XZ 的所有属性上达成一致,因此它们当然在 Z 的所有属性上达成一致。因此,u 和 v 一致Y 的所有属性和 Z 的所有属性,证明完成。▪
From Armstrong's Axioms we can prove a number of other rules of implication among FDs. Furthermore, we can do this without
any further recourse to the FD definition, using only the axioms themselves.
从阿姆斯特朗公理我们可以证明 FD 之间的许多其他蕴涵规则。此外,我们可以在不进一步求助于 FD 定义的情况下,仅使用公理本身来做到这一点。
Theorem 2.4:Some Implications of Armstrong's Axioms.
定理 2.4:阿姆斯特朗公理的一些含义。Again we assume that all sets of attributes below—W, X, Y, and Z—are contained in the heading of a table T.
我们再次假设以下所有属性集(W、X、Y 和 Z)都包含在表 T 的标题中。
- Union rule: If X → Y and X → Z, then X → Y Z.
并集规则:如果 X → Y 且 X → Z,则 X → Y Z。- Decomposition rule: If X → Y Z, then X → Y and X → Z.
分解规则:如果X→YZ,则X→Y且X→Z。- Pseudotransitivity rule: If X → Y and W Y → Z, then X W → Z.
伪及物性规则:如果 X → Y 且 WY → Z,则 XW → Z。- Set accumulation rule: If X → Y Z and Z → W, then X → Y Z W.
设定累加规则:若X→YZ且Z→W,则X→YZ W。
- PROOF. We prove only 2 and 4 here. For 2, note that Y Z = Y ∪ Z. Thus, Y Z → Y by the inclusion rule (axiom 1). By transitivity (axiom 2), X → Y Z and Y Z → Y implies X → Y. Similarly, we can show X → Z, and the decomposition rule (2) has been demonstrated. For 4, we are given that (a) X → Y Z and (b) Z → W. Using axiom 3, we augment (b) with Y Z to obtain Y Z Z → Y Z W. Since Z Z = Z, we have (c) Y Z → Y Z W. Finally, by transitivity, using (a) and (c), we have X → Y Z W, and the set accumulation rule (4) has been demonstrated.▪
证明。这里我们只证明2和4。对于 2,请注意 YZ = Y ∪ Z。因此,根据包含规则(公理 1),YZ → Y。根据传递性(公理2),X→YZ和YZ→Y蕴涵X→Y。类似地,我们可以证明X→Z,分解规则(2)已得到证明。对于 4,我们给出 (a) X → YZ 和 (b) Z → W。使用公理 3,我们用 YZ 增强 (b) 以获得 YZZ → YZ W。由于 ZZ = Z,我们有 (c) YZ → YZ W。最后,通过传递性,利用(a)和(c),我们有X → YZW,并证明了集合累加规则(4)。▪
We state without proof the rather startling result that all valid rules of implication among FDs can be derived from Armstrong's Axioms. In fact, if F is any set of FDs, and X → Y is
an FD that cannot be shown by Armstrong's Axioms to be implied by F, then there must be a table T in which all of the FDs in F hold but X → Y is false. Because of this result, Armstrong's
Axioms are often referred to as being complete, meaning that no other rule of implication can be added to increase their effectiveness.
我们在没有证明的情况下陈述了相当惊人的结果,即 FD 中所有有效的蕴涵规则都可以从阿姆斯特朗公理中导出。事实上,如果 F 是任意 FD 集合,并且 X → Y 是一个不能被阿姆斯特朗公理证明为 F 所隐含的 FD,那么一定存在一个表 T,其中 F 中的所有 FD 都成立,但 X → Y 为假。由于这个结果,阿姆斯特朗公理通常被称为完整公理,这意味着不能添加其他暗示规则来提高其有效性。
Recall that in Example 2.10 we pointed out that the three FDs from Example 2.8,
回想一下,在示例 2.10中,我们指出示例 2.8中的三个 FD,
- emp_id → emp_name, emp_id → emp_phone, and emp_id → dept_name,
allowed us to conclude that FD 1 holds:
让我们得出结论 FD 1 成立:
- emp_id → emp_name emp_phone dept_name.
This fact follows from two applications of the union rule of Theorem 2.4. The inverse implication, that FD 1 implies the first
three, follows from two applications of the decomposition rule in the same theorem. Whenever we have some set of attributes
X on the left of a set of FDs, we can take a union of all sets of attributes on the right of these FDs and combine the FDs
into one. For example, assume we have the attributes A, B, C, D, E, F, and G in the heading of a table T, and we know that
the following FDs hold:
这一事实源自定理 2.4 并集规则的两个应用。 FD 1 暗示前三个的逆蕴涵是从同一定理中分解规则的两个应用得出的。每当我们在一组 FD 的左侧有一些属性 X 时,我们就可以将这些 FD 右侧的所有属性集进行并集,并将这些 FD 合并为一个。例如,假设表 T 的标题中有属性 A、B、C、D、E、F 和 G,并且我们知道以下 FD 成立:
Then we can combine these FDs into one by successive applications of the union rule:
然后我们可以通过连续应用并集规则将这些 FD 合并为一个:
As a matter of fact, we can add the trivial dependency B D → B D and conclude
事实上,我们可以添加简单的依赖关系 BD → BD 并得出结论
However, we normally try to avoid including information in a set of dependencies that can be derived using Armstrong's Axioms
from a more fundamental set. Thus, we might want to return to the FD form of B D → A C E F G. Note that if we had another
attribute H in the heading of the table T not mentioned in any FD, we could conclude that in addition to this FD, the following
FD holds:
然而,我们通常会尽量避免将信息包含在一组依赖关系中,这些依赖关系可以使用阿姆斯特朗公理从更基本的集合中派生出来。因此,我们可能想返回 BD → ACEF G 的 FD 形式。请注意,如果表 T 的标题中还有任何 FD 中未提及的另一个属性 H,我们可以得出结论,除了该 FD 之外,还有以下属性: FD 认为:
But since this FD can be derived from B D → A C E F G by using the augmentation rule, we would once again prefer this shorter
FD.
但由于这个 FD 可以通过使用增广规则从 BD → ACEFG 导出,所以我们再次更喜欢这个较短的 FD。
EXAMPLE 2.11 例2.11
List a minimal set of functional dependencies satisfied by the following table T, where we assume that it is the intent of the designer that exactly this set of rows lies in the table. Once again, we point out that it is unusual to derive FDs from the content of a table. Normally we determine functional dependencies from understanding the data items and rules of the enterprise. Note that we do not yet have a rigorous definition of a minimal set of FDs, so we simply try to arrive at a minimal set in an intuitive way.
列出下表 T 所满足的最小功能依赖性集,其中我们假设设计者的意图正是这组行位于表中。我们再次指出,从表的内容中派生 FD 是不常见的。通常我们通过了解企业的数据项和规则来确定功能依赖关系。请注意,我们还没有 FD 的最小集合的严格定义,因此我们只是尝试以直观的方式得出最小集合。
T Row # 排 # A B C D 1 a1 b1 c1 d1 2 a1 b1 c2 d2 3 a2 b1 c1 d3 4 a2 b1 c3 d4 Analysis 分析
Let us start by considering FDs with a single attribute on the left. Clearly we always have the trivial FDs, A → A, B → B, C → C, and D → D, but we are asking for a minimal set of dependencies, so we won't list them. From the specific content of the table we are able to derive the following. (a) All values of the B attribute are the same, so it can never happen for any other attribute P (i.e., where P represents A, C, or D) that r1(P) = r2(P), while r1(B) ≠ r2(B); thus, we see that A → B, C → B, and D → B. At the same time no other attributes P are functionally dependent on B, since they all have at least two distinct values, and so there are always two rows r1 and r2 such that r1(P) ≠ r2(P), while r1(B) = r2(B); thus, B ↛ A, B ↛ C, and B ↛ D. (b) Because the D values are all different, in addition to D → B of part (a), we also have D → A and D → C; at the same time D is not functionally dependent on anything else since all other attributes have at least two duplicate values. So in addition to B ↛ D of part (a), we have A ↛ D and C ↛ D. (c) We have A ↛ C (because of rows 1 and 2) and C ↛ A (because of rows 1 and 3). Therefore, we can list all FDs (and failed FDs) with a single attribute on the left. (Letters in parentheses are keyed to the parts above that give us each fact.)
让我们首先考虑左侧具有单个属性的 FD。显然,我们总是有简单的 FD,A → A、B → B、C → C 和 D → D,但我们要求的是最小的依赖关系集,因此我们不会列出它们。从表中的具体内容我们可以得出以下结论。 (a) B 属性的所有值都相同,因此对于任何其他属性 P(即,其中 P 代表 A、C 或 D),永远不会发生 r 1 (P) = r 2 (P),而r 1 (B) ≠ r 2 (B);因此,我们看到 A → B、C → B 和 D → B。同时,没有其他属性 P 在功能上依赖于 B,因为它们都至少有两个不同的值,因此总是有两行 r 1和r 2使得r 1 (P)≠r 2 (P),而r 1 (B)=r 2 (B); (b) 由于 D 值不同,除了 (a) 部分的 D → B 之外,还有 D → A 和 D → C;同时 D 在功能上不依赖于任何其他属性,因为所有其他属性都至少有两个重复值。因此,除了 (a) 部分的 B ↛ D 之外,我们还有 A ↛ D 和 C ↛ D。 (c) 我们有 A ↛ C (因为第 1 行和第 2 行)和 C ↛ A (因为第 1 行和第 3 行) )。因此,我们可以在左侧列出具有单个属性的所有 FD(以及失败的 FD)。 (括号中的字母是上面为我们提供每个事实的部分的关键。)
- (a) A → B (a) B ↛ A (c) C ↛ A (b) D → A
- (c) A ↛ C (a) B ↛ C (a) C → B (a) D → B
- (b) A ↛ D (a) BvD (b) C ↛ D (b) D → C
By the union rule, whenever a single attribute on the left functionally determines several other attributes, as with D above, we can combine the attributes on the right: D → A B C. From the analysis so far, we have the following set of FDs (which we believe to be minimal):
根据并集规则,每当左侧的单个属性在功能上决定了其他几个属性时,就像上面的 D 一样,我们可以将右侧的属性组合起来:D → AB C。从到目前为止的分析来看,我们有以下一组 FD (我们认为这是最小的):
- 1. A → B 2. C → B 3. D → A B C
1.A→B 2.C→B 3.D→ABCNow consider FDs with pairs of attributes on the left. (d) Any pair containing D determines all other attributes, by FD 3 above and the augmentation rule, so there is no new FD with D on the left that is not already implied. (e) The attribute B, combined with any other attribute P on the left, still functionally determines only those attributes already determined by P, as we see by the following argument. If P ↛ Q, this means there are rows r1 and r2 such that r1(Q) ≠ r2(Q), while r1(P) = r2(P). But because B has equal values on all rows, we know that r1(B P) = r2(B P) as well, so B P ↛ Q. Thus, we get no new FDs with B on the left.
现在考虑左侧带有属性对的 FD。 (d) 任何包含 D 的对通过上面的 FD 3 和增强规则确定所有其他属性,因此不存在左侧带有 D 的新 FD 尚未隐含。 (e) 属性 B 与左侧的任何其他属性 P 相结合,仍然在功能上仅确定那些已经由 P 确定的属性,正如我们通过以下参数看到的那样。如果 P ↛ Q,这意味着存在 r 1和 r 2行,使得 r 1 (Q) ≠ r 2 (Q),而 r 1 (P) = r 2 (P)。但因为 B 在所有行上具有相同的值,所以我们也知道 r 1 (BP) = r 2 (BP),因此 BP ↛ Q。因此,我们不会得到左侧有 B 的新 FD。(f) Now the only pair of attributes that does not contain B or D is A C, and since A C has distinct values on each row (examine table T again!), we know that A C → A B C D. This is new. We can show most of this by inference rules: It is trivial that A C → A and AC → C, by inclusion, and we already knew that A → B, so it is easy to show that A C → B. Thus, the only new fact we get from A C → A B C D is that A C → D, and we are searching for a minimal set of FDs, so that is all we include as FD 4 in the list below. If we now consider looking for FDs with triples of attributes on the left, we see that we can derive from the FDs we already have that any triple functionally determines all other attributes. Any triple that contains D clearly does, and the only triple not containing D is A B C, where A C alone functionally determines all other attributes. Clearly the same holds for any set of four attributes on the left.
(f) 现在唯一不包含 B 或 D 的属性对是 AC,并且由于 AC 每行都有不同的值(再次检查表 T!),我们知道 AC → ABC D。这是新的。我们可以通过推理规则来证明其中的大部分内容:通过包含,AC → A 和 AC → C 是微不足道的,并且我们已经知道 A → B,因此很容易证明 AC → B。因此,唯一新的事实上,我们从 AC → ABCD 得到的是 AC → D,并且我们正在搜索 FD 的最小集合,因此这就是我们在下面的列表中作为 FD 4 包含的全部内容。如果我们现在考虑寻找左侧具有三元组属性的 FD,我们会发现我们可以从已有的 FD 中推导出任何三元组在功能上决定所有其他属性。任何包含 D 的三元组显然都是如此,唯一不包含 D 的三元组是 ABC,其中 AC 单独在功能上决定了所有其他属性。显然,这对于左侧的任何四个属性组都适用。The complete set of FDs implicit in table T is therefore the following:
因此,表 T 中隐含的完整 FD 集如下:
- 1. A → B 2. C → B 3. D → A B C 4. A C → D
1.A→B 2.C→B 3.D→ABC 4.AC→DThe first three FDs come from the earlier list of FDs with single attributes on the left, while the last FD, A C → D, is the new one generated with two attributes listed on the left. It will turn out that this set of FDs is not quite minimal, despite all our efforts to derive a minimal set. We will see this after we have had a chance to define what we mean by a minimal set of FDs.
前三个 FD 来自左侧具有单个属性的早期 FD 列表,而最后一个 FD AC → D 是左侧列出的两个属性生成的新 FD。事实证明,尽管我们竭尽全力导出一个最小集合,但这组 FD 并不是最小的。在我们有机会定义最小 FD 集的含义之后,我们将会看到这一点。
2.6.3. Closure, Cover, and Minimal Cover
2.6.3.闭合、覆盖和最小覆盖
The implication rules for FDs derived from Armstrong's Axioms mean that whenever a set F of functional dependencies is given,
a much larger set may be implied.
从阿姆斯特朗公理导出的 FD 蕴含规则意味着,只要给出一组 F 函数依赖关系,就可能隐含一个更大的集合。
Definition: Closure of a Set of FDs.
定义:一组 FD 的闭包。Given a set F of FDs on attributes of a table T, we define the closure of F, symbolized by F+, to be the set of all FDs implied by F.
给定表 T 的属性上的 FD 集合 F,我们定义 F 的闭包(用 F +表示)为 F 隐含的所有 FD 的集合。
EXAMPLE 2.12 例2.12
Consider the set F of FDs given by
考虑由下式给出的 FD 集合 FBy the transitivity rule, A → B and B → C together imply A → C, which must be included in F+. Also, B → C and C → D imply B → D. Indeed, every single attribute appearing prior to the terminal one in the sequence A B C D E F G H can be shown by transitivity to functionally determine every single attribute on its right in the sequence. We also have trivial FDs such as A → A. Next, using the union rule, we can generate other FDs, such as A → A B C D E F G H. In fact, by using the union rule in different combinations, we can show A → (any nonempty subset of A B C D E F G H). There are 28 – 1 = 255 such nonempty subsets. All FDs we have just derived are contained in F+.
根据及物性规则,A → B 和 B → C 一起意味着 A → C,它必须包含在 F +中。此外,B → C 和 C → D 意味着 B → D。事实上,序列 ABCDEFGH 中出现在末端属性之前的每个单个属性都可以通过传递性来显示,以在功能上确定序列中其右侧的每个单个属性。我们还有简单的 FD,例如 A → A。接下来,使用并集规则,我们可以生成其他 FD,例如 A → ABCDEFG H。事实上,通过在不同的组合中使用并集规则,我们可以显示 A →(任意ABCDEFGH 的非空子集)。有 2 8 – 1 = 255 个这样的非空子集。我们刚刚导出的所有 FD 都包含在 F +中。
Functional dependencies usually arise in creating a database out of commonsense rules. In terms of ER concepts, it is clear
that data items corresponding to identifiers of entities functionally determine all other attributes of that entity. Similarly,
attributes of relationships are uniquely determined by the identifiers of entities that take part in the relationship. We
would normally expect to start with a manageable set F of FDs in our design, but as Example 2.12 shows, the set of FDs that is implied by F could conceivably grow exponentially. In what follows, we try to find a way to
speak of what is implied by a set F of FDs without this kind of exponential explosion. What we are leading up to is a way
to determine a minimal set of FDs that is equivalent to a given set F. We will also provide an algorithm to derive such a minimal set in a reasonable
length of time.
功能依赖性通常出现在根据常识规则创建数据库时。就 ER 概念而言,很明显,与实体标识符相对应的数据项在功能上决定了该实体的所有其他属性。类似地,关系的属性由参与关系的实体的标识符唯一确定。我们通常期望在设计中从可管理的 FD 集合 F 开始,但如示例 2.12所示,F 隐含的 FD 集合可以想象成指数增长。接下来,我们尝试找到一种方法来表达 FD 集合 F 所隐含的内容,而无需这种指数爆炸。我们正在寻找一种方法来确定与给定集合 F 等价的最小 FD 集合。我们还将提供一种算法来在合理的时间长度内导出这样的最小集合。
Definition: FD Set Cover.
定义:FD 套盖。A set F of FDs on a table T is said to cover another set G of FDs on T, if the set G of FDs can be derived by implication rules from the set F, or in other words, if G ⊆ F+. If F covers G and G covers F, then the two sets of FDs are said to be equivalent, and we write F ≡ G.
如果 FD 集合 G 可以通过集合 F 的蕴涵规则导出,或者换句话说,如果 G ⊆ F + ,则称表 T 上的 FD 集合 F 覆盖 T 上的另一 FD 集合 G。如果 F 覆盖 G 并且 G 覆盖 F,则称这两组 FD 是等价的,我们写为 F ≠ G。
EXAMPLE 2.13 例2.13
Consider the two sets of FDs on the set of attributes A B C D E:
考虑属性集 ABCDE 上的两组 FD:We will demonstrate that F covers G, by showing how all FDs in G are implied by FDs in F. In what follows we derive implications of FDs in F using the various inference rules from previous definitions and Theorem 2.4. Since in F we have (a) B → C D and (b) B → A, by the union rule we see that (c) B → A C D. The trivial functional dependency B → B clearly holds, and in union with (c), we get (d) B → A B C D. By the decomposition rule, B → A B C D implies (e) B → A D, and since (f) A D → E is in F, by transitivity we conclude (g) B → E. This, in union with (d), gives us B → A B C D E. From this, by decomposition we can derive the initial two FDs of the set G, and the third one also exists in F. This demonstrates that F covers G.
我们将通过展示 G 中的所有 FD 如何被 F 中的 FD 隐含来证明 F 覆盖 G。接下来,我们使用前面的定义和定理 2.4 中的各种推理规则来推导 F 中的 FD 的含义。由于在 F 中我们有 (a) B → CD 和 (b) B → A,根据并集规则我们可以看到 (c) B → AC D。平凡的函数依赖 B → B 显然成立,并且与 (c) 并集),我们得到(d)B→ABCD。根据分解规则,B→ABCD意味着(e)B→AD,并且由于(f)AD→E在F中,根据传递性我们得出(g)B→E与 (d) 结合,得到 B → ABCD E。由此,通过分解我们可以导出集合 G 的最初两个 FD,并且第三个 FD 也存在于 F 中。这证明 F 覆盖 G。
In Example 2.8 a technique was used to find all the attributes functionally determined by the attribute B under the set F of FDs. (This turned out to be all the attributes
there were.) In general, we can do this for any set X of attributes on the left, finding all attributes functionally determined
by the set X.
在示例 2.8中,使用了一种技术来查找由 FD 集合 F 下的属性 B 在功能上确定的所有属性。 (结果证明这是所有属性。)一般来说,我们可以对左侧的任何属性集 X 执行此操作,找到由该集 X 函数确定的所有属性。
Definition: Closure of a Set of Attributes.
定义:一组属性的闭包。Given a set F of FDs on a table T and a set X of attributes contained in T, we define the closure of the set X, denoted by X+, as the largest set Y of attributes functionally determined by X, the largest set Y such that X → Y is in F+. Note that the set Y contains all the attributes of X, by the inclusion rule, and might not contain any other attributes.
给定表 T 上的 FD 集合 F 和 T 中包含的属性集合 X,我们定义集合 X 的闭包,用 X +表示,作为由 X 函数确定的最大属性集合 Y,最大集合 Y使得 X → Y 在 F +中。请注意,根据包含规则,集合 Y 包含 X 的所有属性,并且可能不包含任何其他属性。
Here is an algorithm for determining the closure of any set of attributes X.
这是一个用于确定任意属性集 X 的闭包的算法。
ALGORITHM 2.1: Set Closure.
算法 2.1:设置闭包。This algorithm determines X+, the closure of a given set of attributes X, under a given set F of FDs.
该算法确定 X + ,即在给定 FD 集合 F 下给定属性集合 X 的闭包。
I = 0; X[0] = X; /* integer I, attribute set X[0] */ REPEAT /* loop to find larger X[I] */ I = I + 1; /* new I */ X[I] = X[I-1]; /* initialize new X[I] */ FOR ALL Z → W in F /* loop on all FDs Z → W in F */ IF Z ⊆ X[I]
如果 Z ⊆ X[I]/* if Z contained in X[I] */ THEN X[I] = X[I] ∪ W; /* add attributes in W to X[I] */ END FOR /* end loop on FDs */ UNTIL X[I] = X[I-1]; /* loop until no new attributes */ RETURN X+= X[I]; /* return closure of X */ Note that the step in this algorithm that adds attributes to X[I] is based on the set accumulation rule, proved in Theorem 2.4: If X → Y Z and Z → W, then X → Y Z W.
请注意,该算法中向 X[I] 添加属性的步骤基于定理 2.4 中证明的集合累积规则:如果 X → YZ 且 Z → W,则 X → YZ W。In our algorithm we are saying that since X → X[I] (our induction hypothesis) and after finding Z → W in F with Z ⊆ X[I], X[I] can be represented as Y Z (Y = X[I] − Z), so we can write X → X[I] as X → Y Z. Now since F contains Z → W, we conclude by the set accumulation rule that X → Y Z W, or in other words, X → X[I] ∪ W, and our induction hypothesis is maintained.
在我们的算法中,我们说,由于 X → X[I] (我们的归纳假设)并且在 F 中找到 Z → W 且 Z ⊆ X[I] 后,X[I] 可以表示为 YZ (Y = X[I] ] − Z),所以我们可以将 X → X[I] 写为 X → Y Z。现在由于 F 包含 Z → W,我们根据集合累加规则得出 X → YZW,或者换句话说,X → X[ I] ∪ W,我们的归纳假设得到维持。Set closure is an important milestone in our development. It gives us a general way of deciding whether a given FD is implied by a set F of FDs, without worrying about the exponential explosion that Example 2.12 showed could occur in calculating F+. For example, suppose we need to know if the functional dependency X → A is implied by set F of FDs. We simply calculate X+ under F by the set closure in Algorithm 2.1, and determine if it contains A: If so, X → A is in F+; that is, it is implied by F.
集合闭合是我们发展中的一个重要里程碑。它为我们提供了一种通用方法来确定给定的 FD 是否由一组 FD 隐含,而不必担心例 2.12所示的在计算 F +时可能发生的指数爆炸。例如,假设我们需要知道函数依赖 X → A 是否由 FD 的集合 F 隐含。我们简单地通过算法2.1中的集合闭包计算F下的X + ,并判断其是否包含A:如果是,则X→A在F +中;也就是说,它是由 F 隐含的。We will see that a key for a table is just a minimal set of attributes that functionally determines all the attributes of the table. To determine if X is a key, we just compute X+ under F, the set of FDs for the table's attributes, and see if it includes all of them, then check that no subset of X does the same.▪
我们将看到表的键只是功能上确定表的所有属性的最小属性集。为了确定 X 是否是键,我们只需计算 F(表属性的 FD 集合)下的 X + ,并查看它是否包含所有这些,然后检查 X 的子集是否没有执行相同操作。▪
EXAMPLE 2.14 例2.14
Set Closure and a Compact Derivational Notation for It
集合闭包及其紧凑的导数表示法In Example 2.13 we were given a set F of FDs, which we number:
在例 2.13中,我们得到了一组 F 的 FD,我们将其编号为:
- F: 1. B → C D 2. A D → E 3. B → A
F:1.B→CD 2.AD→E 3.B→AGiven X = B, we determined that X+ = A B C D E. Using Algorithm 2.1, we start with X[0] = B. Then X[1] = B, and we begin to loop through the FDs. Because of (1) B → CD, we get X[1] = B C D. As a notational device to show that C and D were added after B because of FD 1, we write this as B C D (1). The next FD, (2) A D → E, does not apply at this time, since A D is not a subset of X[1]. Next, from (3) B → A, we get X[1] = A B C D (or, in our notation to reflect derivation order, B C D (1) A (3)). Now X[0] is strictly contained in X[1] (i.e., X[1 – 1] ⊂ X[1]), so X[1 – 1] ≠ X[1].
给定 X = B,我们确定 X + = ABCD E。使用算法 2.1,我们从 X[0] = B 开始。然后 X[1] = B,我们开始循环 FD。由于 (1) B → CD,我们得到 X[1] = BC D。作为表示 C 和 D 由于 FD 1 而添加在 B 之后的符号装置,我们将其写为 BCD (1)。下一个 FD (2) AD → E 此时不适用,因为 AD 不是 X[1] 的子集。接下来,从 (3) B → A,我们得到 X[1] = ABCD(或者,用我们的表示法来反映推导顺序,BCD (1) A (3))。现在X[0]严格包含在X[1]中(即X[1 – 1] ⊂ X[1]),因此X[1 – 1] ≠ X[1]。Thus, we’ve made progress in the prior pass of the loop and go on to a new pass, setting X[2] = X[1] = A B C D (i.e., B C D (1) A (3)). Looping through the FDs again, we see all of them can be applied (but we skip the ones that have been applied before, since they will have no new effect), with the only new FD, (2) A D → E, giving us X[2] = A B C D E, or in the derivational notation, B C D (2) A (3) E (2). At the end of this loop, the algorithm notes that X[1] ⊂ X[2]. Progress has been made, so we go on to create X[3] and loop through the FDs again, ending up this pass with X[3] = X[2]. Since all of the FDs had been applied already, we could omit this pass by noting that fact. Note that a different ordering of the FDs in F can change the details of execution for this algorithm. In exercises where the derivational notation is requested to demonstrate that the proper derivation was determined, the order is crucial; for example, the derivation above yields the compact notation
因此,我们在循环的前一遍中取得了进展,并继续进行新的一遍,设置 X[2] = X[1] = ABCD(即 BCD (1) A (3))。再次循环遍历 FD,我们看到它们都可以应用(但我们跳过之前应用过的那些,因为它们不会产生新的效果),只有一个新的 FD,(2) AD → E,给我们X[2] = ABCDE,或者用导数表示法来说,BCD (2) A (3) E (2)。在此循环结束时,算法注意到 X[1] ⊂ X[2]。已经取得了进展,所以我们继续创建 X[3] 并再次循环 FD,以 X[3] = X[2] 结束此遍。由于所有 FD 均已应用,因此我们可以通过注意到这一事实来省略此遍。请注意,F 中 FD 的不同顺序可以更改该算法的执行细节。在要求导数符号来证明正确的导数已确定的练习中,顺序至关重要;例如,上面的推导产生紧凑的符号
Given a set F of FDs on a table T, we use the following algorithm to determine a minimal set of dependencies M that covers
F. The set M will be minimal in the sense that none of its FDs can be dropped in their entirety or changed by dropping any
attributes on the left-hand side, without losing the property that it covers F.
给定表 T 上的一组 FD,我们使用以下算法来确定覆盖 F 的最小依赖集 M。集合 M 将是最小的,因为它的任何 FD 都不能完全删除或更改通过删除左侧的任何属性,而不丢失它覆盖 F 的属性。
ALGORITHM 2.2:Minimal Cover.
算法 2.2:最小覆盖。This algorithm constructs a minimal set M of FDs that covers a given set F of FDs. M is known as the minimal cover of F—or, in some texts, as the canonical cover of F.
该算法构建了覆盖给定 FD 集合 F 的最小 FD 集合 M。 M 被称为 F 的最小覆盖,或者在某些文本中,被称为 F 的规范覆盖。Step 1. 步骤1。
From the set F of FDs, we create an equivalent set H of FDs, with only single attributes on the right side.
从 FD 集合 F 中,我们创建等效的 FD 集合 H,右侧仅包含单个属性。
H = Ø; /* initialize H to null set */ FOR ALL X → Y in F /* loop on FDs in F */ FOR ALL A IN Y /* loop on attributes in Y */ H = H ∪ {X → A}; /* add FD to H */ END FOR /* end loop on attributes in Y */ END FOR /* end loop on FDs in F */ Since step 1 derives H by successive applications of the decomposition rule, and F can be reconstructed from H by successive applications of the union rule, it is obvious that F ≡ H.
由于步骤 1 通过连续应用分解规则导出 H,并且可以通过连续应用并集规则从 H 重构 F,因此显然 F ≠ H。Step 2. 步骤2。
From the set H of FDs, successively remove individual FDs that are inessential in H. An FD X → Y is inessential in a set H of FDs, if X → Y can be removed from H, with result J, so that H+ = J+, or H ≡ J. That is, removal of the FD from H has no effect on H+. See Figure 2.20 for an example of an inessential FD.
从 FD 集合 H 中,依次删除 H 中不重要的各个 FD。 FD X → Y 在 FD 集合 H 中是不重要的,如果 X → Y 可以从 H 中删除,结果为 J,因此 H + = J + ,或 H ≠ J。也就是说,从 H 中去除 FD 对 H +没有影响。有关非必要 FD 的示例,请参见图 2.20 。FIGURE 2.20. Example of an inessential FD: X → A.
图 2.20。不重要的 FD 示例:X → A。
FOR ALL X → A in H /* loop on FDs in H */ J = H – {X → A}; /* try removing this FD */ DETERMINE X+ UNDER J; /* set closure algorithm 2.6.12 */ IF A ∈ X+ /* X → A is still implied by J */ H = H – {X → A); /* . . . so it is inessential in H */ END FOR /* end loop on FDs in H */ Each time an FD is removed from H in step 2, the resulting set is equivalent to the previous, larger H. It is clear from this that the final resulting H is equivalent to the original. However, a number of FDs might have been removed.
每次在步骤 2 中从 H 中删除 FD 时,结果集就相当于之前的较大的 H。由此可以清楚地看出,最终得到的 H 相当于原始的 H。然而,许多 FD 可能已被删除。Step 3. 步骤 3.
From the set H of FDs, successively replace individual FDs with FDs that have a smaller number of attributes on the left-hand side, as long as the result does not change H+. See Figure 2.21 for an example of an FD that can be simplified in this manner.
从 FD 集合 H 中,依次将各个 FD 替换为左侧属性数量较少的 FD,只要结果不改变 H +即可。请参阅图 2.21 ,了解可以通过这种方式简化的 FD 示例。FIGURE 2.21. Example of a functional dependency X → A, where B can be dropped from the left-hand side.
图 2.21。函数依赖 X → A 的示例,其中 B 可以从左侧删除。
HO = H /* save original H */ FOR ALL X → A in H with #X > 1 /* loop on FDs with multiple attribute lhs */ FOR ALL B ∈ X /* loop on attributes in X */ Y = X – {B} /* try removing one attribute */ J = (H – {X → A} ∪ {Y → A}; /* left-reduced FD */ GENERATE Y+ UNDER J, Y+ UNDER H; /* set closure algorithm 2.6.12 */ IF Y+ UNDER H = Y+ UNDER J /* if Y+ is unchanged */ UPDATE CURRENT X → A in H /* this is X → A in outer loop */ SET X = Y; /* change X, continue outer loop */ END FOR /* end loop of attributes in X */ END FOR /* end loop on FDs in H */ IF H <>HO /* if FD set changed in Step 3 */ REPEAT STEP 2 AND THEN GOTO STEP 4 /* retest: some FDs may be inessential now */ Step 4. 步骤 4。
From the remaining set of FDs, gather all FDs with equal left-hand sides and use the union rule to create an equivalent set of FDs M where all left-hand sides are unique.
从剩余的 FD 集合中,收集左侧相等的所有 FD,并使用并集规则创建等效的 FD 集合 M,其中所有左侧都是唯一的。
M = Ø; /* initialize M to null set */ FOR ALL X → A in H /* loop on FDs in H */ IF THIS FD IS FLAGGED, CONTINUE; /* if already dealt with, loop */ FLAG CURRENT FD; /* deal with FDs with X on left */ Y = {A}; /* start with right-hand side A */ FOR ALL SUCCESSIVE X → B in H /* nested loop */ FLAG CURRENT FD; /* deal with all FDs, X on left */ Y = Y ∪ {B}; /* gather attributes on right */ END FOR /* gathering complete */ M = M ∪ {X → Y}; /* combine right sides of X → ? */ END FOR /* end outer loop on FDs in H */ We state without proof that this algorithm grows in execution time only as a polynomial in n, the number of attributes in the listed FDs of F (counting repetitions). Step 3 is the most costly, since we need to perform the set closure algorithm once for each attribute on the left-hand side of some FD in H. Note that if we performed step 3 before step 2, we would not have to go back and repeat step 2, as prescribed at the end of step 3 above; however, in general it would take more work for the costly step 3 without the cleanup action of step 2 occurring first.▪
我们在没有证据的情况下声明,该算法的执行时间仅随着 n 的多项式而增长,n 是 F 列出的 FD 中的属性数量(计算重复次数)。步骤 3 的成本最高,因为我们需要对 H 中某个 FD 左侧的每个属性执行一次集合闭包算法。请注意,如果我们在步骤 2 之前执行步骤 3,则不必返回并重复步骤 2,如上述步骤 3 结束时所述;然而,一般来说,如果不首先进行步骤 2 的清理操作,则成本高昂的步骤 3 需要做更多的工作。▪
EXAMPLE 2.15 例2.15
Construct the minimal cover M for the set F of FDs, which we number and list:
为 FD 集合 F 构造最小覆盖 M,我们对其进行编号并列出:
- F: 1. A B D → A C, 2. C → B E, 3. A D → B F, 4. B → E
F:1.ABD→AC,2.C→BE,3.AD→BF,4.B→ENote that it is important to rewrite the set of FDs as you begin each new step where the FDs have changed, so you can refer to individual ones easily in the next step.
请注意,当您开始 FD 已更改的每个新步骤时,重写FD 集非常重要,这样您就可以在下一步中轻松引用各个 FD。
- We apply the decomposition rule to FDs in F, to create an equivalent set with singleton attributes on the right-hand sides (rhs) of all FDs. H =
我们将分解规则应用于 F 中的 FD,以创建右侧具有单例属性的等效集合 (右)所有 FD。 H =
- 1. A B D → A, 2. A B D → C, 3. C → B, 4. C → E
1.ABD→A,2.ABD→C,3.C→B,4.C→E- 5. A D → B, 6. A D → F, 7. B → E
5. AD → B, 6. AD → F, 7. B → E- We consider cases corresponding to the seven numbered FDs in H.
我们考虑与 H 中七个编号的 FD 相对应的情况。
- A B D → A is trivial and thus clearly inessential (since A B D+ contains A), so it can be removed. The FDs remaining in H are (2) A B D → C, (3) C → B, (4) C → E, (5) A D → B, (6) A D → F, and (7) B → E.
ABD → A 是微不足道的,因此显然是无关紧要的(因为 ABD +包含 A),因此可以将其删除。 H中剩余的FD为(2)ABD→C、(3)C→B、(4)C→E、(5)AD→B、(6)AD→F、(7)B→E。- A B D → C cannot be derived from the other FDs in H by the set closure algorithm, Algorithm 2.1, because there is no other FD with C on the right-hand side. (A B D+, with FD (2) missing, will not contain C. We could also go through the steps of Algorithm 2.1 to demonstrate this fact. See also substep 6, below.)
ABD → C 不能通过集合闭包算法(算法 2.1)从 H 中的其他 FD 导出,因为在右侧不存在带有 C 的其他 FD。 (ABD +缺少 FD (2),将不包含 C。我们也可以通过算法 2.1 的步骤来证明这一事实。另请参见下面的子步骤 6。)- Is C → B inessential? Is it implied under the set of other FDs that would remain if this were taken out: {(2) A B D → C, (4) C → E, (5) A D → B, (6) A D → F, (7) B → E}? To see if C → B is inessential, we generate C+ under this smaller set of FDs. (We use the set closure Algorithm, 2.1, to generate X+ in what follows, and use the derivational notation introduced in Example 2.14.) Starting with C+ = C, FD (4) gives us C+ = CE. To indicate the use of FD (4) notationally, we write C+ = CE (4). Now with FD (3) removed, no other left side of an FD is contained in the set C E, so we have reached the full closure of the attribute C. Since C+ doesn't contain B, (3) C → B is essential and remains in H.
C → B 是无关紧要的吗?如果将其删除,这是否暗示在将保留的其他 FD 集合中:{(2) ABD → C, (4) C → E, (5) AD → B, (6) AD → F, (7) B→E}?为了查看 C→B 是否是无关紧要的,我们在这个较小的 FD 集合下生成 C + 。 (我们使用集合闭包算法 2.1 来生成下面的 X + ,并使用例 2.14中引入的导数符号。)从 C + = C 开始,FD (4) 给出 C + = CE。为了在符号上表示FD(4)的使用,我们写成C + =CE(4)。现在去掉 FD (3),集合 CE 中不再包含 FD 的其他左侧,因此我们达到了属性 C 的完全闭包。由于 C +不包含 B,因此 (3) C → B 为必不可少并保留在 H.- C → E is inessential as shown by the working out of set closure on C with FD (4) missing. We get C+ = C B (3) E (7). Thus, since E is in C+ after FD (4) is removed, we can drop FD (4). The FDs remaining in H are (2) A B D → C, (3) C → B, (5) A D → B, (6) A D → F, and (7) B → E.
C → E 是无关紧要的,如在缺少 FD (4) 的情况下 C 上的集合闭包的计算所示。我们得到C + =CB(3)E(7)。因此,由于去掉FD(4)后E在C +中,所以我们可以去掉FD(4)。 H中剩余的FD为(2)ABD→C、(3)C→B、(5)AD→B、(6)AD→F、(7)B→E。- Is A D → B inessential under the set of FDs that remain with (5) missing: {(2) A B D → C, (3) C → B, (6) A D → F, and (7) B → E}? In the set closure algorithm, A D+ = A D F (6) and nothing more. So FD (5) is essential and cannot be removed.
在 (5) 缺失的 FD 集合下,AD → B 是否是无关紧要的:{(2) ABD → C,(3) C → B,(6) AD → F,和 (7) B → E}?在集合闭包算法中,AD + =ADF(6)仅此而已。所以FD(5)是必不可少的,不能去掉。- Is A D → F inessential given the set of other FDs that would remain: {(2) A B D → C, (3) C → B, (5) A D → B, (7) B → E}? Clearly with this set of FDs we can derive A D+ to contain A D B (3) C (2) E (7), all the attributes there are on the right except F, so we cannot derive A D → F without FD (6). Another way to say this is that with FD (6) removed, no FD has F on its right-hand side, so A D → F cannot be implied.
考虑到剩下的其他 FD 集合:{(2) ABD → C, (3) C → B, (5) AD → B, (7) B → E},AD → F 是否无关紧要?显然,通过这组 FD,我们可以推导出 AD +包含 ADB (3) C (2) E (7),除了 F 之外的所有属性都在右侧,因此我们不能在没有 FD (6) 的情况下推导出 AD → F。另一种说法是,去掉 FD (6) 后,没有 FD 的右侧有 F,因此不能暗示 AD → F。- Is B → E inessential under the set of other FDs that remains: {(2) A B D → C, (3) C → B, (5) A D → B, (6) A D → F}? The answer is no, since deriving B+ with this set of FDs gives only B.
B → E 在剩下的其他 FD 集合下是否是无关紧要的:{(2) ABD → C, (3) C → B, (5) AD → B, (6) AD → F}?答案是否定的,因为用这组 FD 导出 B +只能得到 B。We end step 2 with the set H = {(2) A B D → C, (3) C → B, (5) A D → B, (6) A D → F, (7) B → E}, which should be renumbered for ease of reference in step 3.
我们以集合 H = {(2) ABD → C, (3) C → B, (5) AD → B, (6) AD → F, (7) B → E} 结束步骤 2,应重新编号以便于在步骤 3 中参考。
- H = 1. A B D → C, 2. C → B, 3. A D → B, 4. A D → F, 5. B → E
H = 1.ABD→C,2.C→B,3.AD→B,4.AD→F,5.B→E
- We start with FD (1) and note that there are multiple attributes on the left-hand side; we call this set on the left side of FD (1), X = A B D. Therefore, we need to try to reduce this set X by removing any single attributes and creating a new set J of FDs each time.
我们从FD(1)开始,注意到左边有多个属性;我们称这个集合为左侧 FD (1) 中,X = AB D。因此,我们需要尝试通过删除任何单个属性并创建新的属性来减少这个集合 X 每次设置J个FD。
- Drop A? We try to do away with the attribute A in FD (1), so the new set J is given by: (1) B D → C, (2) C → B, (3) A D → B, (4) A D → F, (5) B → E. To show that this reduction gives an equivalent set of FDs, we need to show that B D+ (under H) is the same as B D+ (under J). The risk here is that B D+ under J will functionally determine more than B D+ under H, since J has an FD with only B D on the left that H does not. We claim that the two sets H and J are equivalent FD sets if and only if B D+ (under H) is the same as B D+ (under J). So we calculate B D+ under H to be B D E (5), and that's all. Under J, B D+ is B D C (1) E (5). Since these are different, we can't replace (1) A B D → C with (1) B D → C.
掉A?我们尝试去掉FD(1)中的属性A,因此新的集合J由下式给出:(1)BD→C,(2)C→B,(3)AD→B,(4)AD→ F, (5) B → E。为了证明这种约简给出了一组等价的 FD,我们需要证明 BD + (在 H 下)与 BD + (在 J 下)相同。这里的风险是,J 下的 B D+ 在功能上会比 H 下的 BD +确定更多,因为 J 的 FD 左侧只有 BD,而 H 没有。我们声称两个集合 H 和 J 是等价的 FD 集合当且仅当 BD + (在 H 下)与 BD + (在 J 下)相同。所以我们计算H下的BD +为BDE(5),仅此而已。在 J 下,B D+ 是 BDC (1) E (5)。由于它们不同,我们不能将 (1) ABD → C 替换为 (1) BD → C。- Drop B? We repeat the method. Now J contains (1) A D → C, (2) C → B, (3) A D → B, (4) A D → F, (5) B → E, and A D+ under J is A D C (1) B (2) F (4) E (5). But under H, A D+ = A D B (3) F (4) E (5) C (1). These are the same sets, but note the different order of generation. You need to use derivational notation with the proper order to show that the set closure algorithm is being applied on the proper FD set. Under H, FD (3) is the first one that expands the A D+ closure, and FD (1) comes in the second pass. Under J, we can use each FD as we come to it in order on the first pass. In any event, since A D+ under H is the same as A D+ under J, we can reduce FD (1) to A D on the left-hand side, and the FD set H is now
丢B?我们重复这个方法。现在J包含(1)AD→C、(2)C→B、(3)AD→B、(4)AD→F、(5)B→E,J下的AD +为ADC(1)B( 2)F(4)E(5)。但在H下,AD + =ADB(3)F(4)E(5)C(1)。这些是相同的集合,但请注意生成顺序不同。你需要使用导数 具有正确顺序的符号表明集合闭合算法正在正确的 FD 集合上应用。 H、FD 下 (3) 是第一个扩展AD +闭包的,FD(1)出现在第二遍。在 J 下,我们可以在第一次通过时按顺序使用每个 FD。在 任何事件,由于 H 下的 AD +与 J 下的 AD +相同,我们可以将 FD (1) 约简为左侧的 AD,并且 FD 集合 H 现在为
- H = 1. A D → C, 2. C → B, 3. A D → B, 4. A D → F, 5. B → E
H=1.AD→C,2.C→B,3.AD→B,4.AD→F,5.B→E- Drop D? We have already considered dropping A from the left side of FD (1), A B D → C, and we don't need to repeat this now that B is dropped. But we must consider dropping D. Now J will contain: (1) A → C, (2) C → B, (3) A D → B, (4) A D → F, and (5) B → E, and we need to consider taking A+ under H(A+ = A) and under J(A+ = A C (1) B (2) E (5)). Since they are different, we cannot remove D from FD (1).
掉D?我们已经考虑过从 FD (1) 的左侧删除 A,ABD → C,既然 B 被删除了,我们就不需要重复这个操作了。但我们必须考虑删除D。现在J将包含:(1)A→C,(2)C→B,(3)AD→B,(4)AD→F,(5)B→E,我们需要考虑在 H(A + = A) 和 J(A + = AC (1) B (2) E (5)) 下取 A +。由于它们不同,我们无法从 FD (1) 中删除 D。We note FD (2), C → B, cannot be reduced on the left side, and (3) A D → B also cannot be reduced on the left side, since A+ and D+ under H will contain only these attributes, whereas under the relevant J the closures will contain B. The argument that (4) A D → F cannot be reduced is similar.
我们注意到 FD (2), C → B 不能在左侧约简,并且 (3) AD → B 也不能在左侧约约,因为 H 下的 A +和 D +将仅包含这些属性,而在相关的 J 下,闭包将包含 B。 (4) AD → F 不能约简的论点是类似的。Now since the set of FDs in H has changed in this pass through step 3, we need to return to step 2. When we reach FD (3) and consider dropping it, (3) A D → B, we find now that A D+ under {(1) A D → C, (2) C → B, (4) A D → F, (5) B → E} gives A D C (1) B (2), so since A D+ contains B with FD 3 missing, this FD is inessential and may be dropped. (Surprised? Repeating step 2 is a crucial step!) The final answer out of step 3 is
现在,由于 H 中的 FD 集合在步骤 3 中发生了变化,因此我们需要返回到步骤 2。当我们到达 FD (3) 并考虑删除它时,(3) AD → B,我们现在发现 AD +在 {(1) AD → C, (2) C → B, (4) AD → F, (5) B → E} 下给出 ADC (1) B (2),因此由于 AD +包含 B,且缺少 FD 3 ,这个 FD 是无关紧要的,可以被丢弃。 (惊讶吗?重复步骤2是关键的一步!)步骤3的最终答案是
- H = 1. A D → C, 2. C → B, 3. A D → F, 4. B → E
H = 1. AD → C, 2. C → B, 3. AD → F, 4. B → EThis set is minimal. If you wish, you can perform step 2 and step 3 a final time to assure yourself there are no other changes.
这个集合是最小的。如果您愿意,您可以最后一次执行步骤 2 和步骤 3,以确保没有其他更改。Finally, step 4 leads to the final set of FDs.
最后,第 4 步得出最后一组 FD。
- H = 1. A D → C F, 2. C → B, 3. B → E
H=1.AD→CF,2.C→B,3.B→E
To understand what we have accomplished, you might go through Example 2.15 and think about each change that was made in the set of FDs, then try to use Armstrong's Axioms to demonstrate that each
change that was actually performed will result in the same FD closure. (Don't duplicate the set closure argument, but instead
find a direct proof that the change is legal.)
为了理解我们所取得的成就,您可以浏览示例 2.15并思考在 FD 集合中所做的每个更改,然后尝试使用阿姆斯特朗公理来证明实际执行的每个更改都会导致相同的 FD 闭包。 (不要重复 set 闭包参数,而是找到一个直接证明更改是合法的。)
EXAMPLE 2.16 例2.16
The set of functional dependencies stated in Example 2.10 for the emp_info database,
例 2.10中所述的函数依赖集 emp_info 数据库,
- emp_id → emp_name emp_phone dept_name
- dept_name → dept_phone dept_mgrname
- skill_id → skill_name
- emp_id skill_id → skill_date skill_lvl
already forms a minimal set; that is, the minimal cover Algorithm 2.2 will not reduce it further. We leave this derivation as an exercise.
已经形成一个最小集合;也就是说,最小覆盖算法2.2不会进一步减少它。我们将这个推导作为练习。
The algorithm for finding a minimal cover of a set F of FDs will be crucial in later sections for algorithms to perform appropriate
design by the method of normalization.
在后面的章节中,找到 F 组 FD 的最小覆盖的算法对于算法通过归一化方法执行适当的设计至关重要。
2.7. Lossless Decompositions
2.7.无损分解
The process of normalization depends on being able to factor or decompose a table into two or more smaller tables, in such a way that we can recapture the precise content of the original table by
joining the decomposed parts.
规范化的过程取决于能否将一个表分解或分解为两个或多个较小的表,这样我们就可以通过连接分解的部分来重新捕获原始表的精确内容。
Definition: Lossless Decomposition.
定义:无损分解。For any table T with an associated set of functional dependencies F, a decomposition of T into k tables is a set of tables {T1, T2, … , Tk} with two properties: (1) for every table Ti in the set, Head(Ti) is a proper subset of Head(T); (2) Head(T) = Head(T1) ∪ Head(T2) ∪ … ∪ Head(Tk).
对于任何具有相关函数依赖集 F 集的表 T,将 T分解为 k 个表是一组表 {T 1 , T 2 , … , T k } ,具有两个属性: (1) 对于集合 Head(T i ) 是 Head(T) 的真子集; (2) 头(T) = 头(T 1 ) ∪ 头(T 2 ) ∪ … ∪ 头(T k )。Given any specific content of T, the rows of T are projected onto the columns of each Ti as a result of the decomposition. A decomposition of a table T with an associated set F of FDs is said to be a lossless decomposition, or sometimes a lossless-join decomposition if, for any possible future content of T, the FDs in F guarantee that the following relationship will hold:
给定 T 的任何特定内容,作为分解的结果,T 的行将投影到每个 T i的列上。如果对于 T 的任何可能的未来内容,F 中的 FD 保证以下关系成立,则具有关联的 FD 集合 F 的表 T 的分解被称为无损分解,或者有时是无损连接分解:Figure 数字
When a table T is decomposed, it is sometimes not possible to recover all the information that was originally present in some
specific content of table T by joining the tables of the decomposition, not because we don't get back all the rows we had
before, but because we get back other rows that were not originally present.
当表T被分解时,有时不可能通过连接分解的表来恢复表T的某些特定内容中最初存在的所有信息,而不是因为我们没有取回之前拥有的所有行,但是因为我们取回了最初不存在的其他行。
EXAMPLE 2.17 例2.17
A Lossy Decomposition 有损分解
Consider the following table ABC:
考虑下表 ABC:
ABC A B C a1 100 c1 a2 200 c2 a3 300 c3 a4 200 c4 If we factor this table into two parts, AB and BC, we get the following table contents:
如果我们将该表分解为AB和BC两部分,我们会得到以下表内容:
AB BC A B B C a1 100 100 c1 a2 200 200 c2 a3 300 300 c3 a4 200 200 c4 However, the result of joining these two tables is
然而,连接这两个表的结果是
AB Join BC AB 加入 BC A B C a1 100 c1 a2 200 c2 a2 200 c4 a3 300 c3 a4 200 c2 a4 200 c4 This is not the original table content for ABC! Note that the same decomposed tables AB and BC would have resulted if the table we had started with was ABCX, with content equal to AB Join BC above, or either of two other tables, ABCY or ABCZ.
这不是ABC 的原始表格内容!请注意,如果我们开始使用的表是 ABCX(其内容等于上面的 AB Join BC),或者是其他两个表(ABCY 或 ABCZ)中的任何一个,则将产生相同的分解表 AB 和 BC。
ABCY ABCZ A B C A B C a1 100 c1 a1 100 c1 a2 200 c2 a2 200 c2 a2 200 c4 a3 300 c3 a3 300 c3 a4 200 c2 a4 200 c4 a4 200 c4 Since we can't tell what table content we started from, information has been lost by this decomposition and the subsequent join. This is known as a lossy decomposition, or sometimes a lossy-join decomposition.
由于我们无法判断从什么表内容开始,因此这种分解和随后的连接会丢失信息。这称为有损分解,有时称为有损连接分解。
The reason we lost information in the decomposition of Example 2.17 is that attribute B has duplicate values (200) on distinct rows of the factored tables (with a2 and a4 in table AB and with
c2 and c4 in table BC). When these factored tables are joined again, we get cross-product rows that did not (or might not)
exist in the original:
我们在示例 2.17的分解中丢失信息的原因是属性 B 在分解表的不同行上有重复值 (200)(表 AB 中具有 a2 和 a4,表 BC 中具有 c2 和 c4)。当这些因子表再次连接时,我们会得到原始表中不存在(或可能不存在)的叉积行:
a2 | 200 | c4 |
a4 | 200 | c2 |
EXAMPLE 2.18 例2.18
A Different Content for Table ABC
表 ABC 的不同内容Now let's say that table ABC started with a different content, one that had no duplicate values in column B.
现在假设表 ABC 以不同的内容开始,其中 B 列中没有重复值。
ABC A B C A1 100 c1 A2 200 c2 A3 300 c3 The question is this: If we decompose this table ABC into the two tables AB and BC as we did in Example 2.17, is the resulting decomposition lossless? The answer is no, because the definition of a lossless decomposition requires that the join of the factored tables recapture the original information for any possible future content of the original table. But the table ABC content we have just shown could change with the insert of a single row to give the content of Example 2.17. There doesn't seem to be any rule that would keep this from happening.
问题是这样的:如果我们像例 2.17中那样将表 ABC 分解为两个表 AB 和 BC,那么分解后的结果是无损的吗?答案是否定的,因为无损分解的定义要求分解表的连接重新捕获原始表的任何可能的未来内容的原始信息。但是我们刚刚显示的表 ABC 内容可能会随着单行的插入而改变,以给出示例 2.17的内容。似乎没有任何规则可以阻止这种情况发生。
What sort of rule would we need to limit all possible future content for table ABC so that the decomposition into tables AB
and BC would be lossless? Of course, functional dependencies spring to mind, because they represent rules that govern future
content of a table. Notice that in the previous definition of a lossless decomposition, a set F of FDs is considered to be
part of the table T definition.
我们需要什么样的规则来限制表 ABC 未来所有可能的内容,以便无损地分解为表 AB 和 BC?当然,我会想到函数依赖性,因为它们代表了管理表的未来内容的规则。请注意,在前面的无损分解定义中,FD 的集合 F 被视为表 T 定义的一部分。
Definition:Database Schema.
定义:数据库模式。A database schema is the set of headings of all tables in a database, together with the set of all FDs that the designer wishes to hold on the join of those tables.
数据库模式是数据库中所有表的标题集,以及设计者希望在这些表的连接上保留的所有 FD 集。
EXAMPLE 2.19 例2.19
Table ABC with a Functional Dependency
具有函数依赖性的表 ABCAssume that table ABC is defined, which obeys the functional dependency B → C. Now the table content of Example 2.18 is perfectly legal:
假设定义了表 ABC,它遵循函数依赖 B → C。现在例 2.18的表内容完全合法:
ABC A B C a1 100 c1 a2 200 c2 a3 300 c3 But if we tried to insert a fourth row to achieve the content of Example 2.17,
但如果我们尝试插入第四行来实现例2.17的内容,
a4 200 c4 this insert would fail because it would break the functional dependency B → C. A new row with a duplicate value for B must also have a duplicate value for C in order for B → C to remain true:
此插入会失败,因为它会破坏函数依赖性 B → C。具有 B 重复值的新行也必须具有 C 重复值,以便 B → C 保持 true:
a4 200 c2 Is it true, then, that this new content for ABC can be decomposed and then rejoined losslessly? The answer is yes. Starting with:
那么,ABC 的新内容真的可以分解然后无损地重新连接吗?答案是肯定的。开始于:
ABC A B C a1 100 c1 a2 200 c2 a3 300 c3 a4 200 c2 if we factor this table into two parts, AB and BC, we get the following table contents:
如果我们将该表分解为两部分,AB 和 BC,我们得到以下表内容:
AB BC A B B C a1 100 100 c1 a2 200 200 c2 a3 300 300 c3 a4 200 Note that four rows are projected onto three in table BC because of duplicate values. Now when these two tables are joined again, the original table ABC with the functional dependency B → C results.
请注意,由于重复值,四行被投影到表 BC 中的三行。现在,当这两个表再次连接时,就会生成具有函数依赖关系 B → C 的原始表 ABC。
Because of the functional dependency B → C in table ABC of Example 2.19, the projection of ABC on BC will always have unique values for attribute B. Recall that this means attribute B is a key for table BC. The reason that the decomposition of ABC into AB and BC is lossless is that no cross terms can ever arise in
joining them: Although duplicate values for column B can occur in table AB, every row in table AB joins with a unique row in table BC (assuming that this B value exists in table BC, as it always would in an initial decomposition that projects
rows from ABC). This is reminiscent of what happened with our CAP database when we joined orders with customers. We simply extended rows of orders with more information about individual customers. Although duplicate values can exist in the cid column of the orders table, the cid values in the customers table are unique, so every row in orders joins to exactly one row in customers.
由于示例 2.19的表 ABC 中的函数依赖关系 B → C,ABC 在 BC 上的投影将始终具有属性 B 的唯一值。回想一下,这意味着属性 B 是表 BC 的键。将 ABC 分解为 AB 和 BC 的原因是无损的,因为在连接它们时不会出现交叉项:虽然表 AB 中可能出现 B 列的重复值,但表 AB 中的每一行都与表 BC 中的唯一行连接(假设这个 B 值存在于表 BC 中,就像在从 ABC 投影行的初始分解中一样)。这让人想起我们加入时 CAP 数据库发生的情况 orders 和 customers 。我们简单地扩展了行 orders 有关个人客户的更多信息。尽管重复值可能存在于 cid 的栏目 orders 表,该 cid 中的值 customers 表是唯一的,所以每一行 orders 恰好连接到其中的一行 customers 。
We generalize the preceding discussion somewhat to deal with sets of attributes.
我们在某种程度上概括前面的讨论以处理属性集。
Theorem 2.5. 定理2.5。
Given a table T and a set of attributes X ⊆ Head(T), the following two statements are equivalent: (1) X is a superkey of T; (2) X → Head(T); that is, the set of attributes X functionally determines all attributes in T. Equivalently: X+ = Head(T).
给定一个表 T 和一组属性 X ⊆ Head(T),以下两个语句是等效的: (1) X 是 T 的超键; (2) X → 头(T);也就是说,属性集 X 在功能上决定了 T 中的所有属性。等价地:X + = Head(T)。
- PROOF. (1) implies (2). If X is a superkey of table T, then for any content of table T, two distinct rows of T must always disagree on X; that is, distinct rows cannot agree in value on all attributes of X. But from this it is clear that two rows u and v cannot agree on X and disagree on some other column in Head(T) (since if two rows agree in X, then they both represent the same row), and this means that X → Head(T). (2) implies (1). Similarly if X → Head(T), then for any possible content of T, two rows in T cannot agree in value on X and simultaneously disagree on Head(T). But if the two rows u and v don't disagree on any attributes of Head(T), then they must be the same row. Therefore, this argument has shown that two distinct rows cannot agree in value on X, and therefore X is a superkey for T.▪
证明。 (1) 意味着 (2) 。如果 X 是表 T 的超键,则对于表 T 的任何内容,T 的两个不同行在 X 上必须始终不一致;也就是说,不同的行不能在 X 的所有属性上达成一致。但是从这里可以清楚地看出,两行 u 和 v 不能在 X 上达成一致,并且在 Head(T) 中的某些其他列上达成一致(因为如果两行在 X 上达成一致) ,那么它们都代表同一行),这意味着 X → Head(T)。 (2) 意味着 (1) 。类似地,如果 X → Head(T),则对于 T 的任何可能内容,T 中的两行不能在 X 上的值一致,同时在 Head(T) 上不一致。但如果两行 u 和 v 在 Head(T) 的任何属性上没有不一致,那么它们一定是同一行。因此,该论证表明两个不同的行在 X 上的值不能一致,因此 X 是 T 的超级键。▪
We have reached a point where we can give a general rule for the kind of lossless decomposition we will need in performing
normalization.
我们已经达到了可以为执行归一化所需的无损分解类型给出一般规则的程度。
Theorem 2.6. 定理2.6。
Given a table T with an associated set F of functional dependencies valid on T, a decomposition of T into two tables {T1, T2} is a lossless decomposition of T if and only if Head(T1) and Head(T2) are both proper subsets of Head(T), Head(T) = Head(T1) ∪ Head(T2) (i.e., all attributes of T are duplicated either in T1 or T2), and one of the following functional dependencies is implied by F:
给定一个表 T 以及对 T 有效的相关函数依赖集 F,将 T 分解为两个表 {T 1 , T 2 } 是 T 的无损分解当且仅当 Head(T 1 ) 和 Head(T 2 ) 都是 Head(T) 的真子集,Head(T) = Head(T 1 ) ∪ Head(T 2 ) (即 T 的所有属性在 T 1或 T 2中重复),并且以下之一函数依赖由 F 隐含:
- Head(T1) ∩ Head(T2) → Head(T1)
头部(T 1 ) ∩ 头部(T 2 ) → 头部(T 1 )
- Head(T1) ∩ Head(T2) → Head(T2).
头部(T 1 ) ∩ 头部(T 2 ) → 头部(T 2 )。
- PROOF. We take as given table T, its decomposition into T1 and T2, and FD 1, Head(T1) ∩ Head(T2) → Head(T2). (The case with FD 2 is proven similarly.) In what follows, we denote by X the set of attributes Head(T1) ∩ Head(T2); Y is the set of attributes in Head(T1) – Head(T2), and Z is the set of attributes in Head(T2) – Head(T1). To begin, we note by the definition of decomposition that T1 and T2 are projections of T, and Head(T1) ∪ Head(T2) = Head(T). From this we can demonstrate that T ⊆ T1 ⋈ T2. Every column of T appears in T1 ⋈ T2, and if u is a row in T, we say that the projection of u on Head(T1) is given by y1x1, a concatenation of attribute values, where y1 represents values for attributes in Y and x1 represents values for attributes in X; similarly, x1z1 is the projection of u on Head(T2). Clearly the projection of u on Head(T1) has the same values as the projection of u on Head(T2) on all attributes in X = Head(T1) ∩ Head(T2), and by the definition of join, row u, a concatenation y1x1z1, will appear in T1 ⋈ T2.
证明。我们以给定的表 T 为例,将其分解为 T 1和 T 2 ,以及 FD 1,Head(T 1 ) ∩ Head(T 2 ) → Head(T 2 )。 (FD 2 的情况也得到类似证明。)接下来,我们用 X 表示属性集 Head(T 1 ) ∩ Head(T 2 ); Y 是 Head(T 1 ) – Head(T 2 ) 中的属性集,Z 是 Head(T 2 ) – Head(T 1 ) 中的属性集。首先,我们根据分解的定义注意到 T 1和 T 2是 T 的投影,并且 Head(T 1 ) ∪ Head(T 2 ) = Head(T)。由此我们可以证明 T ⊆ T 1 ⋈ T 2 。 T 的每一列都出现在 T 1 ⋈ T 2中,如果 u 是 T 中的一行,我们说 u 在 Head(T 1 ) 上的投影由 y 1 x 1给出,y 1 x 1 是属性值的串联,其中 y 1代表 Y 和 x 中的属性值1代表 X 中的属性值;类似地,x 1 z 1是u 在Head(T 2 )上的投影。显然,在 X = Head(T 1 ) ∩ Head(T 2 ) 中的所有属性上,u 在 Head(T 1 ) 上的投影与 u 在 Head(T 2 ) 上的投影具有相同的值,并且根据连接的定义,行 u,串联 y 1 x 1 z 1 ,将出现在 T 1 ⋈ T 2中。- Now we show under the given assumptions that T1 ⋈ T2 ⊆ T. Assume that from row u in T, we get by projection a row y1x1 in T1. Similarly assume that from row v in T, we get row x2z2 in T2, with x2 representing values for attributes in X. Now assume that the two rows y1x1 and x2z2 in T1 and T2 are joinable so that x1 is identical in all attribute values to x2, and y1x1z2 is in T1 ⋈ T2. This is the most general possible form for a row in T1 ⋈ T2, and we have only to show that the row is also in T. We denote the additional attribute values of u that project on y1x1 in T by z1, so that u = y1x1z1, and claim that z1 = z2. This is because the row u is identical to v in the attributes of X, and X → Head(T2), so in particular X → Head(T2) – Head(T1) = Z, and since u and v are alike on X, they must be alike on attributes of Z. Thus, z1 = z2 and row y1x1z2 that is in T1 ⋈ T2 is identical to row y1x1z2 in T.▪
现在我们在给定的假设下证明 T 1 ⋈ T 2 ⊆ T。假设从 T 中的行 u ,我们通过投影得到 T 1中的行 y 1 x 1 。类似地,假设从 T 中的 v 行,我们得到 T 2中的行 x 2 z 2 ,其中 x 2表示 X 中属性的值。现在假设 T 1和 T 中的两行 y 1 x 1和 x 2 z 2 2是可连接的,因此 x 1在所有属性值中都与 x 2相同,并且 y 1 x 1 z 2在 T 1 ⋈ T 2中。这是 T 1 ⋈ T 2中的行最常见的可能形式,我们只需证明该行也在 T 中。我们用 z 表示投影在 T 中 y 1 x 1上的 u 的附加属性值1 ,使得 u = y 1 x 1 z 1 ,并声称 z 1 = z 2 。这是因为行 u 在 X 的属性中与 v 相同,并且 X → Head(T 2 ),因此特别是 X → Head(T 2 ) – Head(T 1 ) = Z,并且由于 u 和 v 是在 X 上相似,它们在 Z 的属性上也必定相似。因此,z 1 = z 2并且 T 1 ⋈ T 2中的行 y 1 x 1 z 2与 T 中的行 y 1 x 1 z 2相同。▪
EXAMPLE 2.20 例2.20
In Example 2.19, we demonstrated a decomposition of table T with heading A B C and functional dependency B → C, into two tables T1 and T2 with Head(T1) = A B and Head(T2) = B C. If we apply Theorem 2.6, we have Head(T1) ∩ Head(T2) → Head(T2); that is, A B ∩ B C → B C, or B → B C, which is clear from B → C.
在例 2.19中,我们演示了将标题为 ABC 且函数依赖为 B → C 的表 T 分解为两个表 T 1和 T 2 ,其中 Head(T 1 ) = AB 和 Head(T 2 ) = B C。定理2.6,我们有 Head(T 1 ) ∩ Head(T 2 ) → Head(T 2 );即AB∩BC→BC,或B→BC,由B→C可知。
EXAMPLE 2.21 例2.21
Consider the table custords from Example 2.18, created by joining customers with orders. Clearly ordno is a key for custords, since it has unique values, and the reader can also verify that we have the FD cid → Head(customers). Now we note that Head(customers) ∩ Head(orders) = cid, the key for customers, so Head(customers) ∩ Head(orders) → Head(customers). Thus, by Theorem 2.6, custords has a lossless join decomposition into custs and ords, with the same headings as customers and orders, respectively (we would need to verify that the rows projected from custords onto custs and ords give the same rows that we're used to in customers and orders). The reason that this decomposition seems intuitive is that by joining customers and orders, we extend each of the rows in the orders table with columns from customers associated with the unique cid value in that row. It seems clear, therefore, that we don't lose any information by decomposing the join back onto the headings of customers and orders. Of course, we might have lost some information originally in creating custords—if there were some customers who didn't place any orders, for example. But our lossless decomposition starts with the table custords and guarantees that no information is lost in the decomposition.
考虑一下表格 custords 来自示例 2.18 ,通过连接创建 customers 和 orders 。清楚地 ordno 是一个关键 custords ,因为它具有唯一的值,读者也可以验证我们是否有FD cid → 头( customers )。现在我们注意到 Head( customers ) ∩ 头( orders )= cid ,关键为 customers ,所以头( customers ) ∩ 头( orders ) → 头( customers )。因此,根据定理 2.6, custords 无损连接分解为 custs 和 ords ,具有相同的标题 customers 和 orders 分别(我们需要验证从 custords 到 custs 和 ords 给出与我们习惯的相同的行 customers 和 orders )。这种分解看起来很直观的原因是通过加入 customers 和 orders ,我们扩展中的每一行 orders 表中包含与唯一相关联的客户的列 cid 该行中的值。因此,很明显,通过将连接分解回标题,我们不会丢失任何信息。 customers 和 orders 。当然,我们在创建的过程中可能会丢失一些信息。 custords ——例如,如果有一些客户没有下任何订单。但我们的无损分解是从表格开始的 custords 并保证分解过程中不会丢失任何信息。
Theorem 2.6 shows how to demonstrate that a decomposition of a table T into two tables {T1, T2} is a lossless decomposition. In cases where three or more tables exist in the decomposition, {T1, T2, … , Tk}, with k ≥ 3, we can demonstrate losslessness by using the two-table result in a recursive manner.
定理 2.6 显示了如何证明将表 T 分解为两个表 {T 1 , T 2 } 是无损分解。在分解中存在三个或更多表{T 1 , T 2 , … , T k } 且k ≥ 3 的情况下,我们可以通过以递归方式使用两个表结果来证明无损性。
EXAMPLE 2.22 例2.22
Lossless Join Decomposition with Multiple Tables
多个表的无损连接分解Assume that we are given the table T with Head(T) = A B C D E F and the FD set given by (1) A B → C, (2) A → D, and (3) B → E. Notice there is no FD for the attribute F, but A B forms a key for A B C D E, since its closure includes all these attributes. Therefore, the key for table T must be A B F, since the key must functionally determine everything in Head(T). A perfectly acceptable lossless decomposition of T is {T1, T2, T3, T4}, where Head(T1) = A B C (the keys for these tables are underlined), Head(T2) = A D, Head(T3) = B E, and Head(T4) = A B F. The union of these tables contains all the attributes in T, so we merely need to demonstrate losslessness. Note that if we join tables in the following order by pairs, each parenthesized table join so far will ensure a lossless decomposition with the table that is joined next by Theorem 2.6.
假设我们得到了表 T,其中 Head(T) = ABCDEF 和 FD 集由 (1) AB → C、(2) A → D 和 (3) B → E 给出。注意,没有 FD属性 F,但 AB 形成 ABCDE 的键,因为它的闭包包含所有这些属性。因此,表 T 的键必须是 ABF,因为该键必须在功能上确定 Head(T) 中的所有内容。 T 的完全可接受的无损分解是 {T 1 , T 2 , T 3 , T 4 },其中 Head(T 1 ) = AB C (这些表的键带有下划线),Head(T 2 ) = A D,头(T 3 ) = B E ,且头(T 4 ) = ABF 。这些表的并集包含了T中的所有属性,因此我们只需要证明无损性即可。请注意,如果我们按以下顺序成对连接表,则到目前为止每个带括号的表连接将确保与定理 2.6 接下来连接的表进行无损分解。Figure 数字
We note that Head(T1) = A B C, Head(T2) = A D, Head(T1 ⋈ T2) = A B C D, Head(T3) = B E, and Head((T1 ⋈ T2) ⋈ T3) = A B C D E. Thus, the following FDs yield losslessness for the multitable join desired.
我们注意到 Head(T 1 ) = ABC、Head(T 2 ) = AD、Head(T 1 ⋈ T 2 ) = ABCD、Head(T 3 ) = BE 和 Head((T 1 ⋈ T 2 ) ⋈ T 3 ) = ABCD E。因此,以下 FD 会产生所需的多表连接的无损结果。
- Head(T1) ∩ Head(T2) = A → Head(T2) = A D, because of (2) A → D
Head(T 1 ) ∩ Head(T 2 ) = A → Head(T 2 ) = AD,因为 (2) A → D- Head(T1 ⋈ T2) ∩ Head(T3) = B → Head(T3) = B E, because of (3) B → E
Head(T 1 ⋈ T 2 ) ∩ Head(T 3 ) = B → Head(T 3 ) = BE,因为 (3) B → E- Head((T1 ⋈ T2) ⋈ T3) ∩ Head(T4) = A B → Head(T1) = A B C, because of (1) A B → C
Head((T 1 ⋈ T 2 ) ⋈ T 3 ) ∩ Head(T 4 ) = AB → Head(T 1 ) = ABC,因为 (1) AB → CSince the join operator is associative, losslessness does not require a specific order of join and we can remove the parentheses in the expression ((T1 ⋈ T2) ⋈ T3) ⋈ T4.
由于连接运算符是关联的,因此无损不需要特定的连接顺序,我们可以删除表达式 ((T 1 ⋈ T 2 ) ⋈ T 3 ) ⋈ T 4中的括号。
In the last few sections we have developed algorithms to determine a minimal set of FDs for a given set F and defined what
is meant by a lossless decomposition. In the coming section, we learn how a minimal set of FDs helps us create an appropriate
normal form decomposition for a database.
在最后几节中,我们开发了算法来确定给定集合 F 的最小 FD 集合,并定义了无损分解的含义。在接下来的部分中,我们将了解最小的 FD 集如何帮助我们为数据库创建适当的范式分解。
2.8. Normal Forms 2.8.范式
Let us return now to the example of bad database design from Section 2.5 that motivated the long mathematical digression of the last two sections. Recall that we wish to create a database on a set
of data items given in Figure 2.15, with rules of interrelatedness stated in the set of functional dependencies in Example 2.1. We repeat these here as Figure 2.22.
现在让我们回到2.5 节中糟糕的数据库设计的例子,正是这个例子引发了最后两节的长篇数学题外话。回想一下,我们希望在图 2.15中给出的一组数据项上创建一个数据库,并使用示例 2.1中的函数依赖集中规定的相互关联规则。我们在这里重复这些,如图2.22所示。
FIGURE 2.22. Data items and FDs for the employee information database.
图 2.22。员工信息数据库的数据项和FD。
We started with a 1NF table, emp_info, that combined all these data items (see Figure 2.16) and noted a number of design problems, referred to as anomalies. In the following section, we perform a sequence of table factorizations, which are in fact lossless decompositions, to eliminate
redundancies from the employee information database.
我们从 1NF 表开始, emp_info ,结合了所有这些数据项(见图2.16 )并指出了一些设计问题,称为异常。在下一节中,我们执行一系列表分解,这实际上是无损分解,以消除员工信息数据库中的冗余。
As explained earlier, a database schema is the set of headings of all tables in a database together with a set of all FDs
intended by the designer. The emp_info table in Figure 2.23, together with the FDs given, make up such a database schema.
如前所述,数据库模式是数据库中所有表的标题集以及设计者预期的所有 FD 集。这 emp_info 图2.23中的表与给定的FD一起构成了这样的数据库模式。
FIGURE 2.23. Employee information schema with a single table, emp_info.
图 2.23。具有单个表的员工信息架构, emp_info 。
2.8.1. A Succession of Decompositions to Eliminate Anomalies
2.8.1.一系列分解以消除异常
One anomaly of the database represented in Figure 2.23 is that if the number of skills for some employee goes to zero in the emp_info table, no row of any kind will remain for the employee. We have lost the phone number and the department the employee works
in because of deleting this skill. At the end of Section 2.5, we proposed a solution for this anomaly by factoring the emp_info table into two tables, the emps table and the skills table, whose column names were given in Figure 2.17 and are repeated in Figure 2.24.
图 2.23所示数据库的一个异常现象是,如果某个员工的技能数量在 emp_info 表中,不会为该员工保留任何类型的行。由于删除了该技能,我们丢失了该员工的电话号码和所在部门。在第 2.5 节的末尾,我们通过分解 emp_info 表分成两个表, emps 表和 skills 表,其列名在图 2.17中给出,并在图 2.24中重复。
FIGURE 2.24. Employee information schema with two tables, emps and skills.
图 2.24。具有两个表的员工信息架构, emps 和 skills 。
When the emps and skills tables were originally proposed, a number of features of this factorization were mentioned without justification. We are
now in a position to demonstrate these points.
当 emps 和 skills 最初提出表格时,没有合理地提到了这种因式分解的许多特征。我们现在能够证明这些观点。
Proposition 2.1. 命题2.1。
The key for the emp_info table is the attribute set emp_id and skill_id. This is also key for the skills table, but the emps table has a key consisting of the single attribute emp_id.
的关键是 emp_info 表是属性集 emp_id 和 skill_id 。这也是关键 skills 表,但是 emps 表有一个由单个属性组成的键 emp_id 。
- PROOF. By Theorem 2.5 we can determine a superkey for a table T by finding a set of attributes X ⊆ Head(T) such that X → Head(T). Then, to show the set X is a key, we need merely show that no properly contained subset Y of X has this property. We start our search by finding the set closure of X for all attribute sets X found on the left-hand side of any of the FDs in Figure 2.23, repeated here.
证明。根据定理 2.5,我们可以通过找到一组属性 X ⊆ Head(T) 使得 X → Head(T) 来确定表 T 的超键。 然后,为了证明集合 X 是一个键,我们只需证明 X 的正确包含的子集 Y 不具有此属性。我们开始 我们的搜索是通过找到在图 2.23中任何 FD 的左侧找到的所有属性集 X 的 X 的集合闭包来进行的,此处重复。
- emp_id → emp_name emp_phone dept_name
- dept_name → dept_phone dept_mgrname
- skill_id → skill_name
- emp_id skill_id → skill_date skill_lvl▪
Starting with X = emp_id skill_id (the left side of FD 4 above), we use Algorithm 2.1 and the FD set F given to determine X+. Starting from X+ = emp_id skill_id and applying FD 4, we get X+ = emp_id skill_id skill_date skill_lvl. Next, applying FD 3, since skill_id is in X+, we add skill_name to X+. Applying FD 1, since emp_id is in X+, we add the right-hand side of FD 1 to get X+ = emp_id skill_id skill_date skill_lvl skill_name emp_name emp_phone dept_name. Finally, we apply FD 2, and since dept_name is now in X+, we add the right-hand side of FD 2 to get X+ = emp_id skill_id skill_date skill_lvl skill_name emp_name emp_phone dept_name dept_phone dept_mgrname. This final list contains all the attributes in emp_info—that is, Head(emp_info). By the definition of X+, this means that
以 X 开头 = emp_id skill_id (上面FD 4的左侧),我们使用算法2.1和给出的FD集合F来确定X + 。从 X + = 开始 emp_id skill_id 应用 FD 4,我们得到 X + = emp_id skill_id skill_date skill_lvl 。接下来,应用 FD 3,因为 skill_id 在 X +中,我们添加 skill_name 至 X + 。应用 FD 1,因为 emp_id 在 X +中,我们将 FD 1 的右侧相加得到 X + = emp_id skill_id skill_date skill_lvl skill_name emp_name emp_phone dept_name 。最后,我们应用 FD 2,因为 dept_name 现在在 X +中,我们将 FD 2 的右侧相加得到 X + = emp_id skill_id skill_date skill_lvl skill_name emp_name emp_phone dept_name dept_phone dept_mgrname 。最终列表包含以下所有属性 emp_info ——也就是说,头( emp_info )。根据 X +的定义,这意味着
- (2.1) emp_id skill_id → Head(emp_info)
By Theorem 2.75 then, emp_id skill_id is a superkey for emp_info.
那么根据定理2.75, emp_id skill_id 是一个超级键 emp_info 。
To show that emp_id skill_id is in fact a key for emp_info, we need only show that no subset (either emp_id or skill_id alone) functionally determines all these attributes. Let us take the closure of the set emp_id to find what attributes are functionally determined. We can immediately apply FD 1 to get emp_id → emp_id emp_name emp_phone dept_name. Next we can apply FD 2, and derive
为了表明 emp_id skill_id 实际上是一个关键 emp_info ,我们只需要证明没有子集(或者 emp_id 或者 skill_id 单独)在功能上决定了所有这些属性。让我们来看看集合的闭包 emp_id 找出哪些属性是由功能决定的。我们可以立即应用 FD 1 来得到 emp_id → emp_id emp_name emp_phone dept_name 。接下来我们可以应用FD 2,并推导
- (2.2) emp_id → emp_id emp_name emp_phone dept_name dept_phone dept_mgrname
Since skill_id is not in the right-hand set of (2.2), no other FDs can be applied, so this is the maximum right-hand set that is functionally
determined by emp_id.
自从 skill_id 不在 (2.2) 的右手集中,不能应用其他 FD,因此这是由下式函数确定的最大右手集 emp_id 。
Finally, starting with skill_id alone in the set X to be closed, FD 3 is the only one that can be applied, and we see that the maximum right-hand set functionally
determined by skill_id is given as
最后,从 skill_id 单独在要闭合的集合 X 中,FD 3 是唯一可以应用的,我们看到最大右手集合在函数上由下式确定: skill_id 给出为
- (2.3) skill_id → skill_id skill_name
(2.3) skill_id → skill_id skill_name
Neither (2.2) nor (2.3) contains all attributes of emp_info, and thus we can conclude from (2.1) that
(2.2) 和 (2.3) 都不包含 emp_info ,因此我们可以从(2.1)得出结论
- (2.4) emp_id skill_id is a key for the emp_info table
(2.4) emp_id skill_id 是一个关键 emp_info 桌子
In addition, we note from (2.2) that emp_id functionally determines all attributes in the emps table of Figure 2.24, and since no subset of a singleton set can be on the left side of an FD,
此外,我们从(2.2)中注意到 emp_id 功能上决定了所有属性 emps 图 2.24的表,并且由于单例集的子集不能位于 FD 的左侧,
Finally, we note that the skills table has attributes that are not functionally determined by either emp_id or skill_id individually, skill_lvl is not on the right-hand side in either (2.2) or (2.3), and therefore the only possible key for the skills table is emp_id skill_id:
最后,我们注意到 skills 表具有在功能上不由任何一个确定的属性 emp_id 或者 skill_id 单独地, skill_lvl 不在 (2.2) 或 (2.3) 的右侧,因此是唯一可能的键 skills 表是 emp_id skill_id :
- (2.6) emp_id skill_id is a key for the skills table
(2.6) emp_id skill_id 是一个关键 skills 桌子
Proposition 2.2. 命题2.2。
The factorization of the emp_info table into the emps and skills tables is a true lossless decomposition.
因式分解 emp_info 表入 emps 和 skills 表是真正的无损分解。
- PROOF. To see that this is a valid decomposition, we note that Head(emps) ∪ Head(skills) = Head(emp_info). Furthermore, Head(emps) ∩ Head(skills) = emp_id, and since functional dependency (2.2) shows that emp_id → Head(emps), by Theorem 2.6, the decomposition is lossless.▪
证明。为了证明这是一个有效的分解,我们注意到 Head( emps ) ∪ 头( skills ) = 头( emp_info )。此外,头( emps ) ∩ 头( skills )= emp_id ,并且由于函数依赖 (2.2) 表明 emp_id → 头( emps ),根据定理 2.6,分解是无损的。▪
From Proposition 2.2, we see that the decomposition that brings us from the emp_info table of Figure 2.23 to the emps and skills tables of Figure 2.24 will always allow us to recapture any content of emp_info by a join of the two factored tables. But the real motivation for this decomposition was to deal with the various anomalies
mentioned earlier.
从命题 2.2 中,我们看到分解使我们从 emp_info 图2.23的表 emps 和 skills 图 2.24的表格将始终允许我们重新捕获 emp_info 通过两个分解表的联接。但这种分解的真正动机是为了处理前面提到的各种异常情况。
How did the delete anomaly mentioned in Section 2.5 arise in the emp_info table of Figure 2.23? The basic reason is that the pair of attributes emp_id skill_id form the key for that table, but there are attributes that we wish to keep track of that are functionally determined by a
single one of those two attributes, emp_id. If we delete the last skill_id value for some specific emp_id, we no longer have any (emp_id skill_id) pairs with that specific emp_id, but we still have information that is dependent only on emp_id, which we don't want to lose! Putting this in terms of the ER model, employees are real entities whose attributes we want to keep track of (and so the
employee identifier, emp_id, shows up on the left of a functional dependency).
2.5节提到的删除异常是如何出现的 emp_info 图 2.23的表格?根本原因是这对属性 emp_id skill_id 形成该表的键,但我们希望跟踪一些属性,这些属性在功能上由这两个属性中的一个决定, emp_id 。如果我们删除最后一个 skill_id 对于某些特定的值 emp_id ,我们不再有任何( emp_id skill_id ) 与特定的配对 emp_id ,但我们仍然拥有仅依赖于 emp_id ,我们不想失去!就 ER 模型而言,员工是真实的实体,我们想要跟踪其属性(因此员工标识符, emp_id ,显示在函数依赖的左侧)。
In the decomposition of Figure 2.24, we factored the emps table out of the emp_info table so that we wouldn't lose information in this way. With this new schema, we can keep a row for a given employee in the
emps table even if the employee has no skills. Recall that the insert anomaly is the inverse face of the delete anomaly, making
it impossible to insert a new employee without skills—a trainee—into the emp_info table. As before, this problem is solved by factoring out the emps table, since a new row can be inserted into emps that doesn't have any join to a row of the skills table. As far as the update anomaly is concerned, this problem arises in the emp_info table once again because attributes dependent only on emp_id are in a table with key emp_id skill_id; we can therefore have multiple rows with the same employee phone number in this table that must all be updated at once.
Once again, factoring out the emps table solves this problem, because each employee is now represented by a single row.
在图 2.24的分解中,我们将 emps 表出的 emp_info 表,这样我们就不会以这种方式丢失信息。通过这个新模式,我们可以在 emps 即使员工没有技能。回想一下,插入异常是删除异常的反面,因此不可能将没有技能的新员工(实习生)插入到系统中。 emp_info 桌子。和之前一样,这个问题是通过分解出 emps 表,因为可以将新行插入到 emps 没有任何连接到的行 skills 桌子。就更新异常而言,这个问题出现在 emp_info 再次表,因为属性仅依赖于 emp_id 位于带有键的表中 emp_id skill_id ;因此,我们可以在此表中包含具有相同员工电话号码的多行,并且必须同时更新这些行。再次,排除 emps 表解决了这个问题,因为每个员工现在都由一行表示。
The question now is this: Are there any more anomalies remaining in the database schema of Figure 2.24? The answer, perhaps unsurprisingly, is yes. There is another anomaly of the kind we have just analyzed in the skills table. This table has the primary key (skill_id emp_id), and we recall FD 3 of Figure 2.22:
现在的问题是:图 2.24的数据库模式中是否还存在任何异常情况?也许毫不奇怪,答案是肯定的。还有我们刚刚分析过的另一种异常现象 skills 桌子。该表有主键( skill_id emp_id ),我们回想一下图 2.22的 FD 3 :
- (2.7) skill_id → skill_name
(2.7) skill_id → skill_name
What this FD seems to be saying is that skills is an entity in its own right, that skill_id is an identifier for the entity, and that skill_name is a descriptor. (There might be two distinct skills with different skill_id values but the same skill_name, since skill_name → skill_id is not an FD that is implied by the list we presented.) But recall that the key we have discovered for the skills table is
emp_id skill_id. This situation seems to be symmetric with the one that caused us to factor out the table emps from emp_info. Can we construct (for example) a delete anomaly of the kind that led to this step? The answer is yes, for if we assume that
some skill is rare and difficult to master, and we suddenly lose the last employee who had it, we would no longer have any
information about the skill at all, neither the skill_id nor the skill_name. We therefore need to factor out another table to solve this anomaly, and we see the result in Figure 2.25.
这个FD似乎想说的是 skills 本身就是一个实体,即 skill_id 是实体的标识符,并且 skill_name 是一个描述符。 (可能有两种不同的技能 skill_id 值但相同 skill_name , 自从 skill_name → skill_id 不是我们提供的列表所暗示的 FD。)但是请记住,我们发现的技能表的关键是 emp_id skill_id 。这种情况似乎与导致我们分解表格的情况对称 emps 从 emp_info 。我们可以构建(例如)导致此步骤的删除异常吗?答案是肯定的,因为如果我们假设某种技能是稀有且难以掌握的,而我们突然失去了最后一位拥有该技能的员工,那么我们将不再拥有有关该技能的任何信息,也不会再拥有该技能。 skill_id 也不 skill_name 。因此,我们需要分解另一个表来解决这个异常,结果如图 2.25所示。
FIGURE 2.25. Employee information schema with three tables.
图 2.25。具有三个表的员工信息架构。
From examination of the new emp_skills table and skills table of Figure 2.25, it should be clear that these two tables form a lossless decomposition of the skills table of Figure 2.24. Indeed, the three tables of Figure 2.25 form a lossless decomposition of the single emp_info table we started with in Figure 2.23. Most importantly, we have dealt with the anomalies that arise from keeping attributes of skills entities in a table with
a key of two attributes. In terms of the ER model, what we have just done is to factor out the relationship emp_skills from the two entities Emps and Skills.
从新检 emp_skills 表和 skills 如图 2.25所示,应该清楚这两个表构成了 skills 表2.24 。事实上,图 2.25的三个表形成了单个 emp_info 我们从图 2.23开始。最重要的是,我们已经处理了由于将技能实体的属性保存在具有两个属性的键的表中而产生的异常情况。就ER模型而言,我们刚才所做的就是分解出关系 emp_skills 来自两个实体 Emps 和 Skills 。
Consider now the three tables of Figure 2.25. Everything in the emps table, as we showed earlier in Proposition 2.1, is functionally determined by the singleton attribute emp_id; a similar situation holds with the skills table, as we see from the FD in (2.7); in the emp_skills table, a glance at (2.2) and (2.3) makes it clear that no remaining attributes in this table are dependent on a subset of
the (emp_id skill_id) key. We ask then if any further anomalies can remain in these tables. Once more, the answer is yes!
现在考虑图 2.25的三个表。一切都在 emps 正如我们前面在命题 2.1 中所示,表在功能上是由单例属性决定的 emp_id ;类似的情况也发生在 skills 表,正如我们从(2.7)中的FD看到的;在 emp_skills 表中,浏览一下 (2.2) 和 (2.3) 就可以清楚地看出,该表中的任何剩余属性都不依赖于 ( emp_id skill_id ) 钥匙。然后我们询问这些表中是否可以保留任何进一步的异常情况。再一次,答案是肯定的!
To see how this is possible, consider what would happen if we had a large reorganization in the company, so that every employee
in one department are to be transferred to other departments (even the manager will be transferred—presumably, at some later
time, different employees will take their place in the department that has just been emptied). Now notice that when the last
employee is removed, there remains no row in the emps table containing information about the department: We have lost even the phone number of the department and the name it goes
under! The solution to this problem is obvious: We must factor out a separate table for departments. This will result in the
emp_info database of Figure 2.26; this database is in 3NF, or equivalently in this case, in BCNF. We will give definitions for these normal forms shortly.
为了看看这是怎么可能的,考虑一下如果我们公司进行一次大规模的重组,一个部门的每个员工都被调到其他部门(甚至经理也会被调——大概是在稍后的某个时间,会发生什么,不同的员工将在刚刚空出的部门中就职)。现在请注意,当最后一个员工被删除时,表中不再有任何行。 emps 包含有关部门信息的表格:我们甚至丢失了该部门的电话号码及其名称!这个问题的解决方案很明显:我们必须为部门制定一个单独的表。这将导致 emp_info 图2.26的数据库;该数据库采用 3NF 格式,或者在本例中等效地采用 BCNF 格式。我们很快就会给出这些范式的定义。
FIGURE 2.26. Employee information database schema in 3NF (also in BCNF).
图 2.26。 3NF 中的员工信息数据库架构(也在 BCNF 中)。
With the factorization of the depts table of Figure 2.26, the update anomaly relating to department information will no longer trouble us. In terms of the ER model, what we have
done is to differentiate between the two entities Emps and Depts, between which there is a many-to-one relationship (represented by the foreign key dept_name in the emps table).
随着因式分解 depts 如图2.26所示,部门信息更新异常不再困扰我们。就ER模型而言,我们所做的就是区分两个实体 Emps 和 Depts ,它们之间存在多对一的关系(用外键表示 dept_name 在 emps 桌子)。
At this point, we claim that the database schema of Figure 2.26 is in some sense a final result—no anomalies remain in the representation to trouble us. For the rationale to justify this statement, we look to the four FDs listed that must be maintained in the database, which we refer
to in what follows as the set F of FDs. In every case where we have noted an anomaly in earlier schemas, the underlying reason
for the anomaly has turned out to hinge on the fact that some attribute (it could have been a set of attributes in a different
schema) on the left-hand side of an FD in F might have multiple duplicate occurrences (or possibly zero occurrences) in the
table where it appeared. The solution was to create a separate table, placing the attributes on the left-hand side of this
FD, together with all attributes on the right-hand side in that table, while the attributes on the right-hand side were removed
from the table where they previously appeared. Look carefully at the successive decompositions presented in Figures 2.23 through 2.26 to see that this is an accurate description of what was done. Since the attributes on the left-hand side of the FD are in
both the old and the new tables and determine all other attributes in the new table, the decomposition is lossless. Thus,
FD 1 generates the emps table, FD 2 the depts table, FD 3 the skills table, and FD 4 the emp_skills table. Since no more FDs exist in F, we maintain that no more anomalies will arise, and therefore no further decomposition
is necessary. Thus, we have reached a final form.
此时,我们声称图 2.26的数据库模式在某种意义上是最终结果——表示中不再有任何异常情况来困扰我们。为了证明这一说法的合理性,我们查看了必须在数据库中维护的列出的四个 FD,我们在下文中将其称为 FD 集合 F。在我们注意到早期模式中存在异常的每种情况下,异常的根本原因都取决于左侧的某些属性(它可能是不同模式中的一组属性) F 中的 FD 在其出现的表中可能会出现多次重复出现(或者可能出现零次)。解决方案是创建一个单独的表,将属性放在该 FD 的左侧,以及该表中右侧的所有属性,同时从表中删除右侧的属性他们之前出现的地方。仔细观察图 2.23 到 2.26中呈现的连续分解,您会发现这是对所做工作的准确描述。由于FD左侧的属性在旧表和新表中都存在,并且决定了新表中的所有其他属性,因此分解是无损的。因此,FD 1 生成 emps 表,FD 2 depts 表,FD 3 skills 表,FD 4 emp_skills 桌子。由于 F 中不再存在 FD,我们认为不会再出现任何异常,因此不需要进一步分解。这样,我们就达到了最终的形式。
2.8.2. Normal Forms: BCNF, 3NF, and 2NF
2.8.2.范式:BCNF、3NF 和 2NF
The tables in the final schema of Figure 2.26 each have unique candidate keys, which we may think of as primary keys for the tables. One way to characterize why no further
decomposition is needed to address anomalies in these tables is to say that all functional dependencies involving attributes
of any single table in this schema arise from the table keys alone. We provide definitions to make this idea precise.
图 2.26最终模式中的表每个都有唯一的候选键,我们可以将其视为表的主键。描述为什么不需要进一步分解来解决这些表中的异常的一种方法是,涉及此模式中任何单个表的属性的所有函数依赖性仅源自表键。我们提供定义以使这个想法更加精确。
Definition. 定义。
Given a database schema with a universal table T and a set of functional dependencies F, let {T1, T2, … , Tk} be a lossless decomposition of T. Then a functional dependency X → Y of F is said to be preserved in the decomposition of T, or alternatively the decomposition of T preserves the functional dependency X → Y, if for some table Ti of the decomposition, X ∪ Y ⊆ Head(Ti). When this is the case, we also say that the FD X → Y is preserved in Ti or that it lies in Ti or is in Ti.
给定一个具有通用表 T 和一组函数依赖项 F 的数据库模式,令 {T 1 , T 2 , … , T k } 是 T 的无损分解。那么 F 的函数依赖项 X → Y 被称为保留在 T 的分解中,或者 T 的分解保留函数依赖性 X → Y,如果对于分解的某个表 T i , X ∪ Y ⊆ Head(T i )。在这种情况下,我们也说 FD X → Y 保留在Ti中,或者说它位于Ti中或位于Ti中。
EXAMPLE 2.23 例2.23
We have derived a number of successive decompositions of the employee information schema of Figure 2.23 with a universal table and a set F of FDs: a decomposition with two tables (Figure 2.24), three tables (Figure 2.25), and four tables (Figure 2.26). Each of these decompositions preserves all dependencies in F. For example, in the four-table decomposition of Figure 2.26, FD 1 lies in the emps table, FD 2 lies in the depts table, FD 3 lies in the skills table, and FD 4 lies in the emp_skills table.
我们使用通用表和 F 组 FD 导出了图 2.23的员工信息模式的许多连续分解:具有两个表(图 2.24 )、三个表(图 2.25 )和四个表(图 2.26 )的分解)。这些分解中的每一个都保留了 F 中的所有依赖关系。例如,在图 2.26的四表分解中,FD 1 位于 emps 表中,FD 2 位于 depts 表中,FD 3 位于 skills 表,FD 4 位于 emp_skills 桌子。
Because every FD in F is preserved in one of the four tables of Figure 2.26, whenever any single table in the schema is updated, it is possible to verify that any FD affected by the update is still
valid by testing its validity in that single table, without any need for a join. This is the motivation for seeking to preserve
functional dependencies in a decomposition.
因为 F 中的每个 FD 都保存在图 2.26的四个表之一中,所以每当更新模式中的任何单个表时,都可以通过测试该单个表中的有效性来验证受更新影响的任何 FD 是否仍然有效,无需加入任何内容。这是寻求在分解中保留函数依赖关系的动机。
Definition:Boyce-Codd Normal Form.
定义:Boyce-Codd范式。A table T in a database schema with FD set F is said to be in Boyce-Codd normal form (BCNF) when the following property holds. For any functional dependency X → A implied by F that lies in T, where A is a single attribute that is not in X, X must be a superkey for T. A database schema is in BCNF when all the tables it contains are in BCNF.
当以下属性成立时,具有 FD 集 F 的数据库模式中的表 T 被称为 Boyce-Codd 范式 (BCNF)。对于位于 T 中的 F 隐含的任何函数依赖关系 X → A,其中 A 是不在 X 中的单个属性,X 必须是 T 的超键。当数据库模式包含的所有表都在 BCNF 中时,该数据库模式在 BCNF 中。
Consider a table T, and let X → A be a functional dependency in T. If the BCNF property holds for this case, then X is a superkey,
so for some set K of attributes representing a key for T, K ⊆ X. (Note that there might be a number of different sets K1, K2, … that are candidate keys for T, as we consider in Example 2.26 below.) If the BCNF property fails, then X does not contain a key set K, and K − X is nonempty for all K. Then two cases
are possible: either (1) X − K is empty for some K—that is, X ⊂ K, and we say that some attributes of T are functionally determined
by a proper subset X of a key K; or (2) X − K is nonempty for all K, so some attributes are determined by a set X at least partially outside
each K. In the second case, we say that some attributes of T are functionally determined by a different set of attributes that does not contain and is not contained in any key set.
考虑一个表 T,并让 X → A 成为 T 中的函数依赖。如果 BCNF 属性适用于这种情况,则 X 是一个超键,因此对于表示 T 的键的某些属性集 K,K ⊆ X。(请注意,可能有许多不同的集合 K 1 、 K 2 、…,它们是 T 的候选键,正如我们在下面的示例 2.26中所考虑的那样。)如果 BCNF 属性失败,则 X 不包含键集 K,并且K − X 对于所有 K 都是非空的。那么可能有两种情况: (1) X − K 对于某些 K 来说是空的,即 X ⊂ K,并且我们说 T 的某些属性在功能上由真子集确定钥匙K的X;或 (2) X − K 对于所有 K 都是非空的,因此某些属性由至少部分位于每个 K 之外的集合 X 确定。在第二种情况下,我们说 T 的某些属性在功能上由不同的属性集确定不包含且不包含在任何密钥集中。
EXAMPLE 2.24 例2.24
In the emp_skills table of Figure 2.26, the only key consists of the set emp_id skill_id, as we can easily demonstrate by set closure arguments: Any set of attributes that functionally determine all attributes in the emp_skills table must contain both of these attributes. We claim that the table is in BCNF and will demonstrate this in Example 2.25. As we just pointed out, the BCNF property implies that no attributes of this table are functionally determined by any subset of this key set, or any different set of attributes that does not contain this key set.
在 emp_skills 图2.26的表中,唯一的键由集合组成 emp_id skill_id ,正如我们可以通过设置闭包参数轻松演示的那样:在功能上确定所有属性的任何属性集 emp_skills 表必须包含这两个属性。我们声称该表采用 BCNF 格式,并将在示例 2.25中演示这一点。正如我们刚才指出的,BCNF 属性意味着该表的任何属性都不是由该键集的任何子集或不包含该键集的任何不同属性集在功能上确定的。In the skills table of Figure 2.24, the unique key for this table consists of the two attributes emp_id skill_id, while the FD skill_id → skill_name also lies in the table. Clearly the left-hand side of this FD is a subset of the key emp_id skill_id. Because of this, the BCNF property fails for this table (and we pointed out that an anomaly arose requiring us to perform further decomposition).
在 skills 图2.24的表,该表的唯一键由两个属性组成 emp_id skill_id ,而FD skill_id → skill_name 也在于表中。显然,该 FD 的左侧是密钥的子集 emp_id skill_id 。因此,该表的 BCNF 属性失败(我们指出出现了异常,需要我们执行进一步的分解)。In the emps table of Figure 2.24 (identical to the emps table in Figure 2.25), the unique key for the table consists of the attribute emp_id, while the FD dept_name → dept_phone is implied by FD 2 of F and lies in the table. Since the left side of this FD is different from the key set (neither a subset nor a superset), the BCNF property fails, and further decomposition is necessary. Note, by the way, that a table emps2 containing all the attributes of emps except the attribute dept_phone would still not obey the BCNF property. Although the FD dept_name → dept_phone does not lie in the table emps2, the FD dept_name → dept_mgrname, which is also implied by FD 2, does lie in emps2.
在 emps 图 2.24的表格(与 emps 图 2.25中的表),表的唯一键由属性组成 emp_id ,而FD dept_name → dept_phone 由 F 的 FD 2 隐含并位于表中。由于该 FD 的左侧与密钥集不同(既不是子集也不是超集),因此 BCNF 性质失效,需要进一步分解。顺便说一下,请注意,有一个表 emps2 包含所有属性 emps 除了属性 dept_phone 仍然不遵守 BCNF 属性。虽然FD dept_name → dept_phone 不在表中 emps2 , FD dept_name → dept_mgrname ,这也由 FD 2 隐含,确实在于 emps2 。
EXAMPLE 2.25 例2.25
We claim that the database schema of Figure 2.26 is in BCNF. We need to show that for any functional dependency X → A implied by F that lies in one of the tables of Figure 2.26, where A is an attribute not in X, then X contains a key for that table. We have shown in Example 2.23 that for the set of tables in Figure 2.26, one FD of F lies in each table, and this FD has as its left side the key for the table. This does not quite conclude the issue, however, because we also need to consider all FDs that are implied by F; that is, all FDs that are true in the schema. In Proposition 2.1, FDs (2.1), (2.2), and (2.3), we determined the closure of all sets X of attributes that fall on the left side of three FDs of F, and showed that these three sets form keys for three of the tables. For the fourth FD, we need merely take the closure of dept_name, which is easily seen to consist of the set dept_name, dept_phone, dept_mgrname, or Head(depts):
我们声称图 2.26的数据库模式采用 BCNF 格式。我们需要证明,对于位于图 2.26的表之一中的 F 隐含的任何函数依赖项 X → A,其中 A 是不在 X 中的属性,则 X 包含该表的键。我们在例 2.23中已经表明,对于图 2.26中的一组表,每个表中都有一个 F 的 FD,并且该 FD 的左侧是该表的键。然而,这并不能完全结束问题,因为我们还需要考虑 F 隐含的所有 FD;也就是说,模式中所有为真的 FD。在命题 2.1、FD (2.1)、(2.2) 和 (2.3) 中,我们确定了落在 F 的三个 FD 左侧的所有属性集合 X 的闭包,并表明这三个集合形成了三个属性的键。的表。对于第四个FD,我们只需取闭包 dept_name ,很容易看出它由集合组成 dept_name , dept_phone , dept_mgrname ,或头( depts ):
- (2.8) dept_name → Head(depts)
(2.8) dept_name → 头( depts )Now we claim that all attribute sets Z that do not contain one of the sets X, the left side of an FD in F and therefore a key for one of the Figure 2.26 tables, must have trivial closure Z+ = Z. This follows from the fact that no FDs of the form X → Y exist with X ⊆ Z, and by Algorithm 2.1, no attributes will ever be added to Z's set closure.
现在我们声称,所有不包含集合 X 之一(即 F 中 FD 的左侧,因此不包含图 2.26表之一的键)的所有属性集 Z 必须具有平凡闭包 Z + = Z。这可以从事实上,当 X ⊆ Z 时,不存在 X → Y 形式的 FD,并且根据算法 2.1,不会将任何属性添加到 Z 的集合闭包中。From this we can easily see that all of the tables in Figure 2.26 are BCNF, because if X → A holds and A is an attribute not contained in the attribute set X, then X → A X, and therefore X+ is not identical to X. But we have just shown that any attribute set that does not contain a table key has a trivial closure, and this must mean that X contains some table key K. In that table, we have also included all attributes functionally determined by K, and therefore A is in that table as well.
由此我们不难看出,图2.26中的所有表都是BCNF,因为如果X→A成立且A是不包含在属性集X中的属性,则X→AX,因此X +不等于X但是我们刚刚表明,任何不包含表键的属性集都有一个平凡的闭包,这必定意味着 X 包含某个表键 K。在该表中,我们还包含了由 K 功能确定的所有属性,并且。因此 A 也在该表中。
EXAMPLE 2.26 例2.26
Suppose we changed the rules in the employee information database so that dept_mgrname was a second identifier of the Departments entity duplicating the effect of dept_name. This would add a new FD to the set F: dept_mgrname → dept_name; by transitivity, since dept_name is a key for the depts table in Figure 2.26, dept_mgrname would also be a key. The question now is whether the depts table is still in BCNF. And the answer is yes, because the BCNF property was specially constructed not to require a unique key for the table. The only thing that has changed in the depts table is that there are now two keys, but any FD of the form X → Y in this table has the necessary property that X contains dept_mgrname or X contains dept_name.
假设我们更改了员工信息数据库中的规则,以便 dept_mgrname 是第二个标识符 Departments 实体复制效果 dept_na 我。这将向集合 F 添加一个新的 FD: dept_mgrname → dept_name ;通过传递性,因为 dept_name 是一个关键 depts 图2.26中的表格, dept_mgrname 也将是一把钥匙。现在的问题是是否 depts 表仍在 BCNF 中。答案是肯定的,因为 BCNF 属性是专门构建的,不需要表的唯一键。唯一改变的是 depts 表的优点是现在有两个键,但是该表中任何 X → Y 形式的 FD 都具有 X 包含的必要属性 dept_mgrname 或 X 包含 dept_name 。
Recall that every FD in F is preserved in one of the four tables of Figure 2.26, so that whenever any table in the schema is updated, it is possible to verify that an affected FD still holds by testing
data items in that table alone. We would like to be able to guarantee that this property, preservation of FDs, can always
be achieved starting from a universal table and proceeding to a lossless decomposition into BCNF. Unfortunately this is not
true, because the BCNF criterion for a table is too strict.
回想一下,F 中的每个 FD 都保存在图 2.26的四个表之一中,因此每当更新模式中的任何表时,都可以通过单独测试该表中的数据项来验证受影响的 FD 是否仍然有效。我们希望能够保证这个属性,FD 的保存,总是可以从通用表开始并继续无损分解为 BCNF 来实现。不幸的是,事实并非如此,因为表的 BCNF 标准过于严格。
EXAMPLE 2.27 例2.27
We wish to add a number of attributes to the employee information database of Figure 2.22 to keep track of the full addresses of all employees (assumed to be living in the United States):
我们希望向图 2.22的员工信息数据库添加一些属性,以跟踪所有员工的完整地址(假设居住在美国):
- emp_cityst, emp_straddr, emp_zip
Here emp_cityst reflects the city and state, emp_zip the zip code, and emp_straddr the street name, number, and apartment, if any. We find that when we reach the decomposition of Figure 2.26, the emps table contains all of these attributes in addition to the ones that are already there, as we see in Figure 2.27.
这里 emp_cityst 反映城市和州, emp_zip 邮政编码,以及 emp_straddr 街道名称、门牌号和公寓(如果有)。我们发现,当我们进行图 2.26的分解时, emps 除了已有的属性之外,table 还包含所有这些属性,如图2.27所示。FIGURE 2.27. The emps table extended to contain employee addresses.
图 2.27。这 emps 表扩展为包含员工地址。We assume that each employee is required to provide a single address, so it is clear that the emp_id value functionally determines all these new attributes, and FD (1) is modified accordingly:
我们假设每个员工都需要提供一个地址,因此很明显 emp_id value 在功能上决定了所有这些新属性,并且 FD (1) 相应地修改:
- emp_id → emp_name emp_phone dept_name emp_straddr emp_cityst emp_zip
No keys for any other tables of Figure 2.26 are affected, and the key for the emps table is still emp_id.
图 2.26的任何其他表的键都不会受到影响,并且 emps 桌子还在 emp_id 。But the post office has assigned zip codes to cover regions of a city (determined by street address) and never to cross city boundaries, so we also have the following new FDs to add to the set F:
但是邮局已经分配了邮政编码来覆盖城市的区域(由街道地址确定)并且永远不会跨越城市边界,因此我们还有以下新的 FD 添加到集合 F 中:
- emp_cityst emp_straddr → emp_zip regions of city determine the zip code
- emp_zip → emp_cityst zip codes never cross city boundaries
Since the left side of FD 5, emp_cityst emp_straddr, is not a superkey of the emps table, we need to perform a further decomposition to achieve the BCNF property. If we did not do this and deleted the last employee in some zip code, we would lose what information we have about that zip code—namely, what city and state it is associated with. After the prescription explained in the discussion following Figure 2.26, we place the attributes on the left side of FD 5, together with all attributes on the right side of this FD in a separate table (empadds), while the attributes on the right side are removed from the table where they previously appeared (emps). The result is given in Figure 2.28. This is a perfectly reasonable lossless decomposition of the previous table (lossless because of Theorem 2.6, since emp_cityst emp_straddr is a key for empadds, and this is also the intersection of the headings of the two tables). We note too that emp_zip emp_straddr is an alternate candidate key for empadds, since taking the closure of this set, we get emp_cityst by FD 6. Thus, we have the new derived FD 7. It is easy to see by closure that no other candidate keys exist for empadds.
从FD 5的左侧开始, emp_cityst emp_straddr , 不是 的超级键 emps 表中,我们需要进行进一步的分解来实现BCNF性质。如果我们不这样做并删除了某个邮政编码中的最后一个员工,我们将丢失有关该邮政编码的信息,即它与哪个城市和州相关联。在图 2.26的讨论中解释了规定之后,我们将 FD 5 左侧的属性以及该 FD 右侧的所有属性放在一个单独的表中( empadds ),而右侧的属性将从之前出现的表中删除( emps )。结果如图2.28所示。这是上表的完全合理的无损分解(由于定理 2.6 是无损的,因为 emp_cityst emp_straddr 是一个关键 empadds ,这也是两个表标题的交集)。我们也注意到 emp_zip emp_straddr 是一个备用候选键 empadds ,自从采用该集合的闭包以来,我们得到 emp_cityst 通过 FD 6。因此,我们有新的派生 FD 7。通过闭包很容易看出 empadds 不存在其他候选键。FIGURE 2.28. A 3NF decomposition of Figure 2.27.
图 2.28。图 2.27的 3NF 分解。
- emp_zip emp_straddr → emp_cityst (FD derived from (5) and (6))
The emps table of Figure 2.28 is now in BCNF, since neither FD 5 nor 6 lies in emps, and the only remaining FD, FD 1, requires that any superkey for the table will contain emp_id. This decomposition also preserves FDs 5 and 6, which both lie entirely in the empadds table. However, at this point FD 6 forces us to perform further decomposition of empadds to achieve BCNF, since emp_zip → emp_cityst, and emp_zip does not contain either candidate key of empadds. Clearly this new decomposition must contain at most two attributes in each table, and we require one table (zipcit in Figure 2.29) to have heading emp_zip emp_cityst to contain FD 6. The other table only has two possible pairs of attributes for a heading, the left-hand side of FD 5 or the left-hand side of FD 7, both keys for the empadds table. Choosing the left-hand side of FD 5, emp_cityst emp_straddr, would not result in a lossless decomposition, since the only attribute it would have in common with zipcit is emp_cityst, and this wouldn't contain a key for either table. We therefore choose BCNF decomposition shown in the figure.
这 emps 图 2.28的表现在位于 BCNF 中,因为 FD 5 和 6 都不位于 emps ,唯一剩下的 FD,FD 1,要求该表的任何超级键都将包含 emp_id 。这种分解还保留了 FD 5 和 6,它们都完全位于 empadds 桌子。然而,此时 FD 6 迫使我们进一步分解 empadds 达到 BCNF,因为 emp_zip → emp_cityst , 和 emp_zip 不包含以下任一候选键 empadds 。显然,这个新的分解必须在每个表中最多包含两个属性,并且我们需要一个表( zipcit 在图 2.29中)有标题 emp_zip emp_cityst 包含 FD 6。另一个表只有两个可能的标题属性对,即 FD 5 的左侧或 FD 7 的左侧,这两个键都用于 empadds 桌子。选择FD 5的左侧, emp_cityst emp_straddr ,不会导致无损分解,因为它的唯一共同属性是 zipcit 是 emp_cityst ,并且这不会包含任一表的键。因此我们选择如图所示的BCNF分解。FIGURE 2.29. A BCNF decomposition of Figure 2.28.
图 2.29。图2.28的BCNF分解。The decomposition of Figure 2.29 is lossless for the following reasons: emp_zip is the key for the zipcit table, the intersection of Head(zipstr) and Head(zipcit), so that join is lossless; the union of the zipstr and zipcit table headings contains all the attributes of the previous empadds table, so zipstr and zipcit join to form the empadds table of Figure 2.28; the empadds table formed a lossless join with the emps table, so the three tables join losslessly. Furthermore, both new tables in Figure 2.29 are in BCNF form. The only FD in the zipcit table is FD 6, and emp_zip is the key. The zipstr table has no FD in it, so the unique key includes both attributes, emp_zip emp_straddr, which was also an alternate candidate key for the empadds table of Figure 2.28.
图2.29的分解是无损的,原因如下: emp_zip 是关键 zipcit 表,Head( zipstr )和头( zipcit ),这样连接是无损的;的联盟 zipstr 和 zipcit 表标题包含前面的所有属性 empadds 表,所以 zipstr 和 zipcit 加入以形成 empadds 图2.28的表格;这 empadds 表与 emps 表,因此三个表无损连接。此外,图 2.29中的两个新表都是 BCNF 形式。唯一的FD zipcit 表是 FD 6,并且 emp_zip 是关键。这 zipstr 表中没有 FD,因此唯一键包括这两个属性, emp_zip emp_straddr ,这也是 empadds 表2.28 。But the decomposition in Figure 2.29 does not preserve dependencies of the extended set F, since FD 5 does not lie in either table. This can have an unfortunate effect, in that we must perform programmatic checking to ensure that a given street address, city, state, and zip code being entered conform with the post office assignment.
但图 2.29中的分解并没有保留扩展集 F 的依赖性,因为 FD 5 并不位于这两个表中。这可能会产生不幸的影响,因为我们必须执行编程检查以确保输入的给定街道地址、城市、州和邮政编码符合邮局分配。
It seems that we have gone too far in decomposition if we really want to preserve functional dependencies. What we'd like
is a definition for normal form that allows us to stop at Figure 2.28 and not press on to Figure 2.29. In order to do this, we have to come up with a new definition for normal form (3NF, as it turns out). We achieve this with
the following definitions.
如果我们真的想保留函数依赖关系,那么我们似乎在分解方面走得太远了。我们想要的是范式的定义,它允许我们停在图 2.28上,而不是继续看图 2.29 。为了做到这一点,我们必须为范式(3NF,事实证明)提出一个新的定义。我们通过以下定义来实现这一点。
Definition:Prime Attribute.
定义:主要属性。In a table T, an attribute A is said to be prime if and only if the attribute A exists in some key K for the table.
在表 T 中,当且仅当属性 A 存在于表的某个键 K 中时,属性 A 才被称为素数。
Definition:Third Normal Form.
定义:第三范式。A table T in a database schema with FD set F is said to be in third normal form (3NF) under the following condition. For any functional dependency X → A implied by F that lies in T, where A is a single attribute that is not in X, one of the two following properties must hold: either (1) X is a superkey for T; or (2) A is a prime attribute in T. A database schema is in 3NF when all the tables it contains are in 3NF.
在以下条件下,具有 FD 集 F 的数据库模式中的表 T 被称为第三范式 (3NF)。对于位于 T 中的 F 隐含的任何函数依赖性 X → A,其中 A 是不在 X 中的单个属性,必须满足以下两个属性之一: (1) X 是 T 的超键; (2) A 是 T 中的主要属性。当数据库模式包含的所有表都属于 3NF 时,该数据库模式属于 3NF。
EXAMPLE 2.28 例2.28
Consider the database schema of Figure 2.29. Each of the tables in this schema is in BCNF, and therefore in 3NF. The BCNF prescription for a table requires that the table has property 1 of the 3NF definition, and it doesn't permit the “escape clause” of property 2. Therefore, any table in BCNF is also in 3NF, but the reverse doesn't hold.
考虑图 2.29的数据库模式。此模式中的每个表都采用 BCNF,因此采用 3NF。表的 BCNF 规定要求该表具有 3NF 定义的属性 1,并且不允许属性 2 的“转义子句”。因此,BCNF 中的任何表也是 3NF 中的,但反之则不然抓住。
EXAMPLE 2.29 例2.29
Consider the empadds table of Figure 2.28. This table is in 3NF but not in BCNF. The reason we required a further decomposition of this table was that the empadds table of Figure 2.28 had as a key the attributes emp_cityst emp_straddr, and at the same time FD 6, emp_zip → emp_cityst lies in the table. This is an FD whose left-hand side does not contain a key of empadds, so the FD does not fulfill the BCNF property. However, we note that the attribute on the right of this FD does lie in some key and is therefore prime. Thus, the FD does fulfill property 2 of the 3NF definition.
考虑 empadds 表2.28 。该表属于 3NF,但不属于 BCNF。我们需要进一步分解该表的原因是 empadds 图 2.28的表将属性作为键 emp_cityst emp_straddr ,同时FD 6, emp_zip → emp_cityst 位于表中。这是一个 FD,其左侧不包含以下键: empadds ,所以FD不满足BCNF性质。然而,我们注意到该 FD 右侧的属性确实位于某个键中,因此是素数。因此,FD 确实满足 3NF 定义的属性 2。
EXAMPLE 2.30 例2.30
We are given a table T with Head(T) = A B C D, and FD set F as follows:
给定一个表 T,其中 Head(T) = ABCD,FD 集 F 如下:
- 1. A B → C D, 2. D → B
1.AB→CD,2.D→BClearly A B is a candidate key for T, and we see by closure that A D is another one: A D+ = A D B (2) C (1). It is easy to confirm that there are no others. Now we maintain that table T is already in 3NF, because the only FD implied by F that does not contain A B on the left-hand side (and is not trivial) depends on FD 2, D → B, and since B is a prime attribute, table T is 3NF by the “escape clause” of property 2.
显然AB是T的候选键,并且通过闭包我们看到AD是另一个:AD + = ADB (2) C (1)。很容易确认没有其他人。现在我们认为表 T 已经在 3NF 中,因为 F 隐含的唯一左侧不包含 AB 的 FD(并且不是平凡的)取决于 FD 2, D → B,并且因为 B 是素数属性,表 T 通过属性 2 的“转义子句”是 3NF。If we did wish to decompose T losslessly to a BCNF form, we would want to start by projecting on a table that contains FD 2—that is, table T2 with Head(T2) = B D. Then we will want table T1 to contain a candidate key for T as well as the attribute C, but if we create T1 with Head(T1) = A B C, then the headings of T1 and T2 won't intersect in D and therefore won't join losslessly. Thus, we must create table T1 with Head(T1) = A D C. And so we have the BCNF decomposition {A D C, B D}.
如果我们确实希望将 T 无损分解为 BCNF 形式,我们需要首先投影到包含 FD 2 的表,即表 T 2且 Head(T 2 ) = B D。然后我们需要表 T 1包含 T 的候选键以及属性 C,但如果我们使用 Head(T 1 ) = ABC 创建 T 1 ,则 T 1和 T 2的标题不会在 D 中相交,因此也不会相交无损加入。因此,我们必须创建表 T 1 ,其中 Head(T 1 ) = AD C。因此我们有 BCNF 分解{ADC, BD}。
In decomposing a database with a given set of functional dependencies F to achieve a normal form, the BCNF and 3NF forms are
often identical, as we saw in Example 2.28. They differ exactly when there exist two nontrivial FDs implied by F, X → Y and Z → B, where Z ⊂ X ∪ Y and B ∈ X. In Example 2.30, FD 1 gave us A B → C D and FD 2 gave us D → B, with D ⊂ C D and B ∈ A B. Further decomposition to achieve BCNF will cause
dependencies not to be preserved. Many database designers aim for a 3NF design that preserves dependencies.
在使用给定的函数依赖集 F 分解数据库以实现范式时,BCNF 和 3NF 形式通常是相同的,如示例 2.28中所示。当存在由 F、X → Y 和 Z → B 隐含的两个非平凡 FD 时,它们完全不同,其中 Z ⊂ X ∪ Y 和 B ∈ X。在示例 2.30中,FD 1 给我们 AB → CD ,FD 2 给我们 D → B,其中 D ⊂ CD 且 B ∈ A B。进一步分解以实现 BCNF 将导致依赖关系无法保留。许多数据库设计者的目标是保留依赖关系的 3NF 设计。
Another definition for a table property, known as second normal form (2NF), is weaker than 3NF and of mainly historical interest, since no advantage arises from stopping short of 3NF. When a
table fails to be 3NF, it must contain a valid nontrivial functional dependency X → A, where A is nonprime and X is not a
superkey for T. Recall from the previous discussion of BCNF that if X is not a superkey for T, then two cases are possible:
Either X ⊂ K for some K and we say that some attributes of T are functionally determined by a proper subset X of a key K,
or else X − K is nonempty for all keys K in T, and we say that some attributes of T are functionally determined by a different
set of attributes that does not contain and is not contained in any key set. This latter case is also known as a transitive dependency, since we have K → X for any key K, and given X → A, the functional dependency K → A is implied by transitivity. A table
in 2NF is not allowed to have attributes that are functionally determined by a proper subset of a key K, but it may still
have transitive dependencies.
表属性的另一个定义称为第二范式(2NF),它比 3NF 弱并且主要具有历史意义,因为不使用 3NF 不会产生任何优势。当表不满足 3NF 时,它必须包含有效的非平凡函数依赖 X → A,其中 A 是非素数,并且 X 不是 T 的超级键。回想一下之前对 BCNF 的讨论,如果 X 不是 T 的超级键,那么有两种情况是可能的:对于某个 K,X ⊂ K,并且我们说 T 的某些属性在功能上是由密钥 K 的真子集 X 确定的,或者对于 T 中的所有密钥 K,X − K 都是非空的,并且我们假设 T 的某些属性在功能上由一组不同的属性确定,该属性不包含且不包含在任何键集中。后一种情况也称为传递依赖,因为对于任何键 K,我们都有 K → X,并且给定 X → A,函数依赖 K → A 由传递性隐含。 2NF 中的表不允许具有在功能上由键 K 的真子集确定的属性,但它仍然可能具有传递依赖性。
Definition: Second Normal Form.
定义:第二范式。A table T in a database schema with FD set F is said to be in second normal form (2NF) under the following condition: For any functional dependency X → A implied by F that lies in T, where A is a single attribute that is not in X and is nonprime, X is not a proper subset of any key K of T. A database schema is in 2NF when all the tables it contains are in 2NF.
具有 FD 集 F 的数据库模式中的表 T 在以下条件下被称为第二范式 (2NF):对于位于 T 中的 F 隐含的任何函数依赖项 X → A,其中 A 是单个属性,即不在 X 中并且是非素数,X 不是 T 的任何键 K 的真子集。当数据库模式包含的所有表都在 2NF 中时,它就是 2NF。
2.8.3. An Algorithm to Achieve Well-Behaved 3NF Decomposition
2.8.3.一种实现良好 3NF 分解的算法
For a number of technical reasons it turns out that the approach of successive decompositions to achieve a 3NF lossless join
decomposition preserving functional dependencies is distrusted by many practitioners. This is the only approach we have seen,
used in Figures 2.23 through 2.26. Problems can arise because the set F of functional dependencies used in the successive decompositions has not been carefully
defined, and as we saw in Section 2.6, numerous equivalent sets F are possible. Algorithm 2.3 provides a straightforward method to create the desired decomposition.
由于多种技术原因,许多实践者不信任通过连续分解来实现 3NF 无损连接分解并保留函数依赖性的方法。这是我们在图 2.23 到 2.26中看到的唯一方法。可能会出现问题,因为连续分解中使用的函数依赖集 F 尚未仔细定义,并且正如我们在第 2.6 节中看到的,许多等效集 F 是可能的。算法 2.3 提供了一种简单的方法来创建所需的分解。
ALGORITHM 2.3 算法2.3
This algorithm, given a universal table T and set F of FDs, generates a lossless join decomposition of T that is in 3NF and preserves all FDs of F. The output is a set S of headings (sets of attributes) for tables in the final database schema.
该算法给定通用表 T 和 FD 集合 F,生成 3NF 形式的 T 的无损连接分解,并保留 F 的所有 FD。输出是最终表中表的标题集(属性集)集合 S数据库架构。
REPLACE F WITH MINIMAL COVER OF F; /* use algorithm 2.6.13 */ S = Ø; S = Ø ; /* initialize S to null set */ FOR ALL X → Y in F /* loop on FDs found in F */ IF. FOR ALL Z∈S, X∪Y⊈Z
IF. FOR ALL Z ε S, X ∪ Y ⊈ Z/* no table contains X → Y */ THEN S = S∪Heading (X∪Y);
THEN S = S ∪ Heading (X ∪ Y);/* add new table Heading to S */ END FOR /* end loop on FDs */ IF, FOR ALL CANDIDATE KEYS K FOR T /* if no candidate Keys of T */ FOR ALL Z∈S, K⊈Z
FOR ALL Z ε S, K ⊈ Z/* are contained in any table */ THEN CHOOSE A CANDIDATE KEY K AND /* choose a candidate key */ SET S = S∪Heading(K); SET S = S ∪ Heading(K); /* and add new table to S */ Note that the function Heading(K) generates a singleton set containing the set K of attributes, which can then be added to the set S, which is a set of sets of attributes.▪
请注意,函数 Heading(K) 会生成一个包含属性集 K 的单例集,然后可以将其添加到集合 S(属性集的集合)中。▪
EXAMPLE 2.31 例2.31
To see why the choice of a candidate key might sometimes be necessary in Algorithm 2.3, consider the following small school database. We are given a universal table T with heading.
要了解为什么在算法 2.3 中有时可能需要选择候选键,请考虑以下小型学校数据库。我们得到了一个带有标题的通用表 T。
- Head(T) = instructor class_no class_room text
and FD set F given by
和 FD 集 F 由下式给出
- F = {class_no → class_room text}
In ER terms, there is an entity Classes, identified by class_no, and the actual class holds all its meetings in the same classroom with a unique text. Whether or not there is an entity Class_rooms with identifier class_room is a matter of opinion. Since there is no FD with class_room on the left, such an entity would have no descriptor attributes, and so no table exists for it in the relational model; thus, we can think of class_room as a descriptor attribute for Classes if we like. The same argument can be applied to the text attribute in Head(T). But the instructor attribute in Head(T) is a different situation. Since the instructor attribute is not functionally determined by class_no, there can be several instructors for the same class, and since instructor does not determine class_no, this means that one instructor might teach several classes. From this it is clear that instructors have independent existence from classes and in fact represent an entity, Instructors. Indeed, table T contains a relationship between Instructors and Classes.
用 ER 术语来说,有一个实体 Classes , 识别为 class_no ,并且实际班级在同一间教室中举行所有会议,并使用独特的文本。是否存在实体 Class_rooms 带标识符 class_room 这是一个见仁见智的问题。由于没有 FD class_room 在左边,这样的实体没有描述符属性,因此关系模型中不存在它的表;因此,我们可以想到 class_room 作为描述符属性 Classes 如果我们喜欢的话。同样的论证可以应用到 text Head(T) 中的属性。但是 instructor Head(T) 中的属性是一种不同的情况。自从 instructor 属性不是由函数决定的 class_no ,同一个班级可以有多名讲师,并且由于 instructo r 不确定 class_no ,这意味着一位讲师可能教授多门课程。由此可见,教师是独立于班级而存在的,实际上是一个实体, Instructors 。事实上,表 T 包含以下关系: Instructors 和 Classes 。By standard BCNF/3NF normalization, since the attributes class_room and text depend on class_no alone in table T, we need to factor T into two tables, T1 and T2, with
通过标准 BCNF/3NF 标准化,由于属性 class_room 和 text 依赖于 class_no 仅在表 T 中,我们需要将 T 分解为两个表 T 1和 T 2 ,其中
- Head(T1) = class_no class_room text
- Head(T2) = instructor class_no
But in Algorithm 2.3, only table T1 will be created in the initial loop on FDs, since the instructor attribute does not figure in any FDs of F. However, it is clear from the standard set closure approach that the unique candidate key for T is class_no instructor. Therefore, the loop on candidate keys in Algorithm 2.3 is necessary to create table T2 for set S.
但在算法 2.3 中,由于 F 的任何 FD 中都没有出现讲师属性,因此在 FD 的初始循环中只会创建表 T 1 。然而,从标准集合闭包方法可以清楚地看出,T 的唯一候选键是 class_no 讲师。因此,算法 2.3 中候选键的循环对于为集合 S 创建表 T 2是必要的。
It is commonly said that the normalization approach and the ER approach reinforce one another. Example 2.31 gives an example of this. Without considering functional dependencies, it is not clear why the instructor data item must represent an entity but the class_room data item might not. On the other hand, the ER approach gives the motivation for why the loop on candidate keys in Algorithm
2.3 is appropriate to create table T2. We need table T2 to represent the relationship between the Instructors and Classes entities.
人们普遍认为标准化方法和 ER 方法是相辅相成的。例 2.31给出了一个这样的例子。在不考虑功能依赖性的情况下,尚不清楚为什么 instructor 数据项必须代表一个实体,但 class_room 数据项可能不会。另一方面,ER 方法给出了为什么算法 2.3 中的候选键循环适合创建表 T 2 的动机。我们需要表 T 2来表示之间的关系 Instructors 和 Classes 实体。
2.8.4. A Review of Normalization
2.8.4.标准化回顾
In the normalization approach to database design, we start out with a set of data items and a set F of functional dependencies
that the designer wishes to see maintained by the database system for any future content of the database. The data items are
all placed in a single universal table T, and the set F is replaced by an equivalent minimal cover; then the designer determines
a decomposition of this table into a set of smaller tables {T1, T2, … , Tk}, with a number of good properties, as follows.
在数据库设计的规范化方法中,我们从一组数据项和一组 F 功能依赖性开始,设计者希望看到数据库系统为数据库的任何未来内容维护这些数据项和功能依赖性。数据项全部放在一个通用表T中,集合F用等价的最小覆盖来代替;然后设计者将该表分解为一组较小的表{T 1 , T 2 , … , T k },具有许多良好的属性,如下所示。
- The decomposition is lossless, so that T = T1 ⋈ T2 ⋈ … ⋈ Tn.
分解是无损的,因此 T = T 1 ⋈ T 2 ⋈ … ⋈ T n 。 - To the greatest extent possible, the only FDs X → Y in table Ti arise because X contains some key K in Ti; this is the thrust of the BCNF/3NF definitions.
在最大程度上,表Ti中唯一的 FD X → Y 出现,因为 X 包含Ti中的某个键 K;这是 BCNF/3NF 定义的主旨。 - All FDs in F of the form X → Y are preserved in tables of the decomposition.
F 中 X → Y 形式的所有 FD 都保存在分解表中。
The value of property 2 is that we can avoid the various anomalies defined in Section 2.5. It is also important that with these normal forms we can guarantee that functional dependency will not be broken, so long as we guarantee the uniqueness of all keys for a table. The Create
Table statement of SQL gives us a way to define such keys K for a table, and the uniqueness of these keys will then be guaranteed
by the system for all SQL table Update statements that follow (an update that breaks such a uniqueness constraint will result
in an error). As we will see a bit later, such a uniqueness condition is a particularly easy condition to check with an index
on the key columns involved, whereas a general functional dependency X → Y in a table Ti, where multiple rows with the same value for X can exist, is more difficult. Standard SQL does not provide a constraint to
guarantee such general dependencies against update errors.
属性2的价值在于我们可以避免2.5节中定义的各种异常。同样重要的是,通过这些范式,只要我们保证表的所有键的唯一性,我们就可以保证函数依赖性不会被破坏。 SQL 的 Create Table 语句为我们提供了一种为表定义这样的键 K 的方法,然后系统将在后面的所有 SQL 表 Update 语句中保证这些键的唯一性(打破这种唯一性约束的更新将导致错误)。正如我们稍后将看到的,这样的唯一性条件是一个特别容易检查所涉及的关键列上的索引的条件,而表 T i中的一般函数依赖性 X → Y ,其中多行具有相同的值X能否存在,是比较困难的。标准 SQL 不提供约束来保证此类一般依赖性免受更新错误的影响。
The value of property 3 should also be clear, since we want to guarantee that all functional dependencies provided by the
designer hold for any possible content of the database. Property 3 means that FDs won't cross tables in the final database
schema, so that if an update of one table occurs, only FDs in that table need to be tested by the system. On the other hand,
the very decomposition we are providing does result in a certain amount of join testing, since the standard lossless join
decomposition into tables T1 and T2 leads to a key for one table with attributes in both—that is, a key consisting of (Head(T1) ∩ Head(T2)). Standard SQL provides a constraint, known as referential integrity, that can be imposed with the Create Table statement to guarantee that these attribute values continue to make sense between
the two tables they join, a constraint also known as a foreign key condition.
属性 3 的值也应该明确,因为我们希望保证设计者提供的所有功能依赖关系适用于数据库的任何可能内容。属性3意味着FD不会跨最终数据库模式中的表,因此如果发生一张表的更新,系统只需要测试该表中的FD。另一方面,我们提供的分解确实会导致一定量的连接测试,因为表 T 1和 T 2的标准无损连接分解会导致一个表的键同时具有两个表的属性,即键由 (Head(T 1 ) ∩ Head(T 2 )) 组成。标准 SQL 提供了称为引用完整性的约束,可以通过 Create Table 语句施加该约束,以保证这些属性值在它们连接的两个表之间继续有意义,该约束也称为外键条件。
To sum up, the standard 3NF decomposition eliminates most anomalies and makes it possible to verify efficiently that desired
functional dependencies remain valid when the database is updated.
总而言之,标准 3NF 分解消除了大多数异常情况,并且可以有效地验证数据库更新时所需的功能依赖关系是否仍然有效。
Additional normal forms exist that are not covered here, 4NF and 5NF. In particular, fourth normal form, or 4NF, is based
on an entirely new type of dependency, known as a multivalued dependency. You are referred to Teorey (1994) and Ullman (1988) for good descriptions of these.
还存在此处未涵盖的其他范式,即 4NF 和 5NF。特别是,第四范式(或 4NF)基于一种全新的依赖类型,称为多值依赖。有关这些内容的详细描述,请参阅 Teorey (1994) 和 Ullman (1988)。
We should mention at this point that overnormalization, factoring a database into more tables than are required in order to reach 3NF when this is the goal, is considered a bad
practice. For example, if we factored the depts table into two tables, one with dept_name and dept_phone and a second with dept_name and dept_mgrname, we would certainly still have a 3NF database, but we would have gone further than necessary in decomposition. Unnecessary
inefficiencies would arise in retrieving all department information together, because of the join that would now be required.
我们应该在这一点上提到,过度规范化(当目标是达到 3NF 时,将数据库分解为超出所需数量的表)被认为是一种不好的做法。例如,如果我们将 depts 表分成两张表,一张带有 dept_name 和 dept_phone 第二个是 dept_name 和 dept_mgrname ,我们当然仍然会有一个 3NF 数据库,但我们在分解方面会走得更远。由于现在需要连接,因此在一起检索所有部门信息时会出现不必要的低效率。
2.9. Additional Design Considerations
2.9.其他设计考虑因素
The ER and normalization approaches both have weaknesses. The ER approach, as it is usually presented, is extremely dependent
on intuition, but if intuition fails there is little fallback. As we saw in Example 2.31, it can be difficult on the basis of intuition alone to determine whether a data item represents an entity or not. It helps to have the concept of functional
dependency from normalization. Normalization is more mathematically based and mechanical in its application, but the idea
that you can write down a complete set of FDs as a first step of logical database design is often a delusion; it may be found
later that some have been missed. The intuitive exercise of trying to discover entities and relationships and weak entities
and so on aids the designer in discovering FDs that might otherwise be overlooked.
ER 和标准化方法都有弱点。正如通常所提出的那样,ER 方法极其依赖于直觉,但如果直觉失败,则几乎没有后退办法。正如我们在示例 2.31中看到的,仅凭直觉很难确定数据项是否代表实体。它有助于从规范化中获得函数依赖的概念。规范化在应用中更加基于数学和机械性,但是可以写出完整的 FD 集作为逻辑数据库设计的第一步的想法通常是一种错觉;稍后可能会发现有些被遗漏了。尝试发现实体、关系以及弱实体等的直观练习有助于设计者发现否则可能被忽视的 FD。
Another factor affecting the normalization approach is that a certain amount of judgment might be needed to decide whether
a particular functional dependency should be reflected in a final design. Consider the CAP database schema we discussed earlier
in the chapter. It might seem that all functional dependencies that hold for the database are reflections of the table key
dependencies, so that all the tables are in BCNF. However, there is a rather unexpected FD of the following form:
影响标准化方法的另一个因素是,可能需要一定量的判断来决定是否应在最终设计中反映特定的功能依赖性。考虑我们在本章前面讨论的 CAP 数据库模式。看起来数据库的所有函数依赖关系都是表键依赖关系的反映,因此所有表都在 BCNF 中。然而,有一个相当意想不到的 FD,其形式如下:
- (2.9) qty price discnt → dollars
That is, for each order, from the order quantity, product price, and customer discount we can calculate the dollars charge
for the order. This is expressed in the following SQL Insert statement (2.10). Clearly the FD as it stands crosses tables,
and therefore the decomposition does not preserve dependencies. Note that we can create another table, ddollars, that contains all the attributes on both sides of FD (2.9), qty price discnt dollars, and simultaneously remove the dollars attribute from orders. The result, a five-table schema for CAP including the ddollars table, is a 3NF design that would be arrived at by Algorithm 2.3. The unique key for the ddollars table is qty price discnt, and the only FD is given in (2.9). There is a problem with this design, however. Whenever we want to retrieve the dollar
cost for an order, we have to perform a join with products to get price, customers to get discnt, and ddollars to read off the dollars value for the given qty, price, and discnt. Is all this really necessary?
也就是说,对于每个订单,根据订单数量、产品价格和客户折扣,我们可以计算出该订单的美元费用。这在下面的 SQL Insert 语句 (2.10) 中表达。显然,FD 是跨表的,因此分解不会保留依赖关系。请注意,我们可以创建另一个表, ddollars ,包含 FD (2.9) 两边的所有属性, qty price discnt dollars ,并同时删除 dollars 属性来自 orders 。结果是 CAP 的五表模式,包括 ddollars 表是由算法 2.3 得出的 3NF 设计。的唯一密钥 ddollars 表是 qty price discnt ,唯一的 FD 在(2.9)中给出。然而,这种设计有一个问题。每当我们想要检索订单的美元成本时,我们必须执行与 products 要得到 price , customers 要得到 discnt , 和 ddollars 读出 dollars 给定值 qty , price , 和 discnt 。这一切真的有必要吗?
If we consider the original motivations for a decomposition such as this, we have two: to remove anomalies and to validate
all FDs whenever changes are made in the data. But do we really want to validate this FD by a unique key constraint in normal
form? Presumably when a new order is inserted, the program logic does a calculation of the dollars amount to store, something
like this:
如果我们考虑这样的分解的原始动机,我们有两个:消除异常并在数据发生更改时验证所有 FD。但是我们真的想通过正常形式的唯一键约束来验证这个 FD 吗?据推测,当插入新订单时,程序逻辑会计算要存储的美元金额,如下所示:
- (2.10) exec sql insert into orders
- values (:ordno, :month, :cid, :aid, :pid, :qty,
- qty*:price – .01*:discnt*:qty*:price:
With this Insert statement we guarantee the FD (2.9); and more than that, we guarantee an exact numerical relationship that
an FD is incapable of representing. The only validation that the ddollars table is capable of providing is this: If a previous row exists with a given qty, price, and discnt, then the calculated dollars value will be identical. This seems like a rather strange validation, since if there are a lot of products and customers, with real variation in order sizes and some limit on the number of orders tracked,
we can expect to be adding many (qty, price, discnt) triples for the first time. Thus, the unique key constraint offers no real value in verification: There is no old row with
the same key to compare with it. You would much rather depend on the Insert statement (2.10) to perform the correct calculation.
In this regard, it certainly makes sense to provide this insert in a tested function that must be used by all logic-performing
inserts of new orders.
通过这个 Insert 语句,我们保证 FD (2.9);更重要的是,我们保证了 FD 无法表示的精确数值关系。唯一验证的是 ddollars 表能够提供的是:如果前一行存在给定的 qty , price , 和 discnt ,那么计算出的 dollars 值将相同。这似乎是一个相当奇怪的验证,因为如果有很多产品和客户,订单大小存在实际变化并且跟踪订单数量有一定限制,我们可以预期会添加许多 (qty, price, discnt) 第一次三倍。因此,唯一键约束在验证中没有提供实际价值:没有具有相同键的旧行可以与之比较。您更愿意依赖 Insert 语句 (2.10) 来执行正确的计算。在这方面,在经过测试的函数中提供此插入当然是有意义的,该函数必须由新订单的所有逻辑执行插入使用。
Now the delete and insert anomalies amount to saying that we don't want to lose track of any (qty, price, discnt) triples, but this is a questionable proposition given that we don't really value this method of validation. As for the update
anomaly, we consider the case of needing to update all dollars values at once for a given (qty, price, discnt). Presumably this might happen if the price or discnt value needed to be changed for orders that were previously entered, perhaps because that value was originally entered erroneously
and now has to be corrected. But this change would be so unusual and have such major ramifications for a wholesale business
that it is unreasonable to assume that an inexperienced programmer might write code to correct a single row in orders by mistake. Indeed many designers would model the dollars column as an insert-only quantity that should not be updated at all (except to correct input errors). We are therefore willing
to forgo the protection from the update anomaly.
现在删除和插入异常相当于说我们不想丢失任何异常 (qty, price, discnt) 三元组,但这是一个有问题的命题,因为我们并不真正重视这种验证方法。至于更新异常,我们考虑需要一次更新给定的所有美元值的情况 (qty, price, discnt) 。假设这可能会发生,如果 price 或者 discnt 之前输入的订单的值需要更改,可能是因为该值最初输入错误,现在必须更正。但这种变化是如此不寻常,并且对批发业务产生如此重大的影响,以至于假设一个没有经验的程序员可能会编写代码来纠正其中的单行是不合理的。 orders 误。事实上,许多设计师都会模仿 dollars 列作为仅插入数量,根本不应更新(纠正输入错误除外)。因此,我们愿意放弃对更新异常的保护。
We have gone into detail here to exemplify a type of situation that arises with some frequency in commercial applications,
a need for denormalization to improve performance. Most design practitioners will agree that there is frequently a need for this.
我们在这里详细举例说明了商业应用中经常出现的一种情况,即需要非规范化来提高性能。大多数设计从业者都会同意经常需要这样做。
2.9.1. Database Design Tools
2.9.1.数据库设计工具
A number of commercial products are aimed at providing environments to support the DBA in performing database design. These
environments are provided by database design tools, or sometimes as part of a more general class of products known as computer-aided software engineering (CASE) tools. Such tools usually have a number of components, chosen from the following kinds. It would be rare for a single
product to offer all these capabilities.
许多商业产品旨在提供支持 DBA 执行数据库设计的环境。这些环境由数据库设计工具提供,有时作为称为计算机辅助软件工程(CASE) 工具的更通用产品类别的一部分。此类工具通常具有许多组件,可从以下类型中选择。单一产品能够提供所有这些功能是很少见的。
ER Design Editor ER 设计编辑
A common component is an interface in which a designer can construct ER diagrams, editing and making changes to the diagrams
using the graphical drag-and-drop methods common to products such as the Apple Macintosh and Microsoft Windows.
通用组件是一个界面,设计人员可以在其中构建 ER 图,使用 Apple Macintosh 和 Microsoft Windows 等产品通用的图形拖放方法来编辑和更改图表。
ER to Relational Design Transformer
ER 到关系设计转换器
Another common component of such tools is a transformer that automatically performs a transformation of an ER design to a
set of relational table definitions, following the steps outlined in Section 2.3 and exemplified in the case study of Section 2.4.
此类工具的另一个常见组件是转换器,它按照第 2.3 节中概述的步骤以及第 2.4 节案例研究中的示例自动执行 ER 设计到一组关系表定义的转换。
With database design tools, the flow of development usually starts with ER design and proceeds to a relational table definition.
However, a number of products deal with functional dependencies. One tool advises loading a small universal table and abstracts
from these data the possible functional dependencies that might hold for the data. A transformation to BCNF/3NF for this set
of FDs can then be automatically generated.
使用数据库设计工具,开发流程通常从 ER 设计开始,然后进行关系表定义。然而,许多产品处理功能依赖性。一种工具建议加载一个小型通用表,并从这些数据中提取可能包含该数据的函数依赖关系。然后可以自动生成这组 FD 到 BCNF/3NF 的转换。
FD to ER Design Transformer
FD 到 ER 设计变压器
Another type of component that is sometimes offered takes a set of FDs for the database and generates a valid ER diagram to
reflect the rules of the data.
有时提供的另一种类型的组件采用数据库的一组 FD 并生成有效的 ER 图来反映数据的规则。
As indicated in the previous section, a design that is theoretically perfect may also be inefficient in terms of performance.
Thus, a good design tool tries to analyze the performance implications of a design and accepts designer decisions to perform
certain kinds of denormalization to improve performance. In addition, a tool must be forgiving of errors and omissions in
FDs and entity classifications, in order to produce some kind of best guess at a design that the designer can picture while
making corrections. This brings up another kind of standard tool component.
如上一节所述,理论上完美的设计在性能方面也可能效率低下。因此,一个好的设计工具会尝试分析设计的性能影响,并接受设计者的决定来执行某些类型的非规范化以提高性能。此外,工具必须能够容忍 FD 和实体分类中的错误和遗漏,以便对设计人员在进行更正时可以描绘的设计产生某种最佳猜测。这就带来了另一种标准工具组件。
Design Analyzers 设计分析仪
These components analyze design in the current stage and produce reports that might help the DBA to correct errors of various
kinds.
这些组件分析当前阶段的设计并生成可以帮助 DBA 纠正各种错误的报告。
For an excellent overview of database design tools, you are referred to the last chapter in Batini, Ceri, and Navathe (1992).
有关数据库设计工具的精彩概述,请参阅 Batini、Ceri 和 Navathe (1992) 的最后一章。
2.10 Suggestions for Further Reading
2.10 进一步阅读的建议
Many variations in terminology are prevalent in the field of logical database design. The ER approach is sometimes referred
to as semantic modeling. The real-world objects known as entity occurrences in our notation are often referred to in the literature as entities, and the entity in our notation that makes up a category of entity occurrences then becomes an entity type. Attributes are also sometimes called properties.
在逻辑数据库设计领域中,术语的许多变化很普遍。 ER 方法有时称为语义建模。在我们的表示法中被称为实体出现的现实世界对象通常在文献中被称为实体,而在我们的表示法中构成实体出现类别的实体则成为实体类型。属性有时也称为特性。
Let us try to give an idea of what is meant by semantic modeling. In a programming language, the syntax of the language specifies how the statements are formed out of basic textual elements. The syntax does not associate any
meaning with the statements, however. A specification of how programming language statements act under all possible conditions,
what the statements mean in terms of their effect, is known as the semantics of the language. The term semantic modeling implies that in the ER approach, we are getting into the topic of what data items really mean in order to model their behavior in terms of database structures such as relational tables.
让我们尝试解释一下语义建模的含义。在编程语言中,语言的语法指定如何从基本文本元素形成语句。然而,语法并不将任何含义与语句关联起来。编程语言语句在所有可能的条件下如何运行、语句的效果意味着什么的规范被称为语言的语义。术语“语义建模”意味着在 ER 方法中,我们正在深入了解数据项的真正含义,以便根据数据库结构(例如关系表)对其行为进行建模。
References 1, 2, and 4 cover the topic of logical database design. Reference 1 also contains as its final section an article by David Reiner on commercial products used for database design, known as database
design tools. References 3 and 5 also contain excellent coverage of many normalization concepts that are not covered in the current chapter. Reference 3, in particular, is extremely advanced and represents the state of the art in this field. Reference 6 is at the same level as this text and covers both entity–relationship and normalization.
参考文献1、2 和 4涵盖了逻辑数据库设计主题。参考文献1的最后一部分还包含 David Reiner 撰写的一篇关于用于数据库设计的商业产品(称为数据库设计工具)的文章。参考文献3 和参考文献 5还很好地涵盖了当前章节中未涉及的许多规范化概念。特别是参考文献3非常先进,代表了该领域的最新技术水平。参考文献6与本文处于同一级别,涵盖实体关系和规范化。
1. C. Batini, S. Ceri, S.B. Navathe, Conceptual Database Design 1992 Benjamin-Cummings
1. C. Batini 、 S. Ceri 、 SB Navathe ,概念数据库设计1992 本杰明-卡明斯
2. C.J. Date, An Introduction to Database Systems 6th ed 1995 Addison-Wesley
2. CJ Date ,数据库系统简介,第 6 版,1995 年 Addison-Wesley
3. David Maier, The Theory of Relational Databases 1983 Computer Science Press
3. David Maier ,关系数据库理论1983 计算机科学出版社
4. Toby J. Teorey, Database Modeling and Design: The Fundamental Principles 2nd ed 1994 Morgan Kaufmann
4. Toby J. Teorey ,数据库建模和设计:基本原则,第 2 版,1994 年 Morgan Kaufmann
5. Jeffrey D. Ullman, Database and Knowledge-Base Systems Volume 1 1988 Computer Science Press
5. Jeffrey D. Ullman , 《数据库和知识库系统》第 1 卷,1988 年计算机科学出版社
6. Jeffrey D. Ullman, Jennifer Widom, A First Course in Database Systems 1997 Prentice-Hall
6. Jeffrey D. Ullman 、 Jennifer Widom , 《数据库系统第一门课程》 1997 年 Prentice-Hall