Chapter 2. Entity–Relationship Concepts
第 2 章实体关系概念
Until now we have dealt with databases made up of a number of distinct tables, without concerning ourselves very much with
how the tables and their constituent columns were originally generated. Logical database design, also known simply as database design or database modeling, studies basic properties and interrelationships among data items, with the aim of providing faithful representations of
such items in the basic data structures of a database. Databases with different data models have different structures for
representing data; in relational databases the fundamental structures for representing data are what we have been calling
relational tables. We concentrate on relational databases in this chapter because design for the object-relational model is still in its infancy.
到目前为止,我们已经处理了由许多不同表组成的数据库,而不太关心这些表及其组成列最初是如何生成的。逻辑数据库设计,也简称为数据库设计或数据库建模,研究数据项之间的基本属性和相互关系,目的是在数据库的基本数据结构中提供这些项的忠实表示。不同数据模型的数据库有不同的数据表示结构;在关系数据库中,表示数据的基本结构就是我们所说的关系表。本章我们主要关注关系数据库,因为对象关系模型的设计仍处于起步阶段。
It is the responsibility of the database administrator (DBA) to perform this logical database design, assigning the related
data items of the database to columns of tables in a manner that preserves desirable properties. The most important test of
logical design is that the tables and attributes faithfully reflect interrelationships among objects in the real world and
that this remains true after all likely database updates in the future.
数据库管理员(DBA)有责任执行这种逻辑数据库设计,以保留所需属性的方式将数据库的相关数据项分配给表的列。逻辑设计最重要的测试是表和属性是否忠实地反映了现实世界中对象之间的相互关系,并且在未来所有可能的数据库更新之后仍然如此。
The DBA starts by studying some real-world enterprise, such as a wholesale order business, a company personnel office, or
a college registration department, whose operation needs to be supported on a computerized database system. Often working
with someone who has great expertise about the details of the enterprise, the DBA comes up with a list of data items and underlying
data objects that must be kept track of (in college student registration, this list might include student_names, courses, course_sections, class_rooms, class_periods, etc.), together with a number of rules, or constraints, concerning the interrelatedness of these data items. Typical rules for student registration are the following:
DBA首先研究一些现实世界的企业,例如批发订单业务、公司人事办公室或大学注册部门,其运作需要计算机化数据库系统的支持。 DBA 经常与对企业细节具有丰富专业知识的人合作,提出必须跟踪的数据项和底层数据对象的列表(在大学生注册中,此列表可能包括 student_names , courses , course_sections , class_rooms , class_periods 等),以及有关这些数据项的相互关联性的许多规则或约束。学生注册的典型规则如下:
- Every registered student has a unique student ID number (which we name sid).
每个注册的学生都有一个唯一的学生 ID 号(我们将其命名为 sid )。 - A student can be registered for at most one course section for a given class period.
学生在给定的课程期间最多可以注册一个课程部分。 - A classroom can house at most one course section for a given class period.
在给定的上课时间内,一间教室最多可以容纳一个课程部分。 - And so on. 等等。
From these data items and constraints, the DBA is expected to perform the logical design of the database. Two common techniques
covered in this chapter are used to perform the task of database design. The first is known as the entity–relationship approach (or ER approach), and the second is the normalization approach. The ER approach attempts to provide a taxonomy of data items to allow a DBA to intuitively recognize different
types of data classification objects (entities, weak entities, attributes, relationships, etc.) to classify the listed data
items and their relationships. After creating an ER diagram that illustrates these objects, a relatively straightforward procedure
allows the DBA to translate the design into relational tables and integrity constraints in the database system. The normalization
approach seems entirely different, and perhaps less dependent on intuition: all the data items are listed, and then all interrelatedness
rules (of a recognized kind, known as dependencies) are identified. Design starts with the assumption that all data items are placed in a single huge table and then proceeds
to break down the table into smaller tables. In the resulting set of tables, joins are needed to retrieve the original relationships.
Both the ER modeling approach and the normalization approach are best applied by a DBA with a developed intuition about data
relationships in the real world and about the way those relationships are ultimately modeled as relational tables. The two
approaches tend to lead to identical relational table designs and in fact reinforce one another in providing the needed intuition.
We will not attempt to discriminate between the two in terms of which is more applicable.
DBA 需要根据这些数据项和约束来执行数据库的逻辑设计。本章介绍的两种常见技术用于执行数据库设计任务。第一个称为实体关系方法(或ER方法),第二个是标准化方法。 ER 方法试图提供数据项的分类法,使 DBA 能够直观地识别不同类型的数据分类对象(实体、弱实体、属性、关系等),从而对列出的数据项及其关系进行分类。创建说明这些对象的 ER 图后,DBA 可以通过一个相对简单的过程将设计转换为数据库系统中的关系表和完整性约束。规范化方法似乎完全不同,并且可能不太依赖直觉:列出所有数据项,然后识别所有相互关联规则(属于公认的类型,称为依赖关系)。设计首先假设所有数据项都放置在一个巨大的表中,然后继续将该表分解为较小的表。在生成的表集中,需要连接来检索原始关系。 ER 建模方法和规范化方法最好由 DBA 来应用,该 DBA 对现实世界中的数据关系以及这些关系最终建模为关系表的方式有深入的直觉。这两种方法往往会导致相同的关系表设计,并且实际上在提供所需的直觉方面相互加强。我们不会试图区分两者谁更适用。
One of the major features of logical database design is the emphasis it places on rules of interrelationships between data
items. The naive user often sees a relational table as made up of a set of descriptive columns, one column much like another.
But this is far from accurate, because there are rules that limit possible relationships between values in the columns. For
example, a customers table, conceived as a relation, is a subset of the Cartesian product of four domains, CP = CID × CNAME × CITY × DISCNT. However, in any legal customers table, two rows with the same customer ID (cid) value cannot exist because cid is a unique identifier for a customers row. Here is a perfect example of the kind of rule we wish to take into account in our logical database design. A faithful
table representation enforces such a requirement by specifying that the cid column is a candidate key or the primary key for the customers table. A candidate key is a designated set of columns in a table such that two table rows can never be alike in all these
column values, and where no smaller subset of the key columns has this property. A primary key is a candidate key that has
been chosen by the DBA for external reference from other tables to unique rows in the table.
逻辑数据库设计的主要特征之一是强调数据项之间相互关系的规则。天真的用户经常将关系表视为由一组描述性列组成,其中一列与另一列非常相似。但这远不准确,因为有一些规则限制了列中值之间可能的关系。例如,一个 customers 表被视为一种关系,是四个域的笛卡尔积的子集, CP = CID × CNAME × CITY × DISCNT 。然而,在任何法律 customers 表中,两行具有相同的客户 ID ( cid ) 值不能存在,因为 cid 是一个唯一的标识符 customers 排。这是我们希望在逻辑数据库设计中考虑的规则类型的完美示例。忠实的表表示通过指定以下内容来强制执行这样的要求: cid 列是候选键或主键 customers 桌子。候选键是表中指定的一组列,使得两个表行在所有这些列值中永远不会相似,并且键列的较小子集不具有此属性。主键是 DBA 选择的候选键,用于从其他表外部引用表中的唯一行。
A faithful representation in a computerized database table of a candidate key or a primary key is provided when the table
is created with the SQL Create Table statement (see the syntax given in the declaration in Figure 2.1).
当使用 SQL Create Table 语句创建表时,会在计算机化数据库表中提供候选键或主键的忠实表示(请参见图 2.1中声明中给出的语法)。
FIGURE 2.1. SQL declaration of customers table with primary key cid and candidate key ssn.
图 2.1。 SQL 声明 customers 带主键的表 cid 和候选键 ssn 。
The fact that the ssn column is declared as not null unique in a Create Table statement simply means that in any permitted customers content, two rows cannot have the same ssn value, and thus it is a candidate key. When cid is declared as a primary key in the Create Table statement, this is a more far-reaching statement, making cid the identifier of customers rows that might be used by other tables. Following either of the table definitions of 2.1, a later SQL Insert or Update statement
that would duplicate a cid value or ssn value on two rows of the customers table is illegal and has no effect. Thus, a faithful representation of the table key is maintained by the database system.
事实是 ssn 列在 Create Table 语句中被声明为非空唯一,仅意味着在任何允许的情况下 customers 内容,两行不能相同 ssn 值,因此它是候选键。什么时候 cid 在Create Table语句中被声明为主键,这是一个影响更深远的语句,使得 cid 的标识符 customers 可能被其他表使用的行。遵循 2.1 的任一表定义,后面的 SQL Insert 或 Update 语句将复制 cid 值或 ssn 两行的值 customers 表是非法的并且没有任何作用。因此,数据库系统维护了表键的忠实表示。
Also a number of other clauses of the Create Table statement serve a comparable purpose of limiting possible table content,
and we refer to these as integrity constraints for a table. The interrelationships between columns in relational tables must be understood at a reasonably deep level in
order to properly appreciate some constraints. Although not all concepts of logical design can be faithfully represented in
the SQL of today, SQL is moving in the direction of modeling more and more such concepts. In any event, many of the ideas
of logical design can be useful as an aid to systematic database definition even in the absence of direct system support.
此外,Create Table 语句的许多其他子句也具有限制可能的表内容的类似目的,我们将这些称为表的完整性约束。必须相当深入地理解关系表中列之间的相互关系,以便正确理解某些约束。尽管并不是所有的逻辑设计概念都能在今天的 SQL 中忠实地表示,但 SQL 正在朝着对越来越多的此类概念进行建模的方向发展。无论如何,即使在没有直接系统支持的情况下,逻辑设计的许多思想也可以作为系统数据库定义的辅助手段。
In the following sections, we first introduce a number of definitions of the ER model. The process of normalization is introduced
after some ER intuition has been developed.
在下面的章节中,我们首先介绍 ER 模型的一些定义。在发展了一些 ER 直觉之后,引入了标准化过程。
2.1. Introduction to ER Concepts
2.1. ER 概念简介
The ER approach attempts to define a number of data classification objects; the database designer is then expected to classify
data items by intuitive recognition as belonging in some known classification. Three fundamental data classification objects
introduced in this section are entities, attributes, and relationships.
ER方法试图定义许多数据分类对象;然后,数据库设计者需要通过直观识别将数据项分类为属于某些已知的分类。本节介绍的三个基本数据分类对象是实体、属性和关系。
2.1.1. Entities, Attributes, and Simple ER Diagrams
2.1.1.实体、属性和简单 ER 图
We begin with a definition of the concept of entity.
我们首先定义实体概念。
Definition:Entity. 定义:实体。
An entity is a collection of distinguishable real-world objects with common properties.
实体是具有共同属性的可区分的现实世界对象的集合。
For example, in a college registration database we might have the following entities: Students, Instructors, Class_rooms, Courses, Course_sections, Class_periods, and so on. (Note that entity names are capitalized.) Clearly the set of classrooms in a college fits our definition of an
entity: individual classrooms in the entity Class_rooms are distinguishable (by location—i.e., room number) and have other common properties such as seating capacity (not common
values, but a common property). Class_periods is a somewhat surprising entity—is “MWF from 2:00 to 3:00 pm” a real-world object? However, the test here is that the registration process deals with these class periods as if they were
objects, assigning class periods in student schedules in the same sense that rooms are assigned.
例如,在大学注册数据库中,我们可能有以下实体: Students , Instructors , Class_rooms , Courses , Course_sections , Class_periods , 等等。 (请注意,实体名称大写。)显然,大学中的教室集合符合我们对实体的定义:实体中的各个教室 Class_rooms 是可区分的(通过位置,即房间号),并且具有其他共同属性,例如座位容量(不是共同值,而是共同属性)。 Class_periods 是一个有点令人惊讶的实体——“MWF from 2:00 to 3:00 pm ”是一个真实世界的对象吗?然而,这里的测试是,注册过程将这些课程时间视为对象,在学生时间表中分配课程时间,就像分配房间一样。
To give examples of entities that we have worked with a good deal in the CAP database, we have Customers, Agents, and Products. (Orders is also an entity, but there is some possibility for confusion in this, and we discuss it a bit later.) There is a foreshadowing
here of entities being mapped to relational tables. An entity such as Customers is usually mapped to an actual table, and each row of the table corresponds to one of the distinguishable real-world objects
that make up the entity, called an entity instance, or sometimes an entity occurrence.
为了给出我们在 CAP 数据库中大量合作的实体的例子,我们有 Customers , Agents , 和 Products 。 ( Orders 也是一个实体,但是这可能会造成混淆,我们稍后再讨论。)这里有一个实体被映射到关系表的伏笔。一个实体,例如 Customers 通常映射到一个实际的表,表的每一行对应于构成实体的可区分的现实世界对象之一,称为实体实例,有时也称为实体出现。
Note that we do not yet have a name for the properties by which we tell one entity occurrence from another, the analog to
column values to distinguish rows in a relational table. For now we simply refer to entity instances as being distinguishable,
in the same sense that we would think of the classrooms in a college as being distinguishable, without needing to understand
the room-labeling scheme used. In what follows we always write an entity name with an initial capital letter, but the name
becomes all lowercase when the entity is mapped to a relational table in SQL.
请注意,我们还没有一个属性名称,可以用来区分一个实体的出现与另一个实体的出现,类似于用于区分关系表中的行的列值。现在,我们简单地将实体实例称为可区分的,就像我们认为大学的教室是可区分的一样,无需了解所使用的房间标记方案。在下文中,我们总是以大写字母开头的实体名称,但是当实体映射到 SQL 中的关系表时,名称将变为全小写。
We have chosen an unusual notation by assigning plural entity names: Students, Instructors, Class_rooms, and so forth. More standard would be entities named Student, Instructor, and Class_room. Our plural usage is chosen to emphasize the fact that each represents a set of real-world objects, usually containing multiple elements, and carries over to our plural table names (also somewhat unusual),
which normally contain multiple rows. Entities are represented by rectangles in ER diagrams, as you can see by looking at
Figure 2.2.
我们通过指定复数实体名称来选择一种不寻常的表示法: Students , Instructors , Class_rooms ,等等。更标准的是实体命名 Student , Instructor , 和 Class_room 。我们选择复数用法是为了强调这样一个事实:每个对象代表一组现实世界的对象,通常包含多个元素,并延续到我们的复数表名称(也有点不寻常),它通常包含多行。实体在 ER 图中用矩形表示,如图2.2所示。
FIGURE 2.2. Example of ER diagrams with entities and attributes.
图 2.2。带有实体和属性的 ER 图示例。
Note that some other authors use the terminology entity set or entity type in referring to what we call an entity. Then to these authors, an entity is what we would refer to as an entity instance. We have also noticed occasional ambiguity within a specific author's writing, sometimes referring to an entity set and sometimes
to an entity; we assume that the object that is represented by a rectangle in an ER diagram is an entity, a collection of
real-world objects, and authors who identify such rectangles in the same way agree with our definition. It is unfortunate
that such ambiguity exists, but our notation will be consistent in what follows.
请注意,其他一些作者使用术语实体集或实体类型来指代我们所说的实体。对于这些作者来说,实体就是我们所说的实体实例。我们还注意到特定作者的写作中偶尔会出现歧义,有时指实体集,有时指实体;我们假设 ER 图中的矩形表示的对象是一个实体,是现实世界对象的集合,并且以相同方式识别此类矩形的作者同意我们的定义。不幸的是存在这种歧义,但我们的符号在下文中将保持一致。
In mathematical discussion, for purposes of definition, we usually represent an entity by a single capital letter, possibly
subscripted where several exist (e.g., E, E1, E2, etc.). An entity E is made up of a set of real-world objects, which we represent by subscripted lowercase letters: E = {e1, e2, … , en}. As mentioned above, each distinct representative ei of an entity E is called an entity instance or an entity occurrence.
在数学讨论中,出于定义的目的,我们通常用单个大写字母表示实体,可能在存在多个大写字母的情况下加上下标(例如,E、E 1 、E 2等)。实体 E 由一组现实世界的对象组成,我们用带下标的小写字母表示:E = {e 1 , e 2 , … , en }。如上所述,实体E的每个不同代表e i被称为实体实例或实体出现。
Definition:Attribute. 定义:属性。
An attribute is a data item that describes a property of an entity or a relationship (defined below).
属性是描述实体或关系(定义如下)的属性的数据项。
Recall from the definition of entity that all entity occurrences belonging to a given entity have common properties. In the ER model, these properties are known
as attributes. As we will see, there is no confusion in terminology between an attribute in the ER model and an attribute or column name
in the relational model, because when the ER design is translated into relational terms, the two correspond. A particular
instance of an entity is said to have attribute values for all attributes describing the entity (a null value is possible).
The reader should keep in mind that while we list distinct entity occurrences {e1, e2, … , en} of the entity E, we can't actually tell the occurrences apart without reference to attribute values.
回想一下实体的定义,属于给定实体的所有实体出现都具有公共属性。在 ER 模型中,这些属性称为属性。正如我们将看到的,ER 模型中的属性与关系模型中的属性或列名之间的术语不会混淆,因为当 ER 设计转换为关系术语时,两者是对应的。实体的特定实例被认为具有描述该实体的所有属性的属性值(可能为空值)。读者应该记住,虽然我们列出了实体 E 的不同实体出现次数 {e 1 , e 2 , … , en } ,但如果不参考属性值,我们实际上无法区分这些出现次数。
Each entity has an identifier, an attribute, or set of attributes that takes on unique values for each entity instance; this is the analog of the relational
concept of candidate key. For example, we define an identifier for the Customers entity to be the customer identifier, cid. There might be more than one identifier for a given entity, and when the DBA identifies a single key attribute to be the
universal method of identification for entity occurrences throughout the database, this is called a primary identifier for the entity. Other attributes, such as city for Customers, are not identifiers but descriptive attributes, known as descriptors. Most attributes take on simple values from a domain, as we have seen in the relational model, but a composite attribute is a group of simple attributes that together describe a property. For example, the attribute student_names for the Students entity might be composed of the simple attributes lname, fname, and midinitial. Note that an identifier for an entity is allowed to contain an attribute of composite type. Finally, we define a multivalued attribute to be one that can take on multiple values for a single entity instance. For example, the Employees entity might have an attached multivalued attribute named hobbies, which takes on multiple values provided by the employee asked to list any hobbies or interests. One employee might have
several hobbies, so this is a multivalued attribute.
每个实体都有一个标识符、一个属性或一组属性,它们对每个实体实例具有唯一的值;这类似于候选键的关系概念。例如,我们定义一个标识符 Customers 实体作为客户标识符, cid 。给定实体可能有多个标识符,并且当 DBA 将单个关键属性标识为识别整个数据库中实体出现的通用方法时,这称为该实体的主要标识符。其他属性,例如 city 为了 Customers 、 不是标识符,而是描述性属性,称为描述符。正如我们在关系模型中所看到的,大多数属性都采用域中的简单值,但复合属性是一组共同描述属性的简单属性。例如,属性 student_names 为 Students 实体可能由简单的属性组成 lname , fname , 和 midinitial 。请注意,实体的标识符允许包含复合类型的属性。最后,我们将多值属性定义为可以为单个实体实例采用多个值的属性。例如, Employees 实体可能有一个附加的多值属性,名为 hobbies ,它采用要求列出任何爱好或兴趣的员工提供的多个值。一名员工可能有多种爱好,因此这是一个多值属性。
As mentioned earlier, ER diagrams represent entities as rectangles. Figure 2.2 shows two simple ER diagrams. Simple, single-valued attributes are represented by ovals, attached by a straight line to the
entity. A composite attribute is also in an oval attached directly to the entity, while the simple attributes that make up
the composite are attached to the composite oval. A multivalued attribute is attached by a double line, rather than a single
line, to the entity it describes. The primary identifier attribute is underlined.
如前所述,ER 图将实体表示为矩形。图 2.2显示了两个简单的 ER 图。简单的单值属性由椭圆形表示,并通过直线连接到实体。复合属性也在直接附加到实体的椭圆形中,而构成复合的简单属性则附加到复合椭圆形。多值属性通过双线而不是单线附加到它所描述的实体。主标识符属性带有下划线。
2.1.2. Transforming Entities and Attributes to Relations
2.1.2.将实体和属性转换为关系
Our ultimate aim is to transform the ER design into a set of definitions for relational tables in a computerized database,
which we do through a set of transformation rules.
我们的最终目标是将 ER 设计转换为计算机化数据库中关系表的一组定义,这是通过一组转换规则来实现的。
- Transformation Rule 1. Each entity in an ER diagram is mapped to a single table in a relational database; the table is named after the entity. The
table's columns represent all the single-valued simple attributes attached to the entity (possibly through a composite attribute,
although a composite attribute itself does not become a column of the table). An identifier for an entity is mapped to a candidate
key for the table, as illustrated in Example 2.1, and a primary identifier is mapped to a primary key. Note that the primary identifier of an entity might be a composite
attribute, which therefore translates to a set of attributes in the relational table mapping. Entity occurrences are mapped
to the table's rows.▪
转换规则1. ER图中的每个实体都映射到关系数据库中的单个表;该表以实体命名。表的列表示附加到实体的所有单值简单属性(可能通过复合属性,尽管复合属性本身不会成为表的列)。实体的标识符映射到表的候选键,如示例 2.1所示,主标识符映射到主键。请注意,实体的主要标识符可能是复合属性,因此它会转换为关系表映射中的一组属性。实体出现映射到表的行。▪
EXAMPLE 2.1 例2.1
Here are the two tables, with one example row filled in, mapped from the Students and Employees entities in the ER diagrams of Figure 2.2. The primary key is underlined.
以下是两张表,其中填充了一个示例行,映射自 Students 和 Employees 图 2.2 ER 图中的实体。主键带有下划线。Table students 桌子 students
sid 席德 lname fname Midinitial 1134 Smith 史密斯 John 约翰 L. … … … … Table employees 桌子 employees
eid 开斋节 staddress city state zipcode 197 7 Beacon St 7 灯塔街 Boston 波士顿 MA 02122 … … … … …
- Transformation Rule 2. Given an entity E with primary identifier p, a multivalued attributed attached to E in an ER diagram is mapped to a table of its own; the table is named after the plural
multivalued attribute. The columns of this new table are named after p and a (either p or a might consist of several attributes), and rows of the table correspond to (p, a) value pairs, representing all pairings of attribute values of a associated with entity occurrences in E. The primary key attribute for this table is the set of columns in p and a.▪
转换规则 2。给定一个具有主标识符p的实体 E,ER 图中附加到 E 的多值属性将映射到其自己的表;该表以复数多值属性命名。这个新表的列以p和a命名( p或a可能包含多个属性),表的行对应于 ( p, a ) 值对,表示与实体关联的a的属性值的所有配对该表的主键属性是p和a中的列集。▪
EXAMPLE 2.2 例2.2
Here is an example database of two tables reflecting the ER diagram for the Employees entity and the attached multivalued attribute, hobbies, of Figure 2.2.
这是一个包含两个表的示例数据库,反映了 ER 图 Employees 实体和附加的多值属性, hobbies ,如图 2.2所示。Table employees 桌子 employees
eid 开斋节 staddress city state zipcode 197 7 Beacon St 7 灯塔街 Boston 波士顿 MA 02102 221 19 Brighton St 布莱顿街 19 号 Boston 波士顿 MA 02103 303 153 Mass Ave 大众大道153号 Cambridge 剑桥 MA 02123 … … … … … Table hobbies 桌子 hobbies
eid 开斋节 hobby 爱好 197 chess 棋 197 painting 绘画 197 science fiction 科幻小说 221 reading 阅读 303 bicycling 骑自行车 303 mysteries 谜团 … …
Definition:Relationship. 定义:关系。
Given an ordered list of m entities, E1, E2, … , Em (where the same entity may occur more than once in the list), a relationship R defines a rule of correspondence between the instances of these entities. Specifically, R represents a set of m-tuples, a subset of the Cartesian product of entity instances E1 × E2 × … × Em.
给定 m 个实体的有序列表 E 1 , E 2 , … , E m (其中同一实体可能在列表中出现多次),关系 R 定义这些实体的实例之间的对应规则。具体来说,R 表示 m 元组的集合,即实体实例 E 1 × E 2 × … × E m的笛卡尔积的子集。
2.1.3. Relationships among Entities
2.1.3.实体之间的关系
A particular occurrence of a relationship, corresponding to a tuple of entity occurrences (e1, e2, … , en), where ei is an instance of Ei in the ordered list of the definition, is called a relationship occurrence or relationship instance. The number of entities m in the defining list is called the degree of the relationship. A relationship between two entities is known as a binary relationship. For example, we define teaches to be a binary relationship between Instructors and Course_sections. We indicate that a relationship instance exists by saying that a particular instructor teaches a specific course section.
Another example of a relationship is works_on, defined to relate the two entities Employees and Projects in a large company: Employees works_on Projects.
关系的特定出现,对应于实体出现的元组 (e 1 , e 2 , … , e n ),其中 e i是定义的有序列表中 E i的实例,称为关系出现或关系实例。定义列表中实体的数量m称为关系度。两个实体之间的关系称为二元关系。例如,我们定义 teaches 之间存在二元关系 Instructors 和 Course_sections 。我们通过说特定教师教授特定课程部分来表明存在关系实例。关系的另一个例子是 works_on ,定义为关联两个实体 Employees 和 Projects 在一家大公司: Employees works_on Projects 。
A relationship can also have attached attributes. The relationship works_on might have the attribute percent, indicating the percent of work time during each week that the employee is assigned to work on each specific project (see
Figure 2.3). Note that this percent attribute attached to the works_on relationship would be multivalued if attached to either entity Employees or Projects; the percent attribute is only meaningful in describing a specific employee–project pair, and it is therefore a natural attribute of the
binary relationship works_on.
关系还可以具有附加属性。关系 works_on 可能有属性 percent ,表示每周分配给员工处理每个特定项目的工作时间百分比(见图2.3 )。请注意,这 percent 附加到的属性 works_on 如果附加到任一实体,关系将是多值的 Employees 或者 Projects ;这 percent 属性仅在描述特定的员工-项目对时才有意义,因此它是二元关系的自然属性 works_on 。
FIGURE 2.3. Examples of ER diagrams with relationships.
图 2.3。具有关系的 ER 图示例。
A binary relationship that relates an entity to itself (a subset of E1 × E1) is called a ring, or sometimes a recursive relationship. For example, the Employees entity is related to itself through the relationship manages, where we say that one employee manages another. Relationships are represented by diamonds in an ER diagram, with connecting
lines to the entities they relate. In the case of a ring, the connecting lines are often labeled with the names of the roles
played by the entity instances involved. In Figure 2.3 the two named roles are manager_of and reports_to.
将实体与其自身(E 1 × E 1的子集)相关的二元关系称为环,有时也称为递归关系。例如, Employees 实体通过关系与其自身相关 manages ,我们说一名员工管理另一名员工。关系在 ER 图中用菱形表示,并用连接线连接到它们相关的实体。在环的情况下,连接线通常标有所涉及的实体实例所扮演的角色的名称。在图 2.3中,两个命名角色是 manager_of 和 reports_to 。
Note that we often leave out attributes in an ER diagram to concentrate on relationships between entities without losing our
concentration in excessive detail.
请注意,我们经常在 ER 图中省略属性,以专注于实体之间的关系,同时又不会失去对过多细节的关注。
EXAMPLE 2.3 例2.3
The orders Table in CAP Does Not Represent a Relationship
这 orders CAP 中的表不代表关系Per the relationship definition, the orders table in the CAP database is not a relationship between Customers, Agents, and Products. This is because (cid, aid, pid) triples in the rows of the orders table do not identify a subset of the Cartesian product, Customers × Agents × Products, as required. Instead, some triples of (cid, aid, pid) values occur more than once, and no doubt clearly the designer's intention, since the same customer can order the same product from the same agent on two different occasions. Instead of a relationship, the orders table represents an entity in its own right, with identifier attribute ordno. This makes a good deal of sense, since we might commonly have reason to look up a row in the orders table for reasons unconnected to relating entity occurrences in Customers, Agents, and Products. For example, on request, we might need to check that a past order has been properly billed and shipped. Thus, the entity Orders occurrences are dealt with individually as objects in their own right.
根据关系定义, orders CAP数据库中的表之间没有关系 Customers , Agents , 和 Products 。这是因为 (cid, aid, pid) 的行中的三元组 orders 表不识别笛卡尔积的子集, Customers × Agents × Products ,根据需要。相反,一些三倍 (cid, aid, pid) 值出现不止一次,这无疑清楚地表明了设计师的意图,因为同一个客户可以在两个不同的场合从同一个代理商处订购相同的产品。而不是一种关系, orders 表本身代表一个实体,具有标识符属性 ordno 。这很有意义,因为我们通常可能有理由在 orders 表中的原因与相关实体的出现无关 Customers , Agents , 和 Products 。例如,根据要求,我们可能需要检查过去的订单是否已正确计费和发货。因此,实体 Orders 事件本身作为对象单独处理。
Although the orders table doesn't correspond directly to a relationship, it is clear that there are any number of possible relationships we could
define in terms of the orders table between the Customers, Agents, and Products entities.
虽然 orders 表并不直接对应于关系,很明显,我们可以根据以下关系定义任意数量的可能关系: orders 表之间的 Customers , Agents , 和 Products 实体。
EXAMPLE 2.4 例2.4
Assume that we are performing a study in which we commonly need to know total sales aggregated (summed) from the orders table by customers, agents, and products for the current year. We might do this, for example, to study sales volume relationships between agents and customers, as well as between customers and products, and how those relationships are affected by geographic factors (city values). However, as we begin to plan this application, we decide that it is too inefficient to always perform sums on the orders table to access the basic measures of our study, so we decide to create a new table called yearlies. We define this new table with the following SQL commands:
假设我们正在进行一项研究,其中我们通常需要了解从 orders 表格依据 customers , agents , 和 products 今年。例如,我们可能会这样做来研究之间的销量关系 agents 和 customers ,以及之间 customers 和 products ,以及这些关系如何受到地理因素的影响( city 值)。然而,当我们开始规划这个应用程序时,我们认为总是对 orders 表来访问我们研究的基本指标,因此我们决定创建一个名为 yearlies 。我们使用以下 SQL 命令定义这个新表:
- create table yearlies (cid char(4). aid char(3). pid char(3).
- totqty integer, totdoll float);
- insert into yearlies
- select cid, aid, pid, sum(qty), sum(dollars) from orders
- group by cid, aid, pid;
Once we have the new yearlies table, the totals can be kept up to date by application logic: As each new order is entered, the relevant yearlies row should be updated as well. Now the yearlies table is a relationship, since the (cid, aid, pid) triples in the rows of the table identify a subset of the Cartesian product, Customers × Agents × Products; that is to say, there are now no repeated triples in the yearlies table. Since these triples are unique, (cid, aid, pid) forms the primary key for the yearlies table.
一旦我们有了新的 yearlies 表中,总计可以通过应用程序逻辑保持最新:输入每个新订单时,相关的 yearlies 行也应该更新。现在的 yearlies 表是一种关系,因为 (cid, aid, pid) 表的行中的三元组标识笛卡尔积的子集, Customers × Agents × Products ;也就是说,现在已经没有重复的三元组了 yearlies 桌子。由于这些三元组是唯一的, (cid, aid, pid) 形成主键 yearlies 桌子。
A relationship on more than two entities is called an n-ary relationship. The yearlies relationship on three distinct entities is also known as a ternary relationship. An n-ary relationship with n > 2 can often be replaced by a number of distinct binary relationships in an ER diagram, and this is a good idea if the replacement expresses true binary relationships
for the system. Binary relationships are the ones that are familiar to most practitioners and are sufficient for almost all
applications. However, in some cases, a ternary relationship cannot be decomposed into expressive binary relationships. The
yearlies relationship of Example 2.4 expresses customer-agent-product ordering patterns over a year, a ternary relationship that cannot be decomposed (exactly)
into binary relationships. In converting an ER design to a relational one, a relationship is sometimes translated into a relational
table, and sometimes not. (We will have more to say about this in the next section.) For example, the yearlies relationship (a ternary relationship) is translated into a relational table named yearlies. However, the manages relationship between Employees and Employees, shown in Figure 2.3, does not translate into a table of its own. Instead, this relationship is usually translated into a column in employees
identifying the mgrid to whom the employee reports. This table is shown again in Figure 2.4.
两个以上实体上的关系称为n 元关系。这 yearlies 三个不同实体上的关系也称为三元关系。具有n > 2 的n元关系通常可以被 ER 图中的许多不同的二元关系替换,如果替换表达了系统的真实二元关系,那么这是一个好主意。二元关系是大多数从业者所熟悉的关系,并且足以满足几乎所有应用程序。然而,在某些情况下,三元关系不能分解为可表达的二元关系。这 yearlies 示例 2.4的关系表达了一年内的客户-代理-产品订购模式,这是一种不能(精确地)分解为二元关系的三元关系。在将 ER 设计转换为关系设计时,关系有时会转换为关系表,有时则不会。 (我们将在下一节中对此进行更多讨论。)例如, yearlies 关系(三元关系)被转换为名为的关系表 yearlies 。然而,管理之间的关系 Employees 和 Employees ,如图 2.3所示,不会转换为它自己的表。相反,这种关系通常被转化为员工中的一列,用于标识 mgrid 员工向谁报告。该表再次如图 2.4所示。
FIGURE 2.4. A table representing an entity, Employees, and a ring (recursive relationship), manages.
图 2.4。代表实体的表, Employees ,和一个环(递归关系), manages 。
Note the surprising fact that mgrid is not considered an attribute of the Employees entity, although it exists as a column in the employees table. The mgrid column is what is known as a foreign key in the relational model, and it corresponds to the actual manages relationship in the ER diagram of Figure 2.3. We deal more with this in the next section, after we have had an opportunity to consider some of the properties of relationships.
To summarize this section, Figure 2.5(a) and (b) lists the concepts introduced up to now.
请注意一个令人惊讶的事实: mgrid 不被视为属性 Employees 实体,尽管它作为列存在 employees 桌子。这 mgrid 列就是关系模型中的外键,它对应于实际的 manages 关系如图2.3的ER图。在我们有机会考虑关系的一些属性之后,我们将在下一节中详细讨论这个问题。为了总结本节,图 2.5(a) 和 (b)列出了到目前为止介绍的概念。
FIGURE 2.5. Basic ER concepts: (a) entities and attributes, and (b) relationships.
图 2.5。基本 ER 概念:(a) 实体和属性,以及 (b) 关系。
2.2. Further Details of ER Modeling
2.2. ER 建模的更多细节
Now that we’ve defined some fundamental means of classification, let's discuss properties of relationships in the ER method
of database design.
现在我们已经定义了一些基本的分类方法,接下来我们来讨论数据库设计的 ER 方法中关系的属性。
2.2.1. Cardinality of Entity Participation in a Relationship
2.2.1.实体参与关系的基数
Figure 2.6 illustrates the concepts of minimum and maximum cardinality with which an entity participates in a relationship. Figure 2.6(a), (b), and (c) represent entities E and F on the left and right, respectively, by two sets; elements of the two sets are connected by a
line exactly when a relationship R relates the two entity occurrences represented. Thus, the connecting lines themselves represent
instances of the relation R. Note that the diagrams of Figure 2.6 are not what we refer to as ER diagrams.
图 2.6说明了实体参与关系的最小和最大基数的概念。图2.6(a)、(b)、(c)分别代表左右两个集合的实体E和F;当关系 R 关联所表示的两个实体出现时,这两个集合的元素就通过一条线连接起来。因此,连接线本身代表关系 R 的实例。请注意,图 2.6中的图并不是我们所说的 ER 图。
FIGURE 2.6. Examples of relationships R between two entities E and F.
图 2.6。两个实体 E 和 F 之间关系 R 的示例。
The minimum cardinality with which an entity takes part in a relationship is the minimum number of lines that the DBA allows
to be connected to each entity instance. Note that the diagrams of Figure 2.6 would normally only give examples of relationships at a given moment, and the line connections might change, just as the row content of a table can change, until some entity instances have different numbers of lines connected. On the other
hand, the minimum and maximum cardinality properties of an entity are meant to represent rules laid down by the DBA for all
time, rules that cannot be broken by normal database changes affecting the relationship. In Figure 2.6(a), the DBA clearly permits both entity sets E and F to take part in relationship R with minimum cardinality 0; that is to say,
the DBA does not require a connecting line for each entity instance, since some elements of both sets have no lines connected to them. We symbolize
this by writing min-card(E, R) = 0 and min-card(F, R) = 0. The maximum cardinality with which E and F take part in R is not
obvious from Figure 2.6(a), however. No entity instance has more than one line connected to it, but from an example as of a given moment we have no
guarantee that the line connections won't change in the future so that some entity instances will have more than one line
connected. However, we will assume for purposes of simple explanation that the diagrams of this figure are meant to represent
exactly the cardinalities intended by the DBA. Thus, since no entity instance of E and F in Figure 2.6(a) has more than one incident connecting line, we record this fact using the notation max-card(E, R) = 1 and max-card(F, R)
= 1.
实体参与关系的最小基数是 DBA 允许连接到每个实体实例的最小行数。请注意,图 2.6的图通常仅给出给定时刻的关系示例,并且线路连接可能会发生变化,就像表的行内容可能会发生变化一样,直到某些实体实例具有不同数量的连接线路。另一方面,实体的最小和最大基数属性旨在表示 DBA 始终制定的规则,这些规则不会被影响关系的正常数据库更改所破坏。在图2.6(a)中,DBA明确允许实体集E和F都参与最小基数为0的关系R;也就是说,DBA不需要每个实体实例都有一条连接线,因为两个集合的某些元素都没有连接到它们的线。我们通过写 min-card(E, R) = 0 和 min-card(F, R) = 0 来表示这一点。然而,从图 2.6(a)中,E 和 F 参与 R 的最大基数并不明显。 。没有实体实例有超过一根线连接到它,但从给定时刻的示例来看,我们不能保证线连接将来不会改变,因此某些实体实例将有不止一根线连接。然而,出于简单解释的目的,我们假设该图的图表旨在准确地表示 DBA 想要的基数。因此,由于图 2.6(a)中 E 和 F 的实体实例都没有多于一条事件连接线,因此我们使用 max-card(E, R) = 1 和 max-card(F, R) 记号来记录这一事实= 1。
In Figure 2.6(b), assuming once again that this set of lines is representative of the designer's intention, we can write min-card(E, R) =
0, since not every element of E is connected to a line, but min-card(F, R) = 1, since at least one line is connected to every
element of F, and our assumption implies that this won't change. We also write max-card(E, R) = N, where N means “more than
one”; this means that the designer does not intend to limit to one the number of lines connected to each entity instance of
E. However, we write max-card(F, R) = 1, since every element of F has exactly one line leaving it. Note that the two meaningful
values for min-card are 0 and 1 (where 0 is not really a limitation at all, but 1 stands for the constraint “at least one”), and the two meaningful values for max-card are 1 and N (N is not really a limitation, but
1 represents the constraint “no more than one”). We don't try to differentiate numbers other than 0, 1, and many. Since max-card(E,
R) = N, there are multiple entity instances of F connected to one of E by the relationship. For this reason, F is called the
“many” side and E is called the “one” side in this many-to-one relationship.
在图 2.6(b)中,再次假设这组线代表了设计者的意图,我们可以写 min-card(E, R) = 0,因为并非 E 的每个元素都连接到一条线上,但 min-card(E, R) = 0 -card(F, R) = 1,因为至少有一条线连接到 F 的每个元素,并且我们的假设意味着这不会改变。我们还写 max-card(E, R) = N,其中 N 表示“不止一个”;这意味着设计者并不打算将连接到 E 的每个实体实例的线数限制为 1。但是,我们编写 max-card(F, R) = 1,因为 F 的每个元素都恰好有一条线离开它。请注意,min-card 的两个有意义的值是 0 和 1(其中 0 根本不是真正的限制,而是 1 代表“至少一个”约束),max-card 的两个有意义的值是 1 和N(N 并不是真正的限制,但 1 代表“不超过一个”的约束)。我们不会尝试区分 0、1 和许多以外的数字。由于 max-card(E, R) = N,因此 F 的多个实体实例通过关系连接到 E 之一。因此,在这种多对一关系中,F 称为“多”方,E 称为“一”方。
Note particularly that the “many” side in a many-to-one relationship is the side that has max-card value 1! In Figure 2.6(b), the entity F corresponds to the “many” side of the many-to-one relationship, even though it has min-card(F, R) = max-card(F,
R) = 1. As just explained, the “one” side of a many-to-one relationship is the side where some entity instances can participate
in multiple relationship instances, “shooting out multiple lines” to connect to many entity instances on the “many” side! Phrased this way the terminology makes sense, but this seems to be an easy idea to forget,
and forgetting it can lead to serious confusion.
特别注意,多对一关系中的“多”方是max-card 值为 1 的一方!在图 2.6(b)中,实体 F 对应于多对一关系的“多”方,尽管它的 min-card(F, R) = max-card(F, R) = 1。正如刚才所解释的,多对一关系的“一”侧是一些实体实例可以参与多个关系实例的一侧,“射出多条线”以连接到“多”侧的许多实体实例!以这种方式表述,术语是有道理的,但这似乎是一个很容易忘记的想法,并且忘记它可能会导致严重的混乱。
In Figure 2.6(c) we have min-card(E, R) = 0, min-card(F, R) = 0, max-card(E, R) = N, and max-card(F, R) = N. The meaning of the terms used
for the three diagrams—one-to-one relationship, many-to-one relationship, and many-to-many relationship—are defined later.
在图 2.6(c)中,我们有 min-card(E, R) = 0、min-card(F, R) = 0、max-card(E, R) = N 和 max-card(F, R) = N。这三个图所用术语的含义(一对一关系、多对一关系和多对多关系)稍后定义。
EXAMPLE 2.5 例2.5
In the relationship teaches of Figure 2.3, Instructors teaches Course_sections, the DBA would probably want to make a rule that each course section needs to have at least one instructor assigned to teach it by writing min-card(Course_sections, teaches) = 1. However, we need to be careful in making such a rule, since it means that we will not be able to create a new course section, enter it in the database, assign it a room and a class period, and allow students to register for it, while putting off the decision of who is going to teach it. The DBA might also make the rule that at most one instructor can be assigned to teach a course section by writing max-card(Course_sections, teaches) = 1. On the other hand, if more than one instructor were allowed to share the teaching of a course section, the DBA would write max-card(Course_sections, teaches) = N. This is clearly a significant difference. We probably don't want to make the rule that every instructor teaches some course section (written as min-card(Instructors, teaches) = 1), because an instructor might be on leave, so we settle on min-card(Instructors, teaches) = 0. And in most universities the course load per instructor is greater than one in any given term, so we would set max-card(Instructors, teaches) = N.
在关系中 teaches 图2.3 , Instructors teaches Course_sections ,DBA 可能希望制定一条规则,即每个课程部分都需要至少分配一名讲师通过编写 min-card( Course_sections , teaches ) = 1. 但是,我们在制定这样的规则时需要小心,因为这意味着我们将无法创建新的课程部分,将其输入数据库,为其分配房间和课时,并允许学生注册该课程,同时推迟决定由谁来教授该课程。 DBA 还可能制定规则,通过编写 max-card( Course_sections , teaches ) = 1。另一方面,如果允许多名讲师分享课程部分的教学,则 DBA 会编写 max-card( Course_sections , teaches ) = N。这显然是一个显着差异。我们可能不想制定每个讲师教授某些课程部分的规则(写为 min-card( Instructors , teaches ) = 1),因为教练可能休假,所以我们选择 min-card( Instructors , teaches ) = 0。在大多数大学中,每个教师的课程负担在任何给定学期都大于 1,因此我们将设置 max-card( Instructors , teaches ) = N。
Definition. 定义。
When an entity E takes part in a relationship R with min-card(E, R) = x (x is either 0 or 1) and max-card(E, R) = y (y is either 1 or N), then in the ER diagram the connecting line between E and R can be labeled with the ordered cardinality pair (x, y). We use a new notation to represent this minimum-maximum pair (x, y): card(E, R) = (x, y).
当实体 E 参与关系 R 且 min-card(E, R) = x(x 为 0 或 1)且 max-card(E, R) = y(y 为 1 或 N)时,则在 ER 图中,E 和 R 之间的连接线可以用有序基数对 (x, y) 标记。我们使用一种新的符号来表示这个最小-最大对 (x, y):card(E, R) = (x, y)。
According to the above definition and the assignments of Example 2.5, the edge connecting the entity Course_sections to the relationship teaches should be labeled with the pair (1, 1). In Figure 2.7 we repeat the ER diagrams of Figure 2.3, with the addit ion of ordered pairs (x, y) labeling line connections, to show the minimum and maximum cardinalities for
all ER pairs. The cardinality pair for the Instructors teaches Course_sections diagram follows the discussion of Example 2.5, and other diagrams are filled in with reasonable pair values. We make a number of decisions to arrive at the following rules:
Every employee must work on at least one project (but may work on many); a project might have no employees assigned during
some periods (waiting for staffing), and of course some projects will have a large number of employees working on them; an
employee who acts in the manager_of role (see discussion below) may be managing no other employees at a given time and still be called a manager; and an employee
reports to at most one manager, but may report to none (this possibility exists because there must always be a highest-level
employee in a hierarchy who has no manager).
根据上面的定义和例2.5的赋值,连接实体的边 Course_sections 对关系 teaches 应标有 (1, 1) 对。在图 2.7中,我们重复图 2.3的 ER 图,并添加有序对 (x, y) 标记线连接,以显示所有 ER 对的最小和最大基数。的基数对 Instructors teaches Course_sections 该图遵循例 2.5的讨论,其他图填充了合理的对值。我们做出一系列决定来达成以下规则: 每个员工必须至少参与一个项目(但也可以参与多个项目);一个项目可能在某些时期没有分配员工(等待人员配备),当然有些项目会有大量员工在工作;从事以下活动的雇员 manager_of 角色(参见下面的讨论)可能在给定时间不管理其他员工,但仍称为经理;一名员工最多向一名经理汇报工作,但也可能不向任何人汇报(这种可能性是存在的,因为在层级结构中必定始终存在一名没有经理的最高级别员工)。
FIGURE 2.7. An ER diagram with labels (x, y) on ER connections.
图 2.7。 ER 图,ER 连接上带有标签 (x, y)。
In the Employees-manages diagram shown in Figure 2.7, the normal notation, card(Employees, manages), would be ambiguous. We say that there are two different roles played by the Employees entity in the relationship: the manager_of role and the reports_to role. Each relationship instance in manages connects a managed employee (Employees instance in the reports_to role) to a manager employee (Employees instance in the manager_of role). We use the cardinality notation with entities having parenthesized roles to remove ambiguity.
在 Employees-manages 示意图如图2.7所示,正常表示法, card(Employees , manages) ,会产生歧义。我们说,有两种不同的角色 Employees 关系中的实体: manager_of 角色和 reports_to 角色。中的每个关系实例 manages 连接受管理员工( Employees 实例中的 reports_to 角色)到经理员工( Employees 实例中的 manager_of 角色)。我们对具有括号角色的实体使用基数表示法来消除歧义。
- card(Employees(reports_to). manages) = (0. 1)
- card(Employees(manager_of). manages) = (0. N)
And from these cardinalities we see that an employee who acts in the manager_of role may be managing no other employees at a given time and still be called a manager; and an employee reports to at most
one manager, but may report to none (because of the highest-level employee in a hierarchy who has no manager—if it weren't
for that single person, we could give the label (1, 1) to the reports_to branch of the Employees-manages edge).
从这些基数中我们可以看出,从事以下工作的员工 manager_of 角色可能在给定时间不管理其他员工,但仍称为经理;一名员工最多向一位经理汇报,但也可能不向任何人汇报(因为层次结构中最高级别的员工没有经理,如果不是那个人,我们可以给标签 (1, 1 )到 reports_to 的分支 Employees-manages 边缘)。
Definition. 定义。
When an entity E takes part in a relationship R with max-card(E, R) = 1, then E is said to have single-valued participation in the relationship R. If max-card(E, R) = N, then E is said to be multivalued in this relationship. A binary relationship R between entities E and F is said to be many-to-many, or N-N, if both entities E and F are multi-valued in the relationship. If both E and F are single-valued, the relationship is said to be one-to-one, or 1-1. If E is single-valued and F is multivalued, or the reverse, the relationship is said to be many-to-one, or N-1. (We do not normally speak of a 1-N relationship as distinct from an N-1 relationship.)
当实体 E 参与关系 R 且 max-card(E, R) = 1 时,则称 E 在关系 R 中单值参与。如果 max-card(E, R) = N,则E 在这种关系中被认为是多值的。如果实体 E 和 F 在关系中都是多值的,则实体 E 和 F 之间的二元关系 R 被称为多对多(即 NN)。如果 E 和 F 都是单值,则该关系称为一对一或 1-1。如果 E 是单值且 F 是多值(反之亦然),则该关系称为多对一或 N-1。 (我们通常不会将 1-N 关系与 N-1 关系区分开来。)
2.2.2. One-to-One, Many-to-Many, and Many-to-One Relationships
2.2.2.一对一、多对多和多对一关系
Recall that the “many” side in a many-to-one relationship is the side that has single-valued participation. This might be
better understood by considering the relationship in Figure 2.7, Instructors teaches Course_sections, where card(Course_sections, teaches) = (1, 1), and the Course_sections entity represents the “many” side of the relationship. This is because one instructor teaches “many” course sections, while
the reverse is not true.
回想一下,多对一关系中的“多”方是具有单值参与的一方。通过考虑图 2.7中的关系可能会更好地理解这一点, Instructors teaches Course_sections , 在哪里 card(Course_sections , teaches) = (1, 1),并且 Course_sections 实体代表关系的“多”方。这是因为一位讲师教授“许多”课程部分,而反之则不然。
In the last definition, we see that the values max-card(E, R) and max-card(F, R) determine whether a binary relationship is
many-to-many, many-to-one, or one-to-one. On the other hand, the values min-card(E, R) and min-card(F, R) are not mentioned,
and they are said to be independent of these characterizations. In particular, the fact that min-card(F, R) = 1 in Figure 2.6(b) is independent of the fact that that figure represents a many-to-one relationship. If there were additional elements in entity
F that were not connected by any lines to elements in E (but all current connections remained the same), this would mean that
min-card(F, R) = 0, but the change would not affect the fact that R is a many-to-one relationship. We would still see one
element of E (the second from the top) related to two elements of F; in this case, the entity F is the “many” side of the
relationship.
在最后一个定义中,我们看到值 max-card(E, R) 和 max-card(F, R) 确定二元关系是多对多、多对一还是一对一一。另一方面,没有提到值 min-card(E, R) 和 min-card(F, R),并且据说它们与这些特征无关。特别是,图 2.6(b)中 min-card(F, R) = 1 的事实与该图表示多对一关系的事实无关。如果实体 F 中还有其他元素没有通过任何线路连接到 E 中的元素(但所有当前连接保持不变),这将意味着 min-card(F, R) = 0,但更改不会影响R 是多对一关系这一事实。我们仍然会看到 E 的一个元素(从顶部数第二个)与 F 的两个元素相关;在这种情况下,实体 F 是关系的“多”方。
Although min-card(E, R) and min-card(F, R) have no bearing on whether a binary relationship R is many-to-many, many-to-one,
or one-to-one, a different characterization of entity participation in a relationship is determined by these quantities.
尽管 min-card(E, R) 和 min-card(F, R) 与二元关系 R 是多对多、多对一还是一对一无关,但不同的表征实体参与关系的程度由这些数量决定。
Definition. 定义。
When an entity E that participates in a relationship R has min-card(E, R) = 1, E is said to have mandatory participation in R, or is simply called mandatory in R. An entity E that is not mandatory in R is said to be optional, or to have optional participation.
当参与关系 R 的实体 E 的 min-card(E, R) = 1 时,称 E 在 R 中强制参与,或者简称为 R 中强制。在 R 中非强制的实体 E 为据说是可选的,或者有可选的参与。
2.2.3. Transforming Binary Relationships to Relations
2.2.3.将二元关系转变为关系
We are now prepared to give the transformation rule for a binary many-to-many relationship.
我们现在准备给出二元多对多关系的转换规则。
- Transformation Rule 3. N–N Relationships: When two entities E and F take part in a many-to-many binary relationship R, the relationship is mapped to a representative
table T in the related relational database design. The table contains columns for all attributes in the primary keys of both
tables transformed from entities E and F, and this set of columns forms the primary key for the table T. Table T also contains
columns for all attributes attached to the relationship. Relationship occurrences are represented by rows of the table, with
the related entity instances uniquely identified by their primary key values as rows.▪
转换规则3.N-N关系:当两个实体E和F参与多对多二元关系R时,该关系被映射到相关关系数据库设计中的代表表T。该表包含从实体 E 和 F 转换而来的两个表的主键中所有属性的列,并且这组列形成表 T 的主键。表 T 还包含附加到关系的所有属性的列。关系出现由表的行表示,相关实体实例由其主键值唯一标识为行。▪
EXAMPLE 2.6 例2.6
In Figure 2.7, the relationship works_on is many-to-many between the entities Employees and Projects. The relational design in Figure 2.8 follows Transformation Rule 1 to provide a table for the entity Employees (as specified in Example 2.2) and a table for the entity Projects; it also follows Transformation Rule 3, to provide a table for the relationship works_on.
在图2.7中,关系 works_on 实体之间是多对多 Employees 和 Projects 。图2.8中的关系设计遵循转换规则1,为实体提供一个表 Employees (如示例 2.2中指定)和实体的表 Projects ;它还遵循转换规则 3,为关系提供一个表格 works_on 。FIGURE 2.8. Relational design for Employees works_on Projects of Figure 2.7.
图 2.8。关系设计 Employees works_on Projects 图2.7 。We generally assume that the eid column in the employees table and prid column for the projects table cannot take on null values, since they are the primary keys for their tables and must differentiate all rows by unique values. Similarly, the (eid, prid) pair of columns in the works_on table cannot take on null values in either component, since each row must uniquely designate the employee–project pair related. Note that no primary key column of a relational table can take on null values. Note that although we refer to this as the entity integrity rule, it applies as well to tables arising out of the relationships in the ER model. Note also that the SQL Create Table command provides syntax to impose an integrity constraint on a table that guarantees this rule will not be broken, that no nulls will be assigned. For example, the SQL statement
我们一般假设 eid 栏目中的 employees 表和 prid 列为 projects 表不能采用空值,因为它们是表的主键,并且必须通过唯一值区分所有行。同样, (eid, prid) 中的一对列 works_on 表中的任何一个组件都不能采用空值,因为每一行必须唯一地指定相关的员工-项目对。请注意,关系表的主键列不能采用空值。请注意,虽然我们将此称为实体完整性规则,但它也适用于由 ER 模型中的关系产生的表。另请注意,SQL Create Table 命令提供了对表施加完整性约束的语法,以保证此规则不会被破坏,并且不会分配空值。例如,SQL 语句
- create table projects (prid char(3) not null...);
guarantees that the prid column of the projects table cannot take on null values as a result of later Insert, Delete, or Update statements. There are other constraints as well that have this effect.
保证 prid 的栏目 projects 表不能因后面的 Insert、Delete 或 Update 语句而采用空值。还有其他限制也会产生这种影响。
- Transformation Rule 4. N–1 Relationships: When two entities E and F take part in a many-to-one binary relationship R, the relationship will not be mapped to a table
of its own in a relational database design. Instead, if we assume that the entity F has max-card(F, R) = 1 and thus represents
the “many” side of the relationship, the relational table T transformed from the entity F should include columns constituting
the primary key for the table transformed from the entity E; this is known as a foreign key in T. Since max-card(F, R) = 1, each row of T is related by a foreign key value to at most one instance of the entity E.
If F has mandatory participation in R, then it must be related to exactly one instance of E, and this means that the foreign
key in T cannot take on null values. If F has optional participation in R, then each row of T that is not related can have
null values in all columns of the foreign key. ▪
转换规则 4. N-1 关系:当两个实体 E 和 F 参与多对一二元关系 R 时,在关系数据库设计中,该关系不会映射到其自己的表。相反,如果我们假设实体 F 具有 max-card(F, R) = 1 并因此代表关系的“多”方,则从实体 F 转换而来的关系表 T 应该包括构成实体 F 的主键的列。由实体E转化而来的表;这被称为 T 中的外键。由于 max-card(F, R) = 1,T 的每一行都通过外键值与实体 E 的最多一个实例相关。如果 F 强制参与 R ,那么它必须与 E 的一个实例相关,这意味着 T 中的外键不能采用空值。如果 F 可选参与 R,则 T 中不相关的每一行在外键的所有列中都可以具有空值。 ▪
EXAMPLE 2.7 例2.7
Figure 2.9 shows a relational transformation of the Instructors teaches Course_sections ER diagram of Figure 2.7. Recall that we made the rule that one instructor can teach multiple course sections, but each course section can have only one instructor. The insid column in the Course_sections table is a foreign key, relating a course_sections instance (row) to a unique instructors instance (row).
图 2.9显示了关系变换 Instructors teaches Course_sections ER图如图2.7 。回想一下,我们制定了规则,一名教师可以教授多个课程部分,但每个课程部分只能有一名教师。这 insid 栏目中的 Course_sections 表是一个外键,关联一个 course_sections 实例(行)到唯一 instructors 实例(行)。FIGURE 2.9. Relational design for Instructors teaches Course_sections of Figure 2.7.
图 2.9。关系设计 Instructors teaches Course_sections 图2.7 。The Create Table command in SQL can require a column not to take on null values; therefore, it is possible to guarantee a faithful representation for mandatory participation by the “many” side entity in a many-to-one relationship. Here we can create the course_sections table so no nulls are allowed in the insid column. What we mean by “faithful” is that it becomes impossible for a user to corrupt the data by a thoughtless update, because SQL does not allow a course_sections row with a null value for insid. SQL can also impose a constraint that the foreign key insid value in a row of the course_sections table actually exists as a value in the insid primary key column in the instructors table. This constraint is known as referential integrity.
SQL 中的“创建表”命令可以要求列不采用空值;因此,可以保证多对一关系中“多”方实体强制参与的忠实代表。在这里我们可以创建 course_sections 表中不允许有空值 insid 柱子。我们所说的“忠实”是指用户不可能通过轻率的更新来破坏数据,因为 SQL 不允许 course_sections 具有空值的行 insid 。 SQL 还可以施加一个约束,即外键 insid 的一行中的值 course_sections 表实际上作为值存在于 insid 中的主键列 instructors 桌子。此约束称为引用完整性。
Unfortunately, it is not possible in standard SQL to guarantee a mandatory participation by the “one” side of a many-to-one
relationship, or by either side of a many-to-many relationship. Thus, in Example 2.7 there would be no way to provide a faithful representation in an SQL table definition that would guarantee that every instructor teaches at least one
course.
不幸的是,在标准 SQL 中不可能保证多对一关系的“一”方或多对多关系的任何一方强制参与。因此,在示例 2.7中,无法在 SQL 表定义中提供忠实的表示,以保证每位教师至少教授一门课程。
Note that there are differences of opinion among texts on some of these ER transformation rules for relationships. Teorey
(1994) gives the equivalent to Transformation Rule 4 for N-1 relationships, but Batini et al. (1992) provides an alternate
transformation where the relationship is mapped onto a table of its own if the entity at the “many” side of the relationship
has an optional participation. The reason for this is to avoid possibly heavy use of null values in the foreign key (insid in course_sections in Example 2.7); but since there seems to be nothing wrong with using null values, we follow the transformation of Teorey (1994).
请注意,文本之间对于某些关系的 ER 转换规则存在不同意见。 Teorey (1994) 给出了 N-1 关系的变换规则 4 的等价物,但 Batini 等人。 (1992) 提供了一种替代转换,如果关系“多”方的实体具有可选参与权,则关系将映射到其自己的表上。这样做的原因是为了避免在外键中可能大量使用空值( insid 在 course_sections 例 2.7中);但由于使用空值似乎没有什么问题,所以我们遵循 Teorey (1994) 的转换。
- Transformation Rule 5.1-1 Relationships, Optional Participation: Given two entities E and F that take part in a one-to-one binary relationship R, where participation is optional on either
side, we wish to translate this situation into a relational design. To do this, we create a table S to represent the entity
E, following the prescription of Transformation Rule 1, and similarly a table T to represent the entity F. Then we adjoin
to the table T a set of columns (as a foreign key) constituting the primary key for table S. If we wish, we may also adjoin
to table S a foreign key set of columns referring to the primary key of table T. For any relationship instance in R, a unique
entity instance in E is related to a unique instance in F—in the corresponding rows of S and T, the foreign key column values
filled in to reference the row in the other table arising from the instances related by R.▪
转换规则 5.1-1 关系,可选参与:给定参与一对一二元关系 R 的两个实体 E 和 F,其中任何一方的参与都是可选的,我们希望将这种情况转换为关系设计。为此,我们按照转换规则 1 的规定创建一个表 S 来表示实体 E,并类似地创建一个表 T 来表示实体 F。然后,我们将一组列(作为外键)与表 T 相连) 构成表 S 的主键。如果我们愿意,我们还可以将引用表 T 的主键的外键列集与表 S 相邻。对于 R 中的任何关系实例,E 中的唯一实体实例都是相关的到 F 中的唯一实例 - 在 S 和 T 的相应行中,填充外键列值以引用由 R 相关实例产生的其他表中的行。 - Transformation Rule 6.1-1 Relationships, Mandatory Participation on Both Sides: In the case of a one-to-one relationship with mandatory participation on both sides, it is most appropriate to combine the
tables for the two entities into one, and in this way avoid any foreign keys.▪
转换规则6.1-1关系,双方强制参与:在一对一关系,双方强制参与的情况下,将两个实体的表合并为一个是最合适的,这样避免任何外键。▪
We do not present transformation rules for all possible n-ary relationships with n > 2. Usually such an n-ary relationship is transformed into a table of its own, but if all but one of the entities of the relationship participate
with max-card = 1, then it is possible to represent the relationship with n − 1 foreign keys in the one table that participates with greater cardinality.
我们不会为所有可能的n元关系提供n > 2 的转换规则。通常,这样的n元关系会转换为它自己的表,但如果关系中除一个实体之外的所有实体都参与 max-如果card = 1,则可以用参与基数更大的一张表中的n -1个外键来表示关系。
2.3. Additional ER Concepts
2.3.其他 ER 概念
In this section we introduce a number of additional concepts useful for ER modeling.
在本节中,我们将介绍一些对 ER 建模有用的附加概念。
2.3.1. Cardinality of Attributes
2.3.1.属性的基数
To begin with, we note that the min-card/max-card notation can be used to describe the cardinality of attributes attached
to entities.
首先,我们注意到最小卡/最大卡符号可用于描述附加到实体的属性的基数。
Definition. 定义。
Given an entity E and an attached attribute A, we write min-card(A, E) = 0 to indicate that the attribute A is optional, and min-card(A, E) = 1 to indicate that the attribute A is mandatory. An attribute that is mandatory should correspond to a column declared in the table representing the entity E with no nulls allowed. We write max-card(A, E) = 1 to indicate that the attribute is single valued, and max-card(A, E) = N to indicate that the attribute is multivalued. An attribute A is said to have card(A, E) = (x, y) when min-card(A, E) = x and max-card(A, E) = y. The (x, y) pair can be used to label an attribute–entity connection in an ER diagram to indicate the cardinality of the attribute.
给定实体 E 和附加属性 A,我们写 min-card(A, E) = 0 表示属性 A 是可选的,min-card(A, E) = 1 表示属性 A 是强制的。强制属性应对应于表中声明的表示实体 E 的列,不允许有空值。我们写 max-card(A, E) = 1 表示该属性是单值属性,写 max-card(A, E) = N 表示该属性是多值属性。当 min-card(A, E) = x 且 max-card(A, E) = y 时,属性 A 被称为具有 card(A, E) = (x, y)。 (x, y) 对可用于标记 ER 图中的属性-实体连接,以指示属性的基数。
Attributes that have unlabeled connectors in an ER diagram can be assumed to have cardinality (0, 1) if they are descriptor
attributes, and cardinality (1, 1) if they are identifier attributes. Figure 2.10 recapitulates Figure 2.2 with labeled attribute–entity connectors. (Note that these are not the default cardinalities only because of lack of notation.)
如果 ER 图中具有未标记连接器的属性是描述符属性,则可以假定其基数为 (0, 1);如果它们是标识符属性,则可以假定其基数为 (1, 1)。图 2.10概括了图 2.2,并带有标记的属性-实体连接器。 (请注意,这些不是默认基数,只是因为缺乏符号。)
FIGURE 2.10. ER diagrams with labeled attribute–entity connectors.
图 2.10。带有标记的属性-实体连接器的 ER 图。
In Figure 2.10 we note that the attribute midinitial is optional (some people don't have middle names). The composite attribute student_names is mandatory for Students, but emp_address is optional for Employees. However, given that emp_address exists, all four simple attributes making up the address are mandatory. Both sid and eid have cardinality (1, 1); this is always the case for entity identifiers. The multivalued hobbies attribute has max-card N, as we can also tell from the fact that it is connected to its entity by a double line. The fact that min-card(hobbies, Employees) = 1 is somewhat surprising and indicates that the employee must name at least one hobby for inclusion in the database.
在图 2.10中我们注意到该属性 midinitial 是可选的(有些人没有中间名)。复合属性 student_names 是强制性的 Students , 但 emp_address 是可选的 Employees 。然而,鉴于 emp_address 存在时,构成地址的所有四个简单属性都是强制性的。两个都 sid 和 eid 有基数 (1, 1);对于实体标识符来说总是如此。多值化 hobbies 属性具有最大卡数 N,我们还可以从它通过双线连接到其实体的事实看出。事实上,min-card( hobbies , Employees ) = 1 有点令人惊讶,它表明员工必须至少说出一项爱好才能包含在数据库中。
Definition:Weak Entity. 定义:弱实体。
A weak entity is an entity whose occurrences are dependent for their existence, through a relationship R, on the occurrence of another (strong) entity.
弱实体是这样的实体,其出现通过关系 R 依赖于另一个(强)实体的出现。
2.3.2. Weak Entities 2.3.2.弱实体
As an example, we have been assuming in our CAP design that an order specifies a customer, agent, product, quantity, and dollar
cost. A common design variant that allows multiple products to be ordered at once will create an orders table that relates to customers and agents rows, as well as a line_items table containing individual product purchases; a number of rows in the line_items table relate to one master orders occurrence. The design of this in the ER model is given in Figure 2.11.
例如,我们在 CAP 设计中假设订单指定了客户、代理商、产品、数量和美元成本。允许同时订购多种产品的常见设计变体将创建一个 orders 涉及到的表 customers 和 agents 行,以及 line_items 包含单个产品购买的表格;中的许多行 line_items 表与一位主控相关 orders 发生。 ER 模型中的设计如图 2.11所示。
FIGURE 2.11. A weak entity, Line_items, dependent on the entity Orders.
图 2.11。一个弱小的实体, Line_items ,取决于实体 Orders 。
As we see, the entity Orders is optional in its relationship to Line_items, since each order must start without any line items. Line_items is mandatory in the relationship, because a line-item order for a product cannot exist without a master order containing
it to specify the customer and agent for the order. If the Orders occurrence goes away (the customer cancels it), all occurrences of the weak entity Line_items will likewise disappear. A dead giveaway for a weak entity is the fact that the primary identifier for Line_items (lineno) is only meaningful within some order. In fact, what this implies is that the primary identifier for the weak entity Line_items must include the attributes in the primary identifier for the Orders entity. Attributes such as Line_items are known as external identifier attributes.
正如我们所看到的,实体 Orders 在其关系中是可选的 Line_items ,因为每个订单都必须在没有任何行项目的情况下开始。 Line_items 在关系中是强制性的,因为如果没有包含产品的行项目订单来指定订单的客户和代理,则产品的行项目订单就不可能存在。如果 Orders 事件消失(客户取消它),弱实体的所有事件 Line_items 同样也会消失。对于弱实体来说,一个致命的弱点是,其主要标识符 Line_items ( lineno )仅在一定顺序内才有意义。事实上,这意味着弱实体的主要标识符 Line_items 必须在主标识符中包含属性 Orders 实体。属性如 Line_items 称为外部标识符属性。
When the Line_items weak entity is mapped to a relational table line_items, an ordno column is included by Transformation Rule 4 to represent the N-1 has_item relationship; thus, the primary key for the line_items table is constructed from the external attribute ordno and the weak entity identifier lineno. Note that it is also sometimes difficult to distinguish between a weak entity and a multivalued attribute. For example,
hobbies in Example 2.2 could be identified as a weak entity Hobbies, with an identifier hobby_name. However, Figure 2.11 obviously implies Line_items is a weak entity rather than a multivalued attribute, since Line_items is separately related to another entity, Products.
当 Line_items 弱实体映射到关系表 line_items , 一个 ordno 列包含在转换规则 4 中以表示 N-1 has_item 关系;因此,主键为 line_items 表是根据外部属性构造的 ordno 和弱实体标识符 lineno 。请注意,有时也很难区分弱实体和多值属性。例如, hobbies 例2.2中的可以被识别为弱实体 Hobbies ,带有标识符 hobby_name 。然而,图2.11显然意味着 Line_items 是一个弱实体而不是一个多值属性,因为 Line_items 与另一个实体单独相关, Products 。
2.3.3. Generalization Hierarchies
2.3.3.泛化层次结构
Finally, we introduce the concept of a generalization hierarchy or generalization relationship. The idea is that several entities with common attributes can be generalized into a higher-level supertype entity, or, alternatively, a general entity can be decomposed into lower-level subtype entities. The purpose is to attach attributes at the proper level and thus avoid having attributes of a common entity that require
a large number of null values in each entity instance. For example, assume that we distinguish between Managers and Non_managers as subtype entities of the supertype Employees (see Figure 2.12). Then attributes such as expenseno (for expense reports) can be attached only to the Managers entity, while nonmanager attributes such as union status can be attached to Non_managers. Consultants might form another entity type sharing many properties with Employees, and we could create a new supertype entity named Persons to contain them both. An ER diagram showing a generalization hierarchy normally has arrows (unnamed) directed from the subtype
to the supertype entities.
最后,我们引入泛化层次或泛化关系的概念。这个想法是,具有共同属性的多个实体可以概括为更高级别的超类型实体,或者,通用实体可以分解为较低级别的子类型实体。目的是在适当的级别附加属性,从而避免在每个实体实例中具有需要大量空值的公共实体的属性。例如,假设我们区分 Managers 和 Non_managers 作为超类型的子类型实体 Employees (见图2.12 )。然后属性如 expenseno (用于费用报告)只能附加到 Managers 实体,而非管理者属性,例如 union status 可以附加到 Non_managers 。顾问可能会形成另一种实体类型,与它们共享许多属性 Employees ,我们可以创建一个名为的新超类型实体 Persons 来遏制他们两个。显示泛化层次结构的 ER 图通常具有从子类型指向超类型实体的箭头(未命名)。
FIGURE 2.12. A generalization hierarchy with examples of attributes attached.
图 2.12。带有附加属性示例的泛化层次结构。
The arrow relationship between the subtype entity and the supertype entity is often referred to as an is-a relationship, since a consultant is a person, a manager is an employee, and so forth. Object-relational database systems express these concepts using type inheritance, where objects (rows) of a given subtype contain specific attributes but inherit all attributes of their supertype. In particular, INFORMIX and SQL-99 support inheritance of object types.
子类型实体和父类型实体之间的箭头关系通常称为is-a 关系,因为顾问是人,经理是雇员,等等。对象关系数据库系统使用类型继承来表达这些概念,其中给定子类型的对象(行)包含特定属性,但继承其超类型的所有属性。特别是,INFORMIX 和 SQL-99 支持对象类型的继承。
The relational model provides no support for the concept of generalization hierarchy, so it is necessary to reconfigure such
a design element into simpler concepts. This can happen either prior to transformation into relational tables or as part of
the transformation. Here we give an idea of how to perform such a reconfiguration while remaining in the ER model, before
transformation into a relational representation. We consider one level of generalization hierarchy at a time and give two
alternatives.
关系模型不提供对泛化层次结构概念的支持,因此需要将这样的设计元素重新配置为更简单的概念。这可以在转换为关系表之前发生,也可以作为转换的一部分发生。在这里,我们给出了在转换为关系表示之前如何在保留 ER 模型的同时执行此类重新配置的想法。我们一次考虑一层泛化层次并给出两种选择。
- We can collapse a one-level generalization hierarchy of subtype and supertype entities into a single entity by adding all
attributes of the subtype entities to the supertype entity. An additional attribute must be added to this single entity, which
will discriminate among the various types. As an example, the Employees entity in Figure 2.12 could be augmented to represent managers and nonmanagers as well, by affixing the attributes union_no, expenseno, and emptype to the Employee entity. Now the union_no attribute will be null when emptype has value “Manager,” and similarly expenseno will be null when emptype is “Nonmanager.” The emptype attribute might also designate the supertype case, an important alternative when some entity instances in the supertype fall
in none of the named subtypes.
通过将子类型实体的所有属性添加到超类型实体,我们可以将子类型和超类型实体的一级泛化层次结构折叠为单个实体。必须向这个单一实体添加一个附加属性,这将区分各种类型。举个例子, Employees 通过附加属性,图 2.12中的实体也可以扩展为代表管理者和非管理者 union_no , expenseno , 和 emptype 到 Employee 实体。现在的 union_no 当以下情况时属性将为空 emptype 具有值“Manager”,类似地 expenseno 将为空,当 emptype 是“非经理”。这 emptype 属性还可以指定超类型情况,当超类型中的某些实体实例不属于任何指定子类型时,这是一个重要的替代方案。 - We can retain the supertype entity and all subtype entities as full entities and create explicit named relationships to represent
the is-a relationships.
我们可以将超类型实体和所有子类型实体保留为完整实体,并创建显式命名关系来表示 is-a 关系。
Alternative 2 is particularly useful when the various subtypes and supertype are quite different in attributes and are handled
differently by application logic.
当各种子类型和超类型的属性差异很大并且应用程序逻辑的处理方式不同时,替代方案 2 特别有用。
We do not investigate all concepts of the ER model in full depth here. See the references at the end of this chapter for a
list of texts devoted to complete coverage of the ER model and logical database design.
我们在这里不会全面深入研究 ER 模型的所有概念。请参阅本章末尾的参考资料,获取专门完整介绍 ER 模型和逻辑数据库设计的文本列表。
2.4. Case Study 2.4.案例研究
Let us try to perform an ER design from the beginning, ending up with a set of relational tables. Consider a simple airline
reservation database handling (only) outgoing flights from one airline terminal. We need to keep track of passengers, flights,
departure gates, and seat assignments. We could get almost arbitrarily complex in a real design, since a “flight” actually
brings together a flight crew and an airplane, serviced by a ground crew, slotted into a regularly scheduled departure time
with an assigned flight number on a specific date. But for simplicity, we will assume that we can represent flights with an
entity Flights, having primary identifier flightno (unique identifier values, not repeated on successive days) and descriptive attribute depart_time (actually made up of date and time); other details will be hidden from us. Passengers are represented by another entity,
Passengers, with primary identifier attribute ticketno; a passenger has no other attribute that we care about. We also need to keep track of seats for each flight. We assume that
each seat is an entity instance in its own right, an entity Seats, identified by a seat number, seatno, valid only for a specific flight (different flights might have different airplane seat layouts, and therefore different
sets of seat numbers). We see therefore that seat assignment is a relationship between Passengers and Seats, which we name seat_assign.
让我们尝试从头开始执行 ER 设计,最终得到一组关系表。考虑一个简单的航空公司预订数据库(仅)处理从一个航空公司航站楼出发的航班。我们需要跟踪乘客、航班、登机口和座位分配。在实际设计中,我们可以变得几乎任意复杂,因为“航班”实际上将机组人员和飞机聚集在一起,由地勤人员提供服务,并在特定日期分配到定期起飞时间和指定的航班号。但为了简单起见,我们假设我们可以用一个实体来表示航班 Flights ,具有主标识符 flightno (唯一标识符值,连续几天不重复)和描述性属性 depart_time (实际上由日期和时间组成);其他细节将对我们隐藏。乘客由另一个实体代表, Passengers ,具有主标识符属性 ticketno ;乘客没有我们关心的其他属性。我们还需要跟踪每个航班的座位。我们假设每个座位本身就是一个实体实例,一个实体 Seats ,由座位号标识, seatno ,仅对特定航班有效(不同航班可能有不同的飞机座位布局,因此座位号也不同)。因此我们看到座位分配是以下关系: Passengers 和 Seats ,我们命名为 seat_assign 。
Now think about this specification for a moment. The Passengers entity is easy to picture, and so is the Flights entity. The depart_time attribute for Flights is composite, consisting of simple attributes dtime and ddate. We can add another entity Gates, with primary identifier gateno. We have already defined a Seats entity, but the entity seems to be a little strange: The seatno primary identifier for Seats is only meaningful when related to a Flights instance. This is what is referred to in the previous section as a weak entity, and thus there must be a relationship between
Flights and Seats, which we name has_seat. The identifier for Seats is partially external, encompassing the identifier of the containing flight.
现在考虑一下这个规范。这 Passengers 实体很容易描绘,因此 Flights 实体。这 depart_time 属性为 Flights 是复合的,由简单的属性组成 dtime 和 ddate 。我们可以添加另一个实体 Gates ,带有主标识符 gateno 。我们已经定义了一个 Seats 实体,但实体似乎有点奇怪: seatno 主要标识符 Seats 仅当与某个相关时才有意义 Flights 实例。这就是上一节所说的弱实体,因此它们之间必定存在关系 Flights 和 Seats ,我们命名为 has_seat 。标识符为 Seats 部分是外部的,包含所在航班的标识符。
What other relationships do we have? If we draw the ER diagram for what we have named up to now, we notice that the Gates entity is off by itself. But clearly passengers go to a gate to meet a flight. We model this as two binary relationships
rather than as a ternary relationship: each passenger is related to a specific flight through the relationship Passengers travels_on Flights, and gates normally act as marshaling points for multiple flights (at different times) through the relationship Gates marshals Flights. Figure 2.13 shows the ER diagram so far. The arrow from seatno to flightno symbolizes the fact that the primary identifier for Seats includes the identifier for the master entity Flights.
我们还有哪些其他关系?如果我们为到目前为止所命名的内容绘制 ER 图,我们会注意到 Gates 实体自行关闭。但显然乘客会前往登机口迎接航班。我们将其建模为两个二元关系而不是三元关系:每个乘客通过该关系与特定航班相关 Passengers travels_on Flights ,并且登机口通常通过这种关系充当多个航班(在不同时间)的编组点 Gates marshals Flights 。图 2.13显示了到目前为止的 ER 图。箭头从 seatno 到 flightno 象征着以下事实:主要标识符 Seats 包括主实体的标识符 Flights 。
FIGURE 2.13. Early ER design for a simple airline reservation database.
图 2.13。早期 ER 设计用于简单的航班预订数据库。
Now we need to work out the cardinalities with which the various entities participate in their relationships. Considering
the marshals relationship first, clearly there is exactly one gate for each flight, so card(Flights, marshals) = (1, 1). A single gate might be used for multiple flights at different times, but there is no rule that a gate must be
used at all, so card(Gates, marshals) = (0, N). Now each passenger must travel on exactly one flight, so card(Passengers, travels_on) = (1, 1). A flight must have multiple passengers to fly (the flight will be canceled and the gate reassigned if there are
too few), but the database needs to hold information starting from no passengers, so we set a minimum of 0, and card(Flights, travels_on) = (0, N). A flight must have numerous seats for passengers, so card(Flights, has_seat) = (1, N), and each seat is on a unique flight, so card(Seats, has_seat) = (1, 1). Each passenger must have a seat, and only one, so card(Passengers, seat_assign) = (1, 1), and seats can be used by at most one passenger and may go empty, so card(Seats, seat_assign) = (0, 1). The ER diagram with these cardinality pairs added is pictured in Figure 2.14.
现在我们需要计算出各个实体参与其关系的基数。考虑到 marshals 首先是关系,显然每个航班都有一个登机口,所以卡( Flights , marshals ) = (1, 1)。一个登机口可能会在不同时间用于多个航班,但没有规定必须使用一个登机口,因此卡( Gates , marshals ) = (0, N)。现在每位乘客必须乘坐同一趟航班,因此卡( Passengers , travels_on ) = (1, 1)。一个航班必须有多名乘客飞行(如果乘客太少,航班将被取消,并重新分配登机口),但数据库需要保存从无乘客开始的信息,所以我们设置最小值为0,并且卡( Flights , travels_on ) = (0, N)。一个航班必须有很多乘客座位,所以卡( Flights , has_seat ) = (1, N),并且每个座位都在一个唯一的航班上,因此 card( Seats , has_seat ) = (1, 1)。每个乘客必须有座位,而且只有一个,所以卡( Passengers , seat_assign ) = (1, 1),并且座位最多可供一名乘客使用,并且可能会空,因此卡( Seats , seat_assign ) = (0, 1)。添加了这些基数对的 ER 图如图2.14所示。
FIGURE 2.14. ER design with cardinalities for a simple airline reservation database.
图 2.14。带有基数的 ER 设计用于简单的航空公司预订数据库。
Now the ER design is complete, and we need to transform the design into relational tables. We can begin by creating tables
to map entities, even though this means that we might overlook some attributes that will be needed to represent foreign keys
for relationships. We will simply have to return later when we consider the relationships and add attributes to these tables.
To begin with, we notice with the Flights entity that we don't have multivalued attributes in relational tables, so following the hint of Transformation Rule 1, we
create columns for ddate and dtime in the flights table. All other tables are easily mapped, except for the seats table, where we take the easy way out and use the single column seatno, even though this is not a complete key for the table. Here are the tables so far:
现在ER设计已经完成,我们需要将设计转化为关系表。我们可以从创建表来映射实体开始,尽管这意味着我们可能会忽略表示关系外键所需的一些属性。当我们考虑关系并向这些表添加属性时,我们只需稍后返回即可。首先,我们注意到 Flights 我们在关系表中没有多值属性的实体,因此根据转换规则 1 的提示,我们为以下对象创建列 ddate 和 dtime 在 flights 桌子。所有其他表都可以轻松映射,除了 seats 表,我们采取简单的方法并使用单列 seatno ,即使这不是表的完整键。以下是迄今为止的表格:
passengers | gates | flights | seats | ||
---|---|---|---|---|---|
ticketno 票号 | gateno 加特诺 | flightno 航班号 | ddate | dtime | seatno 座位号 |
… |
Now consider the relationship has_seat, which is N-1 in Figure 2.14, with Seats on the “many” side. By Transformation Rule 4, a foreign key in the seats table will connect each seats row to the appropriate flights row. This completes the primary key for the seats table, which represents a weak entity and therefore needs a foreign key to identify each row.
现在考虑一下关系 has_seat ,即图 2.14中的 N-1 ,其中 Seats 在“多”方面。根据转换规则 4,外键 seats 表将连接每个 seats 行到适当的 flights 排。这样就完成了主键 seats 表,它代表弱实体,因此需要一个外键来标识每一行。
passengers | gates | flights | seats | |||
---|---|---|---|---|---|---|
ticketno 票号 | gateno 加特诺 | flightno 航班号 | ddate | dtime | seatno 座位号 | flightno |
… |
The seat_assign relationship is 1-1, with optional participation by Seats, so by Transformation Rule 5 we can represent this by adjoining to the passengers table a foreign key for seats (this requires two additional columns). We don't expect that we will ever need to look up the
passenger for a given seat, so we place no additional foreign key on the seats table. The resulting table definitions are as follows:
这 seat_assign 关系是 1-1,可选择参与 Seats ,因此根据变换规则 5,我们可以通过邻接来表示这一点 passengers 表一个座位的外键(这需要两个额外的列)。我们不希望我们需要查找给定座位的乘客,因此我们没有在座位上放置额外的外键 seats 桌子。结果表定义如下:
passengers | gates | ||
---|---|---|---|
ticketno 票号 | seatno | flightno | gateno 加特诺 |
… |
flights | seats | |||
---|---|---|---|---|
flightno 航班号 | ddate | dtime | seatno 座位号 | flightno 航班号 |
Now consider the marshals relationship. This is N-1, with Flights on the “many” side, so by Transformation Rule 4 a foreign key in the flights table, gateno, will connect each flights row to the appropriate gates row:
现在考虑 marshals 关系。这是 N-1,其中 Flights 在“多”方面,因此根据转换规则 4,外键 flights 桌子, gateno ,将连接每个 flights 行到适当的 gates 排:
passengers | gates | ||
---|---|---|---|
ticketno 票号 | seatno | flightno | gateno 加特诺 |
… |
flights | seats | ||||
---|---|---|---|---|---|
flightno 航班号 | gateno | ddate | dtime | seatno 座位号 | flightno |
Similarly the travels_on relationship is N-1, with Passengers on the “many” side, so by Transformation Rule 4 a foreign key, flightno, in the passengers table will connect each passengers row to the appropriate flights row. This column already exists in the passengers table, however, so the relational table design is complete.
同样地 travels_on 关系为 N-1,其中 Passengers 在“多”方面,因此根据转换规则 4 外键, flightno ,在 passengers 表将连接每个 passengers 行到适当的 flights 排。该列已存在于 passengers table,但是,这样关系表的设计就完成了。
2.5. Normalization: Preliminaries
2.5.标准化:预备
Normalization is another approach to logical design of a relational database, which seems to share little with the ER model.
However, it will turn out that a relational design based on normalization and a careful ER design transformed into relational
form have nearly identical results, and in fact the two approaches reinforce each other. In the normalization approach, the
designer starts with a real-world situation to be modeled and lists the data items that are candidates to become column names
in relational tables, together with a list of rules about the relatedness of these data items. The aim is to represent all
these data items as attributes of tables that obey restrictive conditions associated with what we call normal forms. These normal form definitions limit the acceptable form of a table so that it has certain desirable properties, thus avoiding
various kinds of anomalous behavior. There is a series of normal form definitions, each more restrictive than the one before;
the forms covered in this chapter are first normal form (1NF), second normal form (2NF), third normal form (3NF), and Boyce-Codd
normal form (BCNF). Other types of normalization, 4NF and 5NF, are less commonly considered and are not covered in detail
in this chapter.
规范化是关系数据库逻辑设计的另一种方法,它似乎与 ER 模型没有什么共同点。然而,事实证明,基于归一化的关系设计和转化为关系形式的仔细 ER 设计具有几乎相同的结果,而且事实上这两种方法是相辅相成的。在规范化方法中,设计者从要建模的真实情况开始,列出将成为关系表中的列名称的候选数据项,以及关于这些数据项的相关性的规则列表。目的是将所有这些数据项表示为表的属性,这些属性遵守与我们所谓的范式相关的限制条件。这些范式定义限制了表的可接受形式,使其具有某些所需的属性,从而避免各种异常行为。有一系列范式定义,每一个都比前一个更具限制性;本章涵盖的形式是第一范式(1NF)、第二范式(2NF)、第三范式(3NF)和Boyce-Codd范式(BCNF)。其他类型的归一化(4NF 和 5NF)不太常见,本章也没有详细介绍。
To begin with, a table in 1NF is simply one that has no multivalued (repeating) fields. SQL language accepts this rule as
basic. In what follows we assume that tables are in 1NF unless otherwise specified. 2NF turns out to be of mainly historical interest, since no sensible designer would leave a database in 2NF but would always
continue normalization until the more restrictive 3NF was reached. From an initial database containing data items that are
all in the same table (sometimes referred to as a universal table) and relatedness rules on these data items, there is a procedure to create an equivalent database with multiple tables, all
of which are in 3NF. (This is what we mean by having a database in 3NF—that all of its tables have 3NF form.) As we proceed
through this chapter we will find that any table that does not obey 3NF can be factored into distinct tables in such a way
that (1) each of the factored tables is in a valid 3NF, and (2) the join of all these factored tables contains exactly the
information in the table from which they were factored. The set of 3NF tables resulting from the initial universal table is known as a 3NF lossless decomposition of the database.
首先,1NF 中的表只是一个没有多值(重复)字段的表。 SQL 语言接受此规则作为基本规则。在下文中,除非另有说明,否则我们假设表为 1NF 。 2NF 事实证明主要具有历史意义,因为明智的设计者不会将数据库保留在 2NF 中,但始终会继续规范化,直到达到更具限制性的 3NF。从包含全部在同一个表(有时称为通用表)中的数据项以及这些数据项的相关性规则的初始数据库中,有一个过程可以创建具有多个表的等效数据库,所有这些表都在 3NF 中。 (这就是我们所说的 3NF 数据库的意思——它的所有表都具有 3NF 形式。)当我们继续阅读本章时,我们会发现任何不遵守 3NF 的表都可以通过这样的方式分解为不同的表(1) 每个分解表都位于有效的 3NF 中,并且 (2) 所有这些分解表的联接准确地包含分解它们的表中的信息。由初始通用表产生的 3NF 表集称为数据库的 3NF无损分解。
There is a third desirable property that we can always provide with a 3NF decomposition. Note that when a new row is added
to one of the tables in the 3NF decomposition (or an old row is updated), it is possible that an erroneous change might break
one of the rules of data item relatedness, mentioned earlier as part of the design input. We wish to impose a constraint on
Insert and Update operations so that such errors will not corrupt the data. The third property that we consider important
in a decomposition, then, is (3) when a table Insert or Update occurs, the possible relatedness rules that might be broken
can be tested by validating data items in the single table affected; there is no need to perform table joins in order to validate
these rules. A 3NF decomposition constructed to have the three desirable properties just mentioned is generally considered
an acceptable database design. It turns out that a further decomposition of tables in 3NF to the more restrictive BCNF is
often unnecessary (many real-world databases in 3NF are also in BCNF), but in cases where further decomposition results, property
(3) no longer holds in the result. Many database designers therefore settle on 3NF design.
我们总是可以通过 3NF 分解来提供第三个理想的属性。请注意,当在 3NF 分解中向其中一个表添加新行(或更新旧行)时,错误的更改可能会破坏数据项相关性的规则之一,正如前面作为 3NF 分解的一部分提到的那样。设计输入。我们希望对插入和更新操作施加限制,以便此类错误不会损坏数据。那么,我们认为在分解中重要的第三个属性是(3)当表插入或更新发生时,可以通过验证受影响的单个表中的数据项来测试可能被破坏的相关性规则;无需执行表连接即可验证这些规则。构建具有刚才提到的三个所需属性的 3NF 分解通常被认为是可接受的数据库设计。事实证明,将 3NF 中的表进一步分解为更具限制性的 BCNF 通常是不必要的(3NF 中的许多现实世界数据库也在 BCNF 中),但在进一步分解结果的情况下,属性 (3) 不再适用结果。因此,许多数据库设计者选择了 3NF 设计。
We will need a good deal of insight into the details of the normalization approach before we are able to properly deal with
some of these ideas. Let us begin to illustrate them with an example.
在我们能够正确处理其中一些想法之前,我们需要深入了解规范化方法的细节。让我们开始用一个例子来说明它们。
2.5.1. A Running Example: Employee Information
2.5.1.运行示例:员工信息
We need an example to clarify some of the definitions of database design that follow. Consider the data items listed in Figure 2.15, representing the employee information that must be modeled by the personnel department of a very large company.
我们需要一个例子来阐明下面的数据库设计的一些定义。考虑图 2.15中列出的数据项,代表一家非常大公司的人事部门必须建模的员工信息。
FIGURE 2.15. Unnormalized data items for employee information.
图 2.15。员工信息的非标准化数据项。
The data items beginning with emp_all represent attributes of what we would refer to in the ER approach as the entity Employees. Other entities underlying the data items of Figure 2.15 include Departments where employees in the company work and Skills that the various employees need to perform their jobs. In the normalization approach, we leave the entity concept unnamed
but reflect it in the data item interrelatedness rules that will be explained shortly, rules known as functional dependencies. The data item emp_id has been created to uniquely identify employees. Each employee works for some single department in the company, and the data
items beginning with dept_ describe the different departments; the data item dept_name uniquely identifies departments, and each department normally has a unique manager (also an employee) with a name given in
dept_mgrname. Finally, we assume that the various employees each possess some number of skills, such as typing or filing, and that data
items beginning with skill_ describe the skills that are tested and used for job assignment and salary determination by the company. The data item skill_id uniquely identifies the skill, which also has a name, skill_name. For each employee who possesses a particular skill, the skill_date describes the date when the skill was last tested, and skill_lvl describes the level of skill the employee displayed at that test.
数据项开头为 emp_all 表示我们在 ER 方法中称为实体的属性 Employees 。图 2.15数据项的其他实体包括 Departments 公司员工在哪里工作以及 Skills 各种员工需要完成他们的工作。在规范化方法中,我们保留实体概念未命名,但将其反映在稍后将解释的数据项相互关联性规则中,这些规则称为函数依赖关系。数据项 emp_id 是为了唯一地识别员工而创建的。每个员工都在公司的某个部门工作,数据项以 dept_ 描述不同的部门;数据项 dept_name 唯一标识部门,每个部门通常有一个唯一的经理(也是员工),其名称为 dept_mgrname 。最后,我们假设各个员工都拥有一定数量的技能,例如打字或归档,并且以 skill_ 描述公司测试和用于工作分配和薪资确定的技能。数据项 skill_id 唯一标识该技能,该技能也有一个名称, skill_name 。对于每个拥有特定技能的员工来说, skill_date 描述上次测试技能的日期,以及 skill_lvl 描述员工在该测试中表现出的技能水平。
Figure 2.16 provides a universal table, emp_info, containing all the data items of employee information from Figure 2.15. Because of 1NF, there can only be atomic values in each row and column position of a table. This poses a difficulty, because
each individual employee might have any number of skills. It is inappropriate to design a table with unique rows for each
emp_id and a distinct column for each piece of skill information—we don't even know the maximum number of skills for an employee,
so we don't know how many columns we should use for skill_id-1, … , skill_id-n. The only solution that will work in a single (universal) table is to give up on having a unique row for each employee and replicate information about the employee, pairing
the employee with different skills on different rows.
图2.16提供了一个通用表, emp_info ,包含图2.15中员工信息的所有数据项。由于1NF,表的每一行和列位置只能有原子值。这带来了困难,因为每个员工可能拥有多种技能。为每个表设计一个具有唯一行的表是不合适的 emp_id 每条技能信息都有一个不同的列 - 我们甚至不知道员工的最大技能数,因此我们不知道应该使用多少列 skill_id-1 , … , skill_id-n 。在单个(通用)表中有效的唯一解决方案是放弃为每个员工提供唯一的行,并复制有关该员工的信息,将具有不同技能的员工配对到不同的行上。
FIGURE 2.16. Single employee information table, emp_info, in 1NF.
图 2.16。单个员工信息表, emp_info ,在 1NF 中。
The intention of the database designer in the emp_info table of Figure 2.16 is that there is a row for every employee–skill pair existing in the company. From this, it should be clear that there cannot
be two rows with the same values for the pair of attributes emp_id and skill_id. The table emp_info has a (candidate) key consisting of the set (pair) of attributes emp_id and skill_id. We confirm that these attributes form a key by noting that the values they take on distinguish any pair of rows in any permissible
content of the table (i.e., for any rows u and v, either u(emp_id) ≠ v(emp_id) or u(skill_id) ≠ v(skill_id)), and that no subset of this set of attributes does the same (there can be two rows u and v such that u(emp_id) = v(emp_id), and there can be two rows r and s such that r(skill_id) = s(skill_id)). We assume in what follows that emp_id and skill_id is the primary key for the emp_info table.
数据库设计者的意图 emp_info 图 2.16的表中,公司中存在的每个员工技能对都有一行。由此可见,不能有两行的属性对具有相同的值 emp_id 和 skill_id 。桌子 emp_info 有一个由属性集(对)组成的(候选)键 emp_id 和 skill_id 。我们通过注意到这些属性所采用的值区分表中任何允许内容中的任何一对行(即,对于任何行 u 和 v,u( emp_id ) ≠ v( emp_id ) 或 u( skill_id ) ≠ v( skill_id )),并且这组属性的子集没有做同样的事情(可以有两行 u 和 v 使得 u( emp_id ) = v( emp_id ),并且可以有两行 r 和 s 使得 r( skill_id ) = s( skill_id ))。我们假设在下文中 emp_id 和 skill_id 是主键 emp_info 桌子。
It turns out that the database design of Figure 2.16 is a bad one, because it is subject to certain anomalies that can corrupt the data when data manipulation statements are
used to update the table.
事实证明,图 2.16的数据库设计是一个糟糕的设计,因为它会受到某些异常的影响,当使用数据操作语句更新表时,这些异常可能会损坏数据。
2.5.2. Anomalies of a Bad Database Design
2.5.2.不良数据库设计的异常情况
It appears that there might be a problem with the emp_info table of Figure 2.16 because there is replication of employee data on different rows. It seems more natural, with the experience we have had up
to now, to have a unique row for each distinct employee. Do we have a good reason for our feeling? Let us look at the behavior
of this table as SQL updates are applied.
看起来可能有问题 emp_info 图 2.16中的表是这样的,因为不同行上存在员工数据的复制。根据我们迄今为止的经验,为每个不同的员工设置一个独特的行似乎更自然。我们的感觉有充分的理由吗?让我们看看应用 SQL 更新时该表的行为。
If some employee were to get a new phone number, we would have to update multiple rows (all rows with different skills for
that employee) in order to change the emp_phone value in a uniform way. If we were to update the phone number of only one row, we might corrupt the data, leaving some rows for that employee with different phone numbers than others. This is commonly known as an update anomaly, and it arises because of data redundancy, duplication of employee phone numbers and other employee attributes on multiple rows of emp_info. Calling this an “anomaly,” with the implication of irregular behavior under update, may seem a bit extreme, since the SQL
language is perfectly capable of updating several rows at once with a Searched Update statement such as:
如果某个员工要获得新的电话号码,我们必须更新多行(该员工具有不同技能的所有行)才能更改 emp_phone 以统一的方式取值。如果我们只更新一行的电话号码,我们可能会损坏数据,为该员工留下一些与其他行不同的电话号码。这通常称为更新异常,它是由于多行数据冗余、员工电话号码和其他员工属性重复而引起的。 emp_info 。称其为“异常”,意味着更新时的不规则行为,可能看起来有点极端,因为 SQL 语言完全能够使用 Searched Update 语句一次更新多行,例如:
- update emp_info set emp_phone = :newphone where emp_id = :eidval;
In fact, the consideration that several rows will be updated is not even apparent from this syntax—the same Searched Update
statement would be used if the table had a unique row for each emp_id value. However, with this replication of phone numbers on different rows, a problem can still arise in performing an update
with a Positioned Update statement. If we encountered a row of the emp_info table in fetching rows from a cursor created for an entirely different purpose, the program might execute the following statement to allow the user to correct an invalid phone number:
事实上,从该语法来看,要更新多行的考虑甚至不明显 - 如果表中的每行都有唯一的行,则将使用相同的 Searched Update 语句 emp_id 价值。但是,通过在不同行上复制电话号码,在使用定位更新语句执行更新时仍然可能会出现问题。如果我们遇到一行 emp_info 当从为完全不同目的而创建的游标中获取行时,程序可能会执行以下语句以允许用户更正无效的电话号码:
- update emp_info set emp_phone = :newphone
- where current of cursor_name;
This would be a programming error, since an experienced programmer would realize that multiple rows need to be updated in order to change an employee phone
number. Still, it is the kind of error that could easily occur in practice, and we would like to be able to create a constraint on the table that makes such an erroneous update impossible. It turns out that the best way to provide such a constraint
is to reconfigure the data items into different tables so as to eliminate the redundant copies of information. This is exactly
what is achieved during the process of normalization. We sum up the idea of an update anomaly in a definition that makes reference
to our intuitive understanding of the ER model.
这将是一个编程错误,因为经验丰富的程序员会意识到需要更新多行才能更改员工电话号码。尽管如此,这种错误在实践中很容易发生,我们希望能够在表上创建一个约束,使这种错误的更新不可能发生。事实证明,提供这种约束的最佳方法是将数据项重新配置到不同的表中,从而消除信息的冗余副本。这正是正常化过程中所实现的目标。我们在定义中总结了更新异常的概念,该定义参考了我们对 ER 模型的直观理解。
Definition:Update Anomaly.
定义:更新异常。A table T is subject to an update anomaly when changing a single attribute value for an entity instance or relationship instance represented in the table that may require that several rows of T be updated.
当更改表中表示的实体实例或关系实例的单个属性值时,表 T 会遇到更新异常,这可能需要更新 T 的几行。
A different sort of problem, known as the delete anomaly, is reflected by the following definition.
以下定义反映了一种不同类型的问题,称为删除异常。
Definition:Delete Anomaly, Insert Anomaly.
定义:删除异常,插入异常。A table T is subject to a delete anomaly when deleting some row of the table to reflect the disappearance of some instance of an entity or relationship that can cause us to lose information about some instance of a different entity or relationship that we do not wish to forget. The insert anomaly is the other face of this problem for inserts, where we cannot represent information about some entity or instance without including information about some other instance of an entity or relationship that does not exist.
当删除表的某些行以反映实体或关系的某些实例的消失时,表 T 会出现删除异常,这可能导致我们丢失我们不希望丢失的有关不同实体或关系的某些实例的信息忘记。插入异常是插入问题的另一面,在这种情况下,如果不包含有关不存在的实体或关系的其他实例的信息,我们就无法表示有关某些实体或实例的信息。
For example, assume that a skill possessed by an employee must be retested after five years to remain current for that employee.
If the employee fails to have the skill retested (and the skill_date column updated), the skill will drop off the emp_info list (an automatic process deletes the row with this emp_id and skill_id). Now consider what happens if the number of skills for some employee goes to zero in the emp_info table with columns of Figure 2.16: No row of any kind will remain for the employee! We have lost the phone number and the department the employee works in because of this delete! This is clearly inappropriate
design. The insert anomaly exists in the emp_info table because we cannot enter a new employee into the table until the employee has acquired some skill; thus it becomes impossible
to hire an employee trainee. Clearly this is just the other face of the delete anomaly, where information about an employee
is lost when the employee loses his or her last skill.
例如,假设员工拥有的一项技能必须在五年后重新测试才能保持该员工的最新技能。如果员工未能重新测试技能(并且 skill_date 栏已更新),该技能将下降 emp_info 列表(自动过程删除带有此的行 emp_id 和 skill_id )。现在考虑一下,如果某些员工的技能数量在 emp_info 包含图 2.16的列的表:不会为员工保留任何类型的行!由于这次删除,我们丢失了该员工的电话号码和所在部门!这显然是不恰当的设计。插入异常存在于 emp_info 表,因为在新员工获得某种技能之前我们无法将其输入表中;因此不可能雇用实习生。显然,这只是删除异常的另一面,当员工失去最后一项技能时,有关员工的信息就会丢失。
Let us jump ahead to a solution for some of the problems mentioned so far. We simply factor the emp_info table and form two tables, the emps table and the skills table, whose column names are listed in Figure 2.17. Notice that the emps table has a unique row for each emp_id (and emp_id is the key for this table), while the skills table has a unique row for each emp_id and skill_id pair, and this pair forms a key for the table. Since there are multiple skills associated with each employee, the emp_id column that we have included in the skills table acts as a foreign key, relating skills back to employees. When we form the natural join of these two tables, the result
is exactly the emp_info table we started with. (We will need to demonstrate this fact in what follows, but for now you should take it on faith.)
However, the delete anomaly is no longer a problem, since if we delete all rows corresponding to skills for any individual
employee, this merely deletes rows in the skills table; the emps table still contains the information we want to retain about the employee, such as emp_phone, dept_name, and the like.
让我们直接找到目前提到的一些问题的解决方案。我们简单地考虑一下 emp_info 表并形成两个表, emps 表和 skills 表,其列名如图2.17所示。请注意, emps 表中的每一个都有一个唯一的行 emp_id (和 emp_id 是该表的关键),而 skills 表中的每一个都有一个唯一的行 emp_id 和 skill_id 对,这对形成表的键。由于每个员工都有多种技能, emp_id 我们已包含在 skills 表充当外键,将技能与员工联系起来。当我们形成这两个表的自然连接时,结果正是 emp_info 我们开始的表。 (我们需要在下面证明这一事实,但现在您应该相信它。)但是,删除异常不再是问题,因为如果我们删除与任何单个员工的技能相对应的所有行,这仅仅删除中的行 skills 桌子;这 emps 表仍然包含我们想要保留的有关员工的信息,例如 emp_phone , dept_name ,等等。
FIGURE 2.17. The emp_info database with two tables.
图 2.17。这 emp_info 有两个表的数据库。
In the sections that follow we will learn how to perform normalization, to factor tables so that all anomalies are removed
from our representation. Note that we haven't yet achieved this with the tables of Figure 2.17; as we will see shortly, a number of anomalies still exist. We will need a good deal of insight into the details of the normalization
approach before we are able to properly deal with fundamental normalization concepts. In the following sections we present
some needed mathematical preliminaries to database normalization. Because it is not always possible to show a real-life application
for all these concepts as they are introduced, we ask the reader to be patient. The value of the concepts will become clear
in the end.
在接下来的部分中,我们将学习如何执行标准化、因子表,以便从我们的表示中删除所有异常。请注意,我们尚未通过图 2.17的表格实现这一目标;正如我们很快就会看到的,许多异常现象仍然存在。在我们能够正确处理基本的标准化概念之前,我们需要深入了解标准化方法的细节。在以下各节中,我们将介绍数据库规范化所需的一些数学基础知识。因为在介绍这些概念时并不总是能够展示它们在现实生活中的应用,所以我们要求读者耐心等待。这些概念的价值最终将变得清晰。
2.6. Functional Dependencies
2.6。功能依赖
A functional dependency (FD) defines the most commonly encountered type of relatedness property between data items of a database. We usually only
need to consider relatedness between column attributes of a single relational table, and our definition reflects this. We
represent rows of a table T by the notation r1, r2, … , and follow standard convention by referring to attributes, rather than columns, of the table T. Individual attributes
of a table will be represented by letters such as A, B, … , and the letters X, Y, … will refer to subsets of attributes. We
follow the notation that ri(A) represents the value of row ri at attribute A.
函数依赖(FD) 定义数据库数据项之间最常遇到的相关性属性类型。我们通常只需要考虑单个关系表的列属性之间的相关性,我们的定义反映了这一点。我们用符号 r 1 , r 2 , … 来表示表 T 的行,并通过引用表 T 的属性而不是列来遵循标准约定。表的各个属性将由字母表示,例如 A、 B, … 和字母 X, Y, … 表示属性的子集。我们遵循 r i (A) 表示属性 A 处 r i行的值的表示法。
Definition. 定义。
Given a table T containing at least two attributes designated by A and B, we say that A → B (read “A functionally determines B” or “B is functionally dependent on A”), if and only if it is the intent of the designer, for any set of rows that might exist in the table, that two rows in T cannot agree in value for A and disagree in value for B. A more formal way of saying this is: Given two rows r1 and r2 in T, if r1(A) = r2(A), then r1(B) = r2(B). We will usually try to use the less formal statement in what follows.
给定一个包含至少两个由 A 和 B 指定的属性的表 T,我们说 A → B(读作“A 在功能上决定 B”或“B 在功能上依赖于 A”),当且仅当这是设计者认为,对于表中可能存在的任何行集,T 中的两行 A 的值不能一致,B 的值不能一致。更正式的说法是:给定两行 r 1和 r 2 T,如果r 1 (A) = r 2 (A),则r 1 (B) = r 2 (B)。我们通常会在下文中尝试使用不太正式的陈述。
This definition is comparable to the definition of a function in mathematics: For every element in attribute A (which appears on some row), there is a unique corresponding element (on
the same row) in attribute B. See Figure 2.18 for a graphical representation of the functional dependency concept.
这个定义类似于数学中函数的定义:对于属性 A 中的每个元素(出现在某行上),属性 B 中都有一个唯一对应的元素(在同一行上)。参见图 2.18 的图形表示函数依赖的概念。
FIGURE 2.18. Graphical depiction of functional dependency.
图 2.18。函数依赖性的图形描述。
EXAMPLE 2.8 例2.8
In the emp_info table of Figure 2.16, the following functional dependencies hold:
在 emp_info 如图 2.16所示,存在以下函数依赖关系:
- emp_id → emp_name
- emp_id → emp_phone
- emp_id → dept_name
In ER terms, we know this is true because emp_id is an identifier for the Employee entity, and the other data items simply represent other attributes of the entity; once the entity is identified, all the other attributes follow. But we also recognize these facts intuitively.
用急诊室术语来说,我们知道这是真的,因为 emp_id 是一个标识符 Employee 实体,其他数据项仅代表该实体的其他属性;一旦实体被识别,所有其他属性都会随之而来。但我们也凭直觉认识到这些事实。If we saw two rows in the single table emp_info design of Figure 2.16 with the same emp_id value and different emp_phone values, we would believe that the data are corrupted (assuming that every employee has a unique phone), but if we saw two rows with the same emp_phone value and different emp_id values, our first thought would be that they represented different employees who shared a phone. But the two situations are symmetric; it is simply our understanding of the data that makes the first one seem to imply corrupted data. We look to emp_id to break ties and uniquely identify employees. Note that what we are saying implies that, while emp_id functionally determines emp_phone, emp_phone does not functionally determine emp_id. We sometimes express this second fact with this notation:
如果我们在单个表中看到两行 emp_info 与图2.16的设计相同 emp_id 价值和不同 emp_phone 值,我们会认为数据已损坏(假设每个员工都有唯一的电话),但如果我们看到两行具有相同的值 emp_phone 价值和不同 emp_id 值,我们的第一个想法是它们代表共享一部手机的不同员工。但这两种情况是对称的;只是我们对数据的理解使得第一个数据似乎暗示着损坏的数据。我们期待 emp_id 打破联系并唯一识别员工。请注意,我们所说的意味着,虽然 emp_id 功能上决定 emp_phone , emp_phone 不能从功能上决定 emp_id 。我们有时用这样的符号来表达第二个事实:
EXAMPLE 2.9 例2.9
Following are three tables to investigate for functional dependencies between attributes (note that some of the tables break the unique row rule, but we accept them as valid tables for purposes of illustration). In these tables we assume that it is the intent of the designer that exactly this set of rows should lie in each table—no changes will ever occur in the tables. Thus, we can determine what functional dependencies exist by examining the data. This is a very unusual situation. Normally we determine functional dependencies from understanding the data items and rules of the enterprise (e.g., each employee has a single phone number, employees can share a phone, etc.), as in Example 2.8. These rules exist before any data have been placed in the tables.
以下是用于调查属性之间功能依赖关系的三个表(请注意,某些表违反了唯一行规则,但出于说明目的,我们接受它们作为有效表)。在这些表中,我们假设设计者的意图是这组行应该准确地位于每个表中,表中永远不会发生任何更改。因此,我们可以通过检查数据来确定存在哪些函数依赖关系。这是一个非常不寻常的情况。通常我们通过了解企业的数据项和规则来确定功能依赖性(例如,每个员工都有一个电话号码,员工可以共享一部电话等),如示例 2.8所示。这些规则在任何数据放入表之前就存在。
T1 T2 T3 Row # 排 # A B A B A B 1 x1 y1 x1 y1 x1 y1 2 x2 y2 x2 y4 x2 y4 3 x3 y1 x1 y1 x1 y1 4 x4 y1 x3 y2 X3 y2 5 x5 y2 x2 y4 X2 y4 6 x6 y2 x4 y3 X4 y4 In table T1 we can easily see that A → B; we merely need to check that for every pair of rows r1 and r2, if r1(A) = r2(A), then r1(B) = r2(B). However, there is no pair of rows in T1 with equal values for column A, so the condition is trivially satisfied. At the same time, in T1, B ↛ A (read “column B does not functionally determine column A”), since, for example, if r1 is row 1 and r2 is row 3, then r1(B) = r2(B) = y1, but r1(A) = x1 ≠ r2(A) = x3. In table T2, we have A → B (we just need to check that rows 1 and 3, which have matching pairs of A values, also have matching B values, and similarly check rows 2 and 5), and B → A. Finally, in table T3, A → B but B ↛ A (note that if r1 is row 2 and r2 is row 6, then r1(B) = r2(B) = y4, but r1(A) = x2 ≠ r2(A) = x4).
在表T1中我们很容易看出A→B;我们只需要检查每对行 r 1和 r 2 ,如果 r 1 (A) = r 2 (A),则 r 1 (B) = r 2 (B)。然而,T1 中不存在 A 列具有相同值的行对,因此条件基本满足。同时,在 T1 中,B ↛ A(读作“B 列在功能上不能确定 A 列”),因为,例如,如果 r 1是第 1 行,r 2是第 3 行,则 r 1 (B) = r 2 (B) = y1,但 r 1 (A) = x1 ≠ r 2 (A) = x3。在表 T2 中,我们有 A → B(我们只需要检查第 1 行和第 3 行是否有匹配的 A 值对,也有匹配的 B 值,并类似地检查第 2 行和第 5 行),以及 B → A。 ,在表 T3 中,A → B 但 B ↛ A(请注意,如果 r 1为第 2 行且 r 2为第 6 行,则 r 1 (B) = r 2 (B) = y4,但 r 1 (A) = x2 ≠ r 2 (A) = x4)。
It is obvious how to extend the definition for functional dependency to its full generality, dealing with sets of attributes.
显然如何将函数依赖的定义扩展至其完整的通用性,处理属性集。
Definition. 定义。
We are given a table T with two sets of attributes, designated by X = A1 A2, …, Ak and Y = B1 B