Chapter 6: Database Design Using the ER Model

In this chapter, we focus on conceptual design.

Overview of Design Process

Design process consists of two phases:

Analyzing of requirements
- What data should be stored
- What operations/transaction are conducted such as insert, delete, and update
Designing of DB schemas, with the three-level data abstract
- conceptual design
- logical design
- physical design

flowchart TD
    A[Application problems in real words] -->|Requirements analysis| B[Specification of user requirements]
    B -->|<span style="color:red;">Conceptual DB design</span>| C[DB conceptual schema, i.e. ER diagram]
    C -->|Logical DB design| D[DB logical schema, i.e. relational data schema]
    D -->|Physical DB design| E[DB physical schema, e.g. physical storage structure]

Figure 1: DB design phases

The lifecycle of Database Application System (DBAS) design consists of five phases: project planning, requirements analysis, system design, implementation and deployment, and operation management and maintenance.

Based on the composition and functionality of the software, the design is divided into three main lines: data organization and storage design, data access and processing design, and application design, which corresponds to the design of the database, database transactions, and applications, respectively.

According to the three-level schema structure of the database system, the design phase is divided into three steps: conceptual design, logical design, and physical design.

The Entity-Relationship Model

ER Model employs three concepts: entity sets, relationship sets, and attributes. ER model has an associated diagrammatic representation, the ER diagram, expressing the overall logical structure of DB graphically.

Entity Sets

Def. An entity is an object that exists distinguishable from other objects. e.g. a specific person.

Def. An entity set is a set of entities of the same type, sharing the same properties. e.g. group of persons

Note: an entity is represented by a set of attributes (descriptive properties of an entity set).

Def. A subset of the attributes form a primary key

Note: an entity consists of all values of its all attributes.

Relationship Sets

Def. A relationship is an association among several entities.

Def. A relationship set is a mathematic relation among \(n\) entities (\(n \geq 2\)) taken from entity sets \(\{(e_1, e_2, \cdots, e_n) \mid e_1 \in E_1, \cdots, e_n \in E_n\}\)

Def. A relationship instance represents an association between the named entities in the real-word enterprise.

Def. An attribute can also be associated with a relationship set.

For example, the advisor relationship set between instructor and student has an attribute date when student started being associated with the advisor

Def. A role of entity is a function that an entity plays in a relationship.

Note: entity sets of a relationship need not be distinct.

Def. Binary relationship involves two entity sets (degree two)

Complex Attributes

Def. The domain or values set of the attribute is the set of permitted values.

Attribute types:

simple and composite attributes
single-valued and multi-valued attributes
derived attributes (can be computed from other attributes, e.g. age, given date_of_birth)

Null value for a attribute means

the attribute not applicable for the entity, not existing
the value for the attribute exists, but is unknown

Composite attributes allow us to divide attributes into subparts. For example

flowchart TD
    A[name] --> B[first_name]
    A --> C[middle_name]
    A --> D[last_name]

Figure 2: Composite attributes example

Redundant Attributes

Suppose we have entity sets:

instructor, with attributes: ID, name, dept_name, salary
department, with attributes: dept_name, building, budget

Model: each instructor is in a department using relationship set inst_dept.

The attribute dept_name appears in both entity sets. Since it is the primary key for the entity set department, it replicates information present in the relationship. It is therefore redundant in the entity set instructor.

Mapping Cardinalities Constraints

Semantic constraints, as integrity constraints, to maintain consistency
Integrity constraints on E or R
- Mapping cardinalities
- Participation constraints
- Cardinality limits
- Keys

Mapping Cardinalities

Def. Mapping Cardinalities are the number of entities to which another entity can be associated via a relationship set.

Def. For a binary relationship set \(R\), from \(A\) to \(B\), the mapping cardinality must be: one to one, one to many, many to one, and many to many.

One to One

A student is associated with at most one instructor via the relationship advisor.

Figure 3: One to one example

One to Many

An instructor is associated with several (including \(0\)) students via advisor. A student is associated with at most one instructor via advisor.

Figure 4: One to many example

Participate Constraints

Def. The participation of an entity \(E\) in a relationship \(R\) is total, if every entity in \(E\) participates in at least one relationship in \(R\).

Def. The participation of an entity \(E\) in a relationship \(R\) is partial, if some entities in \(E\) may not participate in any relationship in \(R\).

Cardinality Limits for Participation

Def. Cardinality limits express quantitative constraints on participation.

Explicit, recommended!

Figure 5: Cardinality limits example

A maximum value * indicates no limit

Primary Keys

Def. Key is a set of attributes (from an entity set or a relationship set). The values of these attributes in one entity to uniquely distinguish this entity from others and the values of these attributes in one relationship to uniquely identify the relationship.

Keys for Entity Set

Def. A super key of an entity set is a set of one or more attributes, whose values uniquely determine each entity in the entity set. The super key may contain extraneous attributes.

Def. A candidate key is the minimal super key (non-redundant super key).

Def. The primary key is a candidate key chosen as the principal means of identifying entities within an entity set. Although several candidate keys may exists, one of the candidate keys is selected to be the primary key.

Keys for Relationship Sets

Def. Keys for relationship sets \(R\) on entities \(E_1\), \(E_2\), \(\cdots\), \(E_n\), uniquely distinguish each relationship instances.

Def. The super keys for \(R =\) \(E_1\)’s primary key \(\cup \cdots \cup E_n\)’s primary key.

Note: if the attribute names of primary keys are not unique, the attributes with the same names should be renamed.

Def. The candidate keys for \(R\) (minimal, non-redundant super keys).

Choice of Primary Key for Binary Relationship

The candidate keys or primary key for \(R\) among entity sets \(A\) amd \(B\) can be decided, in accordance with the mapping cardinality of \(R\), as follows

If \(R\) is many to many, \(R\)’s primary key \(=\) \(A\)’s primary key \(\cup\) \(B\)’s primary key
If \(R\) is many to one from \(A\) to \(B\), \(R\)’s primary key \(=\) \(A\)’s primary key (many side)
If \(R\) is one to one, \(R\)’s primary key \(=\) \(A\)’s primary key or \(B\)’s primary key

Weak Entity Sets

Def. An entity set that does not have a primary key is a weak entity set

Consider section entity set, identified by course_id, semester, year, sec_id. Entities in section are related to entities in course.

Suppose a relationship set sec_course between entity sets section and course
The information in sec_course is redundant, since section already has attribute course_id, which identifies the course with which the section is related
To deal with redundancy is to get rid of the relationship sec_course; however, relationship between section and course becomes implicit

We can introduce weak entity set

Not store the attribute course_id in the section entity, but only store the remaining attributes section_id, year, and semester
However, the entity set section does not have enough attributes to identify a particular section entity uniquely
To deal with this problem, we treat relationship sec_course as a special relationship that provides extra information (section is modeled as a weak entity set).

Figure 6: Weak entity set example

Def. A weak entity set called identifying entity is one, whose existence is dependent on another entity.

Def. Identifying relationship is the relationship associating the weak entity set with the identifying entity set.

A weak entity set is depicted via a double rectangle. Underline the discriminator of a weak entity set with a dashed line. The relationship set connecting weak entity set to identifying string entity set is depicted by a double diamond

Def. The discriminator (or partial key) of a weak entity set is the set of attributes that distinguishes among all those entities in weak entity set that depending on a particular strong entity.

Note: use identifying entity, along with extra attributes called discriminator to uniquely identify a weak entity, instead of a primary key.

Every weak entity must be associated with an identifying entity; weak entity set is existence dependent on identifying entity set.

Removing Redundant Attributes

Suppose we have entity sets:

student, with attributes: ID, name, tot_cred, dept_name
department, with attributes: dept_name, building, budget
Each student has an associated department using a relation set stud_dept

The attribute dept_name in student replicates information present in relationship. It is redundant and needs to be removed. But when converting back to tables, the attribute should be reintroduced.

Reducing ER Diagrams to Relation Schemas

Logical database design with steps: generation of initial relational schema from the ER diagram, and normalization of relational schema. Converting an ER diagram to a table format is the basis for deriving a relational database design (conceptual design to logical design).

A database which conforms to an ER diagram can be represented by a collection of schemas/tables

Entity sets and relationship sets can be expressed uniformly as relation schemas that represent the contents of the database
- there is a unique schema corresponding to either entity set or relationship set
Each table has a number of columns, corresponding to their attributes

A strong entity set reduces to a schema with the same attributes; a weak entity set becomes a table that includes a column for the primary key of the identifying strong entity set

Composite attributes are flattened by separate attributes.

Representing Relationship Sets

Reduction of a relationship set into tables are strongly dependent on the mapping cardinality constraint and total/partial constraints.

A many to many relationship sets is represented as an independent table with columns for the primary keys of two participating entity sets, and any descriptive attributes of the relationship set
Many to one and one to many relationship sets that total on the many side can be represented by adding an extra attribute (the primary key of the one side) to the many side.
Many to one relationship sets with partial participation on the many side. The relationship is reduced to an independent table, avoiding null values
For one to one relationship sets, either side can be chosen to act as the many side

Extended ER Features

Specialization
- Top-down design process
- We designate sub-groupings within an entity set that are distinctive from other entities
- These sub-groupings become lower-level entity sets that have attributes or participate in relationships that do apply to the higher-level entity set
- Depicted by a triangle component labeled ISA
Attribute inheritance
- A lower-level entity set inherits all the attributes and relationship participation of the higher-level entity set to which it is linked
- A higher-level entity’s attributes and relationship can apply to all of its lower-level entities
Generalization
- Bottom-up design process
  - Combine a number of entity sets that share the same features into a higher-level entity set

Specialization and generalization are simple inversion of each other. They are represented in an ER diagram in the same way. The terms specialization and generalization are used interchangeably.

Superclass and Subclass

The ISA relationship, or containment, is also referred to as superclass-subclass relationship.

For the entity sets \(A\) and \(B\), if \(A\) is the generalization of \(B\), then \(A\) is the superclass of \(B\), and \(B\) is the subclass of \(A\).

Completeness Constraint

Completeness constraint specifies whether or not an entity in the higher-level entity set must belong to at least one of the lower-level entity sets within a generalization.

Total: an entity must belong to one of the lower-level entity sets
Partial: an entity need not belong to one of the lower-level entity sets

We can specify total generalization by adding the keyword total in the diagram and drawing a dashed line from the keyword to the corresponding hollow arrow-head (for a total generalization), or to the set of hollow arrow-heads (for an overlapping generalization).

What’s more, constraints on whether or not a higher-level entity may belong to more than one lower-level entity set within a single specialization

Disjoint: a higher-level entity can belong to only one lower-level entity set
Overlapping: a higher-level entity can belong to more than one lower-level entity sets

Representing Specialization via Schemas

Method 1
- Form a schema for the higher-level entity
- Form a schema for each lower-level entity set, including primary key of higher-level entity set and local attributes.
Method 2
- Form a schema for each entity set with all local and inherited attributes
Method 3
- If specialization is disjoint and total, creating two tables
- \(T_1(a_1, a_2, \cdots, a_m, c_1, c_2, \cdots, c_k, d_1, d_2, \cdots, d_j)\)
- \(T_2(b_1, b_2, \cdots, b_n, c_1, c_2, \cdots, c_k, d_1, d_2, \cdots, d_j)\)
- \(a_1\), \(a_2\), \(\cdots\), \(a_m\) are attributes of one lower-level entity set, \(b_1\), \(b_2\), \(\cdots\), \(b_n\) are attributes of the other lower-level entity set, and \(c_1\), \(c_2\), \(\cdots\), \(c_k\), \(d_1\), \(d_2\), \(\cdots\), \(d_j\) are all attributes of higher-level entity set.

ER Design Issues

Not to use the primary key of an entity set as an attribute of another entity set (to represent implicitly association between two entity sets), it is better to use an relationship set to explicitly show this association.
If \(E\) participate in \(R\), not to designate the primary key (\(E\)) as attributes of \(R\), to avoid information redundancy
Possible guideline is to designate a relationship set to describe an action that occurs between entities
Prefer to using the binary relationship set, replace a non-binary relationship set by a number of binary relationship set

Converting Non-Binary Relationships to Binary Form

In general, any non-binary relationship can be represented using binary relationships by creating an artificial entity set \(E\).

For example, a 3-ary relationship set \(R = \{(a_i, b_i, c_i) \mid a_i \in A, b_i \in B, c_i \in C\}\) among entity sets \(A\), \(B\), \(C\), replace \(R\) by an entity set \(E = \{e_{i}\}\), and three binary relationship set \(R_a\), \(R_b\), \(R_c\).

\(E = \{e_i\}\), \(|E| = |R|\), i.e. each \((a_i, b_i, c_i)\) in \(R\) corresponds to one \(e_i\) in \(E\)
\(R_a = \{(e_i, a_i) \mid e_i \in E, a_i \in A\}\), relating \(E\) and \(A\)
\(R_b = \{(e_i, b_i) \mid e_i \in E, a_i \in B\}\), relating \(E\) and \(B\)
\(R_c = \{(e_i, c_i) \mid e_i \in E, a_i \in C\}\), relating \(E\) and \(C\)
\(E\) has an identifying attribute \(e\) (candidate key) to distinguish each \(e_i\) in \(E\)
All attributes of \(R\) are assigned to \(E\).

Summary of Symbols Used in ER Notation

Figure 7: Summary 1

Figure 8: Summary 2

Figure 9: Summary 3

Figure 10: Alternative ER notations

例题

A database used in an ordering system contains information about customers, products, and orders. The following information is to be included:

for each customer
- customer number, ship to address (several per customer), name, balance, discount
for each order
- order number, customer number, ship to address (one per order), date of ordering (including year, month, and day), products (several kinds per order), quantity ordered of each_product
for each kind of product
- product number, name, manufacturing plant, quantity on hand
entity sets: customer, order, product, and their attribute
- in customer, ship to address is a multi-value attribute, but in order, ship to address is a single-value attribute
- in order, date is a composite attribute

Some constraints about this ordering system are as follows

a customer may take one or more orders, and some customers may take no orders
each order must be for one (and only one) customer
each order must contain at least one kind of products
- a kind of product may appear in several orders, but some kinds of products may not be ordered by any customers
relationship sets
- make between customer and order
  - customer partly participates make, and order totally participates make
  - in relationship set make, mapping cardinality from customer to order is one to many
- contain between order and product, and descriptive attributes quantity of ordered for relationship set contain
  - product partly participates contain, and order totally participates contain
  - in relationship set contain, mapping cardinality from order to product is many to many

ER diagram of the ordering DB is shown as follows:

Figure 11: ER diagram

The ER diagram is then reduced to some relational tables. For entity set customer and its multi-value attribute ship to address, there are two tables

customer(C-number, c-name, discount, balance)
ship-to-address(C-number, ship-to-address)

For entity set order and its composite attribute date which should be decomposed, there is a table

order1(O-number, ship-to-address, year, month, day)

For entity set product, there is a table

product(P-number, p-name, plant, quantity-on-hand)

For relationship set make, there no descriptive attributes, mapping cardinality from customer to order is one to many, and order totally participates make. So, make should not be reduced into a separate table, and can be represented by adding the primary key C-number of customer, that is C-number into the table order

order2(O-number, C-number, ship-to-address, year, month, day)

product at the many side only partly participate relationship set contain, contain should also be represented as a separate table

contain(O-number, P-number, quantity of ordered)

数据库系统原理

Chapter 6: Database Design Using the ER Model

https://ddccffq.github.io/2025/12/20/数据库系统原理/Database_Design_Using_the_ER_Model/

作者

ddccffq

发布于

2025年12月20日

许可协议

Chapter 16: Query Optimization 上一篇

Chapter 7: Relational Database Design: Schema Normalization 下一篇