Chapter 7: Relational Database Design: Schema Normalization

*Features of Good Relational Design

flowchart TD
    A[Application problems in real words] -->|Requirements analysis| B[Specification of user requirements]
    B -->|Conceptual DB design| C[DB conceptual schema, i.e. E-R diagram]
    C -->|<span style="color:red;">Logical DB design</span>| D[DB logical schema, i.e. relational data schema]
    D -->|Physical DB design| E[DB physical schema, e.g. physical storage structure]

Figure 1: DB design phases

Example: combined schemas with pitfalls. Suppose to combine two tables, many to one mapping

  • instructor(ID, name, salary, dept_name)
  • department(dept_name, building, budget)

into

  • ins_dept(ID, name, salary, dept_name, building, budget)

Pitfalls when

  • adding a new instructor. Data for some attributes may be repeated for the instructors (building), making space being wasted
  • if the department Comp.Sci is canceled, all instructors in it should then be removed. This means every tuples containing the Comp.Sci department should be deleted
  • information redundancy complicates updating (introducing possibility of inconsistency of the budget values)
    • e.g. change budget of Comp.Sci department from \(100000\) to \(120000\) and every tuples belonging to Comp.Sci has to be updated
  • to represent directly information about a new opened department in which there no exists instructors, a tuple containing null values, such as (null, null, null, Soft.Eng, Taylor, 100000). But null values in DB will complicate data handling in DBS

The above example introduces two problems: when should we combine relations and when should we decompose relations.

Assume if there is a schema (dept_name, building, budget), then dept_name would be a candidate key. We denote dept_name \(\to\) building, budget as a functional dependency (FD).

Not all decompositions are good, for example, suppose we decompose employee(ID, name, street, city, salary) into employee1(ID, name) and employee2(name, street, city, salary). This is a lossy decomposition. If there exists two persons with the same name, obviously when we use NATURAL JOIN will generate additional tuples.

*Lossless Decomposition

Def. Let \(R\) be a relation schema, and \(R_1\) and \(R_2\) form a decomposition of \(R\). That is \(R = R_1 \cup R_2\). The decomposition is a lossless decomposition, if there is no loss of information by replacing the schema \(R\) with two relation schemas \(R_1 \cup R_2\).

Def. The decomposition is a lossy decomposition if \(r(R) \subset \Pi_{R_1} (r(R)) \Join \Pi_{R_2} (r(R))\)

When certain decomposition are lossless? For \(R = (R_1, R_2)\), require that for all possible relations \(r\) on schema \(R\). \(r(R) = \Pi_{R_{1}} (r(R)) \Join \Pi_{R_2}(r(R))\)

Def. A decomposition of \(R\) into \(R_1\) and \(R_2\) is lossless if at least one of the following dependencies is in \(F^{+}\): \(R_1 \cap R_2 \to R_1\) and \(R_1 \cap R_2 \to R_2\).

If only \(R_1 \cap R_2 \to R_1\) holds, then \(R_1 \cap R_2\) is the primary key of \(R_1\), and the foreign key of \(R_2\)

Example, consider the schema: in_dep(ID, name, salary, dept_name, building, budget). Decompose it into the instructor and department schemas: instructor(ID, name, dept_name, salary) and department(dept_name, building, budget).

Consider the intersection of these two schemas, which is dept_name. Because of dept_name \(\to\) building, budget, the lossless decomposition is satisfied.

*Functional Dependencies

Def. An instance of a relation that satisfies all such real world constraints is called a legal instance of relation.

Def. Functional dependency (FD). Constraints on the legal relations that the value for a certain set of attributes determines uniquely the values for another set of attributes.

A functional dependency is a generalization of the notion of a key.

Def. For a schema \(R\) and \(\alpha \subseteq R\), \(\beta \subseteq R\), a relation instance \(r(R)\) satisfies a FD \(\alpha \to \beta\), if for pairs of tuples \(t_i\), \(t_j\) \(\in r(R)\) such that \(t_{i}[\alpha] = t_{j}[\alpha] \to t_{i}[\beta] = t_{j}[\beta]\).

Def. \(r(R)\) is legal under FD \(\alpha \to \beta\), if \(\alpha \to \beta\) is satisfied by \(r(R)\)

For example, given relation \(r(R)\) shown below, which FD is satisfied by \(r\)? (according to \(t_{i}[\alpha] = t_{j}[\alpha] \to t_{i}[\beta] = t_{j}[\beta]\), the answer is C)

  • A. \(A \to B\)
  • B. \(AC \to B\)
  • C. \(BC \to A\)
  • D. \(B \to C\)
\(t\) \(A\) \(B\) \(C\)
\(t_1\) \(1\) \(4\) \(2\)
\(t_2\) \(3\) \(5\) \(6\)
\(t_3\) \(3\) \(4\) \(6\)
\(t_4\) \(7\) \(3\) \(8\)
\(t_5\) \(9\) \(1\) \(0\)

Def. Let \(R\) be a relation schema \(\alpha \subseteq R\) and \(\beta \subseteq R\). The FD holds on \(R\), if every legal instance \(r(R)\) satisfies \(\alpha \to \beta\)

If \(\alpha \to \beta\) holds on \(R\), then every legal \(r_i(R)\) satisfies this \(R\), but for schema \(R\) and \(\alpha \to \beta\), if only some \(r_i(R)\) satisfies \(R\), \(\alpha \to \beta\) may not holds on \(R\)

Example of modeling FD. Consider $R = $(employee_ID, turnover_per_day, department_name, manager). It is assumed that

  • every day, each employee has only one turnover per day
  • each employee works at only one department
  • each department is managed by only one manager

All the functional dependencies on \(R\) are

  • employee_ID, date \(\to\) turnover_per_day
  • employee_ID \(\to\) department__name
  • department_name \(\to\) manager

*Keys and Functional Dependencies

Def. \(K\) is a super key for relation schema \(R\) if and only if \(K \to R\).

Def. \(K\) is a candidate key for \(R\) if and only if \(K \to R\) and for no \(\alpha \subset K\), \(\alpha \to R\).

For example, consider the schema: in_dep(ID, name, salary, dept_name, building, budget). We expect FD to hold: dept_name \(\to\) building, or ID \(\to\) building, but not to hold dept_name \(\to\) salary.

FD as Integrity Constraints in DB

While ER diagram describe objects and associations among them, FD illustrates relationships among attributes of objects.

How to guarantee FD in DBS? Integrity mechanisms in DBMS, i.e. keys, checks, and triggers

Use of Functional Dependencies

  1. Test relations to see if they are legal. If a relation \(r\) is legal under an FD set \(F\), we say that \(r\) satisfies \(F\)
  2. Specify constraints on the set of legal relations. \(F\) holds on \(R\) if all legal relations on \(R\) satisfy the FD set \(F\).

Note: A specific instance of a relation schema may satisfy a FD in \(F\) even if the FD does not hold on all legal instances.

Trivial Functional Dependencies

Def. A functional dependency (FD) \(\alpha \to \beta\) is trivial if \(\beta \subseteq \alpha\).

This means that the attributes on the right-hand side of the FD are already included in the attributes on the left-hand side. As a result, \(\alpha \to \beta\) is satisfied by all possible instances of the relation schema, regardless of the data.

Def. Primary attribute (compared to primary key) is the attribute appears in at least one candidate key.

Transitive Dependency

Def. A functional dependency \(\alpha \to \gamma\) is transitive, if

  • \(\alpha \to \beta\) holds on, but \(\beta \to \alpha\) does not hold
  • \(\beta \to \gamma\) holds on, and \(\gamma \cap \alpha = \phi\)
  • \(\gamma\) is called transitive dependent on \(\alpha\)

*Partial Dependency

Def. A FD \(\alpha \to \beta\) is partial or \(\beta\) is partially dependent on \(\alpha\) if there is a subset \(\gamma\) of \(\alpha\), i.e. \(\gamma \subset \alpha\), such that \(\gamma \to \beta\)

For example, consider Student(sno, sname, address, depart). Partial dependency (sno, sname) \(\to\) address.

*Closure of FD Set

Def. Use the notation \(F^{+}\) to denote the closure of the FD set \(F\). The set of all FDs that can be inferred given the FD set \(F\). \(F^{+}\) contains all of FDs in \(F\).

For example, \(F^{+} = \{A \to B, B \to C\}^{+} = \{A \to B, B \to C, A \to C, AB \to B, AC \to BC, \cdots\}\)

Normal Forms

关系模式规范化处理的基本要求为:

  • 静态关系具有第一范式形式
  • 动态关系最好具有 3NF 和 BCNF 形式
flowchart TD
   A[1NF] --> |消除非主属性对键的部分函数依赖| B[2NF]
   B --> |消除非主属性对键的传递函数依赖| C[3NF]
   C --> |消除主属性对键的部分和传递函数依赖| D[BCNF]
Figure 2: The relations between each norm

Goals of Normalization

Let \(R\) be a relation schema with a FD set \(F\) (decide whether a relation schema \(R\) is in good form). When a relation schema \(R\) is not in good form, decompose it into \(\{R_1, R_2, \cdots, R_n\}\) such that

  • each relation schema is in good form, e.g. in 2NF, 3NF, BCNF
  • the decomposition is a lossless decomposition (mentioned above)
  • the decomposition should be dependency preserving

Atomic Domains and First Normal Form

Def. A relational schema \(R\) is in first normal form if the domains of all attributes of \(R\) are atomic.

Domain is atomic if its elements are indivisible units.

Examples of non-atomic domains: names and composite attributes

Def. Atomicity is actually a property of how the elements of the domain are used. For example, strings would normally be considered indivisible.

Students are given roll numbers which are strings of CS0012 or EE1127. If the first two characters are extracted to find the department, the domain of roll numbers is not atomic.

*Second Normal Form

Def. A relation schema \(R\) is in 2NF with respect to a FD set \(F\), if \(R\) is in 1NF, and each attribute \(A\) meets one of the criteria

  • It appears in a candidate key (a primary attribute)
  • It is (not partially) dependent on a candidate key

Here is an example about 2NF recognition. Consider SLC(S#, SDpt, SLocation, C#, Grade). The candidate key is {S#, C# }. The FDs are

  • (S#, C#) \(\to\) Grade
  • (S#, C#) \(\to\) SDpt
  • (S#, C#) \(\to\) SLocation
  • SDpt \(\to\) SLocation
  • S# \(\to\) SDpt
  • S# \(\to\) SLocation

This schema is not in 2NF, because

  • For non-primary attribute SDpt, there exists S# \(\to\) SDpt, so SDpt is partially dependent on key
  • For non-primary attribute SLocation, there exists S# \(\to\) SLocation, so SLocation is partially dependent on key
  • For non-primary attributes SDpt and SLocation, there exists SDpt \(\to\) SLocation

*BCNF

Def. A relation schema \(R\) is in BCNF with respect to a FD set \(F\), if for all functional dependencies in \(F^{+}\) of the form \(\alpha \to \beta\) where \(\alpha \subseteq R\) and \(\beta \subseteq R\), holds at least one:

  • \(\alpha \to \beta\) is trivial
  • \(\alpha\) is a super key

Example schema that is not in BCNF. Consider in_dept(ID, name, salary, dept_name, budget). Because dept_name \(\to\) building, budget holds on in_dep but dept_name is not a super key. Decompose in_dep into instructor and department as follow are both BCNF.

  • instructor(ID, name, salary, dept_name)
  • department(dept_name, building, budgets)

Decompose a Schema into BCNF

Let \(R\) be a schema \(R\) that is not in BCNF. Let \(\alpha \to \beta\) that causes violation of BCNF.

We decompose \(R\) into:

  • \((\alpha \cup \beta)\)
  • \((R - (\beta - \alpha))\)

*BCNF and FD Preservation

Constraints on FDs, are costly to check in practice, unless they pertain to only one relation.

If test only those FDs on each individual relation of a decomposition to ensure all FDs hold, then that decomposition is dependency preserving.

Note: because it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker normal form, known as the 3NF.

Consider a schema dept_advisor(s_ID, i_ID, dept_name) with \(F\): i_ID \(\to\) dept_name and s_ID, dept_name \(\to\) i_ID. Two candidate keys are {s_ID, dept_name} and {s_ID, i_ID}. dept_advisor is not in BCNF because i_ID is not a super key. If we need to decompose dept_advisor into two BCNF, any decomposition will not include all the attributes in s_ID, dept_name \(\to\) i_ID. Thus, the decomposition not be dependency preserving.

The FD can only be checked by the join of the decomposed relations.

For example, consider \(R = (A, B, C)\) and \(F = \{A \to B, B \to C\}\).

  • Decomposition1: \(R_1 = (A, B)\), \(R_2 = (B, C)\)
    • Lossless join decomposition \(R_1 \cap R_2 \to R_2 \in F^{+}\)
    • Dependency preserving
  • Decomposition2: \(R_1 = (A, B)\), \(R_2 = (A, C)\)
    • Lossless join decomposition \(R_1 \cap R_2 \to R_1 \in F^{+}\)
    • Not dependency preserving

*Third Normal Form

Def. A relation schema \(R\) is in third normal form (3NF) if for all: \(\alpha \to \beta\) in \(F^{+}\) at least one of the following holds:

  • \(\alpha \to \beta\) is trivial
  • \(\alpha\) is a super key for \(R\)
  • Each attribute in \(\beta - \alpha\) is contained in a candidate key

If a schema is in BCNF, it is also in 3NF.

Def. A schema \(R\) is in 3NF with respect to a FD set, if there are no non-primary attributes \(A\) for which \(A\) is transitive dependent on a key

Consider a schema: dept_advisor(s_ID, i_ID, dept_name) with FDs: i_ID \(\to\) dept_name and s_ID, dept_name \(\to\) i_ID. Two candidate keys are {s_ID, dept_name} and {s_ID, i_ID}. dept_advisor is not in BCNF, however, \(R\) is in 3NF.

  • s_ID, dept_name is a super key
  • i_ID \(\to\) dept_name and i_ID is not a super key, but
    • {dept_name} - {i_ID} = {dept_name} and dept_name is contained in a candidate key

Comparison of BCNF and 3NF

  • Advantages to 3NF over BCNF. It is always possible to obtain a 3NF design without sacrificing losslessness or dependency preservation.
  • Disadvantage to 3NF
    • We may have to use null values to represent some of the possible meaningful relationships among data items
    • There is the problem of repetition of information

Functional Dependency (FD) Theory

  • The theory shows which FDs are implied logically by a given set of FDs. (Armstrong’s Axioms)
  • Algorithms generating lossless decompositions into BCNF and 3NF.
  • Algorithms testing if decomposition is dependency-preserving.

给定关系模式 \(R(A, B, C)\) 和关系 \(r(R)\),利用 SQL 语句判断关系 \(r(R)\) 是否满足函数依赖 \(B \to C\)

要求:如果 \(r(R)\) 不满足该函数依赖,找出导致不满足的 \(r(R)\) 中的元组;设计一个断言,以保证在 \(R(A, B, C)\) 上成立 \(B \to C\)

方法:1. 使用元组变量,2. 使用 GROUP BY。如果查询结果为 nullnot exist,则函数依赖成立。

参考 SQL 语句:

1
2
CREATE ASSERTION B_to_C CHECK
(NOT EXISTS (SELECT B FROM R GROUP BY B HAVING COUNT (DISTINCT C) > 1))

What’s logical implication between FDs? Given an FD set \(F\), there are other FDs \(f\) that logically implied by \(F\)

  • Example: if \(A \to B\) and \(B \to C\), then infer that \(A \to C\).

Def. Given a schema \(R\), a FD \(f\) is logically implied by a FD set \(F\), if every instance \(r(R)\) that satisfies \(F\) also satisfies \(f\)

  • Example: \(\{A \to B, B \to C\}^{+} = \{A \to B, B \to C, A \to C, \cdots\}\)

Def. The set of all FDs logically implied by \(F\) is the closure of \(F\).

  • Denote the closure of \(F\) by \(F^{+}\).
  • \(F^{+} = \{f \mid f \text{ is logically implied by } F\}\)

*Closure of FDs

Compute \(F^{+}\), the closure of \(F\), by repeatedly applying Armstrong’s Axioms:

  • Reflexive rule: if \(\beta \subseteq \alpha\), then \(\alpha \to \beta\).
  • Augmentation rule: if \(\alpha \to \beta\), then \(\gamma \alpha \to \gamma \beta\).
  • Transitivity rule: if \(\alpha \to \beta\), and \(\beta \to \gamma\), then \(\alpha \to \gamma\).

These rules are

  • Sound: generate only functional dependencies that actually hold.
  • Complete: generate all functional dependencies that hold.

Additional rules:

  • Union rule: if \(\alpha \to \beta\) holds and \(\alpha \to \gamma\) holds, then \(\alpha \to \beta \gamma\) holds.
  • Decomposition rule: if \(\alpha \to \beta \gamma\) holds, then \(\alpha \to \beta\) holds and \(\alpha \to \gamma\) holds.
  • Pseudo-Transitivity rule: if \(\alpha \to \beta\) holds and \(\gamma \beta \to \delta\) holds, then \(\alpha \gamma \to \delta\) holds.

“holds” 理解为”满足“

Here is an example of computing \(F^{+}\) with using rules mentioned above: \(R = (A, B, C, G, H, I)\), \(F = \{A \to B, A \to C, B \to H, CG \to H, CG \to I\}\).

Some members of \(F^{+}\). \(\{A \to H\} \subseteq F^{+}\) (transitivity rule), \(\{AG \to I\} \subseteq F^{+}\) (augmentation rule and transitivity rule), \(\{CG \to HI\}\) (augmentation rule).

*Closure of Attribute Sets

Def. An attribute \(B\) is functionally determined by \(\alpha\) under \(F\) if \(\alpha \to B\) under \(F\)

  • \(\alpha | F \to B\)

Def. Given a set of attributes \(\alpha\), define the closure of \(\alpha\) under \(F\) (denoted by \(\alpha^{+}\)) as the set of attributes that are functionally determined by \(\alpha\) under \(F\).

  • \(\alpha^{+} = \{\beta \mid \beta \text{ is functionally determined by } \alpha \text{ under } F\} = \{\beta \mid \alpha | F \to \beta\}\)

Algorithm: compute \(\alpha^{+}\), the closure of \(\alpha\), under \(F\).

  • Input: \(\alpha\), \(F\)
  • Output: \(\alpha^{+}\)
  • Result \(:= \alpha\)

The process is as follow:

1
2
3
4
5
while (changes to result) do
for each beta_to_gamma in F do
begin
if beta subseteq result then result := result cup gamma
end

\(:=\) 在数学定义里表示“被定义为”

Here is an example of computing attribute set closure. We suppose \(R = (A, B, C, G, H, I)\), \(F = \{A \to B, A \to C, CG \to H, CG \to I, B \to H\}\). Compute \((AG)^{+}\).

  1. \(res = AG\)
  2. \(res = ABCG\) (\((A \to C)\) and \((A \to B)\))
  3. \(res = ABCGH\) (\((CG \subseteq ABCG)\) and \((CG \to H)\))
  4. \(res = ABCGHI\) (\((CG \subseteq ABCGH)\) and \((CG \to I)\))

So \((AG)^{+} = R\), \(AG\) is a superkey of \(R\).

We can use attribute closure algorithm to

  1. Test for super key
    • To test if \(\alpha\) is a superkey, we compute \(\alpha^{+}\) and check if \(\alpha^{+} = R\)
  2. Test functional dependencies
    • To check if FD \(\alpha \to \beta\) holds (or, in other words, is in \(F^{+}\)), just check if \(\beta \subseteq \alpha^{+}\)

Def. For two FDs \(F\) and \(G\), if \(F^{+} = G^{+}\) then \(F\) and \(G\) are equivalent.

*Canonical Cover

Def. Extraneous

  1. Sets of FDs may have redundant FDs that can be inferred from others.
    • \(A \to C\) is redundant in \(\{A \to B, B \to C, A \to C\}\).
  2. Some attributes in the left or right sides of a FD may be extraneous
    • \(\{A \to B, B \to C, A \to CD\}\) can be simplified to \(\{A \to B, B \to C, A \to D\}\).
  3. An attribute of FD in \(F\) is extraneous if we can remove it without changing \(F^{+}\).

Extraneous attributes:

  1. Removing an attribute from the left side of a FD could make it a stronger constraint. If we have \(AB \to C\), we may remove \(A\) or \(B\). The choice depends on what FD set \(F\) happens to be. Suppose that \(F = \{AB \to C, A \to D, D \to C\}\), then we can show that \(F\) logically implies \(A \to C\), making extraneous \(B\) in \(AB \to C\).

  2. Removing an attribute from the right side of a FD could make it a weaker constraint. If we have \(AB \to CD\) and remove \(C\), we get possibly weaker result \(AB \to D\). Because using just \(AB \to D\), we can no longer infer \(AB \to C\). But, depending on what FD set \(F\) happens to be, we may remove \(C\) from \(AB \to CD\) safely. Suppose that \(F = \{AB \to CD, A \to C\}\). Even after replacing \(AB \to CD\) by \(AB \to D\), we can still infer \(AB \to C\) and thus \(AB \to CD\).

  3. Consider a set \(F\) of FDs and \(\alpha \to \beta\) in F.

    • Attribute \(A\) is extraneous in \(\alpha\), if \(A \in \alpha\), and \(F\) implies \((F - \{\alpha \to \beta\}) \cup {(\alpha - A) \to \beta} = F'\) (\(F' \subseteq F^{+}\)).
    • Attribute \(A\) is extraneous in \(\beta\), if \(A \in \beta\), and the set of FDs \(F' = (F - \{\alpha \to \beta\}) \cup \{\alpha \to (\beta - A)\}\) implies \(F\) (\(F \subseteq (F')^{+}\)).

How to test if an attribute is extraneous? Here are two ways to make it.

Let \(R\) be a relation schema and let \(F\) be a set of FDs that hold on \(R\). Consider an attribute \(B\) in the FD. For \(\alpha \to \beta\)

  1. Testing if attribute \(B \in \beta\) is extraneous in \(\beta\) under \(F\)?
    • Consider the set \(F' = (F - \{\alpha \to \beta\}) \cup \{\alpha \to (\beta - B)\}\).
    • Compute \(\alpha^{+}\) under \(F'\) and check if \(B \in \alpha^{+}\) under \(F'\)? If it does, \(B\) is extraneous in \(\beta\).
  2. Testing if attribute \(A \in \alpha\) is extraneous in \(\alpha\) under \(F\)?
    • Let \(\gamma = \alpha - \{A\}\). Compute \(\gamma^{+}\) using the FDs in \(F\).
    • If \(\gamma^{+}\) includes all attributes in \(\beta\), then \(A\) is extraneous in \(\alpha\).

Examples

  • Given \(F = \{A \to C, AB \to C\}\), \(B\) is extraneous in \(AB \to C\), because \(\{A \to C, AB \to C\}\) logically implies \(\{A \to C\}\)
  • Given \(F = \{A \to C, AB \to CD\}\), \(C\) is extraneous in \(AB \to CD\) since \(AB \to C\) can be inferred even after deleting \(C\)
  • Given \(F = \{AB \to CD, A \to E, E \to C\}\), \(C\) is extraneous in \(AB \to CD\) since the closure includes \(C\)

Def. A canonical cover of \(F\), denoted as \(F_{c}\), is a minimal set of FDs equivalent to \(F\) without redundant FDs or attribute.

A canonical cover for \(F\) is a set of FDs \(F_c\) such that

  • \(F\) logically implies all dependencies in \(F_c\)
  • \(F_c\) logically implies all dependencies in \(F\)
  • No FD in \(F_c\) contains an extraneous attribute and each left side of FD in \(F_c\) is unique.
    • There are no two FDs in \(F_c\), \(\alpha_1 \to \beta_1\) and \(\alpha_2 \to \beta_2\) such that \(\alpha_1 = \alpha_2\).

How to compute a canonical cover for \(F\)? We repeat using the union rule to replace any FDs in \(F\) of the form \(\alpha_1 \to \beta_1\) and \(\alpha_1 \to \beta_2\) with \(\alpha_1 \to \beta_1 \beta_2\), finding a FD \(\alpha \to \beta\) in \(F_c\) with an extraneous attribute either in \(\alpha\) or in \(\beta\), until \(F_c\) not change.

Note

  1. Test for extraneous attributes done using \(F_c\) not \(F\). If an extraneous attribute is found, delete it from \(\alpha \to \beta\)

  2. Union rule may become applicable after some extraneous attributes have been deleted, so it has to be reapplied.

  3. There may be several \(F_c\) for a set \(F\) of FDs.

Here is an example of computing a canonical cover. Suppose \(R = (A, B, C)\), \(F = \{A \to BC, B \to C, A \to B, AB \to C\}\).

  1. Use union rules to combine \(A \to BC\) and \(A \to B\) into \(A \to BC\).

  2. \(A\) is extraneous in \(AB \to C\).

  3. \(C\) is extraneous in \(A \to BC\).

  4. The canonical cover is \(\{A \to B, B \to C\}\).

Lossless-join Decomposition

Def. For \(R = (R_1, R_2)\), we require that for all possible relations \(r\) on schema \(R\), \(r = \Pi_{R_1}(r) \Join \Pi_{R_2}(r)\).

Def. Decomposition of \(R\) into \(R_1\) and \(R_2\) is lossless join if at least one of the following dependencies is in \(F^{+}\).

  • \(R_1 \cap R_2 \to R_1\) or \(R_1 \cap R_2 \to R_2\)

Note: The above FDs are a sufficient condition for lossless join decomposition; The dependencies are a necessary condition only if all constraints are FDs.

Def. Lossy decomposition, \(r \neq \Pi_{R_1}(r) \Join \Pi_{R_2}(r) \cdots \Join \Pi_{R_n}(r)\), also known as lossy-join decomposition.

*Functional Dependency Preservation

Def. For a schema \(R\), \(F\) holds on \(R\), and decomposition \(\{R_1, R_2, \cdots, R_n\}\) of \(R\), the restriction of \(F\) to \(R_i\), denoted as \(F_i\) is defined as

\[ F_i = \{\alpha \to \beta \mid \alpha \to \beta \in F^{+} \And \alpha \beta \subseteq R_i\} \]

The set of FDs in \(F^{+}\) that include only attributes in \(R_i\).

Def. Let \(F_i\) be restriction of \(F\) to \(R_i\). A decomposition is dependency preserving, if \((F_1 \cup F_2 \cup \cdots \cup F_n)^{+} = F^{+}\)

Algorithm. How to test for dependency preservation? Check if a FD \(\alpha \to \beta\) is preserved in decomposition \(R\) of \(R_1\), \(R_2\), \(\cdots\), \(R_n\). We set \(res = \alpha\) and repeat updating \(t = (res \cap R_i)^{+} \cap R_i\), \(res = res \cup t\) for each \(R_i\) in the decomposition until \(res\) does not change. If \(res\) contains all attributes in \(\beta\), the FD \(\alpha \to \beta\) preserved.

Test on all FDs in \(F\) to check if a decomposition is dependency preserving

  • The decomposition is preserved, if and only if all \(\alpha \to \beta\) in \(F\) are preserved.
  • This procedure takes polynomial time.

Here is an example. Consider Student(sno, dept, head), \(F = \{\text{sno} \to \text{dept}, \text{dept} \to \text{head}\}\).

  • Decomposition1:
    • \(R_1(\text{sno, dept})\), \(F_1 = \{\text{sno} \to \text{dept}\}\)
    • \(R_2(\text{sno, head})\), \(F_2 = \{\text{sno} \to \text{head}\}\)
    • lossless decomposition because \(R_1 \cap R_2 \to R_1\)
    • non-dependency preservation
      • Using definition, because \(\{\{\text{sno} \to \text{dept}\} \cup \{\text{sno} \to \text{head}\}\}^{+} \neq F^{+}\)
      • Using algorithm and checking if \(\text{dept} \to \text{head}\) is preserved
        • result = \(\{\text{dept}\}\)
        • \((\text{dept} \cap R_1)^{+} \cap R_1 = \{\text{dept}\}\)
        • \((\text{dept} \cap R_2)^{+} \cap R_2 = \phi\)
        • so th FD is not preserved

Algorithms for Decomposition Using FD

*Testing for BCNF

Algorithm 1. To check if a non-trivial FD \(\alpha \to \beta\) causes a violation of BCNF

  1. Compute \(\alpha^{+}\) (the attribute closure of \(\alpha\)).
  2. Verify that it includes all attributes of \(R\), that is, it is a super key of \(R\).

Simplified test: To check if a relation schema \(R\) is in BCNF, suffices to check only the FDs in \(F\) for violation, rather than checking all FDs in \(F^{+}\). If none of the FDs in \(F\) causes a violation of BCNF, none of the FDs in \(F^{+}\) will cause a violation of BCNF either.

However, simplified test using only \(F\) is incorrect when testing a relation in a decomposition of \(R\).

Consider \(R = (A, B, C, D, E)\), with \(F = \{A \to B, BC \to D\}\). We decompose \(R\) into \(R_1 = (A, B)\) and \(R_2 = (A, C, D, E)\). Neither of the dependencies in \(F\) contain only attributes from \((A, C, D, E)\), so we might be mislead into thinking \(R_2\) satisfies BCNF. In fact, dependency \(AC \to D\) in \(F^{+}\) shows \(R_2\) is not in BCNF.

Algorithm 2. Check if a relation \(R_i\) in a decomposition of \(R\) is in BCNF.

Either test \(R_i\) for BCNF with respect to the restriction of \(F^{+}\) to \(R_i\) or use the original set of FDs \(F\) that hold on \(R\), but with test:

  • For every set of attributes \(\alpha \subseteq R_{i}\), check that \(\alpha^{+}\) either includes no attribute of \(R_i - \alpha\) or includes all attributes of \(R_i\).

If the condition is violated by some \(\alpha \to \beta\) in \(F^{+}\), the dependency \(\alpha \to (\alpha^{+} - \alpha) \cap R_i\) can be shown to hold on \(R_i\) and \(R_i\) violates BCNF.

Algorithm 3. Decomposition algorithm for lossless schema in BCNF.

  • \(F^{+}\) is computed at first to determine whether or not \(\alpha\) is the super key of sub-schema \(R_i\).
  • If other methods can be used to determine whether or not \(\alpha\) is the super key of sub-schema \(R_i\), then \(F^{+}\) need not be computed.

Given \(R\), \(F\) holding on \(R\). We set \(res := \{R\}\). If there is a non-BCNF sub schema \(R_i\) in \(res\), then let \(\alpha \to \beta\) be a non-trivial FD that holds on \(R_i\), such that (1). \(\alpha\) is not the super key for \(R_i\); (2). \(\alpha \cap \beta = \phi\). Then update \(res := \{(res - R_{i})\} \cup \{\{(R_i - \beta)\} \cup \{(\alpha, \beta)\}\}\)

Note

  • To determine whether or not \(R_i\) is in BCNF, the restriction of \(F\) to \(R_i\) and the candidate keys of \(R_i\) should be computed.

  • \(R_i\) is not in BCNF, because of \(\alpha \to \beta\) on \(R_i\), \(\alpha\) is not the super key of \(R_i\).

  • The algorithm replaces non-BCNF \(R_i\) with \((R_i - \beta)\) and \((\alpha, \beta)\).

    • \(R_i\) is decomposed into \((R_i - \beta)\) and BCNF \((\alpha, \beta)\).
  • The restriction of \(F\) to the schema \((\alpha, \beta)\) is \(\alpha \to \beta\), and \(\alpha\) is the super key for \((\alpha, \beta)\). so \((\alpha, \beta)\) is in BCNF.

Here are three examples

(1). Consider \(R = (A, B, C)\), \(F = \{A \to B, B \to C\}\), Key $ = {A}$. \(R\) is not in BCNF because \(B \to C\) but \(B\) is not super key, so we decompose it. \(R_1 = (B, C)\), \(F_1 = \{B \to C\}\) and \(R_2 = (A, B)\), \(F_2 = \{A \to B\}\).

(2). Consider class(course_id, title, dept_name, credits, sec_id, semester, year, building, room_number, capacity, time_slot_id). FDs are

  • course_id \(\to\) title, dept_name, credits
  • building, room_number \(\to\) capacity
  • course_id, sec_id, semester, year \(\to\) building, room_number, time_slot_id

A candidate key is {course_id, sec_id, semester, year}. Obviously, class is not BCNF. Firstly, focusing on course_id \(\to\) title, dept_name, credits, we use algorithm 3 to decompose class into course(course_id, title, dept_name, credits) and class1(course_id, sec_id, semester, year, building, room_number, capacity, time_slot_id). Then focusing on building, room_number \(\to\) capacity, still using algorithm 3, we get final schemas: classroom(building, room_number, capacity) and section(course_id, sec_id, semester, year, building, room_number, time_slot_id).

(3). Consider \(R = (J, K, L)\), \(F = \{JK \to L, L \to K\}\), candidate keys are \(JK\) and \(JL\). \(R\) is not in BCNF because \(L \to K\). A BCNF decomposition is \(R_1 = (L, K)\) with \(F_1 = \{L \to K\}\) and \(R_2 = (J, L)\) with no FD, \(JK \to L\) being lost. Any decomposition of \(R\) will fail to preserve \(JK \to L\). This implies that testing for \(JK \to L\) requires a join.

Testing for 3NF

Why we decompose 3NF?

  • There are some situations where
    • BCNF is not dependency preserving.
    • Whereas checking for FD violation is important.
  • Solution: Define a weaker normal form, called the Third Normal Form (3NF)
    • Allows some redundancy (with resultant problems).
    • But check FDs on individual relations without computing a join.
    • There is a lossless-join, dependency-preserving decomposition into 3NF.

Algorithm 1. A decomposition algorithm that gives lossless and dependency preserving schema in 3NF.

Note

  • All candidate keys should be founded out.
  • Only one \(F_c\) should be computed at first.
  1. Find out all candidate keys for \(R\).

  2. Find out a canonical over \(F_c\) for F.

  3. Do for each FD \(\alpha \to \beta\) in \(F_c\). If none of the schema \(R_i\), \(1 \leq j \leq i\) contains \(\alpha \beta\), then \(i := i + 1\), \(R_i := \alpha \beta\).

  4. If none of the schemas \(R_j\), \(1 \leq j \leq i\) contains a candidate key for \(R\), then \(i := i + 1\), \(R_{i} :=\) any candidate key for \(R\).

  5. Return \((R_1, R_2, \cdots, R_i)\).

Here are two examples

(1). Consider schema \(R(X, Y, Z, W)\) and \(F = \{X \to Z, Z \to X, Y \to XZ, W \to XZ\}\) holds on \(R\). Give the lossless, dependency preserved decomposition into 3NF.

  1. \(\{YW\}\) is the candidate key
  2. \(F_c = \{X \to Z, Z \to X, Y \to Z, W \to X\}\) (not only one)
  3. On the basis of 3NF decomposition algorithm, the sub-schema responding to each functional dependency in \(F_c\) are \(R_1 = (XZ)\), \(R_2 = (YZ)\), \(R_3 = (WX)\)
  4. \(R_4 = (YW)\)

(2). Consider \(R\) about student and FD \(F\)

  • student(stNo, Name, Telno, stID, ClassNo, CNo, CName, Grade)
  • \(F =\)
    • stNo \(\to\) Name, Telno, stID, ClassNo
    • stID \(\to\) stNo
    • stNo, CNo \(\to\) Grade
    • CNo \(\to\) CName
  1. Candidate keys are {stNo, CNo}, {stID, CNo}
  2. \(F_c = F\)
  3. \(R_1\)(stNo, Name, Telno, stID, ClassNo), \(R_2\)(stID, CNo, Grade), \(R_3\)(CNo, CName)
  4. no operations

例题

(1). Consider the following relation schema \(R\) and the functional dependency \(F\) holds on \(R = (X, Y, Z, W, Q)\), \(F = \{XY \to ZQ, Q \to XY, Z \to W, Q \to Z\}\).

  1. Compute \((XZ)^{+}\).

    \(XWZ\)

  2. Is the decomposition \(\rho = \{R_1(X, Y, Z), R_2(Z, W, Q)\}\) on \(R\) a lossless join? Why?

    No. Because \(R_1 \cap R_2\) can not infer both \(R_1\) and \(R_2\).

  3. What is the highest normal form of \(R\)? Why?

    Of course \(R\) satisfies 1NF. The candidate keys are \(\{Q\}\), \(\{X, Y\}\), so the primary attributes are \(Q\), \(X\), \(Y\). \(Z\) and \(W\) can be inferred by candidate key. And \(Z \to W\), \(Z\) is not a super key, this FD is not trivial and \(W\) is not in a candidate key. So the highest level is 2NF.

  4. Compute the canonical cover \(F_c\).

    \(F_{c} = \{Q \to XY, Z \to W, XY \to ZQ\}\)

  5. Give a 3NF decomposition.

    \(R_1 = (QXYZ)\), \(R_2 = (ZW)\)

(2). The functional dependency set \(F = \{A \to C, C \to A, B \to A, D \to AC, B \to E\}\) holds on the relation schema \(R = (A, B, C, D, E)\).

  1. Compute \((AB)^{+}\)

    \(ABCE\)

  2. List all the candidate keys of \(R\).

    \(\{D, B\}\)

  3. What is the highest normal form of \(R\), and why?

    Not BCNF. Is 1NF. Not 2NF. Not 3NF.

  4. Compute the canonical cover \(F_c\).

    \(F_{c} = \{A \to C, C \to A, B \to AE, D \to C\}\)

  5. Give a 3NF decomposition.

    \(R_1 = (AC)\), \(R_2 = (ABE)\), \(R_3 = (CD)\), \(R_4 = (BD)\).


Chapter 7: Relational Database Design: Schema Normalization
https://ddccffq.github.io/2025/12/20/数据库系统原理/Relational_Database_Design_Schema_Normalization/
作者
ddccffq
发布于
2025年12月20日
许可协议