Chapter 16: Query Optimization

Based on 12 rules to optimize relational algebra expression

Overview

A query statement corresponds to several equivalent expressions. Different costs for equivalent evaluation plans. What query optimization does is to choose an efficient evaluation plan with lower costs.

Query optimization procedure:

Generating the equivalent query trees/expressions, by transforming of relational algebra expressions using equivalence rules
Generating the alternative evaluation plans, by annotating the resultant equivalent expressions with algorithms for each operations
Choosing the optimal or near-optimal plan based on the estimated cost, by heuristic optimization or cost-based optimization

Transformation of Relational Expressions

Def. Equivalent expression means two relational algebra expressions are equivalent, if two expressions generate the same set of tuples on all legal database instances.

*Equivalence Rules

Conjunctive selection operations can be deconstructed into a sequence of individual selections

\[ \sigma_{\theta_1 \land \theta_2} (E) = \sigma_{\theta_1} (\sigma_{\theta_2} (E)) \]
Selection operations are commutative

\[ \sigma_{\theta_1}(\sigma_{\theta_2}(E)) = \sigma_{\theta_2}(\sigma_{\theta_1}(E)) \]
Only the final operation in a sequence of project operations is needed

\[ \Pi_{L_1} (\Pi_{L2} (\cdots (\Pi_{L_n} (E)))) = \Pi_{L_1}(E) \]

Required that \(L_1 \subseteq L_2 \subseteq \cdots \subseteq L_n\)

e.g. \(\Pi_{\text{ID}} (\Pi_{\text{ID}. \text{name}} (\text{instructor})) = \Pi_{\text{ID}} (\text{instructor})\)
Selections can be combined with Cartesian products and theta joins

\[ \sigma_{\theta_1} (E_1 \Join_{\theta_2} E_2) = E_1 \Join_{\theta_1 \land \theta_2} E_2 \]

Costs of the right hand are less than that of the left hand.
- Definition of \(\theta\) join: \(\sigma_{\theta}(E_1 \times E_2) = E_1 \Join_{\theta} E_2\)
e.g. \(\sigma_{\text{dept\_name} = \text{'Music'}} (\text{instructor} \Join_{\text{instructor.ID} = \text{teaches.ID}} \text{teaches}) = \text{instructor} \Join_{\text{instructor.ID} = \text{teaches.ID} \land \text{dept\_name} = \text{'Music'}} \text{teaches}\)
Theta join operations are commutative

\[ E_1 \Join_{\theta} E_2 = E_2 \Join_{\theta} E_1 \]

The expression with smaller size should be arranged as the left one in join operation
Associative
- Natural join operations are associative: \((E_1 \Join E_2) \Join E_3 = E_1 \Join (E_2 \Join E_3)\)
- Theta joins are associative: \((E_1 \Join_{\theta_1} E_2) \Join_{\theta_2 \land \theta_3} E_3 = E_1 \Join_{\theta_1 \land \theta_3} (E_2 \Join_{\theta_2} E_3)\)
  - where \(\theta_2\) involves attributes from only \(E_2\) and \(E_3\)
Selection operation distributes over theta join
- Attributes in \(\theta_0\) involve only the attributes of one of the expressions, e.g. \(E_1\), being joined \(\sigma_{\theta_0} (E_1 \Join_{\theta} E_2) = (\sigma_{\theta_0} (E_1)) \Join_{\theta} E_2\)
- Attributes in \(\theta_1\) involves only attributes of \(E_1\), attributes in \(\theta_2\) involves only attributes of \(E_2\), \(\sigma_{\theta_1 \land \theta_2} (E_1 \Join_{\theta} E_2) = (\sigma_{\theta_1}(E_1) \Join_{\theta} (\sigma_{\theta_2}(E_2)))\)
- Note: When 7 is applied, the size of relations participating in theta join operation is reduced, thus evaluation costs are reduced
Projection operation distributes over theta join
- \(L_1\) and \(L_2\) are the attributes of \(E_1\) and \(E_2\), respectively, if join condition \(\theta\) involves only attributes in \(L_1 \cup L_2\), then
\[ \Pi_{L_1 \cup L_2} (E_1 \Join_{\theta} E_2) = (\Pi_{L_1}(E_1)) \Join_{\theta} (\Pi_{L_2}(E_2)) \]
- Consider a join \(E_1 \Join_{\theta} E_2\)
  - Let \(L_1\) and \(L_2\) be sets of attributes from \(E_1\) and \(E_2\), respectively
  - Let \(L_3\) be attributes of \(E_1\) that involved in join condition \(\theta\), but not in \(L_1 \cup L_2\)
  - Let \(L_4\) be attributes of \(E_2\) that involved in join condition \(\theta\), but not in \(L_1 \cup L_2\)
\[ \Pi_{L_1 \cup L_2} (E_1 \Join_{\theta} E_2) = \Pi_{L_1 \cup L_2} ((\Pi_{L_1 \cup L_3}(E_1)) \Join_{\theta} (\Pi_{L_2 \cup L_4} (E_2))) \]
The set operations union and intersection are commutative
Set union and intersection are associative
The selection operation distributes over \(\cup\), \(\cap\) and \(-\). \(\sigma_{\theta}(E_1 - E_2) = \sigma_{\theta}(E_1) - \sigma_{\theta}(E_2)\)
The projection operation distributes over union. \(\Pi_{L}(E_1 \cup E_2) = (\Pi_{L} (E_1)) \cup (\Pi_{L} (E_2))\)

Here are some examples

(1). Find the names of all instructors in the department Music, along with titles of the courses that they teach. The algebra expression is:

\[ \Pi_{\text{name}, \text{title}} (\sigma_{\text{dept\_name} = \text{'Music'}} (\text{instructor} \Join (\text{teaches} \Join \Pi_{\text{course\_id}, \text{title}}(\text{course})))) \]

Transformation using rule 7.

\[ \Pi_{\text{name}, \text{title}} ((\sigma_{\text{dept\_name} = \text{'Music'}}(\text{instructor})) \Join (\text{teaches} \Join \Pi_{\text{course\_id}, \text{title}}(\text{course}))) \]

(2). Find names of all instructors in Music department who taught in 2019, along with titles of the course that they taught. The algebra expression is:

\[ \Pi_{\text{name}, \text{title}} (\sigma_{\text{dept\_name} = \text{'Music'} \land (\text{year} = 2019)} (\text{instructor} \Join (\text{teaches} \Join \Pi_{\text{course\_id}, \text{title}} (\text{course})))) \]

Transformation using rule 7.

\[ \Pi_{\text{name}, \text{title}} ((\sigma_{\text{dept\_name} = \text{'Music'}} (\text{instructor})) \Join (\sigma_{\text{year} = 2019} (\text{teaches})) \Join (\Pi_{\text{course\_id}, \text{title}} (\text{course}))) \]

(3). Why projection as early as possible reduces size of the relation to be joined?

\[ \Pi_{\text{name}, \text{title}} ((\sigma_{\text{dept\_name} = \text{'Music'}} (\text{instructor}) \Join \text{teaches}) \Join \Pi_{\text{course\_id}, \text{title}}(\text{course})) \]

Computing \((\sigma_{\text{dept\_name} = \text{'Music'}} (\text{instructor}) \Join \text{teaches})\) resulting in a schema with unneeded attributes (what we need may be course_id and name). We add a projection to eliminate unneeded attributes.

\[ \Pi_{\text{course\_id}, \text{name}}(\sigma_{\text{dept\_name} = \text{'Music'}} (\text{instructor}) \Join \text{teaches}) \]

Join Ordering

For all relations \(r_1\), \(r_2\) and \(r_3\)

\[ (r_1 \Join r_2) \Join r_3 = r_1 \Join (r_2 \Join r_3) \]

If \(r_2 \Join r_3\) is quite large bu \(r_1 \Join r_2\) is small, choose

\[ (r_1 \Join r_2) \Join r_3 \]

For example , consider the expression

\[ \Pi_{\text{name}, \text{title}} (\sigma_{\text{dept\_name} = \text{Music}} (\text{instructor}) \Join \text{teaches} \Join \Pi_{\text{course\_id}, \text{title}} (\text{course})) \]

Compute \((\text{teaches} \Join \Pi_{\text{course\_id}, \text{title}} (\text{course}))\) first, and join result with \(\sigma_{\text{dept\_name} = \text{Music}} (\text{instructor})\) as the result of the first join is likely to be a large relation
If only a small fraction of instructors are likely to be, it is better to first compute \(\sigma_{\text{dept\_name} = \text{Music}} (\text{instructor}) \Join \text{teaches}\)

Choice of Evaluation Plans: Query Optimization

Generate an evaluation plan for a relational algebra expression with the lowest or lower cost

Generating the optimized query trees
Selecting
- Implementation algorithms for each operations in the tree
- Pipeline or materialization strategy

Heuristic Optimization

Def. Heuristic optimization

Heuristic (or heuristic rules) to choose a better plan
Suboptimal plan with lower costs can be obtained

Principles: transforms query tree by heuristics to reduce cost

Perform selection early to reduce the number of tuples
Perform projection early to reduce the number of attributes
Perform join to substitute Cartesian product and selection
Perform restrictive selection and join operations before other operations
- The most restrictive operations generate relations with the smallest size

Steps in Heuristic Optimization

将 conjunctive selection 分解为多个单独的选择操作以使单个选择操作尽可能沿查询树下移
根据选择操作的交换律和分配律，将查询树每个选择操作尽可能移向叶节点，以便尽早执行选择操作
根据连接操作的结合律和交换律，重新安排查询树中的叶节点，使得具有 restrictive selection 特征的叶节点先执行
利用以连接操作代替相邻的选择和笛卡尔乘积操作
将查询树上投影操作尽可能下移，早执行投影操作，减少中间计算结果

Here an example

(1). Given

S(sID, sname, age, sex)
C(cID, cname, teacher)
SC(sID, cID, grade)

Find students who are male and get grades more than \(90\), when they learn some course. Firstly, we get SQL statement.

1
2
3

SELECT DISTINCT sname
FROM S, SC
WHERE S.S# = SC.S# AND sex = male AND grade > 90;

Then, get algebra expression of statement above.

\[ \Pi_{\text{sname}} (\sigma_{(\text{S.sID = SC.sID}) \land (\text{sex = male}) \land (\text{grade} > 90)} (\text{S} \times \text{SC})) \]

Transformation using rule 4.

\[ \Pi_{\text{sname}} (\sigma_{(\text{sex = male}) \land (\text{grade} > 90)} (\text{S} \Join_{(\text{S.sID = SC.sID})} \text{SC})) \]

Transformation using rule 7.

\[ \Pi_{\text{sname}} ((\sigma_{(\text{sex = male})} (\text{S})) \Join_{(\text{S.sID = SC.sID})} (\sigma_{\text{grade} > 90} (\text{SC}))) \]

Transformation using rule 8.

\[ \Pi_{\text{sname}} (\Pi_{\text{sname, sID}} (\sigma_{(\text{sex = male})} (\text{S})) \Join_{(\text{S.sID = SC.sID})} \Pi_{\text{sID}} (\sigma_{\text{grade} > 90} (\text{SC}))) \]

数据库系统原理

Chapter 16: Query Optimization

https://ddccffq.github.io/2025/12/21/数据库系统原理/Query_Optimization/

作者

ddccffq

发布于

2025年12月21日

许可协议

Chapter 17: Transactions 上一篇

Chapter 6: Database Design Using the ER Model 下一篇