Chapter 16: Query Optimization
Based on 12 rules to optimize relational algebra expression
Overview
A query statement corresponds to several equivalent expressions. Different costs for equivalent evaluation plans. What query optimization does is to choose an efficient evaluation plan with lower costs.
Query optimization procedure:
- Generating the equivalent query trees/expressions, by transforming of relational algebra expressions using equivalence rules
- Generating the alternative evaluation plans, by annotating the resultant equivalent expressions with algorithms for each operations
- Choosing the optimal or near-optimal plan based on the estimated cost, by heuristic optimization or cost-based optimization
Transformation of Relational Expressions
Def. Equivalent expression means two relational algebra expressions are equivalent, if two expressions generate the same set of tuples on all legal database instances.
*Equivalence Rules
Conjunctive selection operations can be deconstructed into a sequence of individual selections
\[ \sigma_{\theta_1 \land \theta_2} (E) = \sigma_{\theta_1} (\sigma_{\theta_2} (E)) \]
Selection operations are commutative
\[ \sigma_{\theta_1}(\sigma_{\theta_2}(E)) = \sigma_{\theta_2}(\sigma_{\theta_1}(E)) \]
Only the final operation in a sequence of project operations is needed
\[ \Pi_{L_1} (\Pi_{L2} (\cdots (\Pi_{L_n} (E)))) = \Pi_{L_1}(E) \]
Required that \(L_1 \subseteq L_2 \subseteq \cdots \subseteq L_n\)
e.g. \(\Pi_{\text{ID}} (\Pi_{\text{ID}. \text{name}} (\text{instructor})) = \Pi_{\text{ID}} (\text{instructor})\)
Selections can be combined with Cartesian products and theta joins
\[ \sigma_{\theta_1} (E_1 \Join_{\theta_2} E_2) = E_1 \Join_{\theta_1 \land \theta_2} E_2 \]
Costs of the right hand are less than that of the left hand.
- Definition of \(\theta\) join: \(\sigma_{\theta}(E_1 \times E_2) = E_1 \Join_{\theta} E_2\)
e.g. \(\sigma_{\text{dept\_name} = \text{'Music'}} (\text{instructor} \Join_{\text{instructor.ID} = \text{teaches.ID}} \text{teaches}) = \text{instructor} \Join_{\text{instructor.ID} = \text{teaches.ID} \land \text{dept\_name} = \text{'Music'}} \text{teaches}\)
Theta join operations are commutative
\[ E_1 \Join_{\theta} E_2 = E_2 \Join_{\theta} E_1 \]
The expression with smaller size should be arranged as the left one in join operation
Associative
- Natural join operations are associative: \((E_1 \Join E_2) \Join E_3 = E_1 \Join (E_2 \Join E_3)\)
- Theta joins are associative: \((E_1 \Join_{\theta_1} E_2) \Join_{\theta_2 \land \theta_3} E_3 = E_1 \Join_{\theta_1 \land \theta_3} (E_2 \Join_{\theta_2} E_3)\)
- where \(\theta_2\) involves attributes from only \(E_2\) and \(E_3\)
Selection operation distributes over theta join
- Attributes in \(\theta_0\) involve only the attributes of one of the expressions, e.g. \(E_1\), being joined \(\sigma_{\theta_0} (E_1 \Join_{\theta} E_2) = (\sigma_{\theta_0} (E_1)) \Join_{\theta} E_2\)
- Attributes in \(\theta_1\) involves only attributes of \(E_1\), attributes in \(\theta_2\) involves only attributes of \(E_2\), \(\sigma_{\theta_1 \land \theta_2} (E_1 \Join_{\theta} E_2) = (\sigma_{\theta_1}(E_1) \Join_{\theta} (\sigma_{\theta_2}(E_2)))\)
- Note: When 7 is applied, the size of relations participating in theta join operation is reduced, thus evaluation costs are reduced
Projection operation distributes over theta join
- \(L_1\) and \(L_2\) are the attributes of \(E_1\) and \(E_2\), respectively, if join condition \(\theta\) involves only attributes in \(L_1 \cup L_2\), then
\[ \Pi_{L_1 \cup L_2} (E_1 \Join_{\theta} E_2) = (\Pi_{L_1}(E_1)) \Join_{\theta} (\Pi_{L_2}(E_2)) \]
- Consider a join \(E_1 \Join_{\theta} E_2\)
- Let \(L_1\) and \(L_2\) be sets of attributes from \(E_1\) and \(E_2\), respectively
- Let \(L_3\) be attributes of \(E_1\) that involved in join condition \(\theta\), but not in \(L_1 \cup L_2\)
- Let \(L_4\) be attributes of \(E_2\) that involved in join condition \(\theta\), but not in \(L_1 \cup L_2\)
\[ \Pi_{L_1 \cup L_2} (E_1 \Join_{\theta} E_2) = \Pi_{L_1 \cup L_2} ((\Pi_{L_1 \cup L_3}(E_1)) \Join_{\theta} (\Pi_{L_2 \cup L_4} (E_2))) \]
The set operations union and intersection are commutative
Set union and intersection are associative
The selection operation distributes over \(\cup\), \(\cap\) and \(-\). \(\sigma_{\theta}(E_1 - E_2) = \sigma_{\theta}(E_1) - \sigma_{\theta}(E_2)\)
The projection operation distributes over union. \(\Pi_{L}(E_1 \cup E_2) = (\Pi_{L} (E_1)) \cup (\Pi_{L} (E_2))\)
Here are some examples
(1). Find the names of all instructors in the department Music, along with titles of the courses that they teach. The algebra expression is:
\[ \Pi_{\text{name}, \text{title}} (\sigma_{\text{dept\_name} = \text{'Music'}} (\text{instructor} \Join (\text{teaches} \Join \Pi_{\text{course\_id}, \text{title}}(\text{course})))) \]
Transformation using rule 7.
\[ \Pi_{\text{name}, \text{title}} ((\sigma_{\text{dept\_name} = \text{'Music'}}(\text{instructor})) \Join (\text{teaches} \Join \Pi_{\text{course\_id}, \text{title}}(\text{course}))) \]
(2). Find names of all instructors in Music department who taught in 2019, along with titles of the course that they taught. The algebra expression is:
\[ \Pi_{\text{name}, \text{title}} (\sigma_{\text{dept\_name} = \text{'Music'} \land (\text{year} = 2019)} (\text{instructor} \Join (\text{teaches} \Join \Pi_{\text{course\_id}, \text{title}} (\text{course})))) \]
Transformation using rule 7.
\[ \Pi_{\text{name}, \text{title}} ((\sigma_{\text{dept\_name} = \text{'Music'}} (\text{instructor})) \Join (\sigma_{\text{year} = 2019} (\text{teaches})) \Join (\Pi_{\text{course\_id}, \text{title}} (\text{course}))) \]
(3). Why projection as early as possible reduces size of the relation to be joined?
\[ \Pi_{\text{name}, \text{title}} ((\sigma_{\text{dept\_name} = \text{'Music'}} (\text{instructor}) \Join \text{teaches}) \Join \Pi_{\text{course\_id}, \text{title}}(\text{course})) \]
Computing \((\sigma_{\text{dept\_name} = \text{'Music'}} (\text{instructor}) \Join \text{teaches})\) resulting in a schema with unneeded attributes (what we need may be course_id and name). We add a projection to eliminate unneeded attributes.
\[ \Pi_{\text{course\_id}, \text{name}}(\sigma_{\text{dept\_name} = \text{'Music'}} (\text{instructor}) \Join \text{teaches}) \]
Join Ordering
For all relations \(r_1\), \(r_2\) and \(r_3\)
\[ (r_1 \Join r_2) \Join r_3 = r_1 \Join (r_2 \Join r_3) \]
If \(r_2 \Join r_3\) is quite large bu \(r_1 \Join r_2\) is small, choose
\[ (r_1 \Join r_2) \Join r_3 \]
For example , consider the expression
\[ \Pi_{\text{name}, \text{title}} (\sigma_{\text{dept\_name} = \text{Music}} (\text{instructor}) \Join \text{teaches} \Join \Pi_{\text{course\_id}, \text{title}} (\text{course})) \]
Compute \((\text{teaches} \Join \Pi_{\text{course\_id}, \text{title}} (\text{course}))\) first, and join result with \(\sigma_{\text{dept\_name} = \text{Music}} (\text{instructor})\) as the result of the first join is likely to be a large relation
If only a small fraction of instructors are likely to be, it is better to first compute \(\sigma_{\text{dept\_name} = \text{Music}} (\text{instructor}) \Join \text{teaches}\)
Choice of Evaluation Plans: Query Optimization
Generate an evaluation plan for a relational algebra expression with the lowest or lower cost
- Generating the optimized query trees
- Selecting
- Implementation algorithms for each operations in the tree
- Pipeline or materialization strategy
Heuristic Optimization
Def. Heuristic optimization
- Heuristic (or heuristic rules) to choose a better plan
- Suboptimal plan with lower costs can be obtained
Principles: transforms query tree by heuristics to reduce cost
- Perform selection early to reduce the number of tuples
- Perform projection early to reduce the number of attributes
- Perform join to substitute Cartesian product and selection
- Perform restrictive selection and join operations before other operations
- The most restrictive operations generate relations with the smallest size
Steps in Heuristic Optimization
- 将 conjunctive selection 分解为多个单独的选择操作以使单个选择操作尽可能沿查询树下移
- 根据选择操作的交换律和分配律,将查询树每个选择操作尽可能移向叶节点,以便尽早执行选择操作
- 根据连接操作的结合律和交换律,重新安排查询树中的叶节点,使得具有 restrictive selection 特征的叶节点先执行
- 利用以连接操作代替相邻的选择和笛卡尔乘积操作
- 将查询树上投影操作尽可能下移,早执行投影操作,减少中间计算结果
Here an example
(1). Given
- S(sID, sname, age, sex)
- C(cID, cname, teacher)
- SC(sID, cID, grade)
Find students who are male and get grades more than \(90\), when they learn some course. Firstly, we get SQL statement.
1 | |
Then, get algebra expression of statement above.
\[ \Pi_{\text{sname}} (\sigma_{(\text{S.sID = SC.sID}) \land (\text{sex = male}) \land (\text{grade} > 90)} (\text{S} \times \text{SC})) \]
Transformation using rule 4.
\[ \Pi_{\text{sname}} (\sigma_{(\text{sex = male}) \land (\text{grade} > 90)} (\text{S} \Join_{(\text{S.sID = SC.sID})} \text{SC})) \]
Transformation using rule 7.
\[ \Pi_{\text{sname}} ((\sigma_{(\text{sex = male})} (\text{S})) \Join_{(\text{S.sID = SC.sID})} (\sigma_{\text{grade} > 90} (\text{SC}))) \]
Transformation using rule 8.
\[ \Pi_{\text{sname}} (\Pi_{\text{sname, sID}} (\sigma_{(\text{sex = male})} (\text{S})) \Join_{(\text{S.sID = SC.sID})} \Pi_{\text{sID}} (\sigma_{\text{grade} > 90} (\text{SC}))) \]