Chapter 16: Query Optimization

Based on 12 rules to optimize relational algebra expression

Overview

A query statement corresponds to several equivalent expressions. Different costs for equivalent evaluation plans. What query optimization does is to choose an efficient evaluation plan with lower costs.

Query optimization procedure:

  1. Generating the equivalent query trees/expressions, by transforming of relational algebra expressions using equivalence rules
  2. Generating the alternative evaluation plans, by annotating the resultant equivalent expressions with algorithms for each operations
  3. Choosing the optimal or near-optimal plan based on the estimated cost, by heuristic optimization or cost-based optimization

Transformation of Relational Expressions

Def. Equivalent expression means two relational algebra expressions are equivalent, if two expressions generate the same set of tuples on all legal database instances.

*Equivalence Rules

  1. Conjunctive selection operations can be deconstructed into a sequence of individual selections

    \[ \sigma_{\theta_1 \land \theta_2} (E) = \sigma_{\theta_1} (\sigma_{\theta_2} (E)) \]

  2. Selection operations are commutative

    \[ \sigma_{\theta_1}(\sigma_{\theta_2}(E)) = \sigma_{\theta_2}(\sigma_{\theta_1}(E)) \]

  3. Only the final operation in a sequence of project operations is needed

    \[ \Pi_{L_1} (\Pi_{L2} (\cdots (\Pi_{L_n} (E)))) = \Pi_{L_1}(E) \]

    Required that \(L_1 \subseteq L_2 \subseteq \cdots \subseteq L_n\)

    e.g. \(\Pi_{\text{ID}} (\Pi_{\text{ID}. \text{name}} (\text{instructor})) = \Pi_{\text{ID}} (\text{instructor})\)

  4. Selections can be combined with Cartesian products and theta joins

    \[ \sigma_{\theta_1} (E_1 \Join_{\theta_2} E_2) = E_1 \Join_{\theta_1 \land \theta_2} E_2 \]

    Costs of the right hand are less than that of the left hand.

    • Definition of \(\theta\) join: \(\sigma_{\theta}(E_1 \times E_2) = E_1 \Join_{\theta} E_2\)

    e.g. \(\sigma_{\text{dept\_name} = \text{'Music'}} (\text{instructor} \Join_{\text{instructor.ID} = \text{teaches.ID}} \text{teaches}) = \text{instructor} \Join_{\text{instructor.ID} = \text{teaches.ID} \land \text{dept\_name} = \text{'Music'}} \text{teaches}\)

  5. Theta join operations are commutative

    \[ E_1 \Join_{\theta} E_2 = E_2 \Join_{\theta} E_1 \]

    The expression with smaller size should be arranged as the left one in join operation

  6. Associative

    • Natural join operations are associative: \((E_1 \Join E_2) \Join E_3 = E_1 \Join (E_2 \Join E_3)\)
    • Theta joins are associative: \((E_1 \Join_{\theta_1} E_2) \Join_{\theta_2 \land \theta_3} E_3 = E_1 \Join_{\theta_1 \land \theta_3} (E_2 \Join_{\theta_2} E_3)\)
      • where \(\theta_2\) involves attributes from only \(E_2\) and \(E_3\)
  7. Selection operation distributes over theta join

    • Attributes in \(\theta_0\) involve only the attributes of one of the expressions, e.g. \(E_1\), being joined \(\sigma_{\theta_0} (E_1 \Join_{\theta} E_2) = (\sigma_{\theta_0} (E_1)) \Join_{\theta} E_2\)
    • Attributes in \(\theta_1\) involves only attributes of \(E_1\), attributes in \(\theta_2\) involves only attributes of \(E_2\), \(\sigma_{\theta_1 \land \theta_2} (E_1 \Join_{\theta} E_2) = (\sigma_{\theta_1}(E_1) \Join_{\theta} (\sigma_{\theta_2}(E_2)))\)
    • Note: When 7 is applied, the size of relations participating in theta join operation is reduced, thus evaluation costs are reduced
  8. Projection operation distributes over theta join

    • \(L_1\) and \(L_2\) are the attributes of \(E_1\) and \(E_2\), respectively, if join condition \(\theta\) involves only attributes in \(L_1 \cup L_2\), then

    \[ \Pi_{L_1 \cup L_2} (E_1 \Join_{\theta} E_2) = (\Pi_{L_1}(E_1)) \Join_{\theta} (\Pi_{L_2}(E_2)) \]

    • Consider a join \(E_1 \Join_{\theta} E_2\)
      • Let \(L_1\) and \(L_2\) be sets of attributes from \(E_1\) and \(E_2\), respectively
      • Let \(L_3\) be attributes of \(E_1\) that involved in join condition \(\theta\), but not in \(L_1 \cup L_2\)
      • Let \(L_4\) be attributes of \(E_2\) that involved in join condition \(\theta\), but not in \(L_1 \cup L_2\)

    \[ \Pi_{L_1 \cup L_2} (E_1 \Join_{\theta} E_2) = \Pi_{L_1 \cup L_2} ((\Pi_{L_1 \cup L_3}(E_1)) \Join_{\theta} (\Pi_{L_2 \cup L_4} (E_2))) \]

  9. The set operations union and intersection are commutative

  10. Set union and intersection are associative

  11. The selection operation distributes over \(\cup\), \(\cap\) and \(-\). \(\sigma_{\theta}(E_1 - E_2) = \sigma_{\theta}(E_1) - \sigma_{\theta}(E_2)\)

  12. The projection operation distributes over union. \(\Pi_{L}(E_1 \cup E_2) = (\Pi_{L} (E_1)) \cup (\Pi_{L} (E_2))\)

Here are some examples

(1). Find the names of all instructors in the department Music, along with titles of the courses that they teach. The algebra expression is:

\[ \Pi_{\text{name}, \text{title}} (\sigma_{\text{dept\_name} = \text{'Music'}} (\text{instructor} \Join (\text{teaches} \Join \Pi_{\text{course\_id}, \text{title}}(\text{course})))) \]

Transformation using rule 7.

\[ \Pi_{\text{name}, \text{title}} ((\sigma_{\text{dept\_name} = \text{'Music'}}(\text{instructor})) \Join (\text{teaches} \Join \Pi_{\text{course\_id}, \text{title}}(\text{course}))) \]

(2). Find names of all instructors in Music department who taught in 2019, along with titles of the course that they taught. The algebra expression is:

\[ \Pi_{\text{name}, \text{title}} (\sigma_{\text{dept\_name} = \text{'Music'} \land (\text{year} = 2019)} (\text{instructor} \Join (\text{teaches} \Join \Pi_{\text{course\_id}, \text{title}} (\text{course})))) \]

Transformation using rule 7.

\[ \Pi_{\text{name}, \text{title}} ((\sigma_{\text{dept\_name} = \text{'Music'}} (\text{instructor})) \Join (\sigma_{\text{year} = 2019} (\text{teaches})) \Join (\Pi_{\text{course\_id}, \text{title}} (\text{course}))) \]

(3). Why projection as early as possible reduces size of the relation to be joined?

\[ \Pi_{\text{name}, \text{title}} ((\sigma_{\text{dept\_name} = \text{'Music'}} (\text{instructor}) \Join \text{teaches}) \Join \Pi_{\text{course\_id}, \text{title}}(\text{course})) \]

Computing \((\sigma_{\text{dept\_name} = \text{'Music'}} (\text{instructor}) \Join \text{teaches})\) resulting in a schema with unneeded attributes (what we need may be course_id and name). We add a projection to eliminate unneeded attributes.

\[ \Pi_{\text{course\_id}, \text{name}}(\sigma_{\text{dept\_name} = \text{'Music'}} (\text{instructor}) \Join \text{teaches}) \]

Join Ordering

For all relations \(r_1\), \(r_2\) and \(r_3\)

\[ (r_1 \Join r_2) \Join r_3 = r_1 \Join (r_2 \Join r_3) \]

If \(r_2 \Join r_3\) is quite large bu \(r_1 \Join r_2\) is small, choose

\[ (r_1 \Join r_2) \Join r_3 \]

For example , consider the expression

\[ \Pi_{\text{name}, \text{title}} (\sigma_{\text{dept\_name} = \text{Music}} (\text{instructor}) \Join \text{teaches} \Join \Pi_{\text{course\_id}, \text{title}} (\text{course})) \]

  1. Compute \((\text{teaches} \Join \Pi_{\text{course\_id}, \text{title}} (\text{course}))\) first, and join result with \(\sigma_{\text{dept\_name} = \text{Music}} (\text{instructor})\) as the result of the first join is likely to be a large relation

  2. If only a small fraction of instructors are likely to be, it is better to first compute \(\sigma_{\text{dept\_name} = \text{Music}} (\text{instructor}) \Join \text{teaches}\)

Choice of Evaluation Plans: Query Optimization

Generate an evaluation plan for a relational algebra expression with the lowest or lower cost

  1. Generating the optimized query trees
  2. Selecting
    • Implementation algorithms for each operations in the tree
    • Pipeline or materialization strategy

Heuristic Optimization

Def. Heuristic optimization

  • Heuristic (or heuristic rules) to choose a better plan
  • Suboptimal plan with lower costs can be obtained

Principles: transforms query tree by heuristics to reduce cost

  • Perform selection early to reduce the number of tuples
  • Perform projection early to reduce the number of attributes
  • Perform join to substitute Cartesian product and selection
  • Perform restrictive selection and join operations before other operations
    • The most restrictive operations generate relations with the smallest size

Steps in Heuristic Optimization

  1. 将 conjunctive selection 分解为多个单独的选择操作以使单个选择操作尽可能沿查询树下移
  2. 根据选择操作的交换律和分配律,将查询树每个选择操作尽可能移向叶节点,以便尽早执行选择操作
  3. 根据连接操作的结合律和交换律,重新安排查询树中的叶节点,使得具有 restrictive selection 特征的叶节点先执行
  4. 利用以连接操作代替相邻的选择和笛卡尔乘积操作
  5. 将查询树上投影操作尽可能下移,早执行投影操作,减少中间计算结果

Here an example

(1). Given

  • S(sID, sname, age, sex)
  • C(cID, cname, teacher)
  • SC(sID, cID, grade)

Find students who are male and get grades more than \(90\), when they learn some course. Firstly, we get SQL statement.

1
2
3
SELECT DISTINCT sname
FROM S, SC
WHERE S.S# = SC.S# AND sex = male AND grade > 90;

Then, get algebra expression of statement above.

\[ \Pi_{\text{sname}} (\sigma_{(\text{S.sID = SC.sID}) \land (\text{sex = male}) \land (\text{grade} > 90)} (\text{S} \times \text{SC})) \]

Transformation using rule 4.

\[ \Pi_{\text{sname}} (\sigma_{(\text{sex = male}) \land (\text{grade} > 90)} (\text{S} \Join_{(\text{S.sID = SC.sID})} \text{SC})) \]

Transformation using rule 7.

\[ \Pi_{\text{sname}} ((\sigma_{(\text{sex = male})} (\text{S})) \Join_{(\text{S.sID = SC.sID})} (\sigma_{\text{grade} > 90} (\text{SC}))) \]

Transformation using rule 8.

\[ \Pi_{\text{sname}} (\Pi_{\text{sname, sID}} (\sigma_{(\text{sex = male})} (\text{S})) \Join_{(\text{S.sID = SC.sID})} \Pi_{\text{sID}} (\sigma_{\text{grade} > 90} (\text{SC}))) \]


Chapter 16: Query Optimization
https://ddccffq.github.io/2025/12/21/数据库系统原理/Query_Optimization/
作者
ddccffq
发布于
2025年12月21日
许可协议