Chapter 15: Query Processing
Def. Query processing refers to activities (DBMS) extracting data from DB, including
- translation of queries in DB language into expressions that can be used at the physical level of the system
- SQL statement into a initial relational algebra expression.
- query-optimizing transformations. (see chapter 16)
- actual evaluation of queries.
Overview
- Parsing and translation
- Translate query into parser-tree
- Parser checks syntax, verifies the correctness of the relations.
- Parser replace the view with relations on which built.
- The parser-tree is then translated into relational algebra.
- Translate query into parser-tree
- Query optimization
- Choose the lowest cost plan among equivalent query evaluation plans.
- The cost is estimated using statistical information in data dictionary.
- e.g. the number of tuples, size of tuples, etc.
- Evaluation plan execution: the query-execution engine
- Takes an optimized query-evaluation plan.
- Executes the optimized plan, and returns answers to the query.
flowchart TD
A[query] --> |parser and translator| B[relational algebra expression]
B --> |optimizer| C[execution plan]
C --> |evaluation engine| D[query output]
Def. The evaluation-plan is an annotated expression specifying the detailed evaluation strategy, such as index, evaluation algorithms, etc.
A relation algebra expression has equivalent expressions.
- \(\sigma_{\text{salary} < 75000} (\Pi_{\text{salary}(\text{instructor})})\) is equivalent to \(\Pi_{\text{salary}}(\sigma_{\text{salary} < 75000}(\text{instructor}))\).
Each relational algebra operation can be evaluated with algorithms.
- e.g. the select operation can be evaluated using \(A_1\), \(A_2\), \(\cdots\), algorithms.
- e.g. nested-loop join, merge join, hash join operation.
Measures of Query Costs
Def. Cost is measured as the total elapsed time for answering query.
- Factors contribute to time cost.
- disk access, network communication.
Def. Disk access is the predominant cost, measured by
number of seeks \(\times\) average-seek-cost
- 数据定位时间,需要访问的 records 在 DB file 中的位置。
numebr of blocks read \(\times\) average-block-read-cost
number of blocks written \(\times\) average-block-write-cost
- cost to write is greater than cost to read.
- data is read back after being written to be ensured successful.
假设一次业务处理(如SQL 查询)总开销是100%,其中
- 7% 不到,真正处理业务逻辑
- 34%,用于缓冲区管理如缓冲区的加载替换、地址转化等
- 14%,处理对于内存数据结构互斥访问加锁 Latching
- 16%,处理事务并发访问时,事务处理加锁 Locking
- 12%,处理日志 Logging
- 16%,用于对 B 树索引的处理
We suppose that
- number of blocks transfered from disk \(t_T\).
- \(t_{T}\) is time to transfer one block (transfer speed in disk).
- number of seeks \(t_{S}\)
- \(t_{S}\) is time for one seek including 寻道时间和旋转延迟
Time cost for \(b\) blocks transfer plus \(s\) seeks is \(b \times t_{T} + s \times t_{S}\).
Algorithms can reduce disk I/O by using extrs buffer space
- Amount of real memory available to buffer depends on concurrent queries and processes, known during execution.
- using the worst case estimates.
- assuming the minimum amount of memory needed for the operation.
- Required data may be buffer resident already, avoiding disk I/O, hard to take into account for cost estimation.
Selection Operation
Selection operation \(\sigma\) on relation \(r\) by file scan.
- Locating and scanning the file in which \(r\) is stored to retrieving the records satisfying the selection conditions.
File Scan Algorithms.
- Linear search/scan - \(A_1\).
- Selections using indices - \(A_2\), \(A_3\), \(A_4\).
- Selections involving comparisons - \(A_5\), \(A_6\).
- Complex selections - \(A_7\), \(A_8\), \(A_9\), \(A_{10}\)
- \(A_1\) (linear search/scan)
- Scan each block and test all records to see whether satisfy selection condition.
- Cost = \(b_r\) block transfers \(+ 1\) seek = \(b_r \times t_{T} + t_{S}\)
- If selection is on the key attribute, it can stop on finding record equality, average case:
- Cost = \((b_{r} / 2)\) block transfers \(+ 1\) seek = \((b_r / 2) \times t_{T} + t_{S}\) > Linear search can be applied regardless of selection condition, ordering of records, availability of indices, suitable for seeking on a relational table organized as the heap file, or seeking on a non-index attribute (non-search-key attribute) in a relational table organized as the sequential file, e.g. B/B+-tree file.
- \(A_2\) (clustering/primary index, B+-tree index, equality on key )
- Index scan/seek - search algorithms with an index.
- Selection condition on search-key of index.
- With key-attributes or non-key-attributes.
- Retrieve a single record, using B+-tree as the clustering/primary index.
- Cost \(= (h_i + 1) \times (t_{T} + t_{S})\)
- \(h_{i}\): height of the tree.
- \(t_{T}\): time to transfer one block.
- \(t_{S}\): time for one seek.
- Index scan/seek - search algorithms with an index.
- \(A_3\) (primary/clustering index, B+-tree index, equality on non-key)
- Retrieve mutiple records that satisfy select conditions.
- e.g. Seek on non-key attribute branch_name in account and branch_name = ‘Perryride’.
- Records are located on consecutive blocks.
- Cost \(= h_i \times (t_{T} + t_{S}) + t_{S} + t_{T} \times b\).
- $b = $ number of blocks containing matching records.
- Retrieve mutiple records that satisfy select conditions.
- \(A_4\) (secondary index, B+-tree index, equality on key)
- If equality condition is on key, only a single record is retrieved.
- Cost \(= (h_i + 1) \times (t_{T} + t_{S})\).
- If equality condition is on non-key, mutiple records (\(n\) records) are retrieved.
- Each of \(n\) matching records may be resident on a different block, which may result in on I/O operation per retrieved record.
- Cost \(= (h_i + n) \times (t_T + t_S)\).
- If equality condition is on key, only a single record is retrieved.
- \(A_5\) (clustering/primary index, B+tree index, comparison). (sorted by A)
- For \(\sigma_{A \geq V} (r)\) use index to find the first tuple \(\geq v\) and scan relation sequentially.
- For \(\sigma_{A \leq V} (r)\) just scan relation sequentially till the first tuple \(> v\); do not use index.
- Cost \(= h_i \times (t_{T} + t_{S}) + t_{S} + t_{T} \times b\).
- \(b\): number of blocks containing matching records.
- \(A_6\) (secondary index, B+tree index, comparison)
- For \(\sigma_{A \geq V} (r)\) use index to find index entry \(\geq v\) and scan index sequentially, to find pointers to records.
- For \(\sigma_{A \leq V} (r)\) scan leaf page of index finding pointers to records, till first entry \(> v\).
- Cost \(= (h_i + n) \times (t_{T} + t_{S})\)
- In either case, retrieve records that are pointed.
- Requires an I/O for each record.
- Linear file scan may be cheaper.
- \(A_7\) (conjunctive selection using one index)
- Select a combination of \(\theta_i\) and algorithms \(A_1\) through \(A_6\) taht results in the least cost for \(\sigma_{\theta i} (r)\).
- Test other conditions on tuple after fetching it into memory buffer.
- \(A_8\) (conjunctive selection using composite index)
- Use appropriate composite (mutiple-key) index if available.
- \(A_9\) (conjunctive selection by intersection of identifiers)
- Requires indices with record pointers.
- Use corresponding index for each condition, take intersection of all the obtained sets of record pointers.
- If some conditions do not have appropriate indices, apply test in memory.
- Step 1. 利用定义在每个查询条件 \(\theta_{i}\) 的查询属性上的索引,查找满足 \(\theta_{i}\) 的元组集合
- Step 2. 取各个元组集合的交集
- \(A_{10}\) (disjunctive selection by union of identifiers)
- Applicable if conditions have available indices, otherwise use linear scan.
- Use corresponding index for each condition
- Take union of all the obtained sets of record pointers.
- Fetch records from file.
- Negation: \(\sigma_{\lnot \theta} (r)\)
- Use linear scan on file.
- If very few records satisfy \(\not \theta\), and an index is applicable to \(\theta\) (find satisfying records using index and fetch from file).
Sorting
Let \(M\) denote memory size.
- Create sorted runs
- Let \(i\) be \(0\) initially. Repeatedly do the following till the end of the relation:
- Read \(M\) blocks of relation into memory.
- Sorted the in-memory blocks.
- Write sorted data to run \(R_{i}\);
- Increment \(i\). Let the final value of \(i\) be \(N\).
- Let \(i\) be \(0\) initially. Repeatedly do the following till the end of the relation:
- Merge the runs.
Evaluation of Expressions
Evaluate an expression containing multiple evaluation primitives
- Key: how to process intermediate computing results.
Materialization and pipeline are used for expression evaluation.
- Materialization: generate results of an expression whose inputs are relations or are already computed, materialize (store) it on disk. Repeat.
- Pipelining: pass on tuples to parent operations even as an operation is being executed.
Materialization
Principles:
- Start from the lowest-level (i.e. at the bottom of the tree) evaluate one operation at a time.
- The results of each evaluation (i.e. intermediate computing results) are stored in temporal relations on the disk for subsequent evaluation. (serial evaluating)
Cost of writing results to disk and reading them back can be quite high.
- Overall cost \(=\) sum of costs of individual operations \(+\) cost of writing intermediate results to disk.
Pipelined
Principles of pipelined evaluation: evaluate several operations simultaneously in a pipeline, with the results of one operation passed to the next, without need to store temporary relations in disk. (parallel evaluating)
Much cheaper than materialization: no need to store a temporary relation to disk. Instead, pass tuples directly to the join; similarly, don’t store result of join, pass tuples directly to projection.