Chapter 3: Introduction to SQL
SQL introduction.
Structured Query Language (SQL)
SQL Parts
- Data Manipulation Language (DML)
- The ability to query/retrieve information
- The ability to select, insert, delete, update tuples
- Data Definition Language (DDL)
- The ability to define, drop, modify on schemas, views and index
- Integrity: commands for specifying integrity constraints
- View definition: commands for defining views
- Transaction control
- Commands for specifying the beginning and ending of transactions (chapter 17)
- Embedded SQL and dynamic SQL
- Statements embedded within programming languages (chapter 5)
- Authorization
- Commands for specifying access rights to relations and views
SQL Data Definition
Def. DDL allows specification about relations:
- The schema for each relation
- The type of values associated with each attribute
- The integrity constraints
- The indices to be maintained for each relation
- The security and authorization information for each relation
- The physical storage structure of each relation on disk
Domain Types in SQL
char(n). Fixed length character string, user-specified length \(n\)varchar(n). Variable length character string, user-specified length \(n\)int. Integer (the integers that is machine-dependent)smallint. Small integer (machine-dependent subset of the integer)numeric(p, d). Fixed point number, user-specified precision of \(p\) digits, with \(d\) digits to the right of decimal pointdata. Dates, containing a (\(4\) digit) year, month and datetime. Time of day, in hours, minutes and secondstimestamp. Date plus time of day, e.g.2001-7-27 09:00:30.75Interval. Period of time- Subtracting a date/time/timestamp value from another gives an interval value
Null. Values allowed in all domain types. Declaring an attribute to be not null prohibits. null values for that attributecreate domain. Construct creates user-defined domain types
Basic Schema Definition: Create Table
1 | |
Updates to Schemas
- Drop table:
DROP TABLE table_name; - Alter:
ALTER TABLE table_name ADD attribute_name domain_type;- All existing tuples are assigned null as the value for the added attribute
ALTER TABLE table_name DROP attribute_name;- Not supported by many database
The tuples in table are stored row by row. Adding attributes or dropping the table leads to costly modifying the records and then moving these records.
Basic Structure of SQL Queries
A typical SQL query has the form:
1 | |
This query is equivalent to the relational algebra expression:
\[ \Pi_{A_1, A_2, \cdots, A_n} (\sigma_{P} (r_1 \times r_2 \times \cdots \times r_m)) \]
The Select Clause
The select clause lists the attributes. For example, find the names of all instructors:
1 | |
Note: names are case insensitive (Name, NAME, name).
SQL allows duplicates in relations. To force elimination of duplicates, insert the keyword distinct. For example, find the department names of all instructors, and remove duplicates:
1 | |
We also insert the keyword all to specify that duplicates should not be removed.
- An asterisk (
*) in the select clause denotes all attributes - An attribute can be a literal with no from clause, e.g.
SELECT '437';- Results is a table with one column and a single row with value 437
- Can give the column a name using:
SELECT '437' AS FOO;
- An attribute can be a literal with from clause, e.g.
SELECT 'A' FROM instructor;- Result is a table with one column and \(N\) rows, each row with value A.
- What about
SELECT DISTINCT 'A' FROM instructor;?
The select clause can contain arithmetic expressions involving operation \(+\), \(-\), \(*\), and \(/\). The query:
1 | |
would return a relation that is the same as instructor, except that the value of salary is divided by \(12\).
The Where Clause
The where clause specifies conditions that the result must satisfy
- Corresponds to selection predicate of relational algebra
- Allows the logical connectives in
WHEREpredicate - Between comparison operator
- Find names of all instructors with salary between \(90000\) and \(100000\)
SELECT name FROM instructor WHERE salary BETWEEN 90000 AND 100000;
- Tuple comparison operator
SELECT name, course_id FROM instructors, teaches WHERE (instructor.ID, dept_name) = (teaches.ID, 'Biology);
The From Clause
The from clause lists relations corresponding to Cartesian product operation of relational algebra. For example, fin Cartesian product instructor and teaches:
1 | |
For common attributes, like ID, attributes in resulting table are renamed using the relation name, like instructor.ID.
Cartesian product not very useful directly, but useful with where clause condition.
Natural Join
Natural Join in from subclause. For example, find names of all instructors in Art who have taught some course and course_id.
1 | |
What’s difference compared with:
1 | |
Two SQL are equivalent but the second SQL is more costly.
SQL queries run in this order: FROM + JOIN \(\to\) WHERE \(\to\) GROUP BY \(\to\) HAVING \(\to\) SELECT \(\to\) ORDER BY \(\to\) LIMIT.
The Rename Operation
Renaming relations and attributes using the as clause. For example, find names of all instructors who have a higher salary than instructor in Comp.Sci.
1 | |
String Operations
String matching operator for comparison on character strings. The operator like uses patterns two special characters:
- Percent (
%). The%character matches any substring. - Underscore (
_). The_character matches any character.
For example, find the names of all instructors whose name includes the substring “dar”.
1 | |
Or match the string “100%”, LIKE '100\%' ESCAPE '\';. We use backslash (\) as escape character.
Patterns are case sensitive. Examples:
Intro%matches any string beginning with “Intro”%Comp%matches any string containing “Comp” as a substring_ _ _matches any string of exactly three characters_ _ _%matches any string of at least three characters.
Ordering the Display of Tuples
- List in alphabetic order the names of all instructors.
ORDER BY name - Specify desc for descending order or asc for ascending order (ascending order is the default), e.g.
ORDER BY name DESC, dept_name DESC
Set Operation
Def. Set operations include union, intersect, and except (\(\cup\), \(\cap\), \(-\)) operate on relations.
- Automatically eliminates duplicates
- Union all, Intersect all and except all, to retain duplicates
Null Values
Null signifies an unknown value or a value does not exist. The result of any arithmetic expression involving null is null. The predicate is null can be used to check for null values. For example, find all instructors whose salary is null:
1 | |
SQL treats as unknown the result of any comparison involving a null value.
The predicate in a where clause can involve Boolean operations (and, or, not); thus Boolean operations need to be extended to deal with unknown.
True AND UNKNOWN = UNKNOWNFALSE AND UNKNOWN = FALSEUNKNOWN AND UNKNOWN = UNKNOWNUNKNOWN OR TRUE = TRUEUNKNOWN OR FALSE = UNKNOWNUNKNOWN OR UNKNOWN = UNKNOWN
Results of where clause predicate is treated as false if it evaluates to unknown.
Aggregate Functions
Aggregate functions operate on values of column, but return a value.
AVG: average valueMIN: minimum valueMAX: maximum valueSUM: sum of valuesCOUNT: number of values
Examples:
Find average salary of instructors in Computer Science department
1
2
3SELECT AVG(salary)
FROM instructor
WHERE dept_name = 'Comp.Sci';Find total number of instructors who teach a course in Spring 2018
1
2
3SELECT COUNT(DISTINCT ID)
FROM teaches
WHERE dept_name = 'Comp.Sci';Find the number of tuples in the course relation
1
2SELECT COUNT(*)
FROM course;
Group By
Find the average salary of instructors in each department:
1 | |
Attributes in select clause outside of aggregate functions must appear in group by list (dept_name).
All aggregate operations except
COUNT(*)ignore tuples with null values on the aggregated attributes.
Having Clause
Find the names and average salaries of all departments whose average salary is more than \(42000\).
1 | |
Note: predicates in having clause are applied after the formation of groups whereas predicates in where clause are applied before forming groups.
Nested Subqueries
Subquery is a select-from-where expression nested within another query. For example, find courses offered in Fall 2009 and in Spring 2010:
1 | |
Don’t recommend!
Some and All Clause
- Some:
F <comp> SOME r\(\Leftrightarrow \exists t \in r\) such that (F <comp> t) - All:
F <comp> ALL r\(\Leftrightarrow \forall t \in r\) (F <comp> t)
Exists and Not Exists Clause
Exists: find all courses taught in both Fall 2017 and Spring 2018
1
2
3
4SELECT course_id
FROM section AS S
WHERE semester = 'Fall' AND year = 2017 AND EXISTS
(SELECT * FROM section AS T WHERE semester = 'Spring' AND year = 2018 AND S.course_id = T.course_id);Not exists: find all students who have taken all courses offered in Biology department
1
2
3SELECT DISTINCT S.ID, S.name
FROM student AS S
WHERE NOT EXISTS ((SELECT course_id FROM course WHERE dept_name = 'Biology') EXCEPT (SELECT T.course_id FROM takes AS T WHERE S.ID = T.ID));
Unique Clause
The unique tests whether a subquery has any duplicate tuples in result. For example, find all courses that were offered at most once in 2017:
1 | |
Subqueries in From Clause
Find the average instructors’ salaries of those departments where the average salary is greater than 42000:
1 | |
With Clause
The with clause provides a way of defining a temporary relation available only to the query in which the with clause occurs. For example, find all departments with the maximum budget:
1 | |
Scalar Subquery
Scalar subquery is for one where a single value is expected. For example, list all departments along with the number of instructors in each department:
1 | |
Modification of Database
Deletion
- Delete all instructors:
DELETE FROM instructor; - Delete all instructors from Finance department:
DELETE FROM instructor WHERE dept_name = 'Finance'; - Delete all tuples in instructor for those instructors associated with department located in Watson building:
DELETE FROM instructor WHERE dept_name IN (SELECT dept_name FROM department WHERE building = 'Watson'); - Delete all instructors whose salary is less than the average salary of instructors:
DELETE FROM instructor WHERE salary < (SELECT AVG(salary) FROM instructor);- Problem: as deleting tuples from instructor, the average salary changes. Solutions:
- Firstly, compute average salary and find all tuples to delete
- Subsequently, delete all tuples found above (without recomputing)
- Problem: as deleting tuples from instructor, the average salary changes. Solutions:
Insertion
Add a new tuple to course:
INSERT INTO course VALUES ('CS-437', 'Database Systems', 'Comp.Sci', 4);Add a new tuple to student with total_creds set to null:
INSERT INTO student VALUES('3003', 'Green', 'Finance', null);Make each student in Music department who has earned more than \(144\) credits hours an instructor in the Music department with salary of \(18000\)
1
2
3
4INSERT INTO instructor
SELECT ID, name, dept_name, 18000
FROM student
WHERE dept_name = 'Music' AND total_cred > 144;
Updates
Give a \(5\%\) salary raise to all instructors:
1
2UPDATE instructor
SET salary = salary * 1.05;Give a \(5\%\) salary raise to those instructors who earn less than \(70000\):
1
2
3UPDATE instructor
SET salary = salary * 1.05
WHERE salary < 70000;Give a \(5\%\) salary raise to instructors whose salary is less than average:
1
2
3UPDATE instructor
SET salary = salary * 1.05
WHERE salary < (SELECT AVG(salary) FROM instructor);Increase salaries of instructors whose salary is over \(100000\) by \(3\%\), and all others by a \(5\%\):
1
2
3
4UPDATE instructor
SET salary = CASE
WHEN salary <= 100000 THEN salary * 1.05 ELSE salary * 1.03
END;Recompute and update total_creds value for all students
1
2
3UPDATE student S
SET total_creds =
(SELECT SUM(credits) FROM takes, course WHERE takes.course_id = course.course_id AND S.ID = takes.ID AND takes.grade <> 'F' AND takes.grade IS NOT NULL);Sets total_creds to \(0\) for students who have not taken any course:
1
2UPDATE student
SET total_creds = CASE WHEN SUM(credits) IS NOT NULL THEN SUM(credits) ELSE 0 END;