Chapter 3: Introduction to SQL

SQL introduction.

Structured Query Language (SQL)

SQL Parts

  1. Data Manipulation Language (DML)
    • The ability to query/retrieve information
    • The ability to select, insert, delete, update tuples
  2. Data Definition Language (DDL)
    • The ability to define, drop, modify on schemas, views and index
    • Integrity: commands for specifying integrity constraints
    • View definition: commands for defining views
  3. Transaction control
    • Commands for specifying the beginning and ending of transactions (chapter 17)
  4. Embedded SQL and dynamic SQL
    • Statements embedded within programming languages (chapter 5)
  5. Authorization
    • Commands for specifying access rights to relations and views

SQL Data Definition

Def. DDL allows specification about relations:

  • The schema for each relation
  • The type of values associated with each attribute
  • The integrity constraints
  • The indices to be maintained for each relation
  • The security and authorization information for each relation
  • The physical storage structure of each relation on disk

Domain Types in SQL

  • char(n). Fixed length character string, user-specified length \(n\)
  • varchar(n). Variable length character string, user-specified length \(n\)
  • int. Integer (the integers that is machine-dependent)
  • smallint. Small integer (machine-dependent subset of the integer)
  • numeric(p, d). Fixed point number, user-specified precision of \(p\) digits, with \(d\) digits to the right of decimal point
  • data. Dates, containing a (\(4\) digit) year, month and date
  • time. Time of day, in hours, minutes and seconds
  • timestamp. Date plus time of day, e.g. 2001-7-27 09:00:30.75
  • Interval. Period of time
    • Subtracting a date/time/timestamp value from another gives an interval value
  • Null. Values allowed in all domain types. Declaring an attribute to be not null prohibits. null values for that attribute
  • create domain. Construct creates user-defined domain types

Basic Schema Definition: Create Table

1
2
3
4
5
6
7
8
9
create table takes (
ID varchar(5) not null,
course_id varchar(8) not null,
year numeric(4,0) not null,
grade varchar(2),
primary key (ID, course_id, year),
foreign key (ID) references student(ID),
foreign key (course_id, year) references section(course_id, year)
);

Updates to Schemas

  • Drop table: DROP TABLE table_name;
  • Alter:
    • ALTER TABLE table_name ADD attribute_name domain_type;
      • All existing tuples are assigned null as the value for the added attribute
    • ALTER TABLE table_name DROP attribute_name;
      • Not supported by many database

The tuples in table are stored row by row. Adding attributes or dropping the table leads to costly modifying the records and then moving these records.

Basic Structure of SQL Queries

A typical SQL query has the form:

1
2
3
SELECT A1, A2, ..., An
FROM r1, r2, ..., rm
WHERE P;

This query is equivalent to the relational algebra expression:

\[ \Pi_{A_1, A_2, \cdots, A_n} (\sigma_{P} (r_1 \times r_2 \times \cdots \times r_m)) \]

The Select Clause

The select clause lists the attributes. For example, find the names of all instructors:

1
2
SELECT name
FROM instructor;

Note: names are case insensitive (Name, NAME, name).

SQL allows duplicates in relations. To force elimination of duplicates, insert the keyword distinct. For example, find the department names of all instructors, and remove duplicates:

1
2
SELECT DISTINCT dept_name
FROM instructor;

We also insert the keyword all to specify that duplicates should not be removed.

  • An asterisk (*) in the select clause denotes all attributes
  • An attribute can be a literal with no from clause, e.g. SELECT '437';
    • Results is a table with one column and a single row with value 437
    • Can give the column a name using: SELECT '437' AS FOO;
  • An attribute can be a literal with from clause, e.g. SELECT 'A' FROM instructor;
    • Result is a table with one column and \(N\) rows, each row with value A.
    • What about SELECT DISTINCT 'A' FROM instructor;?

The select clause can contain arithmetic expressions involving operation \(+\), \(-\), \(*\), and \(/\). The query:

1
2
SELECT ID, name and salary/12 AS monthly_salary
FROM instructor;

would return a relation that is the same as instructor, except that the value of salary is divided by \(12\).

The Where Clause

The where clause specifies conditions that the result must satisfy

  • Corresponds to selection predicate of relational algebra
  • Allows the logical connectives in WHERE predicate
  • Between comparison operator
    • Find names of all instructors with salary between \(90000\) and \(100000\)
    • SELECT name FROM instructor WHERE salary BETWEEN 90000 AND 100000;
  • Tuple comparison operator
    • SELECT name, course_id FROM instructors, teaches WHERE (instructor.ID, dept_name) = (teaches.ID, 'Biology);

The From Clause

The from clause lists relations corresponding to Cartesian product operation of relational algebra. For example, fin Cartesian product instructor and teaches:

1
2
SELECT * 
FROM instructor, teaches;

For common attributes, like ID, attributes in resulting table are renamed using the relation name, like instructor.ID.

Cartesian product not very useful directly, but useful with where clause condition.

Natural Join

Natural Join in from subclause. For example, find names of all instructors in Art who have taught some course and course_id.

1
2
3
SELECT name, course_id
FROM instructor NATURAL JOIN teaches
WHERE instructor.dept_name = 'Art';

What’s difference compared with:

1
2
3
SELECT name, course_id
FROM instructor, teaches
WHERE instructor.ID = teaches.ID AND instructor.dept_name = 'Art';

Two SQL are equivalent but the second SQL is more costly.

SQL queries run in this order: FROM + JOIN \(\to\) WHERE \(\to\) GROUP BY \(\to\) HAVING \(\to\) SELECT \(\to\) ORDER BY \(\to\) LIMIT.

The Rename Operation

Renaming relations and attributes using the as clause. For example, find names of all instructors who have a higher salary than instructor in Comp.Sci.

1
2
3
SELECT DISTINCT T.name AS TeacherName
FROM instructor AS T, instructor AS S /*tuple variables*/
WHERE T.salary > S.salary and S.dept_name = 'Comp.Sci';

String Operations

String matching operator for comparison on character strings. The operator like uses patterns two special characters:

  • Percent (%). The % character matches any substring.
  • Underscore (_). The _ character matches any character.

For example, find the names of all instructors whose name includes the substring “dar”.

1
2
3
SELECT name
FROM instructor
WHERE name LIKE '%dar%';

Or match the string “100%”, LIKE '100\%' ESCAPE '\';. We use backslash (\) as escape character.

Patterns are case sensitive. Examples:

  • Intro% matches any string beginning with “Intro”
  • %Comp% matches any string containing “Comp” as a substring
  • _ _ _ matches any string of exactly three characters
  • _ _ _% matches any string of at least three characters.

Ordering the Display of Tuples

  • List in alphabetic order the names of all instructors. ORDER BY name
  • Specify desc for descending order or asc for ascending order (ascending order is the default), e.g. ORDER BY name DESC, dept_name DESC

Set Operation

Def. Set operations include union, intersect, and except (\(\cup\), \(\cap\), \(-\)) operate on relations.

  • Automatically eliminates duplicates
  • Union all, Intersect all and except all, to retain duplicates

Null Values

Null signifies an unknown value or a value does not exist. The result of any arithmetic expression involving null is null. The predicate is null can be used to check for null values. For example, find all instructors whose salary is null:

1
2
3
SELECT name
FROM instructor
WHERE salary IS NULL;

SQL treats as unknown the result of any comparison involving a null value.

The predicate in a where clause can involve Boolean operations (and, or, not); thus Boolean operations need to be extended to deal with unknown.

  • True AND UNKNOWN = UNKNOWN
  • FALSE AND UNKNOWN = FALSE
  • UNKNOWN AND UNKNOWN = UNKNOWN
  • UNKNOWN OR TRUE = TRUE
  • UNKNOWN OR FALSE = UNKNOWN
  • UNKNOWN OR UNKNOWN = UNKNOWN

Results of where clause predicate is treated as false if it evaluates to unknown.

Aggregate Functions

Aggregate functions operate on values of column, but return a value.

  • AVG: average value
  • MIN: minimum value
  • MAX: maximum value
  • SUM: sum of values
  • COUNT: number of values

Examples:

  1. Find average salary of instructors in Computer Science department

    1
    2
    3
    SELECT AVG(salary)
    FROM instructor
    WHERE dept_name = 'Comp.Sci';
  2. Find total number of instructors who teach a course in Spring 2018

    1
    2
    3
    SELECT COUNT(DISTINCT ID)
    FROM teaches
    WHERE dept_name = 'Comp.Sci';
  3. Find the number of tuples in the course relation

    1
    2
    SELECT COUNT(*)
    FROM course;

Group By

Find the average salary of instructors in each department:

1
2
3
SELECT dept_name, AVG(salary) AS avg_salary
FROM instructor
GROUP BY dept_name;

Attributes in select clause outside of aggregate functions must appear in group by list (dept_name).

All aggregate operations except COUNT(*) ignore tuples with null values on the aggregated attributes.

Having Clause

Find the names and average salaries of all departments whose average salary is more than \(42000\).

1
2
3
4
SELECT dept_name, AVG(salary) AS avg_salary
FROM instructor
GROUP BY dept_name
HAVING AVG(salary) > 42000;

Note: predicates in having clause are applied after the formation of groups whereas predicates in where clause are applied before forming groups.

Nested Subqueries

Subquery is a select-from-where expression nested within another query. For example, find courses offered in Fall 2009 and in Spring 2010:

1
2
3
4
SELECT DISTINCT course_id
FROM section
WHERE semester = 'Fall' AND year = 2009 AND course_id IN
(SELECT course_id FROM section WHERE semester = 'Sprint' and year = 2010);

Don’t recommend!

Some and All Clause

  • Some: F <comp> SOME r \(\Leftrightarrow \exists t \in r\) such that (F <comp> t)
  • All: F <comp> ALL r \(\Leftrightarrow \forall t \in r\) (F <comp> t)

Exists and Not Exists Clause

  • Exists: find all courses taught in both Fall 2017 and Spring 2018

    1
    2
    3
    4
    SELECT course_id
    FROM section AS S
    WHERE semester = 'Fall' AND year = 2017 AND EXISTS
    (SELECT * FROM section AS T WHERE semester = 'Spring' AND year = 2018 AND S.course_id = T.course_id);
  • Not exists: find all students who have taken all courses offered in Biology department

    1
    2
    3
    SELECT DISTINCT S.ID, S.name
    FROM student AS S
    WHERE NOT EXISTS ((SELECT course_id FROM course WHERE dept_name = 'Biology') EXCEPT (SELECT T.course_id FROM takes AS T WHERE S.ID = T.ID));

Unique Clause

The unique tests whether a subquery has any duplicate tuples in result. For example, find all courses that were offered at most once in 2017:

1
2
3
SELECT T.course_id
FROM course AS T
WHERE UNIQUE (SELECT R.course_id FROM section AS R WHERE T.course_id = R.course_id AND R.year = 2017);

Subqueries in From Clause

Find the average instructors’ salaries of those departments where the average salary is greater than 42000:

1
2
3
SELECT dept_name, avg_salary
FROM (SELECT dept_name, AVG(salary) FROM instructor GROUP BY dept_name) AS dept_avg(dept_name, avg_salary)
WHERE avg_salary > 42000;

With Clause

The with clause provides a way of defining a temporary relation available only to the query in which the with clause occurs. For example, find all departments with the maximum budget:

1
2
3
4
WITH max_budget(values) AS (SELECT MAX(budget) FROM department)
SELECT department.name
FROM department, max_budget
WHERE department.budget = max_budget.value;

Scalar Subquery

Scalar subquery is for one where a single value is expected. For example, list all departments along with the number of instructors in each department:

1
2
SELECT dept_name, (SELECT COUNT(*) FROM instructor WHERE department.dept_name = instructor.dept_name) AS num_instructors
FROM department;

Modification of Database

Deletion

  • Delete all instructors: DELETE FROM instructor;
  • Delete all instructors from Finance department: DELETE FROM instructor WHERE dept_name = 'Finance';
  • Delete all tuples in instructor for those instructors associated with department located in Watson building: DELETE FROM instructor WHERE dept_name IN (SELECT dept_name FROM department WHERE building = 'Watson');
  • Delete all instructors whose salary is less than the average salary of instructors: DELETE FROM instructor WHERE salary < (SELECT AVG(salary) FROM instructor);
    • Problem: as deleting tuples from instructor, the average salary changes. Solutions:
      • Firstly, compute average salary and find all tuples to delete
      • Subsequently, delete all tuples found above (without recomputing)

Insertion

  • Add a new tuple to course: INSERT INTO course VALUES ('CS-437', 'Database Systems', 'Comp.Sci', 4);

  • Add a new tuple to student with total_creds set to null: INSERT INTO student VALUES('3003', 'Green', 'Finance', null);

  • Make each student in Music department who has earned more than \(144\) credits hours an instructor in the Music department with salary of \(18000\)

    1
    2
    3
    4
    INSERT INTO instructor
    SELECT ID, name, dept_name, 18000
    FROM student
    WHERE dept_name = 'Music' AND total_cred > 144;

Updates

  • Give a \(5\%\) salary raise to all instructors:

    1
    2
    UPDATE instructor
    SET salary = salary * 1.05;
  • Give a \(5\%\) salary raise to those instructors who earn less than \(70000\):

    1
    2
    3
    UPDATE instructor
    SET salary = salary * 1.05
    WHERE salary < 70000;
  • Give a \(5\%\) salary raise to instructors whose salary is less than average:

    1
    2
    3
    UPDATE instructor
    SET salary = salary * 1.05
    WHERE salary < (SELECT AVG(salary) FROM instructor);
  • Increase salaries of instructors whose salary is over \(100000\) by \(3\%\), and all others by a \(5\%\):

    1
    2
    3
    4
    UPDATE instructor
    SET salary = CASE
    WHEN salary <= 100000 THEN salary * 1.05 ELSE salary * 1.03
    END;
  • Recompute and update total_creds value for all students

    1
    2
    3
    UPDATE student S
    SET total_creds =
    (SELECT SUM(credits) FROM takes, course WHERE takes.course_id = course.course_id AND S.ID = takes.ID AND takes.grade <> 'F' AND takes.grade IS NOT NULL);
  • Sets total_creds to \(0\) for students who have not taken any course:

    1
    2
    UPDATE student
    SET total_creds = CASE WHEN SUM(credits) IS NOT NULL THEN SUM(credits) ELSE 0 END;

Chapter 3: Introduction to SQL
https://ddccffq.github.io/2025/10/20/数据库系统原理/Introduction_to_SQL/
作者
ddccffq
发布于
2025年10月20日
许可协议