1. Join Overview
Relational databases store data across multiple tables to avoid redundancy. JOINs let you combine rows from two or more tables based on a related column, reconstructing the complete picture at query time.
π‘ Why not store everything in one big table? Duplicated data wastes storage, makes updates error-prone, and violates normalization principles. JOINs give you the best of both worlds: normalized storage with flexible querying.
We'll use these sample tables throughout this chapter:
CREATE TABLE departments (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100) NOT NULL
);
CREATE TABLE employees (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100) NOT NULL,
dept_id INT,
manager_id INT,
salary DECIMAL(10,2),
FOREIGN KEY (dept_id) REFERENCES departments(id),
FOREIGN KEY (manager_id) REFERENCES employees(id)
);
CREATE TABLE projects (
id INT PRIMARY KEY AUTO_INCREMENT,
title VARCHAR(200) NOT NULL,
dept_id INT,
FOREIGN KEY (dept_id) REFERENCES departments(id)
);
-- Sample data
INSERT INTO departments (id, name) VALUES
(1, 'Engineering'), (2, 'Marketing'), (3, 'Sales'), (4, 'HR');
INSERT INTO employees (id, name, dept_id, manager_id, salary) VALUES
(1, 'Alice', 1, NULL, 95000),
(2, 'Bob', 1, 1, 85000),
(3, 'Charlie', 2, 1, 72000),
(4, 'Diana', 2, 3, 68000),
(5, 'Eve', NULL, 1, 78000),
(6, 'Frank', 3, NULL, 82000);
INSERT INTO projects (id, title, dept_id) VALUES
(1, 'Website Redesign', 1),
(2, 'Mobile App', 1),
(3, 'Ad Campaign', 2);
Note that Eve has no department (dept_id = NULL), and the HR department has no employees. These edge cases will help illustrate different join behaviors.
2. INNER JOIN
Returns only rows that have matching values in both tables. This is the most common join type.
Basic Syntax
SELECT e.name AS employee, d.name AS department
FROM employees e
INNER JOIN departments d ON e.dept_id = d.id;
Result (Eve is excluded because her dept_id is NULL, HR is excluded because no employee belongs to it):
| employee | department |
|---|---|
| Alice | Engineering |
| Bob | Engineering |
| Charlie | Marketing |
| Diana | Marketing |
| Frank | Sales |
Multi-Table Join
You can chain multiple joins to connect three or more tables:
SELECT e.name AS employee, d.name AS department, p.title AS project
FROM employees e
INNER JOIN departments d ON e.dept_id = d.id
INNER JOIN projects p ON d.id = p.dept_id;
| employee | department | project |
|---|---|---|
| Alice | Engineering | Website Redesign |
| Alice | Engineering | Mobile App |
| Bob | Engineering | Website Redesign |
| Bob | Engineering | Mobile App |
| Charlie | Marketing | Ad Campaign |
| Diana | Marketing | Ad Campaign |
Table Aliases
Aliases (e, d, p) make queries shorter and more readable. They're especially important when a column name exists in multiple tables:
-- Without alias: ambiguous 'name' column
SELECT employees.name, departments.name
FROM employees
INNER JOIN departments ON employees.dept_id = departments.id;
-- With alias: clear and concise
SELECT e.name AS employee, d.name AS department
FROM employees e
INNER JOIN departments d ON e.dept_id = d.id;
π‘ JOIN without a prefix keyword defaults to INNER JOIN. Writing INNER JOIN explicitly is recommended for clarity.
3. LEFT JOIN / RIGHT JOIN
Outer joins keep all rows from one side, filling in NULLs where there's no match on the other side.
LEFT JOIN
Returns all rows from the left table, with matching rows from the right. If no match, right-side columns are NULL.
SELECT e.name AS employee, d.name AS department
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.id;
| employee | department |
|---|---|
| Alice | Engineering |
| Bob | Engineering |
| Charlie | Marketing |
| Diana | Marketing |
| Eve | NULL |
| Frank | Sales |
RIGHT JOIN
Returns all rows from the right table, with matching rows from the left.
SELECT e.name AS employee, d.name AS department
FROM employees e
RIGHT JOIN departments d ON e.dept_id = d.id;
| employee | department |
|---|---|
| Alice | Engineering |
| Bob | Engineering |
| Charlie | Marketing |
| Diana | Marketing |
| Frank | Sales |
| NULL | HR |
Finding Unmatched Rows
A common pattern: use a LEFT JOIN + WHERE ... IS NULL to find rows without a match.
-- Employees without a department
SELECT e.name
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.id
WHERE d.id IS NULL;
-- Departments with no employees
SELECT d.name
FROM departments d
LEFT JOIN employees e ON d.id = e.dept_id
WHERE e.id IS NULL;
π FULL OUTER JOIN:
- β’ PostgreSQL, SQL Server, and Oracle support
FULL OUTER JOINnatively β returns all rows from both sides - β’ MySQL does not support FULL OUTER JOIN. Emulate it with a UNION of LEFT and RIGHT joins:
SELECT e.name, d.name FROM employees e LEFT JOIN departments d ON e.dept_id = d.id
UNION
SELECT e.name, d.name FROM employees e RIGHT JOIN departments d ON e.dept_id = d.id;
4. CROSS JOIN
Returns the Cartesian product β every row from the left table combined with every row from the right table. If table A has M rows and table B has N rows, the result has M Γ N rows.
-- Generate all possible employee-department combinations
SELECT e.name AS employee, d.name AS department
FROM employees e
CROSS JOIN departments d;
With 6 employees and 4 departments, this returns 24 rows.
Practical Use Cases
CROSS JOIN is useful for generating combinations:
-- Generate a calendar grid: all dates Γ all time slots
SELECT d.date_val, t.slot_name
FROM (
SELECT '2025-01-01' AS date_val
UNION ALL SELECT '2025-01-02'
UNION ALL SELECT '2025-01-03'
) d
CROSS JOIN (
SELECT 'Morning' AS slot_name
UNION ALL SELECT 'Afternoon'
UNION ALL SELECT 'Evening'
) t;
π‘ Caution: CROSS JOIN can produce enormous result sets. A cross join of two tables with 1,000 rows each yields 1,000,000 rows. Always use it intentionally and on small datasets.
5. Self Join
A self join joins a table with itself. This is essential when rows in the same table have a hierarchical or peer relationship β for example, employees and their managers.
Employee-Manager Example
-- Find each employee's manager
SELECT
e.name AS employee,
m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;
| employee | manager |
|---|---|
| Alice | NULL |
| Bob | Alice |
| Charlie | Alice |
| Diana | Charlie |
| Eve | Alice |
| Frank | NULL |
Employees Who Earn More Than Their Manager
SELECT
e.name AS employee,
e.salary AS emp_salary,
m.name AS manager,
m.salary AS mgr_salary
FROM employees e
INNER JOIN employees m ON e.manager_id = m.id
WHERE e.salary > m.salary;
Finding Colleagues (Same Department)
SELECT
e1.name AS employee1,
e2.name AS employee2,
d.name AS department
FROM employees e1
INNER JOIN employees e2 ON e1.dept_id = e2.dept_id AND e1.id < e2.id
INNER JOIN departments d ON e1.dept_id = d.id;
π‘ The condition e1.id < e2.id prevents duplicate pairs (Alice-Bob and Bob-Alice) and self-pairs (Alice-Alice).
6. Subqueries
A subquery is a SELECT statement nested inside another query. Subqueries can appear in the WHERE clause, FROM clause, or SELECT list.
Scalar Subquery
Returns a single value. Can be used anywhere a single value is expected.
-- Employees who earn above the average salary
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
-- Show each employee's salary and the company average
SELECT
name,
salary,
(SELECT AVG(salary) FROM employees) AS avg_salary,
salary - (SELECT AVG(salary) FROM employees) AS diff
FROM employees;
IN Subquery
Returns a list of values. Use with the IN operator to filter rows matching any value in the list.
-- Employees in departments that have projects
SELECT name, dept_id
FROM employees
WHERE dept_id IN (SELECT DISTINCT dept_id FROM projects);
-- Employees NOT in any department with projects
SELECT name, dept_id
FROM employees
WHERE dept_id NOT IN (SELECT DISTINCT dept_id FROM projects WHERE dept_id IS NOT NULL);
π‘ Watch out for NULL with NOT IN: If the subquery returns any NULL value, NOT IN will return no rows at all. Always add WHERE column IS NOT NULL inside the subquery, or use NOT EXISTS instead.
EXISTS Subquery
Returns TRUE if the subquery returns at least one row. Often more efficient than IN for large datasets.
-- Departments that have at least one employee
SELECT d.name
FROM departments d
WHERE EXISTS (
SELECT 1 FROM employees e WHERE e.dept_id = d.id
);
-- Departments that have NO employees
SELECT d.name
FROM departments d
WHERE NOT EXISTS (
SELECT 1 FROM employees e WHERE e.dept_id = d.id
);
FROM Subquery (Derived Table)
A subquery in the FROM clause creates a temporary result set (derived table) that you can query like a regular table.
-- Department salary statistics
SELECT
dept_summary.department,
dept_summary.emp_count,
dept_summary.avg_salary,
dept_summary.max_salary
FROM (
SELECT
d.name AS department,
COUNT(e.id) AS emp_count,
ROUND(AVG(e.salary), 2) AS avg_salary,
MAX(e.salary) AS max_salary
FROM departments d
INNER JOIN employees e ON d.id = e.dept_id
GROUP BY d.id, d.name
) AS dept_summary
WHERE dept_summary.emp_count >= 2;
π Subquery vs JOIN performance:
- β’ In many cases, JOINs and subqueries produce the same results. The optimizer may rewrite one into the other
- β’
EXISTSis usually faster thanINfor correlated subqueries on large tables - β’ Derived tables (FROM subqueries) can sometimes be replaced with
WITH (CTE)for better readability (MySQL 8.0+) - β’ Always check with
EXPLAINto verify the actual execution plan
7. UNION & UNION ALL
UNION combines the result sets of two or more SELECT statements into a single result. The columns must match in number and be compatible in type.
UNION (Removes Duplicates)
-- All names from both employees and departments
SELECT name, 'employee' AS type FROM employees
UNION
SELECT name, 'department' AS type FROM departments;
UNION ALL (Keeps Duplicates)
-- Combine recent orders and archived orders
SELECT id, user_id, total_amount, created_at FROM orders
UNION ALL
SELECT id, user_id, total_amount, created_at FROM orders_archive;
π‘ UNION vs UNION ALL performance: UNION performs a de-duplication step (like DISTINCT), which requires sorting or hashing. If you know there are no duplicates, or you want to keep them, use UNION ALL β it's significantly faster.
ORDER BY with UNION
Place ORDER BY at the end to sort the combined result. It applies to the entire UNION, not individual SELECTs.
SELECT name, salary, 'high' AS tier
FROM employees WHERE salary >= 80000
UNION ALL
SELECT name, salary, 'standard' AS tier
FROM employees WHERE salary < 80000
ORDER BY salary DESC;
Practical Example: Search Across Tables
-- Search for "Engineering" across multiple tables
SELECT 'department' AS source, name AS result FROM departments WHERE name LIKE '%Engineering%'
UNION ALL
SELECT 'project' AS source, title AS result FROM projects WHERE title LIKE '%Engineering%'
UNION ALL
SELECT 'employee' AS source, name AS result FROM employees WHERE name LIKE '%Engineering%';
π UNION rules across databases:
- β’
UNIONandUNION ALLwork the same in MySQL, PostgreSQL, SQL Server, and SQLite - β’ PostgreSQL and SQL Server also support
INTERSECT(rows in both) andEXCEPT(rows in first but not second) - β’ MySQL 8.0.31+ added support for
INTERSECTandEXCEPT
8. Chapter Summary
Join Types
- β’ INNER JOIN β only matching rows from both tables
- β’ LEFT JOIN β all left rows + matched right rows
- β’ RIGHT JOIN β all right rows + matched left rows
- β’ CROSS JOIN β Cartesian product (M Γ N rows)
- β’ Self Join β a table joined with itself
Subqueries
- β’ Scalar β returns a single value
- β’ IN / NOT IN β returns a list of values
- β’ EXISTS / NOT EXISTS β checks for row existence
- β’ Derived Table β subquery in FROM clause
UNION Operations
- β’ UNION β combine + remove duplicates
- β’ UNION ALL β combine + keep duplicates (faster)
- β’ Columns must match in number and type
- β’ ORDER BY applies to the final combined result
Key Takeaways
- β’ Use INNER JOIN for strict matches
- β’ Use LEFT JOIN to preserve all left-side rows
- β’ LEFT JOIN + IS NULL finds orphan records
- β’ Prefer EXISTS over IN for correlated checks
- β’ Use UNION ALL when duplicates aren't an issue