JOIN clause
Back to DuckDB Data Engineering Glossary
Overview
A JOIN
clause combines rows from two or more tables based on a related column between them. It's one of the most fundamental operations in SQL, allowing you to bring together data that has been normalized across different tables. Think of it like connecting puzzle pieces - each table holds part of the information, and the join brings those pieces together into a complete picture.
NOTEThis is an overview. If you want to understand the details of how joins work in DuckDB, check out Chapter 3 in the free DuckDB in Action book.
Basic Syntax
The most common join syntax in DuckDB follows this pattern:
Copy code
SELECT * FROM table1
JOIN table2 ON table1.column = table2.column
Join Types in DuckDB SQL
Inner Join (default)
An inner join combines rows from two or more tables based on a matching condition, keeping only the rows where there is a match in both tables. Think of it like a Venn diagram - it returns just the overlapping part.
Copy code
SELECT orders.id, customers.name
FROM orders
JOIN customers ON orders.customer_id = customers.id
Left Outer Join
A left outer join returns all records from the left table (the first one mentioned) and only matching records from the right table. If there's no match in the right table, the result will contain NULL values for those columns.
The Left Outer Join (LEFT JOIN
) is one of the most common types of joins in data analytics.
Copy code
SELECT products.name, reviews.rating
FROM products
LEFT JOIN reviews ON products.id = reviews.product_id
Right Outer Join
A right outer join is essentially the mirror image of a left outer join - it returns all records from the right table (the second one mentioned) and only matching records from the left table. If there's no match in the left table, NULL values are used for those columns.
Copy code
SELECT employees.name, departments.name
FROM employees
RIGHT JOIN departments ON employees.dept_id = departments.id
Full Outer Join
A full outer join (also called a full join) combines both left and right outer joins - it returns ALL records from BOTH tables. Where there are matches, it joins the data; where there's no match on either side, it fills in NULL values for the missing data.
Copy code
SELECT students.name, classes.name
FROM students
FULL OUTER JOIN classes ON students.class_id = classes.id
Cross Join (creates Cartesian product)
A cross join (also called a Cartesian join) creates a result set that combines every row from the first table with every row from the second table. It creates all possible combinations without any matching condition.
Copy code
SELECT employees.name, shifts.time
FROM employees
CROSS JOIN shifts
DuckDB-Specific Features
DuckDB includes special join types not commonly found in other databases:
ASOF Join
An ASOF (As-Of) join in DuckDB is a special type of join designed for time-series data that matches records based on the closest previous timestamp. It's particularly useful when working with data streams that have different sampling rates or when you need to align events with the most recent prior state.
Copy code
SELECT trades.price, quotes.bid
FROM trades
ASOF JOIN quotes ON trades.timestamp >= quotes.timestamp
Joining against files and subqueries
Additionally, DuckDB supports joining directly against files and subqueries without needing to create temporary tables first. For example:
Copy code
SELECT t1.*, t2.revenue
FROM 'sales.parquet' t1
JOIN (SELECT * FROM 'revenue.csv') t2 ON t1.id = t2.id
Performance Considerations
DuckDB automatically optimizes join operations using various strategies including hash joins and merge joins. Unlike some databases that require explicit join hints, DuckDB's query optimizer typically selects the most efficient join strategy based on the data characteristics and available indexes.