*New* The MotherDuck Native Integration is Live on Vercel Marketplace for Embedded Analytics and Data AppsLearn more

sub-query

Back to DuckDB Data Engineering Glossary

Overview

A sub-query (also called a nested query) is a SQL query embedded within another SQL query. It allows you to use the results from one query as input for another query, enabling more complex data analysis and transformations. Sub-queries can appear in the SELECT, FROM, WHERE, and HAVING clauses of a main query.

Types and Examples

DuckDB supports several types of sub-queries:

A scalar sub-query returns exactly one value and can be used anywhere a single value is expected:

Copy code

SELECT name, salary, (SELECT AVG(salary) FROM employees) as company_average FROM employees;

A row sub-query returns a single row with multiple columns:

Copy code

SELECT * FROM employees WHERE (department_id, salary) = (SELECT department_id, MAX(salary) FROM employees GROUP BY department_id);

A table sub-query returns multiple rows and can be used in the FROM clause:

Copy code

SELECT dept_name, avg_salary FROM ( SELECT department_id, AVG(salary) as avg_salary FROM employees GROUP BY department_id ) dept_averages;

A correlated sub-query references columns from the outer query. DuckDB automatically optimizes these to run efficiently:

Copy code

SELECT employee_name, (SELECT COUNT(*) FROM projects WHERE projects.owner_id = employees.id) as project_count FROM employees;

DuckDB-Specific Features

Unlike some databases that struggle with correlated sub-queries, DuckDB uses advanced optimization techniques to "decorrelate" these queries automatically, making them perform well without manual rewriting. DuckDB also allows sub-queries in more places than traditional databases, such as in UPDATE and DELETE statements, and supports sub-queries within window function definitions.

Best Practices

While sub-queries are powerful, they should be used judiciously. For simpler operations, JOINs often provide better readability. However, sub-queries are excellent for complex calculations, especially when you need to reference aggregated results or when you want to break down a complex problem into smaller, more manageable pieces.