GROUP BY clause
Back to DuckDB Data Engineering Glossary
Overview
The GROUP BY clause is a fundamental SQL operation that allows you to organize rows into groups based on one or more columns, enabling aggregate calculations on each group rather than the entire dataset. When you use GROUP BY, each group will contain all rows that share the same values in the specified grouping columns.
Basic Usage
The most common use of GROUP BY involves aggregating data by a single column. For example, to count orders by country:
Copy code
SELECT country, COUNT(*) as order_count
FROM orders
GROUP BY country;
Multiple Column Groups
You can group by multiple columns to create more specific groupings. The order of columns in the GROUP BY doesn't affect the results:
Copy code
SELECT country, product_category, SUM(revenue) as total_revenue
FROM sales
GROUP BY country, product_category;
DuckDB-Specific Features
DuckDB extends the standard GROUP BY functionality with helpful features like GROUP BY ALL, which automatically groups by all non-aggregated columns in the SELECT clause:
Copy code
-- These queries are equivalent in DuckDB
SELECT country, region, SUM(sales)
FROM orders
GROUP BY country, region;
SELECT country, region, SUM(sales)
FROM orders
GROUP BY ALL;
Common Pitfalls
When using GROUP BY, all columns in the SELECT clause must either be included in the GROUP BY clause or be wrapped in an aggregate function (like SUM, COUNT, AVG). DuckDB will return an error if this rule is violated, helping prevent accidental misuse of grouping operations.
Related Concepts
The GROUP BY clause is often used with:
HAVINGclause to filter grouped results- Window functions for more complex aggregations
ORDER BYto sort the grouped results- Aggregate functions like
COUNT,SUM,AVG,MAX, andMIN
For more advanced grouping operations in DuckDB, look into GROUPING SETS, ROLLUP, and CUBE clauses which provide additional ways to aggregate data at multiple levels simultaneously.

