aggregate functions
Back to DuckDB Data Engineering Glossary
Overview
Aggregate functions are SQL operations that perform calculations across multiple rows of data to produce a single summary value. These functions help analysts understand patterns and trends by computing statistics like totals, averages, counts and other numerical summaries from groups of records.
Common Aggregate Functions
The most frequently used aggregate functions in DuckDB include:
COUNT()
- Returns the number of rows in a group
Copy code
-- Count total orders
SELECT COUNT(*) FROM orders;
-- Count distinct customers
SELECT COUNT(DISTINCT customer_id) FROM orders;
SUM()
- Adds up all values in a numeric column
Copy code
-- Total revenue
SELECT SUM(order_amount) FROM orders;
-- Revenue by product category
SELECT category, SUM(order_amount)
FROM orders
GROUP BY category;
AVG()
- Calculates the arithmetic mean
Copy code
-- Average order value
SELECT AVG(order_amount) FROM orders;
MIN()
and MAX()
- Find the smallest and largest values
Copy code
-- Price range by category
SELECT category,
MIN(price) as lowest_price,
MAX(price) as highest_price
FROM products
GROUP BY category;
Advanced Aggregates
DuckDB provides several advanced aggregate functions beyond the basics:
STDDEV()
and VARIANCE()
- Calculate statistical measures of spread
Copy code
SELECT category,
STDDEV(price) as price_std_dev,
VARIANCE(price) as price_variance
FROM products
GROUP BY category;
ARRAY_AGG()
- Collects values into an array
Copy code
-- Get array of order dates by customer
SELECT customer_id,
ARRAY_AGG(order_date ORDER BY order_date) as order_dates
FROM orders
GROUP BY customer_id;
Filtering Aggregates
DuckDB supports the FILTER
clause to conditionally include rows in aggregations:
Copy code
SELECT
COUNT(*) as total_orders,
COUNT(*) FILTER (WHERE status = 'completed') as completed_orders
FROM orders;
This provides more readable syntax compared to using CASE
statements or WHERE
clauses for conditional aggregation.
Null Handling
By default, aggregate functions ignore null values. The COUNT(*)
function counts all rows, while COUNT(column_name)
only counts non-null values. You can use COALESCE()
or IFNULL()
to handle nulls differently if needed.
Copy code
-- Count including nulls converted to zero
SELECT SUM(COALESCE(amount, 0))
FROM transactions;