aggregate functions

Back to DuckDB Data Engineering Glossary

Overview

Aggregate functions are SQL operations that perform calculations across multiple rows of data to produce a single summary value. These functions help analysts understand patterns and trends by computing statistics like totals, averages, counts and other numerical summaries from groups of records.

Common Aggregate Functions

The most frequently used aggregate functions in DuckDB include:

COUNT() - Returns the number of rows in a group

Copy code

-- Count total orders SELECT COUNT(*) FROM orders; -- Count distinct customers SELECT COUNT(DISTINCT customer_id) FROM orders;

SUM() - Adds up all values in a numeric column

Copy code

-- Total revenue SELECT SUM(order_amount) FROM orders; -- Revenue by product category SELECT category, SUM(order_amount) FROM orders GROUP BY category;

AVG() - Calculates the arithmetic mean

Copy code

-- Average order value SELECT AVG(order_amount) FROM orders;

MIN() and MAX() - Find the smallest and largest values

Copy code

-- Price range by category SELECT category, MIN(price) as lowest_price, MAX(price) as highest_price FROM products GROUP BY category;

Advanced Aggregates

DuckDB provides several advanced aggregate functions beyond the basics:

STDDEV() and VARIANCE() - Calculate statistical measures of spread

Copy code

SELECT category, STDDEV(price) as price_std_dev, VARIANCE(price) as price_variance FROM products GROUP BY category;

ARRAY_AGG() - Collects values into an array

Copy code

-- Get array of order dates by customer SELECT customer_id, ARRAY_AGG(order_date ORDER BY order_date) as order_dates FROM orders GROUP BY customer_id;

Filtering Aggregates

DuckDB supports the FILTER clause to conditionally include rows in aggregations:

Copy code

SELECT COUNT(*) as total_orders, COUNT(*) FILTER (WHERE status = 'completed') as completed_orders FROM orders;

This provides more readable syntax compared to using CASE statements or WHERE clauses for conditional aggregation.

Null Handling

By default, aggregate functions ignore null values. The COUNT(*) function counts all rows, while COUNT(column_name) only counts non-null values. You can use COALESCE() or IFNULL() to handle nulls differently if needed.

Copy code

-- Count including nulls converted to zero SELECT SUM(COALESCE(amount, 0)) FROM transactions;