Friendly SQL with DuckDB
2024/01/05For decades, SQL has been the undisputed language of data. Its power and ubiquity are undeniable, yet so are its frustrations. Standard SQL can be verbose and rigid, often requiring repetitive boilerplate code, nested subqueries, and tedious workarounds for common analytical tasks. This increases the risk of errors, slows down exploratory analysis, and can feel counter-intuitive for modern data practitioners accustomed to the flexibility of data frame libraries.
What if SQL could be more intuitive, less error-prone, and more productive? This question is at the heart of DuckDB's approach to the language. DuckDB is pioneering practical, common-sense syntax improvements that address these long-standing pain points, making SQL more powerful and approachable. This article explores these features, drawing insights from a conversation with Alex, a forward deployed software engineer at MotherDuck, who uses DuckDB's modern syntax to solve complex data challenges daily.
The features discussed are demonstrated in the accompanying video using a Google Colab notebook. Analysts can achieve this setup seamlessly by setting a MotherDuck service token and using the md: connection string, which automatically installs and configures the necessary DuckDB extension.
Accelerate Exploration with a Flexible Query Structure
A typical data analysis workflow begins with a simple question: what does the data look like? Standard SQL forces analysts to start with a SELECT clause, even when the first logical step is to identify the source table. DuckDB improves this workflow by allowing for FROM-first queries.
This syntax lets an analyst start by specifying the table, which is often the first thing they want to do. The video demonstrates this by running a query that begins with FROM sample_data.hacker_news.posts followed by a LIMIT clause to immediately inspect the table's contents. This small change aligns the query structure with the natural thought process of exploration.
Building on this concept, DuckDB also supports a more logical ordering of clauses. Instead of the standard SELECT ... FROM ... WHERE order, users can write queries in the order of execution: FROM ... WHERE ... GROUP BY ... SELECT. This not only improves human readability but also enhances the performance of AI-powered code completion tools. As Alex notes, providing the table context upfront gives AI assistants a much easier job of suggesting relevant columns and functions, leading to a faster and more accurate coding experience.
Eliminate Boilerplate with Smarter Aggregations
Just as DuckDB rethinks the overall query structure, it also streamlines one of the most common and repetitive parts of any analysis: aggregations. One of the most common sources of friction in analytical SQL is the GROUP BY clause. Standard SQL requires users to manually list every non-aggregated column from the SELECT statement, a repetitive task that is prone to error, especially in queries with many grouping dimensions.
DuckDB solves this with GROUP BY ALL. This simple command instructs the engine to automatically group by all non-aggregated columns present in the SELECT list. The video illustrates this with a query that counts Hacker News posts mentioning "DuckDB" by year and month. Instead of writing GROUP BY year, month, the query simply uses GROUP BY ALL, and DuckDB correctly infers the grouping columns. This eliminates redundant code and reduces the chance of syntax errors.
Similarly, DuckDB offers ORDER BY ALL, which sorts the results by every column in the SELECT list in ascending order. This mirrors the intuitive sorting of a pivot table in Excel, where data is automatically ordered column-by-column. This feature is a convenient alternative to listing columns by their numeric position (e.g., ORDER BY 1, 2, 3), a practice that can easily break when columns are added or removed during query development.
Simplify Complex Logic with Reusable Aliases
Beyond simplifying aggregations, DuckDB also addresses a common source of complexity: the rigid rules around column aliases. Standard SQL imposes a frustrating limitation on column aliases: an alias defined in a SELECT statement cannot be used in a subsequent WHERE, GROUP BY, or HAVING clause within the same query level. This forces analysts to either repeat the entire expression or wrap the logic in a subquery or a Common Table Expression (CTE), adding unnecessary complexity.
DuckDB removes this barrier by allowing aliases to be reused in subsequent clauses. If an analyst creates an alias like EXTRACT(YEAR FROM post_date) AS year, that year alias can be used directly in the WHERE and GROUP BY clauses. DuckDB intelligently resolves the alias as long as there is no ambiguity with an existing column name. This results in cleaner, more readable, and more maintainable queries.
The flexibility extends even further with sequential aliasing within the SELECT clause itself. Users can define an alias and immediately reference it to build a chained calculation in the next column. This allows for complex, multi-step logic to be expressed clearly in a single SELECT statement without the need for nested functions or subqueries.
Select Columns Dynamically with Data Frame-Style Syntax
While reusable aliases help manage complexity within a query's logic, another challenge arises when dealing with the structure of the data itself, especially in wide tables. Working with tables containing hundreds of columns often requires manually listing each one to select, exclude, or transform a subset. DuckDB brings data frame-style flexibility to this problem with the powerful COLUMNS function.
The COLUMNS function allows for dynamic column selection using expressions. Key capabilities include:
- Excluding columns:
COLUMNS(* EXCLUDE (col_a, col_b))selects all columns except for the ones specified. - Replacing columns:
COLUMNS(* REPLACE (col_a * 2 AS col_a))allows for in-place modification of a column's value. - Pattern matching: Users can select columns based on patterns using regular expressions or lambda functions. The video demonstrates selecting all columns that start with "pop" from a dataset, a task that would be cumbersome in standard SQL.
This functionality dramatically simplifies data preparation and feature engineering workflows that were previously the exclusive domain of programmatic tools like pandas.
Improve Daily Workflows with Quality-of-Life Enhancements
Beyond these major structural improvements, DuckDB includes several smaller features that enhance the daily user experience.
- Function Chaining: To avoid deeply nested function calls, DuckDB supports dot notation. A series of string manipulations can be written as
string.upper().split()instead ofsplit(upper(string)), which improves readability by presenting operations in a clear, sequential order. - Trailing Commas: DuckDB resolves the long-standing debate over leading versus trailing commas by supporting trailing commas. This makes it easier to reorder or comment out lines in your
SELECTstatement without causing syntax errors. - String Slicing: For users familiar with Python, DuckDB provides convenient bracket notation for string slicing. This offers a familiar and concise method for text manipulation to those coming from programming backgrounds.
The Future of SQL is Friendlier
The SQL enhancements in DuckDB are more than just syntactic sugar. They represent fundamental improvements that reduce cognitive load, prevent common errors, and make data analysis more productive and enjoyable. By listening to the needs of its users, DuckDB is actively evolving SQL to meet the demands of modern data work.
Features like GROUP BY ALL and the COLUMNS function demonstrate a commitment to developer experience, proving that a powerful analytical engine can also be user-friendly. As Alex concluded in the conversation, this is a community-driven effort. Users are encouraged to share their own SQL frustrations and ideas on the DuckDB GitHub repository, helping to collectively build a better, more intuitive future for the language of data.
Related Videos

2025-12-10
Watch Me Deploy a DuckLake to Production with MotherDuck!
In this video, Hoyt Emerson will show you the fastest way to get DuckLake into production using MotherDuck's beta implementation. If you've been following his DuckLake series, this is the next step you've been waiting for!
YouTube
Data Pipelines
Tutorial
MotherDuck Features
SQL
Ecosystem

2025-11-19
LLMs Meet Data Warehouses: Reliable AI Agents for Business Analytics
LLMs excel at natural language understanding but struggle with factual accuracy when aggregating business data. Ryan Boyd explores the architectural patterns needed to make LLMs work effectively alongside analytics databases.
AI, ML and LLMs
MotherDuck Features
SQL
Talk
Python
BI & Visualization

2025-11-05
The Unbearable Bigness of Small Data
MotherDuck CEO Jordan Tigani shares why we built our data warehouse for small data first, not big data. Learn about designing for the bottom left quadrant, hypertenancy, and why scale doesn't define importance.
Talk
MotherDuck Features
Ecosystem
SQL
BI & Visualization
AI, ML and LLMs

