DuckDB 1.5 Features I am Excited About

2026/03/09 - 10 min read

The team at DuckDB Labs and the DuckDB community have released DuckDB 1.5! It is full of all kinds of goodies - every new release has me feeling like a kid in a candy store. Be sure to check out the DuckDB blog Announcing DuckDB 1.5.0! All across MotherDuck, we are very excited for 1.5 and we will be releasing support for DuckDB 1.5 within the next few weeks. Thank you to the folks at DuckDB Labs and all who contributed!

The DuckDB team launch post describes a lot of the big new features in 1.5 (seriously, go read the post!), but I want to share a bit about why I am so excited about those features and why you should be too. It’s always more fun to have someone else complement your code, after all! Here is a sampling of new functionality that I think is worth an extra shout out.

Faster JSON Queries with the VARIANT Type

Now Ducks can shred JSON and on the guitar!

With a VARIANT, you can automatically store data with different data types on each row and still get tremendous performance when querying that data. The time when this comes in the most handy is when working with JSON data with different structures. There are so many cases where data is mostly the same shape, but not exactly (creatively called semi-structured data…). This can come from observability data where each service logs keys that are important in that specific domain, API outputs that can change over time, or just plain old messy data.

In the age of AI, I doubt data gets more structured… I am confident this will be tremendously useful in so many places!

So, that is what a VARIANT can be used for, but why use it? In a word: speed. In more than one word: 10-100x kind of speed. DuckDB’s existing functionality for semi-structured data, the JSON type, stores data as text for flexibility. However, the VARIANT type automatically “shreds” the JSON data into separate columns (and since it is automatic, it keeps that flexibility!). So, if you only need a few pieces of your JSON data, you only need to grab those pieces off of disk. Plus, they will already be in the perfect data type instead of always a VARCHAR. In internal benchmarking, we have seen over 100x improvements in those types of queries.

Even in fast-improving DuckDB, you don’t get 100x boosts every release, let alone for workloads that are this ubiquitous! Huge.

Using a VARIANT feels much like using JSON. If you create a VARIANT column, you can query individual keys like this:

Copy code
CREATE TEMP TABLE go_ducks AS 
  SELECT {duck: 42, goose: -1}::VARIANT as my_variant;

SELECT my_variant.duck
FROM go_ducks;

Faster Real-World Queries

The real world of SQL is messy. If you have been in this game for a bit, you’ve seen the 3000 line behemoths just like I have. You may have even written some of them, as I am guilty of! And it’s rarely the short queries that are the ones that need optimizing. Benchmarks are useful, but they just can’t cover the full gamut, so often real-world performance comes down to how much optimization work the database does on our behalf.

Your real-life database performance depends a lot on how friendly and helpful your database is. Doubly so in the era of Agents! SQL code is only going to get more complex in 2026…

DuckDB 1.5 has a ton of impactful features in this area.

Basic Min / Max Queries are 6 - 18x Faster

There are many cases where I need to know the min or max of an entire table: I need the latest timestamp to know if my cache is up to date, or I want the max customer_id so I can generate a new one. When I’m exploring new data, I want to know things like how far back a dataset goes, and that is just a quick:

Copy code
SELECT min(event_date)
FROM shipments

In DuckDB 1.5, this can be between 6 and 18x faster!

How? DuckDB’s storage format automatically breaks your data into chunks of rows, 122880 by default. For each chunk, it stores statistics about every column in that chunk, including the minimum and maximum value present. With this new feature, DuckDB no longer needs to check every row to know the min or max of a table - it can just check the statistics! As you would expect, it is quite a lot faster to check 1 value instead of 122880!

And this feature is not limited to just DuckDB files, it works for Parquet files as well! Parquet uses the same kind of rowgroup concept (a PAX layout in hardcore database lingo), so it has similar statistics. Any queries that are simple enough automatically get this speedup with both file types - no changes needed!

More Complex Joins can be Much Faster

If only every database schema were a beautiful and pristine star shape… But out here in the real world of analytics, joins can get complicated. DuckDB has always been able to handle those kinds of joins, but in the past, it sometimes fell back to the “lowest common denominator” join algorithm: the trusty blockwise nested loop join. That may sound like it is pretty deep in the database-land weeds, but practically it meant that complex joins in DuckDB were often substantially slower than basic joins.

DuckDB 1.5 can now detect more cases where it can use its incredibly fast, industry-leading Hash Join algorithm. In practice, this can easily be an over 10x performance improvement. If there is at least one equality condition, even joins with complex expressions can do a fast hash join, then apply the complex expression as a residual predicate (a filter step that happens after the join). More speed for messy joins! This is another benefit that you won’t see in benchmarks, but you will really feel in your day to day job.

Up to 40x Speedups for Top N by Group

There are many cases where it is important to retrieve the Top N items within a group. That sounds a bit abstract, but it includes queries like: the top 10 products by category, the last 5 shipments from each supplier, or the 100 most recent logs for each microservice. The most common use case though is for removing duplicates intelligently. Situations like, show me the most up to date value per customer.

There are 2 standard approaches to calculating this: using a row_number() filter, or using the max_by aggregate function (also known as arg_max). You may have seen a deduplication query like this before:

Copy code
WITH row_number_added AS (
  SELECT 
    *, 
    row_number() OVER (
      PARTITION BY group_col 
      ORDER BY update_date DESC
      ) AS rn
  FROM tbl
)
SELECT * 
FROM row_number_added
WHERE rn = 1

Now, DuckDB automatically chooses the optimal algorithm, regardless of your syntax. And not only that, it goes far faster than either approach could manually by cutting out intermediate calculations. In some cases, it can be up to 70x faster! Not 70% faster, 70 times faster.

This one is personal for me! I have coached at least 10 different customers about how to manually tune their queries to use the arg_max / max_by approach instead of row_number(). I even wrote a blog about it, complete with a microbenchmark. I have been replaced!! I am more than happy to hand this one over to the machines though… Thank you to the humans who made that possible!

Don’t Repeat Yourself, but for the Database…

Gnarly SQL queries often reuse the same pieces of data in different ways, often with CTE’s (Common Table Expressions), which use the WITH clause. DuckDB 1.5 is now even more creative in how it can detect reusable pieces of analysis. The technical term for this is “Common Subplan Elimination”, where calculations that are reused in multiple places get calculated once and materialized (stored in memory / local disk) for reuse later on in the same query. That means that DuckDB has less work to do and your most complex queries can go faster! Even fuzzy matches are supported where CTEs are similar to one another, and the superset of their analysis can be calculated once and reused in both places.

Queries in TPC-DS and TPC-H that fit this pattern can be up to 80% faster!

Read Whole Folders of DuckDB Files

One nice property of working with Parquet files in DuckDB is that you can query an entire folder structure of them as if it were a single table in a SQL statement. Now, DuckDB files have that same capability! You can read a whole folder with:

Copy code
SELECT * FROM read_duckdb('*.duckdb')

The key benefit of this is that DuckDB becomes a lot more convenient to use as a file format on cloud object stores like AWS S3 and others. You now have the option to build up an archive of your data in many individual DuckDB files, which can have both read performance and compression benefits over Parquet.

Writing to Azure Blob and ADLSv2 Storage

DuckDB’s COPY statement can now write directly to Azure Blob Storage and ADLSv2 storage. This really unlocks Azure as a place where you can manage files with DuckDB, not just query them. Azure was the last remaining major cloud object store to support writes, so now DuckDB is a fully multi-cloud technology! That is a pretty amazing milestone.

DuckLake 0.4 Launches with Macros, Sorting, and Fixes

DuckLake is completely changing what it means to be a table lakehouse format. It is dramatically simpler to use and has significantly faster read query performance - up to 10x lower latency than other formats! It does this all with an at once traditional and radical approach: use a SQL database for the lakehouse catalog and all lakehouse metadata instead of thousands of metadata files on object storage. Your full dataset still lives in Parquet on object storage (in the same format as Iceberg!), but queries often save seconds by using a DB for the metadata.

DuckLake 0.4 has a variety of great new features like macros, sorted / clustered tables, and deletion inlining. We will cover them in depth in future posts.

DuckLake 1.0 is coming in April, so definitely stay tuned for that as well!

Iceberg and Delta Lake

Both Iceberg and Delta Lake continue to receive significant focus in DuckDB. For Delta Lake, write support through Unity Catalog has been improved. DuckDB’s Iceberg extension also can write tables when used with the AWS Glue catalog thanks to table properties in the CREATE TABLE statement. There is more to come for Iceberg in the 1.5.1 release as well!

Non-Blocking Checkpointing

I saved this section for last because it is so significant. This work marks the next leap forward in DuckDB’s handling of concurrency. DuckDB may have begun with a vision for amazing single player analytics, but it has grown to be invaluable in all kinds of use cases, including uses with heavy read/write concurrency. Now, whenever you checkpoint a DuckDB file, you can read, write, and delete at the same time. That removes a lot of variability in performance and increases the throughput of the already heavily optimized TPC-H workload by 17%! This took some hardcore engineering! DuckDB now has multiple different write ahead log (WAL) files! That way you can push the contents of one WAL file to the DuckDB file and still modify the database using another WAL file, completely in parallel.

Come Learn More!

However you use DuckDB, version 1.5 has some serious benefits waiting for you! Give it a spin, and MotherDuck will be flying forward to DuckDB 1.5 in just a matter of weeks.

Join us live on March 19th at 9am Pacific to learn even more! Bring your questions!