5 Hidden gems in DuckDB 1.1

2024/09/27 - 4 min read

DuckDB 1.1 was released on September 9, and we recently had a bug fix release, 1.1.1, out on September 23. MotherDuck supported 1.1.1 just two days after its release, and we continue to work closely with the DuckDB Labs team to bring a smooth upgrade experience for all users.
But as things are moving fast, what did you miss in the 1.1 features? DuckDB Labs released their usual blog, but I have my own preferred picks that didn't make that list, so let's dive in.

1. Custom HTTP headers: your database can do API call

The DuckDB extension mechanism is powerful. Most of them are pre-loaded in the background, and you can't see the magic happening. In a previous blog post, I show how we could query an API with a single line statement and return it as a DuckDB table :

Copy code
CREATE TABLE poi_france AS SELECT * FROM read_json_auto('https://my-endpoint/api')

What is happening here :

The httpfs extension is loaded to get the data from an HTTP endpoint.
read_json_auto will parse directly the JSON response in a table

But what if our API is not public and requires authentication and other headers?

This is where the new HTTP headers come into play. You can now create http secret.

Copy code
CREATE SECRET http (
    TYPE HTTP,
    EXTRA_HTTP_HEADERS MAP {
        'Authorization': 'Bearer sk_test_VePHdqKTYQjKNInc7u56JBrQ'
    }
); 

select unnest(data) as customers 
from read_json('https://api.stripe.com/v1/customers');

Snippet courtesy of Archie on duckdbsnippets.com.

2. More data types to optimize memory: VARINT

VARINT type refers to a variable-length integer data type. Unlike fixed-size integers (like INT or BIGINT), which allocate a fixed number of bytes regardless of the size of the value stored, VARINT optimizes the storage by using fewer bytes for smaller numbers and more bytes for larger numbers.

This is particularly useful when dealing with datasets that contain a wide range of integer values, including many small numbers and some large numbers.

Did you know? You can list all data types from the CLI using :

Copy code
D SELECT * FROM (DESCRIBE SELECT * FROM test_all_types()) ;
┌────────────────────────────┬─────────────────────────────────────┬─────────┬─────────┬─────────┬─────────┐
│        column_name         │             column_type             │  null   │   key   │ default │  extra  │
│          varchar           │               varchar               │ varchar │ varchar │ varchar │ varchar │
├────────────────────────────┼─────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┤
│ bool                       │ BOOLEAN                             │ YES     │         │         │         │
│ tinyint                    │ TINYINT                             │ YES     │         │         │         │
│ smallint                   │ SMALLINT                            │ YES     │         │         │         │
│ int                        │ INTEGER                             │ YES     │         │         │         │
│ bigint                     │ BIGINT                              │ YES        │         │         │         │
│     ·                      │     ·                               │  ·      │    ·    │    ·    │    ·    │
│     ·                      │     ·                               │  ·      │    ·    │    ·    │    ·    │
│     ·                      │     ·                               │  ·      │    ·    │    ·    │    ·    │

3. More DuckDB in the browser: Pyodide support

DuckDB is already heavily used in the browser through Wasm. This runs entirely on the client side, enabling you to leverage your local computing and avoid network traffic. Pyodide is a port of CPython to WebAssembly. In short, it enables a Python environment that runs in the browser, again on the client side. This is currently really useful for learning platforms like Datacamp. It's a better experience for the user as things run on the client, and it reduces server-side cost 💸.

DuckDB now supports Pyodide, which means you can install the duckdb package directly there (through micropip - meaning any import statement will install the package). Check the demo using the REPL of Pyodide :

Note : It doesn't support yet extensions - so pretty limited but a big path forward.

4. ORDER BY + LIMIT get faster

Before this fix, DuckDB would not apply the Top-N optimization if the ORDER BY and LIMIT clauses were used in different parts of the query, such as within a CTE. So typically, this will be faster on 1.1 release :

Copy code
WITH CTE AS (SELECT * FROM tbl ORDER BY col) SELECT * FROM cte LIMIT N

5. More insights from EXPLAIN - easier debugging

The DuckDB team added a neat feature to export your EXPLAIN as HTML.

Usage :

Copy code
EXPLAIN (FORMAT HTML) SELECT ...

You can easily navigate through complex plans as you can also collapse/expand children. And that's not all, when using a Jupyter notebook, the explain() method of the DuckDBPyRelation will automatically use the HTML format and render the result using IPython.display.HTML.

Note that the team also re-work the documentation around EXPLAIN and EXPLAIN ANALYZE. Make sure to check this one; it's really helpful whenever you have an issue or performance slowdown to better understand what's going on.

That's it for the new feature on 1.1! In the meantime, keep coding and keep quacking.

☁️🦆 Start using DuckDB in the Cloud for FREE with MotherDuck : https://hubs.la/Q02QnFR40