DuckDB 1.1 was released on September 9, and we recently had a bug fix release, 1.1.1, out on September 23. MotherDuck supported 1.1.1
just two days after its release, and we continue to work closely with the DuckDB Labs team to bring a smooth upgrade experience for all users.
But as things are moving fast, what did you miss in the 1.1 features?
DuckDB Labs released their usual blog, but I have my own preferred picks that didn't make that list, so let's dive in.
1. Custom HTTP headers: your database can do API call
The DuckDB extension mechanism is powerful. Most of them are pre-loaded in the background, and you can't see the magic happening. In a previous blog post, I show how we could query an API with a single line statement and return it as a DuckDB table :
Copy code
CREATE TABLE poi_france AS SELECT * FROM read_json_auto('https://my-endpoint/api')
What is happening here :
- The
httpfs
extension is loaded to get the data from an HTTP endpoint. read_json_auto
will parse directly the JSON response in a table
But what if our API is not public and requires authentication and other headers?
This is where the new HTTP headers come into play. You can now create http
secret.
Copy code
CREATE SECRET http (
TYPE HTTP,
EXTRA_HTTP_HEADERS MAP {
'Authorization': 'Bearer sk_test_VePHdqKTYQjKNInc7u56JBrQ'
}
);
select unnest(data) as customers
from read_json('https://api.stripe.com/v1/customers');
Snippet courtesy of Archie on duckdbsnippets.com.
2. More data types to optimize memory: VARINT
VARINT
type refers to a variable-length integer data type. Unlike fixed-size integers (like INT
or BIGINT
), which allocate a fixed number of bytes regardless of the size of the value stored, VARINT
optimizes the storage by using fewer bytes for smaller numbers and more bytes for larger numbers.
This is particularly useful when dealing with datasets that contain a wide range of integer values, including many small numbers and some large numbers.
Did you know? You can list all data types from the CLI using :
Copy code
D SELECT * FROM (DESCRIBE SELECT * FROM test_all_types()) ;
┌────────────────────────────┬─────────────────────────────────────┬─────────┬─────────┬─────────┬─────────┐
│ column_name │ column_type │ null │ key │ default │ extra │
│ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │
├────────────────────────────┼─────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┤
│ bool │ BOOLEAN │ YES │ │ │ │
│ tinyint │ TINYINT │ YES │ │ │ │
│ smallint │ SMALLINT │ YES │ │ │ │
│ int │ INTEGER │ YES │ │ │ │
│ bigint │ BIGINT │ YES │ │ │ │
│ · │ · │ · │ · │ · │ · │
│ · │ · │ · │ · │ · │ · │
│ · │ · │ · │ · │ · │ · │
3. More DuckDB in the browser: Pyodide support
DuckDB is already heavily used in the browser through Wasm. This runs entirely on the client side, enabling you to leverage your local computing and avoid network traffic. Pyodide is a port of CPython to WebAssembly. In short, it enables a Python environment that runs in the browser, again on the client side. This is currently really useful for learning platforms like Datacamp. It's a better experience for the user as things run on the client, and it reduces server-side cost 💸.
DuckDB now supports Pyodide, which means you can install the duckdb package directly there (through micropip
- meaning any import statement will install the package).
Check the demo using the REPL of Pyodide :
Note : It doesn't support yet extensions - so pretty limited but a big path forward.
4. ORDER BY + LIMIT get faster
Before this fix, DuckDB would not apply the Top-N optimization if the ORDER BY
and LIMIT
clauses were used in different parts of the query, such as within a CTE.
So typically, this will be faster on 1.1 release :
Copy code
WITH CTE AS (SELECT * FROM tbl ORDER BY col) SELECT * FROM cte LIMIT N
5. More insights from EXPLAIN - easier debugging
The DuckDB team added a neat feature to export your EXPLAIN as HTML.
Usage :
Copy code
EXPLAIN (FORMAT HTML) SELECT ...
You can easily navigate through complex plans as you can also collapse/expand children.
And that's not all, when using a Jupyter notebook, the explain()
method of the DuckDBPyRelation
will automatically use the HTML format and render the result using IPython.display.HTML
.
Note that the team also re-work the documentation around EXPLAIN
and EXPLAIN ANALYZE
. Make sure to check this one; it's really helpful whenever you have an issue or performance slowdown to better understand what's going on.
That's it for the new feature on 1.1! In the meantime, keep coding and keep quacking.
☁️🦆 Start using DuckDB in the Cloud for FREE with MotherDuck : https://hubs.la/Q02QnFR40
CONTENT
- 1. Custom HTTP headers: your database can do API call
- 2. More data types to optimize memory: VARINT
- 3. More DuckDB in the browser: Pyodide support
- 4. ORDER BY + LIMIT get faster
- 5. More insights from EXPLAIN - easier debugging
Start using MotherDuck now!