Running dual (or hybrid) queries
MotherDuck can use local data and remote data in the same query.
Example:
- CLI
- UI
Run the DuckDB CLI.
duckdb
Connect to MotherDuck. You may be prompted to sign in if you aren't already.
ATTACH 'md:';
Run the following in a MotherDuck notebook.
Create a local database in memory.
ATTACH ':memory:' AS local_db;
CREATE TABLE local_db.pricing AS
FROM (VALUES ('A', 1.4), ('B', 1.12), ('C', 2.552), ('D', 5.23))
pricing(item, price);
FROM local_db.pricing;
┌─────────┬──────────────┐
│ item │ price │
│ varchar │ decimal(4,3) │
├─────────┼──────────────┤
│ A │ 1.400 │
│ B │ 1.120 │
│ C │ 2.552 │
│ D │ 5.230 │
└─────────┴──────────────┘
Create a remote database in MotherDuck.
CREATE OR REPLACE DATABASE remote_db;
CREATE TABLE remote_db.sales AS
SELECT
'ABCD'[floor(random() * 3.999)::int + 1] AS item,
current_date() - interval (random() * 100) days AS dt,
floor(random() * 50)::int AS tally
FROM generate_series(1000);
FROM remote_db.sales LIMIT 10;
┌─────────┬─────────────────────┬───────┐
│ item │ dt │ tally │
│ varchar │ timestamp │ int32 │
├─────────┼─────────────────────┼───────┤
│ D │ 2024-11-29 00:00:00 │ 0 │
│ A │ 2024-10-04 00:00:00 │ 17 │
│ A │ 2024-10-13 00:00:00 │ 0 │
│ C │ 2024-11-05 00:00:00 │ 49 │
│ A │ 2024-09-30 00:00:00 │ 12 │
│ B │ 2024-09-27 00:00:00 │ 47 │
│ C │ 2024-11-23 00:00:00 │ 47 │
│ B │ 2024-09-18 00:00:00 │ 13 │
│ A │ 2024-11-18 00:00:00 │ 40 │
│ C │ 2024-09-18 00:00:00 │ 4 │
├─────────┴─────────────────────┴───────┤
│ 10 rows 3 columns │
└───────────────────────────────────────┘
Join the remote sales table to our local pricing data to get revenue by month.
SELECT
date_trunc('month', dt) AS mo,
round(sum(price * tally),2) AS rev
FROM remote_db.sales
JOIN (FROM local_db.pricing WHERE price > 2) pricing
ON sales.item = pricing.item
GROUP BY mo ORDER BY mo;
┌────────────┬───────────────┐
│ mo │ rev │
│ date │ decimal(38,2) │
├────────────┼───────────────┤
│ 2024-09-01 │ 9241.39 │
│ 2024-10-01 │ 14226.12 │
│ 2024-11-01 │ 13136.55 │
│ 2024-12-01 │ 7783.26 │
└────────────┴───────────────┘
To see what is running locally and remotely, you can use EXPLAIN:
EXPLAIN
SELECT
date_trunc('month', dt) AS mo,
round(sum(price * tally),2) AS rev
FROM remote_db.sales
JOIN (FROM local_db.pricing WHERE price > 2) pricing
ON sales.item = pricing.item
GROUP BY mo ORDER BY mo;
In each operator of the plan, (L)
indicates local while (R)
indicates remote. Data is transferred using sinks and sources.
┌─────────────────────────────┐
│┌───────────────────────────┐│
││ Physical Plan ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│ DOWNLOAD_SOURCE (L) │
│ ──────────────────── │
│ bridge_id: 1 │
└─────────────┬─────────────┘
┌───── ────────┴─────────────┐
│ BATCH_DOWNLOAD_SINK (R) │
│ ──────────────────── │
│ bridge_id: 1 │
│ parallel: true │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ ORDER_BY (R) │
│ ──────────────────── │
│ date_trunc('month', sales │
│ .dt) ASC │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION (R) │
│ ──────────────────── │
│ 0 │
│ rev │
│ │
│ ~125 Rows │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ HASH_GROUP_BY (R) │
│ ──────────────────── │
│ Groups: #0 │
│ Aggregates: sum(#1) │
│ │
│ ~125 Rows │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION (R) │
│ ──────────────────── │
│ mo │
│ (CAST(price AS DECIMAL(14 │
│ ,3)) * CAST(tally AS │
│ DECIMAL(14,0))) │
│ │
│ ~250 Rows │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION (R) │
│ ──────────────────── │
│ #0 │
│ #1 │
│ #2 │
│__internal_compress_string_│
│ utinyint(#3) │
│ #4 │
│ │
│ ~250 Rows │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ HASH_JOIN (R) │
│ ──────────────────── │
│ Join Type: INNER │
│ │
│ Conditions: ├──────────────┐
│ item = item │ │
│ │ │
│ ~250 Rows │ │
└─────────────┬─────────────┘ │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│ SEQ_SCAN (R) ││ UPLOAD_SOURCE (R) │
│ ──────────────────── ││ ──────────────────── │
│ sales ││ bridge_id: 2 │
│ ││ │
│ Projections: ││ │
│ item ││ │
│ dt ││ │
│ tally ││ │
│ ││ │
│ ~1001 Rows ││ │
└───────────────────────────┘└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ BATCH_UPLOAD_SINK (L) │
│ ──────────────────── │
│ bridge_id: 2 │
│ parallel: true │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION (L) │
│ ──────────────────── │
│ item │
│ price │
│ │
│ ~1 Rows │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ SEQ_SCAN (L) │
│ ──────────────────── │
│ pricing │
│ │
│ Projections: │
│ price │
│ item │
│ │
│ Filters: │
│ price>2.000 AND price IS │
│ NOT NULL │
│ │
│ ~1 Rows │
└───────────────────────────┘
A dual (or hybrid) query can be run on any database format supported by DuckDB, including sqlite, postgres and many others.