Live Demo: Getting Started with MotherDuck and the DuckDB UIRegister Now

Skip to main content

Running dual (or hybrid) queries

MotherDuck can use local data and remote data in the same query.

Example:

Run the DuckDB CLI.

duckdb

Connect to MotherDuck. You may be prompted to sign in if you aren't already.

ATTACH 'md:';

Create a local database in memory.

ATTACH ':memory:' AS local_db;

CREATE TABLE local_db.pricing AS
FROM (VALUES ('A', 1.4), ('B', 1.12), ('C', 2.552), ('D', 5.23))
pricing(item, price);

FROM local_db.pricing;
┌─────────┬──────────────┐
│ item │ price │
│ varchar │ decimal(4,3)
├─────────┼──────────────┤
│ A │ 1.400
│ B │ 1.120
│ C │ 2.552
│ D │ 5.230
└─────────┴──────────────┘

Create a remote database in MotherDuck.

CREATE OR REPLACE DATABASE remote_db;

CREATE TABLE remote_db.sales AS
SELECT
'ABCD'[floor(random() * 3.999)::int + 1] AS item,
current_date() - interval (random() * 100) days AS dt,
floor(random() * 50)::int AS tally
FROM generate_series(1000);

FROM remote_db.sales LIMIT 10;
┌─────────┬─────────────────────┬───────┐
│ item │ dt │ tally │
│ varchar │ timestamp │ int32 │
├─────────┼─────────────────────┼───────┤
│ D │ 2024-11-29 00:00:00 │ 0
│ A │ 2024-10-04 00:00:00 │ 17
│ A │ 2024-10-13 00:00:00 │ 0
│ C │ 2024-11-05 00:00:00 │ 49
│ A │ 2024-09-30 00:00:00 │ 12
│ B │ 2024-09-27 00:00:00 │ 47
│ C │ 2024-11-23 00:00:00 │ 47
│ B │ 2024-09-18 00:00:00 │ 13
│ A │ 2024-11-18 00:00:00 │ 40
│ C │ 2024-09-18 00:00:00 │ 4
├─────────┴─────────────────────┴───────┤
10 rows 3 columns │
└───────────────────────────────────────┘

Join the remote sales table to our local pricing data to get revenue by month.

SELECT
date_trunc('month', dt) AS mo,
round(sum(price * tally),2) AS rev
FROM remote_db.sales
JOIN (FROM local_db.pricing WHERE price > 2) pricing
ON sales.item = pricing.item
GROUP BY mo ORDER BY mo;
┌────────────┬───────────────┐
│ mo │ rev
date │ decimal(38,2)
├────────────┼───────────────┤
2024-09-01 │ 9241.39
2024-10-01 │ 14226.12
2024-11-01 │ 13136.55
2024-12-01 │ 7783.26
└────────────┴───────────────┘

To see what is running locally and remotely, you can use EXPLAIN:

EXPLAIN
SELECT
date_trunc('month', dt) AS mo,
round(sum(price * tally),2) AS rev
FROM remote_db.sales
JOIN (FROM local_db.pricing WHERE price > 2) pricing
ON sales.item = pricing.item
GROUP BY mo ORDER BY mo;

In each operator of the plan, (L) indicates local while (R) indicates remote. Data is transferred using sinks and sources.

┌─────────────────────────────┐
│┌───────────────────────────┐│
││ Physical Plan ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│ DOWNLOAD_SOURCE (L)
│ ──────────────────── │
│ bridge_id: 1
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ BATCH_DOWNLOAD_SINK (R)
│ ──────────────────── │
│ bridge_id: 1
│ parallel: true
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ ORDER_BY (R)
│ ──────────────────── │
│ date_trunc('month', sales │
│ .dt) ASC │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION (R)
│ ──────────────────── │
0
rev
│ │
│ ~125 Rows │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ HASH_GROUP_BY (R)
│ ──────────────────── │
│ Groups: #0 │
│ Aggregates: sum(#1) │
│ │
│ ~125 Rows │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION (R)
│ ──────────────────── │
│ mo │
(CAST(price AS DECIMAL(14
│ ,3)) * CAST(tally AS │
│ DECIMAL(14,0)))
│ │
│ ~250 Rows │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION (R)
│ ──────────────────── │
#0 │
#1 │
#2 │
│__internal_compress_string_│
│ utinyint(#3) │
#4 │
│ │
│ ~250 Rows │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ HASH_JOIN (R)
│ ──────────────────── │
│ Join Type: INNER │
│ │
│ Conditions: ├──────────────┐
│ item = item │ │
│ │ │
│ ~250 Rows │ │
└─────────────┬─────────────┘ │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│ SEQ_SCAN (R) ││ UPLOAD_SOURCE (R)
│ ──────────────────── ││ ──────────────────── │
│ sales ││ bridge_id: 2
│ ││ │
│ Projections: ││ │
│ item ││ │
│ dt ││ │
│ tally ││ │
│ ││ │
│ ~1001 Rows ││ │
└───────────────────────────┘└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ BATCH_UPLOAD_SINK (L)
│ ──────────────────── │
│ bridge_id: 2
│ parallel: true
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION (L)
│ ──────────────────── │
│ item │
│ price │
│ │
│ ~1 Rows │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ SEQ_SCAN (L)
│ ──────────────────── │
│ pricing │
│ │
│ Projections: │
│ price │
│ item │
│ │
│ Filters: │
│ price>2.000 AND price IS │
│ NOT NULL │
│ │
│ ~1 Rows │
└───────────────────────────┘

A dual (or hybrid) query can be run on any database format supported by DuckDB, including sqlite, postgres and many others.