Running dual execution (or hybrid) queries

MotherDuck can use local data and remote data in the same query.

Example:

Run the DuckDB CLI.

duckdb

Connect to MotherDuck. You may be prompted to sign in if you aren't already.

ATTACH 'md:';

Create a local database in memory.

ATTACH ':memory:' AS local_db;

CREATE TABLE local_db.pricing AS
    FROM (VALUES ('A', 1.4), ('B', 1.12), ('C', 2.552), ('D', 5.23))
    pricing(item, price);

FROM local_db.pricing;

┌─────────┬──────────────┐
│  item   │    price     │
│ varchar │ decimal(4,3) │
├─────────┼──────────────┤
│ A       │        1.400 │
│ B       │        1.120 │
│ C       │        2.552 │
│ D       │        5.230 │
└─────────┴──────────────┘

Create a remote database in MotherDuck.

CREATE OR REPLACE DATABASE remote_db;

CREATE TABLE remote_db.sales AS
    SELECT 
    'ABCD'[floor(random() * 3.999)::int + 1] AS item,
    current_date() - interval (random() * 100) days AS dt,
    floor(random() * 50)::int AS tally
    FROM generate_series(1000);

FROM remote_db.sales LIMIT 10;

┌─────────┬─────────────────────┬───────┐
│  item   │         dt          │ tally │
│ varchar │      timestamp      │ int32 │
├─────────┼─────────────────────┼───────┤
│ D       │ 2024-11-29 00:00:00 │     0 │
│ A       │ 2024-10-04 00:00:00 │    17 │
│ A       │ 2024-10-13 00:00:00 │     0 │
│ C       │ 2024-11-05 00:00:00 │    49 │
│ A       │ 2024-09-30 00:00:00 │    12 │
│ B       │ 2024-09-27 00:00:00 │    47 │
│ C       │ 2024-11-23 00:00:00 │    47 │
│ B       │ 2024-09-18 00:00:00 │    13 │
│ A       │ 2024-11-18 00:00:00 │    40 │
│ C       │ 2024-09-18 00:00:00 │     4 │
├─────────┴─────────────────────┴───────┤
│ 10 rows                     3 columns │
└───────────────────────────────────────┘

Join the remote sales table to our local pricing data to get revenue by month.

SELECT
    date_trunc('month', dt) AS mo, 
    round(sum(price * tally),2) AS rev 
FROM remote_db.sales
JOIN (FROM local_db.pricing WHERE price > 2) pricing 
    ON sales.item = pricing.item
GROUP BY mo ORDER BY mo;

┌────────────┬───────────────┐
│     mo     │      rev      │
│    date    │ decimal(38,2) │
├────────────┼───────────────┤
│ 2024-09-01 │       9241.39 │
│ 2024-10-01 │      14226.12 │
│ 2024-11-01 │      13136.55 │
│ 2024-12-01 │       7783.26 │
└────────────┴───────────────┘

To see what is running locally and remotely, you can use EXPLAIN:

EXPLAIN
SELECT
    date_trunc('month', dt) AS mo, 
    round(sum(price * tally),2) AS rev 
FROM remote_db.sales
JOIN (FROM local_db.pricing WHERE price > 2) pricing 
    ON sales.item = pricing.item
GROUP BY mo ORDER BY mo;

In each operator of the plan, (L) indicates local while (R) indicates remote. Data is transferred using sinks and sources.

┌─────────────────────────────┐
│┌───────────────────────────┐│
││       Physical Plan       ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│    DOWNLOAD_SOURCE (L)    │
│    ────────────────────   │
│        bridge_id: 1       │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│  BATCH_DOWNLOAD_SINK (R)  │
│    ────────────────────   │
│        bridge_id: 1       │
│       parallel: true      │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│        ORDER_BY (R)       │
│    ────────────────────   │
│ date_trunc('month', sales │
│          .dt) ASC         │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│       PROJECTION (R)      │
│    ────────────────────   │
│             0             │
│            rev            │
│                           │
│         ~125 Rows         │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│     HASH_GROUP_BY (R)     │
│    ────────────────────   │
│         Groups: #0        │
│    Aggregates: sum(#1)    │
│                           │
│         ~125 Rows         │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│       PROJECTION (R)      │
│    ────────────────────   │
│             mo            │
│ (CAST(price AS DECIMAL(14 │
│    ,3)) * CAST(tally AS   │
│       DECIMAL(14,0)))     │
│                           │
│         ~250 Rows         │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│       PROJECTION (R)      │
│    ────────────────────   │
│             #0            │
│             #1            │
│             #2            │
│__internal_compress_string_│
│        utinyint(#3)       │
│             #4            │
│                           │
│         ~250 Rows         │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│       HASH_JOIN (R)       │
│    ────────────────────   │
│      Join Type: INNER     │
│                           │
│        Conditions:        ├──────────────┐
│        item = item        │              │
│                           │              │
│         ~250 Rows         │              │
└─────────────┬─────────────┘              │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│       SEQ_SCAN  (R)       ││     UPLOAD_SOURCE (R)     │
│    ────────────────────   ││    ────────────────────   │
│           sales           ││        bridge_id: 2       │
│                           ││                           │
│        Projections:       ││                           │
│            item           ││                           │
│             dt            ││                           │
│           tally           ││                           │
│                           ││                           │
│         ~1001 Rows        ││                           │
└───────────────────────────┘└─────────────┬─────────────┘
                             ┌─────────────┴─────────────┐
                             │   BATCH_UPLOAD_SINK (L)   │
                             │    ────────────────────   │
                             │        bridge_id: 2       │
                             │       parallel: true      │
                             └─────────────┬─────────────┘
                             ┌─────────────┴─────────────┐
                             │       PROJECTION (L)      │
                             │    ────────────────────   │
                             │            item           │
                             │           price           │
                             │                           │
                             │          ~1 Rows          │
                             └─────────────┬─────────────┘
                             ┌─────────────┴─────────────┐
                             │       SEQ_SCAN  (L)       │
                             │    ────────────────────   │
                             │          pricing          │
                             │                           │
                             │        Projections:       │
                             │           price           │
                             │            item           │
                             │                           │
                             │          Filters:         │
                             │  price>2.000 AND price IS │
                             │          NOT NULL         │
                             │                           │
                             │          ~1 Rows          │
                             └───────────────────────────┘

A dual execution (or hybrid) query can be run on any database format supported by DuckDB, including sqlite, postgres and many others.