Skip to main content

Running hybrid queries

note

Running hybrid queries only works when you have access to the local filesystem, and thus, it only operates through the CLI, not the web UI.

MotherDuck can query data in your local DuckDB databases alongside data in MotherDuck:

Connect to MotherDuck

ATTACH 'md:';
USE my_db;

Check the current context

SELECT current_database(), current_schema();

should result in :

┌────────────────────┬──────────────────┐
│ current_database() │ current_schema()
│ varchar │ varchar │
├────────────────────┼──────────────────┤
│ my_db │ main │
└────────────────────┴──────────────────┘

Attach your local database

ATTACH 'local.db';

-- Query across local and MotherDuck
SELECT * FROM table1 JOIN local.local_table USING(column_name);

A hybrid query can be run on any database format supported by DuckDB. Including sqlite, postgres and many others.

ATTACH 'local.sqlite';

-- Query across local and motherduck
SELECT * FROM table1 JOIN local.local_sqlite_table USING(column_name);

Join A Local CSV Or Parquet File To Data In MotherDuck

Given the csv file `pricing.csv`` as follows

> cat pricing.csv
item,price
A,1.4
B,1.12
C,2.552
D,5.23

Let's connect to motherduck (you can also use ./duckdb md:)

./duckdb motherduck:

Let's run the following queries

-- note we can select from a local file
SELECT * FROM './pricing.csv';

will return the following result

┌─────────┬────────┐
│ item │ price │
│ varchar │ double │
├─────────┼────────┤
│ A │ 1.4
│ B │ 1.12
│ C │ 2.552
│ D │ 5.23
└─────────┴────────┘

Let's make some fake sales data in MotherDuck : item, dt, tally. Note: This table is created in MotherDuck in the cloud

CREATE OR REPLACE TABLE sales AS
SELECT
'ABCD'[floor(random() * 3.999)::int + 1] AS item,
current_date() - interval (random() * 100) days AS dt,
floor(random() * 50)::int AS tally
FROM generate_series(1000);
SELECT * FROM sales LIMIT 10;

will give us the following result

┌─────────┬────────────┬───────┐
│ item │ dt │ tally │
│ varchar │ date │ int32 │
├─────────┼────────────┼───────┤
│ B │ 2022-11-07 │ 13
│ D │ 2022-09-30 │ 32
│ C │ 2022-10-08 │ 39
...

Let's join it to our pricing csv to get revenue by day Note: this is a hybrid join, joing cloud data to data on your computer

SELECT date_trunc('month', dt) AS mo, 
round(sum(price * tally),2) AS rev
FROM sales
JOIN (SELECT * FROM './pricing.csv' WHERE price > 2) pricing
ON sales.item = pricing.item
GROUP BY mo ORDER BY mo;