Running hybrid queries
Running hybrid queries only works when you have access to the local filesystem, and thus, it only operates through the CLI, not the web UI.
MotherDuck can query data in your local DuckDB databases alongside data in MotherDuck:
- CLI
Connect to MotherDuck
ATTACH 'md:';
USE my_db;
Check the current context
SELECT current_database(), current_schema();
should result in :
┌────────────────────┬──────────────────┐
│ current_database() │ current_schema() │
│ varchar │ varchar │
├────────────────────┼──────────────────┤
│ my_db │ main │
└────────────────────┴──────────────────┘
Attach your local database
ATTACH 'local.db';
-- Query across local and MotherDuck
SELECT * FROM table1 JOIN local.local_table USING(column_name);
A hybrid query can be run on any database format supported by DuckDB. Including sqlite, postgres and many others.
ATTACH 'local.sqlite';
-- Query across local and motherduck
SELECT * FROM table1 JOIN local.local_sqlite_table USING(column_name);
Join A Local CSV Or Parquet File To Data In MotherDuck
- CLI
Given the csv
file `pricing.csv`` as follows
$ cat pricing.csv
item,price
A,1.4
B,1.12
C,2.552
D,5.23
Let's connect to motherduck (you can also use ./duckdb md:
)
./duckdb motherduck:
Let's run the following queries
-- note we can select from a local file
SELECT * FROM './pricing.csv';
will return the following result
┌─────────┬────────┐
│ item │ price │
│ varchar │ double │
├─────────┼────────┤
│ A │ 1.4 │
│ B │ 1.12 │
│ C │ 2.552 │
│ D │ 5.23 │
└─────────┴────────┘
Let's make some fake sales data in MotherDuck : item, dt, tally. Note: This table is created in MotherDuck in the cloud
CREATE OR REPLACE TABLE sales AS
SELECT
'ABCD'[floor(random() * 3.999)::int + 1] AS item,
current_date() - interval (random() * 100) days AS dt,
floor(random() * 50)::int AS tally
FROM generate_series(1000);
SELECT * FROM sales LIMIT 10;
will give us the following result
┌─────────┬────────────┬───────┐
│ item │ dt │ tally │
│ varchar │ date │ int32 │
├─────────┼────────────┼───────┤
│ B │ 2022-11-07 │ 13 │
│ D │ 2022-09-30 │ 32 │
│ C │ 2022-10-08 │ 39 │
...
Let's join it to our pricing csv to get revenue by day Note: this is a hybrid join, joing cloud data to data on your computer
SELECT date_trunc('month', dt) AS mo,
round(sum(price * tally),2) AS rev
FROM sales
JOIN (SELECT * FROM './pricing.csv' WHERE price > 2) pricing
ON sales.item = pricing.item
GROUP BY mo ORDER BY mo;