Using SQL in Your Data Lake with DuckDB, Iceberg, dbt, and MotherDuck

2025/01/17Featuring:

Hey there data folks! Jacob Matson from MotherDuck's DevRel team here. Let me tell you about my recent obsession: using SQL more effectively across your entire data workflow. If you're like me, you've probably wondered why our data stacks often force us to switch between different tools and languages when SQL is the one language we all speak. What if we could just stick with SQL across the board? That's exactly what I've been exploring by combining DuckDB, MotherDuck, Apache Iceberg, and dbt.

The Dynamic Duo: DuckDB and MotherDuck

For those who haven't waddled into DuckDB's waters yet, it's an in-process OLAP query engine – think of it as SQLite's analytically-minded cousin. It's remarkably lightweight (just a single binary) that runs practically anywhere: Mac, Linux, Windows, and even in browsers via WASM. What makes DuckDB particularly useful is its built-in parallelism – write one SQL query, and it efficiently uses all your CPU cores to process it. Plus, its SQL dialect is based on PostgreSQL, so it feels familiar from the start.

One of DuckDB's standout features is its extension ecosystem. These extensions let DuckDB read from and write to numerous data sources and formats – Parquet, CSVs, JSON, and as we'll see later, even Google Sheets and Apache Iceberg tables.

So what's MotherDuck? We've taken the open-source DuckDB and built a serverless cloud service around it. Unlike distributed systems that scale out, we scale up on single nodes. This approach makes practical sense when you consider that single-node hardware capabilities (AWS instances with up to 32TB RAM and nearly 900 vCPUs) are growing faster than most datasets. With MotherDuck, you get DuckDB's local speed plus remote querying, persistent storage, and collaboration in the cloud.

Here's an interesting tidbit: in a recent survey, 23% of DuckDB users identified as software engineers. That's telling – we've found a SQL dialect that developers actually like using! When everyone in your organization speaks the same data language, those handoffs between business analysts, data analysts, data engineers, and software engineers become much smoother.

Adding Apache Iceberg to the Mix

For modern data architectures handling large datasets in data lakes, an open table format like Apache Iceberg proves essential. It offers schema evolution (table structures can change without breaking things), time travel (accessing historical versions of your data), and ACID-like transaction properties directly on your data lake storage like S3 or Google Cloud Storage.

The neat part is that DuckDB and MotherDuck integrate with Iceberg through a simple extension. Here's how easily you can get started from your DuckDB CLI:

Copy code
-- Install and Load the Iceberg extension (if not autoloaded by MotherDuck)
INSTALL 'iceberg';
LOAD 'iceberg';

-- (Optional) Install and Load the HTTPFS extension to access remote filesystems like S3
INSTALL 'httpfs';
LOAD 'httpfs';

-- Query an Iceberg table stored on S3 directly
FROM iceberg_scan('s3://us-prd-motherduck-open-datasets/iceberg/tpcds/default.db/call_center/', ALLOW_MOVED_PATHS=TRUE);

The key function here is iceberg_scan, which points to an S3 path where our Iceberg table resides. The ALLOW_MOVED_PATHS=TRUE parameter helps DuckDB resolve paths correctly if files have been moved or restructured.

The practical benefit? Anyone on your team – analyst, data engineer, or developer – can tap into the same governed, version-controlled dataset in your data lake using straightforward SQL. No context switching or learning complex APIs just to read data.

Local-to-Cloud Integration with MotherDuck

This is where things get particularly interesting. While you can query Iceberg tables directly from your local DuckDB instance, for large datasets, you'll often want the compute to happen closer to the data in the cloud. MotherDuck makes this seamless.

From your local DuckDB shell, you simply "attach" to your MotherDuck account:

Copy code
-- This command connects your local DuckDB session to your MotherDuck service
ATTACH 'md:';

Once attached, your local DuckDB client works as a lightweight interface to your MotherDuck environment. You can execute queries that run within MotherDuck, right next to your data in AWS (assuming your S3 buckets are in the same region as MotherDuck).

For example, creating a table in MotherDuck sourced directly from an Iceberg table in S3:

Copy code
-- Create a table in your MotherDuck database from an Iceberg source
CREATE OR REPLACE TABLE my_db.main.call_center AS
FROM iceberg_scan('s3://us-prd-motherduck-open-datasets/iceberg/tpcds/default.db/call_center/', ALLOW_MOVED_PATHS=TRUE);

The naming convention my_db.main.call_center refers to a database within your MotherDuck account, with call_center as the new table name.

When you run this statement, the data is read from Iceberg in S3 and written into a native DuckDB table within MotherDuck, all happening efficiently in the cloud while your local machine just orchestrates it. This minimizes network overhead for large data transfers. In my testing, even with the small 30-row example table (imagine millions in a real scenario), the data loaded noticeably faster because the heavy lifting happened in AWS rather than pulling data to my laptop.

You can then query this table either from your local CLI or directly within the MotherDuck UI (which runs DuckDB in your browser using WASM – pretty cool, right?).

Transforming Data with dbt: DataOps Best Practices

Reading data is just the beginning – most real-world data workflows involve transformations. This is where dbt (data build tool) comes in, allowing you to define data transformations using SQL models that promote version control, modularity, and testing for your pipelines.

With DuckDB and MotherDuck, you can build an efficient dbt workflow:

Raw Layer (Views on Iceberg): Define dbt sources pointing to your Iceberg tables in S3, then create dbt models materialized as views. These views in MotherDuck will directly query the underlying Iceberg data upon access – ideal for your "bronze" or raw data layer.

In your dbt project, a source YAML might look like:

Copy code
# models/sources.yml
version: 2

sources:
 - name: raw_iceberg_data
   schema: my_iceberg_sources # A conceptual schema
   tables:
     - name: call_center_iceberg
       meta:
         external_location: "iceberg_scan('s3://us-prd-motherduck-open-datasets/iceberg/tpcds/default.db/call_center/', ALLOW_MOVED_PATHS=TRUE);"

And a model materialized as a view:

Copy code
-- models/raw/raw_call_center.sql
{{ config(materialized='view') }}

SELECT * FROM {{ source('raw_iceberg_data', 'call_center_iceberg') }}

Transformations (SQL Models): Write your business logic and transformations as standard dbt SQL models, which dbt will compile into appropriate SQL queries for DuckDB/MotherDuck.

Gold Layer (Materialized Tables in MotherDuck): For your final, curated datasets (your "gold" layer), materialize these dbt models as tables in MotherDuck. This persists the transformed data in MotherDuck's native format, offering excellent performance for downstream analytics and applications.

Copy code
-- models/gold/dim_call_center_summary.sql
{{ config(materialized='table') }}

SELECT
   c_call_center_sk,
   c_first_name,
   COUNT(*) AS total_records_in_source -- Example transformation
FROM {{ ref('raw_call_center') }}
GROUP BY ALL

Running dbt build executes these models. The views pointing to Iceberg will be created in MotherDuck, and the queries for materialized tables will pull data from Iceberg (via the views), transform it, and store the results as optimized tables within MotherDuck. I was able to build a dbt project based on the TPC-DS dataset (about 40GB in Iceberg) in MotherDuck in about 12-15 minutes, which was a pleasant surprise.

This approach gives you a clean separation: Iceberg for your large, evolving data lake storage, and MotherDuck for performant querying, transformation execution, and serving refined data.

Beyond the Database: Extension Power

The SQL-centric approach extends beyond querying and transforming. DuckDB's extension ecosystem lets you integrate your data with other tools and systems using just a few lines of SQL.

Serving Data Over HTTP

Need to quickly expose some data via an API? DuckDB has an httpserver extension:

Copy code
-- In your DuckDB CLI
INSTALL httpserver FROM community;
LOAD httpserver;

-- Start a simple HTTP server from within DuckDB
SELECT httpserve_start('0.0.0.0', 8080, '');

Once the server is running, you can curl it to get query results as JSON:

Copy code
curl -X POST -d "SELECT 1" "http://localhost:8080"

Copy code
curl -X POST -d "FROM my_db.call_center"

This queries the call_center table we created in MotherDuck and returns the first 10 rows as JSON. Imagine creating quick analytics endpoints or data feeds for other services with just SQL!

Exporting Data to Google Sheets

How often have you been asked for data in a spreadsheet? The gsheets extension makes this remarkably straightforward:

Copy code
-- In your DuckDB CLI
INSTALL gsheets from community;
LOAD gsheets;

-- Create a secret for OAuth with Google Sheets
CREATE SECRET (
   TYPE gsheet,
   -- You'll usually be prompted for credentials or a token
);

-- Write data from a MotherDuck table to a Google Sheet
COPY (
   FROM my_db.call_center
) TO 'YOUR_GOOGLE_SHEET_ID' (
   FORMAT ghseet
);

Just like that, data from your MotherDuck tables can be piped directly into a Google Sheet. That CFO report due every Monday? Now you can automate it with a few quacks of SQL.

Real-World Impact & Use Cases

So what does all this mean in practice?

Enhanced Cross-Team Collaboration: With SQL as the common denominator, analysts, data engineers, and software developers can finally speak the same language. Handoffs become smoother, and everyone can contribute to and understand the data pipeline.

Scalability and Flexibility: DuckDB's in-process speed works great for local development and smaller tasks, while MotherDuck's serverless architecture lets you scale up effortlessly for massive datasets and complex queries in the cloud. You get the right tool for the job without changing your SQL approach.

Practical Extensibility: The rich extension ecosystem (HTTP server, Google Sheets, Excel, spatial analysis, and many more) means you can connect DuckDB and MotherDuck to virtually anything, automating workflows in creative ways.

Getting Started

Ready to try this out yourself?

Install DuckDB: Head over to the DuckDB Official Website to install it locally. It's as simple as pip install duckdb or using brew.
Sign Up for MotherDuck: Explore the MotherDuck Documentation and sign up for a free account to experience the serverless cloud capabilities.
Learn About Apache Iceberg: Check out the Apache Iceberg Documentation to understand its features and how to set up Iceberg tables.
Explore dbt Integration: If you're new to dbt, the dbt Docs are a great place to start learning how to build SQL-based data transformations.
Experiment with Extensions: The DuckDB Extensions page lists available extensions. Try a few out!

As you get more comfortable, consider how you can automate your data pipelines fully – perhaps by scheduling dbt runs or creating dynamic analytics endpoints using the HTTP extension.

Conclusion: SQL Simplicity

By combining DuckDB's local speed and rich SQL dialect, MotherDuck's serverless deployment and collaborative features, Iceberg's robust open table format, and dbt's transformation capabilities, teams can create a streamlined, SQL-centric workflow. This stack works for everything from local ad-hoc analytics to cloud-scale data processing and innovative integrations like direct exports to Google Sheets.

Using SQL across different roles and parts of the data stack doesn't just simplify individual tasks; it removes friction and fosters collaboration. It shows that sometimes, the most practical tools, elegantly combined, can solve complex data challenges most effectively. Give it a try – I think you'll find this approach surprisingly powerful for your data workflows.

TABLE OF CONTENTS

The Dynamic Duo: DuckDB and MotherDuck

Adding Apache Iceberg to the Mix

Local-to-Cloud Integration with MotherDuck

Transforming Data with dbt: DataOps Best Practices

Beyond the Database: Extension Power

Real-World Impact & Use Cases

Getting Started

Conclusion: SQL Simplicity

Transcript

0:04I'm Jacob Matson I work here at mother duck Welcome to our great office um I work in in uh devell so I get to kind of do marketing without saying I work in marketing but uh my talk today is about we should use SQL more and uh about how we can use uh Iceberg and SQL together

0:23um to do lots of lots of fun things um okay come on so uh I want to a quick primer on what duck DB is uh how many of you here have used duck DB before okay about third maybe quarter so duct DB both have a good cont per perfect uh so duct DB is a inprocess um

0:46olap query engine it's like SQL light you can put it anywhere um it's super lightweight single binary runs on Mac Linux um Windows Etc um really uh where

0:59it excels is it kind of has built-in parallelism everywhere you write one single query on it and it will saturate the CPU um it also has really nice kind of single parser or a SQL kind of career parser based on the postgress um spec so it is postgress compatible as well for those of you that are have experience

1:16using postrest it's very seamless just plug in your queries that they just work um and then a bunch of um one one thing that I love about it and we're going to talk about a lot today is it has uh this notion of supporting libraries so uh a lot of the challenges that come along with SQL are that you're rewriting the

1:31same things over and over and over again um it supports libraries you know written in C++ SQL and other languages so um what we're doing at mother duck is we're taking this really great open source library that is duck DB and we're wrapping a cloud service around it um it's serverless it scales it's single node single node scale so we scale up

1:51not out um one thing that we've kind of you know come to realize is that as we build software that's kind of built for single nodes is that uh single nodes continue to get really big and they've kind of outscaled the our ability to create data the biggest note on AWS for example is 32 terab of ram right now and uh 896

2:11vcpus um it's very large uh I think you can only get those for sap Hana by the way so um can't use duct Tob on them yet uh but uh yeah so we got a bunch of cool stuff like that that we're building out um and uh yeah that's kind of the mother duck the mother duck story there um one thing I

2:31want to talk about and why I talk about let's use SQL more is um kind of the old Paradigm that I think about is these groups of people um they use different tools right maybe the business user using Excel and the data scientists are using R and Python and the data Engineers are you know maybe in in in

2:49the core of what they're what they're building or maybe they're using spark or you know their specific SQL engine um and you know software Engineers aren't really in the picture generally when you're talking about data um one thing that's happening with duck DB is and when they surveyed their users is we're seeing that uh 23% of duck DB

3:06users self-identify as software Engineers so um something interesting is happening here we actually have something that uh we believe is uh SQL dialect that softare developers don't hate um which I think uh speaks a lot for the quality of duct TB um but uh if

3:23we use if we use SQL more what it means is that when we get really complex problems everyone can talk to each other right because uh I can as a business analyst I can write a query that or maybe frame out the notion of one that someone else can take you know from a data uh data analyst perspective to a

3:40data engineer or then maybe even to a software engineer I think that's a very powerful pattern because a lot of the expensiveness of solving problems with data is the handoffs between all of these groups using different languages and different paradigms um so what I love about this again duct DB and SQL we can solve uh epic business problems uh together so uh

4:05without further Ado I'm going to jump into a demo here of the duck

4:11DB command line so let's find the right

4:18window not that one okay give me one second to get this here okay

4:29so I'm going to find my share window and share my

4:40terminal hang on I think I have to move my Google meet to my the right

4:48desktop here we

4:54go okay so we are in the terminal all um so so uh you can install

5:01duck DB with Brew in your terminal um if you're on a Mac um you can also download the binaries using WG or whatever uh on Linux and uh on Windows you can download an exe installer um so I'm going to Dum jump into duck DB and I'm going to turn the timer on why not um okay so like I mentioned earlier

5:28uh there are Library for dctb so by default if I try to read an iceberg file right now I can't do it because I need to install the iceberg

5:40library and load it so now I have access to it and I'm going to do the same thing with htd PFS so I can hit uh if I can type so that way I can hit um S3 as well

5:54okay uh and again this is all SQL so if you're like you know running something again you just do like a prepar statement executed in SQL that all just executed in duct EB okay um so I think I

6:06have a query here that I can grab so I'm going to use what we call Iceberg scan and so I've taken the um I went through the pain of getting spark running on my laptop don't recommend that

6:20um and uh took some paret files that represent TP the TPC DS data set at scale factor 100 which is about 100 gigabyt of data um uh you know kind of a nice you know it's not not a huge data set but uh enough to kind of get things working so um this is what the SQL query looks like we just

6:39pass the S3 path in and we need to make sure to pass in this all rows or uh allow allow moved paths um parameter to and so we can just like look at what's in this table this only has 30 rows but um shouldn't come right back maybe okay great um yeah so there's some latency there um probably just going out to est3

6:57and getting the data honestly um so that's the first thing we can do um and then uh what we want to do next is really kind of like okay well this is all running like locally with latency to my laptop that's why it took like two and a half seconds we can just attach mother duck to this and then use mother duck to

7:17query it and mother duck is in is in uh AWS also right next to S3 so it'll be much faster right so we can do something like uh attach notion so oops see if I can type it right okay so this is going to basically take my local duck DB session and authenticate to mother duck um so I've just done that

7:39and so I can actually just type in like show databases as an example and you'll see that I have a databas is attached Now Memory this memory DB here this represents um this represents the database that I actually have kind of in memory using dctb right now on my laptop so I have all these other databases available in

7:57the cloud and then I have this one this one avail able kind of locally to me so when I when you see me pass like an abbreviated path like just a single table reference that's going to be using the memory database um all right so uh let's just really quickly put this

8:15into um into the cloud so I'm just going to put this into my DB so I'm going to go back up and just type in create or replace table we'll just call this I don't know call center oops okay so if I do it like this it's going to just create a table kind of in my local I want to actually reference my

8:36cloud database which is called myb or any of these really so now that I have a fully qualified name it's going to just insert this if I can you know remember the keyword as it's very

8:49nice did my cursor go okay

8:53um Okay cool so now we have uh o a catalog ER call center is not a table that's because I created a view with that name in it already earlier today let's just rename

9:08it okay great so now we've taken that data and you'll notice it just it took less time right because I'm just going there's no network traffic coming here to my local machine that's just all happening in AWS and just saying hey yeah that query worked so if I go look in my DB I can see I have a bunch of

9:23tables so I can do like I'm going to use my DB like this and then show tables and

9:29now I will show here's my should have some yeah here we go so here's the ones I threw in there um so again this is just interacting you know just like a classic cell session like using PC PC glue or whatever right um but we also have a a really powerful database that we can use locally too so

9:48uh because this is all SQL one thing that's really cool is we can use DBT as well yeah yeah I I didn't see exactly where you're running this this in the same account um saying r or like how

10:02like shouldn't we expect different Bing SE that I think the I think the S3 yeah good question good question I think I mean my experience with with the S3 interconnects is they're pretty good but like yes if you're not in East you know uh if your S3 bucket is outside of Us East one it might be slower yeah but

10:21like not materially in my experience but it is you you can tell mother's serion in US is one back yeah uh is there a

10:31timeline for the regions uh not that not that I'm aware of um it's definitely on our road map you know to to add add multi region sport for sure yeah um yeah no problem so one thing that's cool is is that because we're just using pure SQL we can use tools like DBT that manage kind of our data

10:50transformation layer so I'm going to just stop sharing for a second and hop into a DBT project that has a bunch of um has all my sources defined I'll just show you what that looks like if I can okay this was

11:10a okay share Windows okay um I know probably my guess is this audience is not super familiar with DBT how many of you here okay great cool um so DBT is like a a layer for managing Transformations um so here's how i i b I basically have this kind of project set up so I can hit a bunch of different

11:35data sources uh using variables in DBT so uh for example I can really quickly import my Iceberg catalog by just doing DBT uh DVT build and I'm just going to do just these raw tables um can I move this okay

11:51DVT build um actually probably there we go okay so what this is doing is it's just going to run uh run right now and it's going to uh pass this sequel into mother duck actually hang on let me just I don't know why I did that let's run this a little bit faster um so this will run eight threads at a

12:07time and just Jam views um across into mother duck um so now we've basically put all of the TPC DS data that is uh in iceberg into mother duck right so that means that what we can do is if I go to mother duck

12:29and reshare my screen if I can find

12:38Google sure this is the right VI here we go yep okay so this is the mother duck web UI and so I can actually here are my here's kind of a list of the tables that I have um I'm just F filtering on database name and you know I I have all the queries there's 99 queries as part of

12:55tcds that I don't care about right now so I can see that so these are all basic in here as views and so what's really cool is I can actually like so this is the the mother duck web UI um which again is running duct DB but duct DB WM so this is in the browser um so I if I click on this

13:10you'll notice that my my catalog takes a second to load that's because because this is an iceberg table and it's a view this is basically now we we've made this a schema on read it's if you squint it's a materialized view it's it's not a materialized view but if you squint it kind of is um so what this means is

13:26we're just you know as more data is inserted into that Iceberg table it just shows up yeah sorry uh you just spot SC W are you saying

13:36that you have connected to

13:41the these are running on the same databases but uh the duck DB wasm engine is connected through mother duck directly and then I have a separate session open connected through my command line thanks but they uh they

13:55are not on the same like uh kind back end and since they're going to they're both going to be in their own separate like we call it a duck Clank so they're going to have walk no no not not in this case no no so everything so all of this data went into into mother duck on the back

14:09end and then I'm pulling the catalog into wum yes yeah yeah yep um okay so we

14:16have data in here this is um views right so we can we can do things like um we can uh like look at this data right here's a uh this is like a 300 million row table and we're just you know getting a a summary of it you know here's sales sales by day basically um I'm not

14:35joining anything into it um join joins work um they're very fast um but just for the sake of this demo I didn't add it um so one thing I can do here that's really cool though is if I do create temporary table this table that I'm running this is now in wum right so when I execute things against this temporary

14:52table um it's super fast like I don't know this demo like uh I'm just doing having some fun here here's like a linreg you know on this data set right so we we've taken this data of 300 million rows we've summarized in to almost to 2K rows and now we can get linreg on like hey what's our what does

15:10that look like uh on this data set the sales by sales by day data set um are you exporting the temporary tank PS or kind of in the same vein question can you exportal use up to W absolutely yeah

15:25so when I do create temp table here and I do it when I and I don't specif if I where where I'm putting it it'll go locally yeah that's right all right yeah um which means that if I do subsequent queries on it it's free right I'm not it's not not increasing any compute um okay now I'm going to go back

15:43to DBT so I'm going to stop sharing if I can find the button and present something else Ah that's nice okay so um again this is probably

15:57most relevant to people who are like kind of know their way around DBT but in my DBT project I have defined kind of my raw okay stop it don't want to show the diff

16:12uh so in my DBT project I've defined my raw tables as materialized as views and then the queries as tables so you can kind of think this as like a a silver or like you know bronze and like bronze is just view sitting on top of Iceberg right there's no data persisted in the database and then we are since we're

16:28materializing our queries as tables that's like our gold gold layer right and so we'll put those we'll take those out of Iceberg and put them in the mother duck database really so that we get really nice performance when quering them right and for users we can do things like auditing and checking on them make sure they're all good before

16:41they kind of get to the analytical layer um which is kind of the the fun part of DBT this this project um which is about uh 40 gigs of data in iceberg if I build this all of these queries this again the tpcd benchmarking data set it takes about uh 12 to 15 minutes to run in mother do to kind of build those yeah go

17:01ahead um stacking up while ago you said her dat added to Iceberg Jeff show yeah if what do it doing Falling For You staff shot so the view the view that's built in uh mother duck is just a pointer so when you query the view it goes out to Iceberg and pulls that data into mother duck okay

17:24yeah I mean it's it's there's trade-offs it's not there's there's no streaming occurring it's when you execute it yeah yeah um okay so that's kind of what's happening in DBT again I can do a full run here um lots of fun ways to kind of optimize this I can either make the loading fast and then uh make the kind

17:46of subsequent qu query slow I can load all of it into mother duck and then the you know subsequent queries are much faster um generally what we see with duct DB performance over iceberg is rout about 5x faster in the duct DB native format um okay so I'm going to stop sharing

18:06again and jump to the next part here um so here's another fun thing we can do which is uh in the in the same vein of um let's just use SQL in like silly places um I'm going to jump in here and there is a http server that someone built as

18:33um an extension for dctb so we can do this and then load sheb server um so now now I can

18:43do this hang on I think I have to like this is where I need to fix the uh quotes

18:51Jal

18:55okay I put these in the notes app and it changed the quotes okay fix it okay so I can use a select query and now I have an HTTP server that exists sitting on top of dctb right and what's cool here is

19:12um I can do things like make a curl request now against it right so I can do this and so now I just get back as a Json line you know select one now what I can also do is for example uh query a table maybe oh this has the quotes again hang on why does it do this I don't know

19:42okay all right so we just got back 100 rows from that table in Json lines it's very fast so this is taking data that's in mother duck returning it you know via HTTP to my console I can also go back and even hit like an iceberg table the same way right so if I hit something that's just sitting in iceberg okay I

20:02don't know what's going on there but so like this call center table again it's 30 rows but when I'm making this request this is going right from uh from my local duck DB right to mother duck to Iceberg and then returning it as Json mines right so now whatever endpoint I want to expose my data as I can just

20:21say you know anything that can hit Json or make a c request I can get data out out of mother duck and into wherever I want to get it to right um but like a lot of times you know I come from like the finance side of the house so like people like spreadsheets and so what tends to happen

20:38is uh people want their spreadsheets your CFO says hey um can I get that but can I get that in a in a spreadsheet every Monday you say uh let me get back to you on that um but what we can do is I can

20:53again stop with a select query it's very silly um someone else in the community made something called a g a g sheets extension there's also ex an Excel yeah go ahead uh finish this one for no yeah okay perfect G sheets

21:11uh um load G sheets okay so now I have the G sheets extension installed which means I can actually read from Google Sheets directly with ooth and I can actually write back to them so uh what I'm going to do is I'm just going to create a secret so duct Tob as a secret manager so I'm going to say type G sheet and I'm

21:33actually going to pop into an ooth flow that you can't see right now it's okay though and then um okay copy my token um okay so I can see this now so what I'm going to do is I'll just share my whole screen if I can find the button oh stop it okay so we're going to go

22:00into um sheets and we'll just create a new spreadsheet we'll just call it Iceberg demo sure um and then I'm going to grabb this ID right here and so what I'll do is go back to my

22:24terminal and I'm going to just use postgress syntax and I'll say jdw dev. DS and then we'll do I don't know career 23 to start again this is materialized in mother dock um two and then we just use this little format like we're doing a CSV except we do G

22:48sheet okay a g sheets maybe table JW

23:02what did I do

23:08here two ID forat G

23:20shet oh I don't think I need to use quotes on it such is the way things work I believe there we go okay so you'll see right behind me this data just popped into my spreadsheet right I wrote it straight out um of course what this also means is I can do things like materialize an iceberg table directly in

23:45uh in ex or in in Google Sheets as well by just simply going over here oops oh

23:53gosh um changing this to I don't know like a call center again this is a view this is a view here um it's only has 30 rows but run that and now we've taken that Iceberg data and we've written it into the sheet that we defined um let's see do I have anything else I want to talk about that's all I had for

24:12now I think um you know that's kind of the way we think about connecting all of these pieces together being able to interact with the iceberg as both tables and Views with mother duck without mother duck with duct DB and memory you know materialized Etc and then writing it out to all the different places you might need it um um that's all I got

24:30thanks everybody [Applause] [Music]

FAQS

How do you query Apache Iceberg tables with DuckDB and MotherDuck?

Install the Iceberg and HTTPFS extensions in DuckDB, then use the iceberg_scan() function with an S3 path to your Iceberg table. For faster performance, attach MotherDuck with ATTACH 'md:' and let the cloud compute handle S3 network traffic instead of your laptop. You can create views on top of Iceberg tables for schema-on-read access, or materialize them as tables in MotherDuck for faster repeated queries. DuckDB's native format is roughly 5x faster than reading from Iceberg directly. Learn more in our getting started docs.

How do you use dbt to build a medallion architecture with Iceberg and MotherDuck?

In the dbt project demonstrated, raw Iceberg tables are defined as views (bronze layer) using dbt sources pointing to S3 Iceberg paths, while transformation queries are materialized as tables in MotherDuck (gold layer). The bronze layer has no persisted data in the database and automatically reflects new Iceberg data, while the gold layer provides fast query performance for analytics. Running dbt build loads and transforms the full TPC-DS dataset (about 40 GB) in approximately 12-15 minutes on MotherDuck.

Can you write data from DuckDB directly to Google Sheets?

Yes, using the community-built Google Sheets extension for DuckDB. After installing it (INSTALL gsheets FROM community; LOAD gsheets;), you set up OAuth authentication via DuckDB's secret manager, then use the COPY command with FORMAT 'gsheet' to write query results directly to a Google Sheets spreadsheet by its ID. This works with any data source DuckDB can access, including MotherDuck cloud tables and Iceberg tables, making it easy to share analytical results with stakeholders who prefer spreadsheets.

How does the DuckDB HTTP server extension work for exposing data as an API?

The community HTTP server extension for DuckDB lets you spin up a local HTTP server that accepts SQL queries via curl requests and returns results as JSON lines. After loading the extension, you can make HTTP requests against any data source DuckDB can access, including MotherDuck cloud databases and Iceberg tables on S3. This lets you expose your data warehouse as a lightweight REST API without any additional infrastructure, useful for feeding data into applications or services that consume JSON.

Using SQL in Your Data Lake with DuckDB, Iceberg, dbt, and MotherDuck

The Dynamic Duo: DuckDB and MotherDuck

Adding Apache Iceberg to the Mix

Local-to-Cloud Integration with MotherDuck

Transforming Data with dbt: DataOps Best Practices

Beyond the Database: Extension Power

Serving Data Over HTTP

Exporting Data to Google Sheets

Real-World Impact & Use Cases

Getting Started

Conclusion: SQL Simplicity

Transcript

FAQS

How do you query Apache Iceberg tables with DuckDB and MotherDuck?

How do you use dbt to build a medallion architecture with Iceberg and MotherDuck?

Can you write data from DuckDB directly to Google Sheets?

How does the DuckDB HTTP server extension work for exposing data as an API?

Related Videos

Preparing Your Data Warehouse for AI: Let Your Agents Cook

The MCP Sessions - Vol 2: Supply Chain Analytics

The MCP Sessions Vol. 1: Sports Analytics