AI, ML and LLMs Short SQL MotherDuck Features Talk

Building AI Applications That Need Analytics

2025/09/10Featuring:

Lightning Talk at AI Native Summit, Hosted by Zetta Venture Partners

TRANSCRIPT

Good afternoon everybody. I'm Jordan Tigani, CEO and co-founder of MotherDuck, and I'm going to talk about building AI applications that need analytics, which is something that I think is very important but people have not quite wrapped their heads around yet.

First of all, I think probably everybody here believes that AI is magical, is changing the world—unless there's any AI skeptics that are here just to heckle and talk about how this is all hype. I didn't see any hands, but I can't see all that well, so that's probably good. But for people who believe that the decisions you make have to be actually based on truth, LLMs can be problematic. There are some things that they are just not great at. Sometimes they make things up. Everybody knows about hallucinations. They don't know about recent events. The Perplexity founder earlier on was talking about how Larry Ellison being the new richest person in the world is something that's hard for an LLM to understand if it hasn't been trained since that new information came about. They don't know my information. They don't know my company's information—things that are private, things that aren't necessarily able to be in the crawl. And so there's a variety of techniques like RAG, letting you encode information in a separate database to enhance what you can do with an LLM.

But even with RAG, there's a bunch of questions that are hard for these kinds of databases to answer. For example, everybody knows that Oracle is the most valuable database company. That is probably encoded as a fact somewhere, but what is the valuations of database companies that were founded outside of the Bay Area? What are the top 10 most recent database companies that were sold to private equity companies? These kinds of questions can be very hard to answer with typical RAG solutions.

RAG doesn't work as well if you can't encode something as a fact. Typically if I want to know things about my customers—who's spending the most money, which ones are costing more money than they're paying, or even complicated things like which of my customers are going to churn—that's probably not a fact that's sitting encodable in some database somewhere. You need to be able to sort of scan over perhaps large amounts of data, slice and dice it, compute over it in various different ways. Generally you want an analytical database.

What would the analytical database look like? I've heard some people say, "Well I could just write a SQL query and send this off to Snowflake." But there's a bunch of reasons why you may not want to send to a monolithic database like your data warehouse. I mean first of all you're building an app. So your app may sometimes have only a couple of users, maybe in the middle of the night, maybe over the weekend. Maybe sometimes it may be on the front page of HackerNews and have thousands of concurrent users. You want to be able to scale it up, scale it down, provision for peak. You also don't want users to be able to clobber each other. So if one user is using it a lot and another user is using it less, you don't want the user that's using it a lot to give the other user a poorer experience. And you also want to make sure that cost is scalable with usage—you don't want to have to overprovision.

DuckDB is a great database that can solve some of these problems. DuckDB is an open source analytical database built by a couple of academics in Amsterdam that has really been taking the world by storm. It's super easy to use. This is like four lines of Python code that shows actually being able to go from not having DuckDB installed to actually being able to use it for real world things. So using DuckDB is great because DuckDB is an embedded database which means that it runs in process. And so if you're building an application, DuckDB you just link it in and it can just run and you don't have to provision anything. And it's free. It's open source.

On the other hand, some of the things that are tricky about analytics are that it uses a very variable amount of resources. It can go from using essentially no CPU, no memory to all of your CPU and all of your memory very quickly. So you may not want to run this actually in the same process. You may want to then have it run somewhere else. Okay, if you're running somewhere else then you have to manage a fleet of DuckDB instances.

So what you probably want is something like hyper-tenancy, which is a new made-up word for being able to have a single tenant—so a single database, a single DuckDB instance running somewhere else that's not in your process. And then you can scale it up. It scales down to zero when you're not being used. You can get thousands of them. You only pay for the ones that are currently running, and you have full isolation.

MotherDuck is a service that we're building—Zetta-backed—and we are running DuckDB in the cloud. We can run lots of them. It scales up, scales down, scales to zero. But one of the cool things about it is the only difference between the code to run MotherDuck and the code to run DuckDB is I just changed the database name—I just added the "md:" prefix in front of the database name. So it's also super easy to use and get started with. It's serverless. You can scale out to many, scale them up. They're all isolated, independent. And we also have some cool things about running—we can actually run some simpler things locally, so you don't even need to call out to the cloud.

Using MotherDuck for your app, you get the speed and flexibility of DuckDB. It spins up in milliseconds. We provision in sub-200 milliseconds. You only pay for what you use. We have also a bunch of built-in AI features. So we can call out to OpenAI models to do embeddings, to do per-row lookups for things as well. And on the other hand, you do miss out on building and managing your own infrastructure. Some people like to build their own infrastructure. I think it's kind of fun—that's why I'm building an infrastructure company. But it may not be for everybody.

And then lastly, I bet everybody has been thinking, "Well, that's all well and good, but all the cool kids are doing agents." And we're all here to talk about and think about agents. So what would an analytics agent look like? Well, the question I asked earlier is like, which of my customers are going to churn or are at risk of churning? That's not something you can just turn into a SQL query and fire off and get a reasonable answer back. You might give that to an analyst and they would think about it and they'd pull a bunch of data together and they do a bunch of repeated queries. The kind of database you would want to do that might be something where you could have a separate tenant per agent, per agent instance, so that they could go off and they could make some changes and figure a bunch of things out. You pay for the resources that they use and then maybe this was a branch that didn't work out—they just sort of go away and you follow a branch that actually did work out.

In summary, analytics databases can help you build applications that answer richer types of questions. I think we haven't heard a lot about how people can use analytics in their AI applications and I think hopefully we'll be hearing more about that in the future. Thank you so much.Claude is AI and can make mistakes. Please double-check responses.

Transcript

0:05Good afternoon everybody. I'm uh Jordan Tagani uh CEO and co-founder of uh of Motherduck. Uh and I'm going to talk about uh building uh uh AI applications

0:17that need that need analytics which is um something that I think is very important and that people uh but people have not quite wrapped their heads around yet. Um, so first of all, I think probably everybody here believes that, you know, AI is magical, is changing the world. Um, unless there's any any AI skeptics that are here just to heckle

0:37and like talk about uh how this is all Um, I didn't see any hands, but I can't see all that well, so that's that's probably good. But you know for people who uh you know believe in like you know the uh that the things that the decisions you make have to be actually based on on on truth like LLMs can be

0:57problematic. There are some things that that they are just just not great at. Um sometimes they make they make things up.

1:04Uh everybody knows about hallucinations. Um they don't know about recent events. you know, the uh the Perplexity uh founder uh earlier on was talking about um how, you know, Larry Ellison being the new richest person in the world is sort of is um something that's hard for an LLM to uh to to understand if it if it hasn't been been trained since that

1:27that new information came about. Um they don't know my information. They don't know my company's information. um and uh you know things that are things that are private, things that aren't necessarily um you know able to be in the uh in the crawl. Uh and so there's there's a you know variety of techniques you know rag um you know letting you encode

1:50information in a separate database to uh to enhance what you can do with um with

1:57an LLM. Um, but even with rag, there's a bunch of questions that are hard for these kinds of these kinds of databases to answer. For example, like okay, everybody knows that Oracle is the most valuable uh database company. That is probably encoded as a fact somewhere, but like what is the valuations of companies of database companies that

2:18were founded outside of the Bay Area? uh what are this the you know top 10 um most recent database companies that were uh sold to private equity companies.

2:29These kinds of questions can be can be very hard to answer with typical uh rag solutions.

2:38Um so rag doesn't work as well if you can't encode something as a fact. Um, you know, typically if I want to say, you know, if I want to know things about my customers, who's who's spending the most money, um, which ones are costing more money than they're paying, or even complicated things like, you know, which of my customers are going to churn, um,

2:58that's probably not a fact that's sitting, um, that's encodable in some database somewhere. You need to be able to sort of scan over perhaps large amounts of data, uh, slice and dice it, compute over it in various different ways. uh generally you want an analytical database.

3:15So what would the analytical database look like? So I've heard some people say like well I could just you know send this off to you know write a SQL query send this off to Snowflake. Um but there's a bunch of reasons why you may not want uh to send to a monolithic

3:30database like your like your your uh your your data warehouse. I mean first of all you know you're building an app.

3:37So your app may sometimes have only a couple of users, maybe in the middle of the night, maybe over the weekend. Uh maybe sometimes, you know, it may be on the front page of HackerNews and have thousands of users and thousands of concurrent users. Uh you want to be able to sort of scale it up, scale it down,

3:53provision for peak. You also don't want users to be able to um to clober each other. So if one user is is using it a lot us another user is using it less uh you don't want the user that's a lot to give the other the other uh other user a poorer experience and you also want to

4:10make sure that cost is you know scalable with usage you don't want to have to overprovision oops so ductb is a great database that can solve some of these problems ductb is an open source database uh it's an analytical database um built by a couple of academics in uh in Amsterdam that has really been taking the world by storm.

4:34It's super easy to use. You know, this is like, you know, four lines of Python code that shows like actually being able to go from not having DUTDB installed to actually being able to use it for for real world things. Um, so using duct DB is great because ductb is an embedded database which means that it runs in

4:52process and um, and so if you're building an application, ductb you just link it in and it can it can just run and uh, you don't have to provision anything. You don't have to um, and and it's free. It's open source. On the other hand, some of the things that are tricky about analytics are that uh, it

5:10uses a very very variable amount of resources. it can go from using essentially no CPU, no memory to all of your CPU and all of your memory very very very very quickly. So you may not want to run this actually in the same process. You may want to then have it run somewhere else. Okay, if you're running somewhere else then you have to

5:27manage a fleet of u of duct DB instances. Um, so what you probably want uh is something like hypertendency, which is um a new madeup word for being

5:41able to have uh a single tenant. So a single database, a single ductb instance running somewhere else that's not in your process. Um uh and then you can scale it up. It scales down to zero when you're not being used. You can get thousands you can get thousands of them.

5:55Um you only pay for the ones that are currently running. Um and um and you

6:01have full isolation. Um so mother duck is it's a service that we're um zetabbacked um and we are running uh we run duck db in the cloud and we run it um we we can run you know lots of them um it's uh it's scales up scales down scales to zero but one of the cool things about it is the only

6:23difference between the code to run motherduck and the code to run ductb is I just changed the database name I just added the md prefix uh in front of the database name so it's also super easy to use uh and uh and get started with. So at serverless um you can you know scale out to many many scale them up. They're

6:42all isolated independent. Um and we also have some cool things about running uh we can actually uh run some simpler things locally. Um so you don't even need to uh to call out to the cloud. So you know using motherduck for your app um you get the speed and flexibility of duct DB. Um it spins up um in

7:01milliseconds. So we we you know provision in in sub 200 milliseconds. Um you only pay for what you use. We have also a bunch of built-in AI features. So we can call out to um we can call out to uh open AI um models to do embeddings to do to do you know per row lookups for things uh as well. And on the other hand

7:23like you do miss out on building and managing your own infrastructure. Um you know some people like to build their own infrastructure. I think it's you know kind of fun. that's why I'm, you know, building infrastructure company. But, um, you know, but it may not be for everybody.

7:38Uh, and then lastly, you know, I bet, you know, everybody has been thinking, well, that's all well and good, but like all the cool kids are doing agents. Uh, and then we're all here to talk about and think about agents. So, what would an analytics agent look like? Well, you know, the question I asked earlier is like, which of my customers are going to

7:55churn or are at risk of churning? That's not something you can just turn into a SQL query and fire off and and get a reasonable answer back. You might give that to an analyst and they would think about it and they'd pull a bunch of data together and they do a bunch of repeated queries. Um, the kind of database you

8:11would want to do that, you know, might be something like something where you could have a separate tenant per agent per agent instance so that you they could go off and they could make some changes and and figure figure a bunch of things out. uh you pay for the resources that they use and then you know maybe

8:26you know maybe this was a branch that kind of didn't didn't work out you know they just they just sort of go away and you follow a branch that actually actually did work out.

8:36So in summary um analytics databases can uh can help you build applications that have that answer richer types of questions. Uh I think you know we haven't heard a lot about you know how people can use analytics in their AI applications and I think um you know we hopefully we'll be hearing more about that in the future. Um thank thank you

8:53so much.

FAQS

Why do AI applications need analytical databases instead of just RAG?

RAG works well for encoding specific facts, but many business questions require scanning large amounts of data, slicing and dicing, and computing aggregates. These are things that cannot be encoded as individual facts in a vector database. Questions like "which customers are likely to churn" or "what are the top 10 database companies sold to private equity" require an analytical database that can process and aggregate data at scale rather than just retrieve pre-stored facts.

What is hyper-tenancy and why does it matter for AI analytics?

Hyper-tenancy is a model where each user or agent gets a single, isolated DuckDB instance running in the cloud that scales up and down independently. This matters for AI applications because workloads can spike unpredictably, from a few users to thousands. You need cost-efficient scaling, full isolation so one user does not degrade another's performance, and the ability to scale to zero when not in use. MotherDuck provides this with sub-200 millisecond provisioning.

How does MotherDuck differ from using DuckDB directly for AI applications?

DuckDB is an embedded, in-process database that is free and easy to link into your application, but it uses highly variable resources and may not be ideal for running in your main application process. MotherDuck runs DuckDB in the cloud with serverless scaling, isolation between tenants, and pay-per-use pricing. Switching requires only adding an "md:" prefix to your database name, making it simple to move from local DuckDB to cloud-hosted analytics.

What would an analytics agent look like for AI applications?

An analytics agent could tackle complex questions like predicting customer churn by running repeated queries, pulling data together, and iterating on analysis, similar to what a human analyst would do. The ideal infrastructure would provide a separate database tenant per agent instance so each can make changes independently, with costs scaling to the resources used. Branches that do not work out simply go away, while successful branches are followed.