Building AI Applications That Need Analytics
2025/09/10Lightning Talk at AI Native Summit, Hosted by Zetta Venture Partners
TRANSCRIPT
Good afternoon everybody. I'm Jordan Tigani, CEO and co-founder of MotherDuck, and I'm going to talk about building AI applications that need analytics, which is something that I think is very important but people have not quite wrapped their heads around yet.
First of all, I think probably everybody here believes that AI is magical, is changing the world—unless there's any AI skeptics that are here just to heckle and talk about how this is all hype. I didn't see any hands, but I can't see all that well, so that's probably good. But for people who believe that the decisions you make have to be actually based on truth, LLMs can be problematic. There are some things that they are just not great at. Sometimes they make things up. Everybody knows about hallucinations. They don't know about recent events. The Perplexity founder earlier on was talking about how Larry Ellison being the new richest person in the world is something that's hard for an LLM to understand if it hasn't been trained since that new information came about. They don't know my information. They don't know my company's information—things that are private, things that aren't necessarily able to be in the crawl. And so there's a variety of techniques like RAG, letting you encode information in a separate database to enhance what you can do with an LLM.
But even with RAG, there's a bunch of questions that are hard for these kinds of databases to answer. For example, everybody knows that Oracle is the most valuable database company. That is probably encoded as a fact somewhere, but what is the valuations of database companies that were founded outside of the Bay Area? What are the top 10 most recent database companies that were sold to private equity companies? These kinds of questions can be very hard to answer with typical RAG solutions.
RAG doesn't work as well if you can't encode something as a fact. Typically if I want to know things about my customers—who's spending the most money, which ones are costing more money than they're paying, or even complicated things like which of my customers are going to churn—that's probably not a fact that's sitting encodable in some database somewhere. You need to be able to sort of scan over perhaps large amounts of data, slice and dice it, compute over it in various different ways. Generally you want an analytical database.
What would the analytical database look like? I've heard some people say, "Well I could just write a SQL query and send this off to Snowflake." But there's a bunch of reasons why you may not want to send to a monolithic database like your data warehouse. I mean first of all you're building an app. So your app may sometimes have only a couple of users, maybe in the middle of the night, maybe over the weekend. Maybe sometimes it may be on the front page of HackerNews and have thousands of concurrent users. You want to be able to scale it up, scale it down, provision for peak. You also don't want users to be able to clobber each other. So if one user is using it a lot and another user is using it less, you don't want the user that's using it a lot to give the other user a poorer experience. And you also want to make sure that cost is scalable with usage—you don't want to have to overprovision.
DuckDB is a great database that can solve some of these problems. DuckDB is an open source analytical database built by a couple of academics in Amsterdam that has really been taking the world by storm. It's super easy to use. This is like four lines of Python code that shows actually being able to go from not having DuckDB installed to actually being able to use it for real world things. So using DuckDB is great because DuckDB is an embedded database which means that it runs in process. And so if you're building an application, DuckDB you just link it in and it can just run and you don't have to provision anything. And it's free. It's open source.
On the other hand, some of the things that are tricky about analytics are that it uses a very variable amount of resources. It can go from using essentially no CPU, no memory to all of your CPU and all of your memory very quickly. So you may not want to run this actually in the same process. You may want to then have it run somewhere else. Okay, if you're running somewhere else then you have to manage a fleet of DuckDB instances.
So what you probably want is something like hyper-tenancy, which is a new made-up word for being able to have a single tenant—so a single database, a single DuckDB instance running somewhere else that's not in your process. And then you can scale it up. It scales down to zero when you're not being used. You can get thousands of them. You only pay for the ones that are currently running, and you have full isolation.
MotherDuck is a service that we're building—Zetta-backed—and we are running DuckDB in the cloud. We can run lots of them. It scales up, scales down, scales to zero. But one of the cool things about it is the only difference between the code to run MotherDuck and the code to run DuckDB is I just changed the database name—I just added the "md:" prefix in front of the database name. So it's also super easy to use and get started with. It's serverless. You can scale out to many, scale them up. They're all isolated, independent. And we also have some cool things about running—we can actually run some simpler things locally, so you don't even need to call out to the cloud.
Using MotherDuck for your app, you get the speed and flexibility of DuckDB. It spins up in milliseconds. We provision in sub-200 milliseconds. You only pay for what you use. We have also a bunch of built-in AI features. So we can call out to OpenAI models to do embeddings, to do per-row lookups for things as well. And on the other hand, you do miss out on building and managing your own infrastructure. Some people like to build their own infrastructure. I think it's kind of fun—that's why I'm building an infrastructure company. But it may not be for everybody.
And then lastly, I bet everybody has been thinking, "Well, that's all well and good, but all the cool kids are doing agents." And we're all here to talk about and think about agents. So what would an analytics agent look like? Well, the question I asked earlier is like, which of my customers are going to churn or are at risk of churning? That's not something you can just turn into a SQL query and fire off and get a reasonable answer back. You might give that to an analyst and they would think about it and they'd pull a bunch of data together and they do a bunch of repeated queries. The kind of database you would want to do that might be something where you could have a separate tenant per agent, per agent instance, so that they could go off and they could make some changes and figure a bunch of things out. You pay for the resources that they use and then maybe this was a branch that didn't work out—they just sort of go away and you follow a branch that actually did work out.
In summary, analytics databases can help you build applications that answer richer types of questions. I think we haven't heard a lot about how people can use analytics in their AI applications and I think hopefully we'll be hearing more about that in the future. Thank you so much.Claude is AI and can make mistakes. Please double-check responses.
Related Videos

2025-12-10
Watch Me Deploy a DuckLake to Production with MotherDuck!
In this video, Hoyt Emerson will show you the fastest way to get DuckLake into production using MotherDuck's beta implementation. If you've been following his DuckLake series, this is the next step you've been waiting for!
YouTube
Data Pipelines
Tutorial
MotherDuck Features
SQL
Ecosystem

2025-11-20
Data-based: Going Beyond the Dataframe
Learn how to turbocharge your Python data work using DuckDB and MotherDuck with Pandas. We walk through performance comparisons, exploratory data analysis on bigger datasets, and an end-to-end ML feature engineering pipeline.
Webinar
Python
AI, ML and LLMs

2025-11-19
LLMs Meet Data Warehouses: Reliable AI Agents for Business Analytics
LLMs excel at natural language understanding but struggle with factual accuracy when aggregating business data. Ryan Boyd explores the architectural patterns needed to make LLMs work effectively alongside analytics databases.
AI, ML and LLMs
MotherDuck Features
SQL
Talk
Python
BI & Visualization

