Empowering Data Teams: Smarter AI Workflows with Hex & MotherDuck

2025/11/14Featuring:

TL;DR: Learn how to build AI-powered analytics workflows using MotherDuck and Hex—including generating column descriptions with AI, semantic modeling, and creating a "compounding context engine" that improves over time.

The Key Insight: Context is King

LLMs don't know your business data. The most important thing for accurate AI analytics is building context:

Column descriptions in your database
Semantic models defining business metrics
Rules files with business logic and SQL styling
Endorsed assets marking trusted tables

Generating Column Descriptions with AI

MotherDuck's prompt() function generates descriptions directly in SQL. The workflow involves:

Getting a table summary for AI context
Generating AI descriptions focused on business purpose
Applying descriptions to your schema with COMMENT statements

Pro tip: Tell the AI to focus on business purpose, not statistics. Avoid descriptions like "contains integers from 1-100"—that's already in the schema.

Read Scaling for Concurrent Users

MotherDuck's read scaling gives each Hex user their own DuckDB instance:

Configure in Settings → Instance Size
Choose fleet size (up to 16 ducklings)
Eliminates noisy neighbor problems
Pay only for what you use

Hex's Three Pillars for AI Workflows

1. Agentic Notebook Agent

For data teams (SQL/Python savvy):

Cursor-like experience in notebooks
Creates, edits, modifies cells automatically
Builds execution plans for complex analyses

2. Conversational Self-Serve (Threads)

For business users:

Chat interface—no code visible
References existing assets (dashboards, models)
Everything is notebook-backed (can convert threads to projects)

3. Semantic Modeling Workbench

Build trusted metrics and dimensions:

AI-assisted semantic model creation
Define joins, relationships, business logic
Models become context for LLMs

Demo: From Question to Dashboard

User prompt: "Break down our marketing opportunities"

What happens automatically:

Fuzzy search finds related dashboards
Loads semantic model definitions
Runs 20+ iterative SQL queries
Creates charts, pivot tables, insights
Generates a complete notebook

Converting to production:

Thread → Project (for data team review)
Project → Dashboard (drag-and-drop app builder)
Dashboard → Semantic Model (close the loop)

Context Studio: Monitoring & Improvement

Track how users interact with AI:

Conversation volumes and top users
Actual prompts being submitted
Which queries hit semantic models vs raw tables
Identify hallucinations and gaps

Rules files allow you to define business definitions (like what "Revenue" or "Active customer" means) and SQL style preferences in markdown format.

Best Practices

Practice	Why
Add column descriptions first	Easiest win, 5 minutes of work
Note join keys in descriptions	LLMs struggle with joins
Semantic models by department	Sales, Marketing, Operations
6-10 tables per semantic model	Avoid monolithic models
Start with gold tables, add rails later	Iterate based on user questions

Transcript

0:00Hi everyone. Good morning, good afternoon, good evening, wherever you're at. Uh, welcome to our webinar today on empowering data teams, smarter AI workflows with Motherduck and Hex. My name is Gerald. I am on the marketing team here at Motherduck. And I'm joined today by Jacob Matson from Motherduck, who's one of our developer relations, developer advocates, and Armen from Hex.

0:23He's a partner engineer uh at Hex. Uh, a couple housekeeping items. Uh so I this we are recording this uh and we will share out a link if you registered for it. Otherwise uh you can find the recording on our website afterwards. Um if you have we'll have a time for Q&A at the end. So if you have any questions

0:40you can put them in the chat and we will get to them uh after the demo time. And then yeah, just the kind of guys the quick agenda for today. We'll have uh Jacob will do a little bit of an intro on you know what is mother duck if you haven't heard of motherduck before and a brief little demo and a few features

0:56that we have that are related to to hex and then we'll hand it over to Armen for you know a brief introduction in terms of what hex is and then he'll dive into the meat of the demo in terms of showing us our smarter AI workflows with motherduck and hex. Uh and with that so let me hand it over to Jacob for mother

1:12duck. All right. Thank you, Gerald. Um, yeah, thanks everyone for joining us. Uh, so I'm just going to talk through a little bit about what Motherduck is if you don't know. Um, and the main thing

1:25that that you should really take away from about Mother Duck is it's a cloud data warehouse that makes big data feel small.

1:32um we are always thinking about how to build a user experience that's really great and uh lets you get at the heart of what problems you're trying to solve.

1:43Um in that way I actually really love the way that it fits in um with Hex. Um it is powered by DUPDB on the back end.

1:51So for those of you that uh are familiar with it, it's a super fast vectorzed lightweight execution engine. Um, we're building all of the things that you need to, uh, make it work really, really awesome in the cloud reliably, uh, including some other, uh, pieces in there. One thing I'll call out too, it is indeed uh, fully managed and

2:11serverless, so you pay by the second. Um, if you're not using it, you don't pay for it. Um, and that's a really nice um, kind of feature of something like this.

2:22Uh, of course, you know, uh, a cloud data warehouse is only as good as all the pieces that it fits in with, and we fit in with a lot of pieces. We've built native integrations with a a good number of partners, uh, including Hex. Um,

2:37[snorts] and, uh, we continue to build out more. Uh, one thing that's really awesome about Mother Duck is it has a bunch of different surfaces you can integrate, whether that's Java or C++ or Rust or Python. Um, there's lots of different surfaces in addition to using all of these tools that kind of have native integrations through a bunch of

2:57different interfaces. [snorts] Uh, I'm going to switch over to a quick mother duck demo uh just to show off a few things that I think are really awesome in terms of uh this hex and AI workflow stuff in [clears throat] particular. So, I'm logged into Mother Duck here. The first thing I'm going to actually show is my

3:19settings. um potentially. Where did that go? Perfect. All right. So, I clicked on my instance size over here. I'm running on a jumbo. I also have this notion of read scaling. And why I want to call this out is we we have recently implemented this for hex. And so, what this does is for each user that's connected to a

3:39dashboard or to a notebook uh or an app

3:44uh in Hex, they all get their own instance. We can choose how big we want our fleet to be. We can choose what instance sizes we want. So, this is our smallest one that it's set to right now.

3:54We can choose standard or jumbo. Um, and then we can get up to 16 ducklings. If you need more, uh, you can just ask us for them, which is really great. Um, and again, you just get your own token for this. And then mother duck handles the kind of rotation between the different um, uh, readonly items. Where this

4:14really matters is for those of you who are familiar with ductb and have maybe used it in a notebook when you're sharing data sets uh ductb is kind of dedicated to running those queries as fast as possible for a single user and so we give you the ability to sh have that same shared experience across multiple users by giving each user their

4:29own duckling which is just an instance running mother duck. Uh all right so that was the first thing I wanted to show. The second thing I wanted to show is um one thing that's really great about uh using these AI workflows is they can take context from your database, right?

4:48And so there's actually a pretty old feature called column comments that exists in databases and um I really have

4:56never used them until uh I've needed them for AI because they just make the quality of the queries you get from your AI agents much much higher. And so I want to show a couple things off. The first one is um I actually use this prompt function in motherduck to write um write a bunch of descriptions from my

5:16uh for my data sets. And so you can see I can just run it like this. Um this is actually making a call to OpenAI with this um with this information and then it's returning text for me. So you can see here is here is some text that describes this table. This is actually not using a great model at the moment. I

5:32should have it use GPT5. I just have don't have it using it right here in this example. But um this is using an older model. You can see it's returning um a nice description of what this data set is, right? And so because we have SQL in our database, we can use things or uh we have AI in our database, we can

5:47do things like you know pass uh a summarized version of the table, right? Which looks like this. Um tells us all the columns, what information is in them, statistical um information about it. we can pass that as a string into our prompt and say, "Hey, like write tell me what this is." Write it in a way for other AI to use it. Um, and so

6:10that's what I'm doing here. Um, and then of course we can throw these all into a big table that looks like this. So I actually generated some AI descriptions.

6:19And you can see that I have them all in here for different u both tables and columns, which is pretty cool. Um, and we can again see what they are for each of these, which is very helpful. uh as we as we think about writing queries with AI. And then of course the last step is once we have these comments

6:36generated by our AI engine, we can then add these comments to tables. So here's what it looks like to add a comment on a table. Here's what it looks like to add a comment on a column. You'll notice these are atomic operations. Um they're also um what's the word? Uh they only

6:53exist at the moment in time that you uh submit them. There's no history on them.

6:57You write a comment, it's good. It it exists. If someone else submits a new comment, it overwrites the existing comment. It doesn't have the same type of um affordances you would expect kind of from proper data in a database. So the metadata handling, you know, is um uh exists only at the moment that you send you submit it. Um and then we can

7:16actually see them, right? So this is actually what the um like when we use AI to query these tables, what it will see about this taxi data set, for example, right? It will show this comment. So now it has some context around what this data actually means, not just whatever's in the table in the columns and the

7:33rows. And again, we can look at that from a column perspective too. Here's a bunch of columns we have added to about a hacker news data set. And again, um, this is actually really helpful. Like what does this column dead mean? Oh, this actually means it's been withdrawn or disabled from normal visibility.

7:49That's really helpful. Um, AI might not know what that means. This is a column named dead in a hacker news data set, right? Um, so these are types of things we can do kind of as a data prep step before we load data into um our kind of notebook tools or our BI tools um to make it more useful for us to consume

8:08downstream. Okay, so with that I'm going to stop sharing and uh hand it off to Army. [clears throat] Thank you. Give it a moment to bring up our screen. There we go. Um, so hi folks. Thank you for that uh intro. Um, for folks that aren't aware of what Hex is, haven't heard of it, very short and

8:26sweet, Hex is a AI optimized platform for analytics. It brings together the best breed of notebooking, dashboards, and infusing LMS for your co-pilot experiences, your chat with your data, text to SQL sort of workflows, and also authoring semantic models, which we'll touch on in this kind of idea of building context for these LM systems.

8:43So, to jump right into it, Hex is trusted by many different data leaders. We operate um across a vast uh amount of different industries from very very small companies to very very large ones across many different industries uh as well. [clears throat] And if we jump into kind of the ethos of what we're really trying to do here uh is

9:01ultimately make a complete set of AI data workflows that can cater to a wide variety of practitioners. These could be folks that are say on the data team.

9:09These are your data analytics, embedded analysts, um data scientists, data engineers, but also catering to folks on the other end of side of the business.

9:17These are for your conversational self-s serve analytics needs. Think of your marketers, your CFOs, your sales folks, any leaders in the business. And the core takeaway here that I want you all to have when we're kind of um at the wrap-up of this is the most important thing about building these kind of AI workflows is this idea of a compounding

9:35context engine or some sort of virtuous cycle where over time you actually build out a very very robust and verbose um platform that allows you to answer all sorts of types of questions. this idea that you can have some sort of benchmark or eval sets or synthesizing some natural language questions into SQL queries and just trying to have a

9:55chatbot that can perform say up to some x number of accuracy right off the rip.

9:59That's not an approach that's sustainable. It may work for a a small set of uh use cases, but it's not something that's living and breathing over time. And this is where we try to differentiate ourselves and again having this idea of a compounding context engine. So keep that in the back of your mind as we go through the demo. Now I

10:15want to take a moment and kind of highlight the big pillars of Hex before we jump into the demo. Um the one being this idea of this like agentic AI workflows catering to the data team, right? These are folks that are SQL savvy. They're Python savvy. They maybe can just read or understand SQL and they're typically going to be

10:31comfortable with some form of notebooking framework. So in Hex, we have a world-class notebook experience and with it comes paired with this notebook agent where it actually understand all the different types of capabilities in your Hex notebooks. So our pivot cells, chart cells, obviously any sort of markdown or text cells, SQL and Python cells and the like. Um, and

10:50it can actually go create, edit, modify, and build out an entire plan for you in kind of this cursor-l like workflow for specific data teams. The next big piece of this is around the conversational self-s serve components. This is your classical chat interface. Think of your chat GPTs of the world. You're [clears throat] not necessarily highlighting any code. It's not

11:11clicking. There's no cells stacked on top of each other. But the big kind of component or differentiator is everything in hex that you create is always notebook backed. In fact, the kind of tow of hex is the building blocks of everything we do is always this cell component or everything can always be tied back to some sort of like

11:28DAG or creation the notebook. So we'll see how we can actually convert these chat bots into you know data products notebooking experiences and the like.

11:37And the last piece here, the most important in my opinion is this idea around this context engine, some form of virtuous cycle where you can build out the metadata that is required for these LMS to perform in a trusted and accurate way. So this includes our semantic modeling workbench where we have a kind of AI LM assisted co-pilot experience to

11:55build out semantic models. You can build these out even on top of your existing projects for additional kind of um [clears throat] business savvy metadata.

12:02And then there's the idea around observability or monitoring the agent usage, right? You need to have kind of the the data team at the center. They're driving the spaceship forward and they need to have the understanding of what users are asking certain types of questions. What sort of metrics are you interested in? And we'll touch on this

12:18context studio and the creation that we're building. So with that said, let's jump into a demo. I'll [clears throat] jump over to our shared hex mother duck instance in or hexorg here. And we'll first start with this conversational piece and we'll see kind of this walk through a life cycle of how you can start uh bending kind of having this malleable experience

12:37with just a natural language question. So maybe I ask something like break down our marketing opportunities rather open-ended but [clears throat] I'll go ahead and trigger this. It's going to look at my prod mother duck instance and it's going to do a bunch of reasoning. Right now by design it's agentic in nature. We're going to iterate over many different tool calls

13:01and allow this to try to mimic the approaches that say a traditional data scientist or data analyst would perform.

13:07So by nature it's going to take a little bit of time to run a bunch of iterative processes behind the scenes. But before we get to actually making all these tool calls, we're doing a few things right off the rip. The first is we're doing this search a fuzzy search under the hood to unidentify any existing assets

13:22that you may have within your hexorg. So in this case it found one particular hex dashboard. This is our sales and marketing dashboard that may be relevant to the user. So you're reusing these assets that have already been baked by the business, right? Some trusted assets. The next we're going to look through is any sort of model data. So

13:40this is where this idea of semantic modeling comes in where you're taking these core business logics. Think of your definitions for profit, for revenue, for customer. These are all things that need to be trusted and repeatable by the rest of the business.

13:51And one way to do that is by building out a semantic model. We've seen this with many different uh flavors in the BI world. And now it's becoming very very prominent in the age of AI. So as I hover over this and we see what's actually looking. We can see we have this opportunities funnel, right? Um it's fighting me as it's working here,

14:07but we can hover this [clears throat] quickly. Maybe we look at our opportunity count. We can see there's this actual expression behind this. And this is all in an effort to try to make these LLMs much more repeatable or deterministic in nature given their undeterministic capabilities. It will then start reasoning and understanding the sort of information it has. it can

14:26start going through actually starting to create some of these charts for us. So it has access to all of Hex's bells and whistles from our single value cells, our table cells, pivot cells, the charting capabilities, and it will actually build out a notebook under the hood for a particular user.

14:41[clears throat] So let's go to reason this. We'll leave it go for a little bit and I want to touch on a key component that is very very important to getting started. A lot of folks will come into hex, they'll, you know, attach their mother duck instance. They'll point AI to it and just expect that it kind of

14:55works. And while that is a a a dream state, it's not necessarily one that is viable. And I don't think it'll be viable in the future, even if these models get better. The biggest thing here is is context is king. And one of the simplest ways to do this is something that Jacob highlighted. We're actually bringing in column descriptions

15:12into mother duck. So, I've gone ahead and I'll just showcase this very quickly, but here's kind of a live example of a working data set. We're actually built out. I had our agent build this out for me, but here are all those column descriptions for our tables. You can see I have three different tables here and they're quite

15:29verbose in nature. And this is all in an effort to build out the metadata for this LM system. So, I can see here this my unique identifiers, my foreign keys to accounts links, the name of the associate account, so on and so forth.

15:41It's very very important to build this out kind of your foundational building blocks as you're starting to work with LMS and this applies honestly to other systems outside of Hex. If it's a homegrown solution, if it's a competitors, um I strongly strongly advise you all to add this metadata here as you're getting started. Okay, so we'll give this [clears throat] a little

15:59bit. Okay, here we go. It's finished. And you'll notice it's pretty verbose in nature. And this is because it's actually built out an entire notebook behind the scenes that's totally abstracted to the end user. We have our marketing source performance here. We can look at our top five marketing sources that it's done and it's doing a lot of iteration with these different

16:16tool calls or different types of um cells that you can work in side effects.

16:21We have a nice chart that we can actually do additional explorations. So I can open this up [clears throat]

16:28and now you've entered in kind of our um editor or explorability feature where I have all these measures and dimensions that are coming from my semantic model.

16:36So I know it's trusted and governed. And I can go ahead and slice and dice this.

16:40I can look at the underlying data if I want in this table view. And I can actually then style and change any sort of X or Y axes, change different charting types, um, so on and so forth for added flexibility and exploration.

16:53You'll see there's additional chart cells that come in. Close one ARR. It's done. And it'll give this opportunity distribution with a bunch of text. So on and so forth. I can ask follow-up questions. And here I have this related project. Now, inevitably, what's going to happen is you're going to have some sort of ceiling you're going to hit with

17:10this chatbot. These are things that are kind of living and breathing and they grow and get better over time, but they may hallucinate. Maybe you are a marketer who actually wants to then bring in a data scientist to do some maybe seasonality assessment on this.

17:22It's not something that the AI can actually do for you in a way that you'd like it. Maybe you don't have a full confidence in it. How do we actually try to circumn that problem? Well, one of the things that we try to really pride ourselves is again everything is notebook backed. So I can take this raw

17:35thread or this conversation. I can continue this in a actual project. And [clears throat] this serves two main purposes for the data scientists, the data teams. They can actually see and have full observability or monitoring aspects on top of this particular um

17:53project here. So we'll let this run. Give this a moment. Maybe there's some issues classic. But what I can do is actually understand Let's see here.

18:06Maybe some beta. So maybe I want to bring in ops by source. But I can actually bring out and understand all of the context from the particular thread as the data team member. What's neat? I have this ability to ask further questions. So I can add, you know, any specific tables or any specific um tables in or or cells inside the

18:27notebook. So as this is running maybe I want to look at closed source r um can you break down

18:36there any

18:42I could do something like this where if I wanted to extend this analysis it will build out a plan for me and I'll start working against the context in this specific notebook. So very very powerful. We can come back and revisit this in a little bit. Um but it's some way to actually give a lot of control

18:57back to the data scientist or extend your analyses. One neat thing if I jump over to kind of a finished project or one of those reports. If I want to edit this particular project, let's jump into kind of the behind the scenes of how hex projects are made. You'll notice everything is notebook backed. So we'll run this top to bottom. And I have all

19:15sorts of bells and whistles to highlight in a notebook. So whether I'm extending from a project, from a thread, excuse me, or I'm starting a net new project, I have these different cell types that are available to me. I can start building out my analysis here. I'm looking at this revenue information. I'm looking at our opportunities table. [clears throat]

19:31If I minimize this, we have our marketing tab as well. So I can start curating very very dense reporting that I can share out. And actually, if I want to share this out, it's not something that's in a format that is very intuitive for say some end stakeholders.

19:45What I can do is if I commandclick here, I'll jump into my app builder. Let me clean up my screen a little bit. And here what we see, we [clears throat] have this sideby-side comparison where on the left side we have this notebook and on the right we have this app builder. So if I wanted to drag and drop

19:59or bring any of these of highle insights or SQL cells, chart cells, table cells into some form of dashboard, I can very quickly just select this. So, if we kind of set this up to show you guys, if I want to add this or remove this to my revenue tab on the app side, it's as matter as just clicking a button and add

20:17it. And then from here, I can drag and drop this. Very similar to your classical BI approaches um where you can style and curate and build out these apps. From here, everything is always going to be paralleled between the notebook and the app builder. So, if I were to come in and say add some text, this text will get created. Let's just

20:35drop in an example like so. [clears throat] This will be replicated back into the notebook in respect to its position.

20:44Once I'm happy with my changes and I say, okay, this is actually a dashboard that I like. I want to publish. There's a publishing workflow we can go into.

20:51You can have kind of native GitHub capabilities in the platform. And then ultimately, you'll end up with this final approach or this final dashboard that you can share out.

21:01I'll give this a second to load. [clears throat] It's sharable with just a link. You can come in and edit it as an admin or an editor in Hex. And this is the way to kind of share your insights from for the rest of the business. Now, how does this all tie back into this idea that we're

21:17building this virtuous cycle? How do we actually build out something where it compounds over time? These are basically business assets that are very, very useful to LM. And as we saw in that first thread I created, we were able to index or reference this and use it as kind of fuel for for the LM. One thing we can do is actually then convert this

21:35into a semantic model. So let's say I have some very detailed business metrics. They're very very powerful for my business and I want to create some form of metadata or validate this as metadata for the LM. If I jump into this, I can show you guys kind of our semantic modeling workbench. So let's start there quick.

21:52[clears throat] If I go into my settings tab, go to my data sources here, you can see we have our mother.connection, but additionally we have this semantic model project. And remember if I go back to this thread, if I scroll up, we had this model data. So this model data is actually coming from a semantic model created inside of hex.

22:15And that semantic model was this B SAS model here. And I'll showcase kind of from scratch how we can actually build out a semantic model uh internally. So let's go ahead and we'll just edit this one that's a testing one in this workbench which I have right here. I'll showcase the kind of thread history that was done. First off, I actually added

22:35the particular dashboard in question. So you can imagine as you're maybe doing a migration from a different tool into Hex, as you're building out these assets for the business, you're going to have maybe top five, top 20, top 50 dashboards that your company uses. How can you start stitching that together and providing some for metadata to the

22:50MLM? One very quick way to do this is you can just at a specific project. So if I wanted to like recreate this, we'll say at sales and marketing dashboard and it will trigger this kind of reporting and for the sake of time won't rerun this query here. But you can see it's actually picked up a contents. So it

23:06knows all of the different SQL joins that are happening, the tables that were referenced, different cell types that were utilized in the project that are of interest and will agentically go through and kind of ask, okay, I'm detecting, you know, three particular project or tables here. Do you want me to actually build out a semantic model? Here are

23:21some of the recommendations it has. So, it's starting to work with me and plan out the sort of metrics and dimensions that I want to have in my semantic model. And at the end, right, I can say, okay, yes, this looks good. Can you go ahead and create it? And it'll spit out actually the the hex representative uh

23:36format for our semantic modeling that can be tied back into the model. And this is kind of that first approach of a virtuous cycler, something that kind of lives and breathes over time. You can notice it's actually gone and created this. And it's really really neat here in powerful because it understands the relationship types. So down here I can

23:55actually see that it's you know doing a manyto one relationship with my accounts. This is actually correct and I can go ahead and go through a publishing workflow again here where I promote this abroad. I can have reviewers. I have a nice diff. Uh interesting enough I can also preview this which is really really helpful. So I want to look at um I don't

24:11know let's add like total employees um per account segment type something like this. [clears throat] I can build out these charts on the fly before I promote them to a productionalized state where the LMS use them and they're accessible to the rest of the users inside of Hacks. So from here I can publish this version. Looks good. And now I have this

24:33test mandum project that I can weave back into my LM systems. So if I go back into Hex, we'll start this from scratch.

24:42If I click on Mother Duck here, you'll notice we have this toggle use semantic models only. Now, as you're building this out, you may want to start with just saying, "Hey, I just want you to hit the gold tables and mother duck.

24:52Just go and start querying it. Start building out the types of questions and understanding of the types of needs from our business." As you start curating and building style, you may want to elect into a more on-rails experience. And this is a very simple way you can do this. We can specify okay if you are say a marketer or someone that is not

25:08necessarily in the weeds with the data you probably should fall into this bucket of being uh only only allowing analyses on an on-rails experience but you're only going to be allowed in what's curated by the semantic model.

25:20There may be other users say an embedded analyst or data scientist that does actually have good familiarity and they want to be able to have this off rails experience. So there's different toggles that are allowed kind of a switchable um approach here by hex admins dependent on the user persona that is interacting with threads. Second fire off the same

25:37question. You know, break down our uh marketing

25:44of spy. Let's say first touch source this time something like this. And we want the way and [clears throat] finish this off. But the big thing to call out here now that I've toggled this, it will only be scoped to the semantic modeling that I've built out already. Now, as users are building this out, you're starting to get more adoption. You're building

26:04out more assets. it's getting more curated. You need a way to be able to observe this. What sort of monitoring capabilities do we have? And this is that third pillar that I touched on, the context studio. So, I'll show you how to get there. If we go from our home as an editor, admin of hex, I have a context

26:20studio available to me. And here, right off the bat, I have this dashboard. It's an internal dashboard in our product that allows you to understand and see the usage patterns that you're getting outside of um or getting from the users using all agents. So you'll notice we have our semantic modeling workbench that I touched on. We have our

26:36notebooking agent and we have threads. You can just get high level metrics, conver uh conversation volumes, top users by conversation volumes. And you can see the actual prompts that are being created. So if I wanted to dive in deep and see, okay, did this hallucinate? Is it running? What's going on? I have the ability to always jump

26:51back into any thread that has been created. I can go ahead and actually convert them to projects for further analysis and understand what's going on.

27:00Now the future of this this is just recently released um but the future of this is providing some form of like LM as a judge approach right where we're actually detecting and kind of making segmentations on the type of questions your your users are asking we're trying to make them like understanding heristics on user behaviors for which

27:15sort of patterns we should match as data um uh teams also understanding are users hitting a semantic model or hitting the raw tables does data need to be modeled maybe upstream in say DBT those are all types of questions that we're currently working with with some um alpha our ners on how we can make this a little bit

27:31more verbose. But in current state, we have this context sources. And one of the initial steps after you've done the metadata on the warehouse side is building out a rules file. And this is kind of an markdown file. You can edit this live if you'd want it. But this is where you can start baking in any sort

27:46of business logic, business definitions, data quality guarding, um any sort of SQL styling that you may want to do. And this is all catered at a workspace level. We also have this available for kind of a user level on the notebooking agent side as well. So this is very very important in practice. We see this you know be 4 500 uh lines long. It can be

28:05very very curated and specific to the business and the LMS do a really good job of kind of taking this context and applying it. The next kind of iteration or phase of this that's a very very simple win is endorsing your assets. So this is uh ensuring that you know your specific tables your your databases your columns are accessible by the LMS. You

28:23maybe have some you know source tables that aren't necessarily worthwhile to give. they'll just add noise to the LM.

28:30So, you can exclude those. And if I open up this data browser, you can see we have these accounts, campaign members.

28:35This is all trusted data um that has been endorsed by our hex admins. So, another very easy way to improve the LM accuracy. And then last, we touched on it here, but here are the semantic projects where in the near future, right, you can imagine we will have some form of um recommendation engine that can provide, hey, we think these metrics

28:54should be modeled. the LM is doing some kind of unstructured or or off-rails experience. This is something that you may want to look into. So kind of again that virtuous cycle approach where you're building and curating LMS over time.

29:07Okay, that's all I had for the demo. There any questions? Love to hear from you guys.

29:17 >> Hey there. Thanks Armen. Yeah, we have a couple questions. Um, let me hang let me hide this screen here. Uh, we'll go through them now and then if you have anyone else has questions, pop them in the chat and we will get to them.

29:31 >> We'll take questions in the chat. Yeah, absolutely. >> Yep. Yep. Um, where was one that I saw?

29:36I think we'll start off with um I think you showed this a little bit earlier, Jacob. Um, but but just for people that maybe came in a little bit later, how are the column descriptions generated uh and then sent to Mother Duck?

29:49So this is both a mother duck and a hex question because I think we can both do this. Um uh you can absolutely do it in like a hex notebook um with all the context and kind of load it back in um and they are absolutely written back to your database. So the the kind of database is

30:08the source of truth for those descriptions. Um that's what the comment the comment on function um does. From the from the first perspective, what I did is I did a um initial pass with AI and then I just hand curated it from there. The descriptions were pretty good. Um they can you know they can tend to hallucinate at some point. Um I did a

30:27little bit of tuning on my prompt as well. The first version that I did of it was not as good. Um it was like telling me it was giving me statistical information in the column descriptions which doesn't make a lot of sense. So like I said like hey like let's talk about business purpose in this description. Let's not let's not talk

30:40about stats because those are in the database. We know what they are. are changing over time. Um, so yeah, that was kind of my my take on it. Uh, I don't know, Armen, if you wanted to add anything else to that.

30:50 >> Yeah, I I second it. I think, um, it's really really simple to do this. This is something that I see a lot of users almost glancing over or maybe they're just not aware of it. Um, you can vibe code this. You could do it in any other ID you want. You can do it in mud. You

31:04can do it in hex. Like it is so easy to build out these descriptions. like it took me five minutes to build out I don't know what did I showcase like maybe a hundred different com comments like just using notebook agent and there's other tools that could do it quite frankly but if you are not doing this it's like why are you trying to

31:19build out any other context without doing this first >> yeah one thing I'll add is that like generally like OLAP databases don't perform great with like primary key foreign key relations like you would typically use in an OOLTP database and so you can kind of get around that by adding a comment that says hey this is the join key for this table uh as an

31:39example right and so um uh that is helpful when like you can't inspect the relationship between the table and say oh I join this on this ID right um getting getting AI to write joins is actually um one of the harder problems to solve um or getting the right correct joins and so being able to give the

32:00column description that says hey here's the join key actually is very very very helpful.

32:04 >> Yep. And on that like the joins is a great call out like that's where I think the semantic modeling can really really help where you're baking in these relationships over time and you may not know exactly which sort of users user questions are going to require which joins if say have like a hundred different tables. Um so you may want to

32:20think like oh well it sounds great to have this like mega monolithic semantic model where do I start and you just throw everything at it. I think that's a fool's eron. um things are going to change, your data is going to change, your user requirements are going to change. You're better off with just pointing adding that kind of metadata,

32:35providing any like rules files on like what key >> uh let's call them tables that you want to join initially and then iterating that over time by that kind of like context engine, context studio idea where you're reviewing these questions that users are asking and the patterns that the elements are making from a you know SQL creation standpoint.

32:52 >> That's absolutely right. Absolutely right. >> Cool. Thank you. Um there's a question that was kind of early on in the the hex demo. Um you know talking about uh reproducibility and related projects. Uh is it something you they can uh you know embed in their own apps?

33:08 >> Yeah. So we just recently released um an M MCP server. Uh MCP server offers um a

33:15few tool calls. The main of which is a threaded tool call which is that kind of uh conversational analytics piece that you can bring in if you're if you're you have an MCB client. And then also it can tap into project search and it will link back to projects. So you can have this of um maybe white labelled or very

33:32custom experience but it's still mimicking threads uh patterns. [clears throat] The other thing in terms of embedded is you can embed the hex apps as well. So if you did one like white label white label embedded entirely not necessarily chat experience but just from a dashboarding perspective or BI embedding capabilities we offer that as well.

33:52 >> Good question. It's a very big hot topic honestly right now. >> Yeah. Yeah. Uh another question from David. Uh how or does the chat with your data uh support adding attachments?

34:05 >> Yeah. Um so currently with hex you can add CSV attachments in it. Um some things we're we're thinking about is is acting as an MCB client here where you can point to say uh a notion doc, you can look at linear tits, you can bring in Google docs. Um today like I said it supports CSV uploads. So you can if you

34:22have uh you know external data sets you have whatever you know reasons you want to bring it in you can and they can kind of reason and join that with your existing data. Um but soon to come and it's definitely a hot topic that we're seeing a lot of customers ask for of bringing in you know maybe even

34:36unstructured data like how do you work with PDFs how do you work with images um so ever evolving and and we're very very keen on bringing that as soon as possible.

34:47 >> Uh okay what's uh what's the best practice for semantic layer projects? Um, [snorts] so they're looking to answer how I should think about uh what each of the projects represents.

34:56 >> Yeah, that's a that's a great question. I think a lot of folks are pondering and also like struggling with this. Um, the biggest pattern that I have seen work well from from our joint customers or just talking to talking to folks in the industry is breaking down semantic models by by um business departments. So you'll have a sales one, you'll have a

35:15marketing one, operations one. Um, it can kind of be a little bit hit or miss.

35:20You definitely want to like practice this over time. I think it's a fine approach to maybe have as kind of a rule of thumb maybe like up to six to 10 tables in one semantic model project at most or if you have a semantic problem like I showcase three three different YAML files. I really wouldn't push it

35:37beyond that. Like maybe six is kind of the sweet spot. Six to 10. Um and I would break those down by by business unit to start.

35:45 >> Yeah, I would just uh I would echo echo that. Um we actually have a there's a lot of prior art here which is uh in the form of the books you know the data warehousing toolkit and so on you know from from the '9s um actually those are those are very relevant here uh they talk a lot about data marts and I have

36:01found marts as a a very effective mental model to transition into thinking about um semantic layers um uh here as well.

36:12 >> Awesome. Um, let me see if there's any more questions that we had in here. Um, there's one, Jacob, that you answered earlier around uh, let's see, what was it? Writing custom functions for ductb in Rust. Uh, I think you shared um, >> oh yeah, sure. Yeah. Yeah. Yeah. There there's a lot of good um, definitely take a look at query farm if you are

36:36curious about uh, C++ extensions and Rust. Um there's a lot of good example extensions and how to tie those things together. Um you know if you've got a specific uh type of data structure or query you're looking at you know the extensibility of ductb is super powerful in kind of tying those things together.

36:56 >> I have a question maybe for you Jacob. >> Yeah sure. [clears throat] >> Like all these LLMs I admittedly haven't seen too much concern for it but I think it it will it will show sooner rather than later. They're naturally generating a bunch of SQL queries under the hood, whether they're like, >> you know, using some for like SQL API to

37:12a degree to to make them query the semantic model, but they're always going to be compiled down to SQL and we have this vast majority of kind of like net new users now writing SQL that the agents are doing for them. Why do you think or is Motherduck best suited for that approach from a from a cost

37:28perspective? >> Yeah, I think the cost model is really interesting. Um, you know, one one thing that that Motherduck can do is provision you uh tens or hundreds of very small instances that you can run queries against and they're very fast. Um, which is crazy because you think small, you're like, "Oh, that's not fast. That can't be it can't be fast, but uh the reality

37:48is actually it is it is pretty it is pretty um scalable at that um at that level." And so that lets us charge you, you know, um, pennies pennies per per

37:59um, per hour and get in the range of like, you know, you can just have users hammering it. And then more importantly, users kind of who are maybe non-technical interacting with maybe an MCP, whether it's the hex MCP or the mother duck MCP or maybe just using, you know, claude or or chatbt or whatever, and just getting SQL out of it. All of

38:18those things can kind of happen in a very safe way. And there's there's two reasons. you know, I think Mother Duck's model architecturally because it is u on the compute side kind of more constrained. Um you can you're not going to kind of blow up your cost by doing that. And the second thing um that I would add to that is a thought that has

38:35just escaped me. Oh uh it's trivial you know to scale out readonly very quickly both locally and uh in the cloud with motherduck because it's backed by ductb and so like that readon notion lets you kind of safely share a data set.

38:49Obviously you can do this with like ROS and grants like this is not like you know um totally groundbreaking there but I think like if you combine that with the fact that it's so cheap to run these queries and and so cost effective um you know I expect that we will see you know tens or hund 10 or 100x increase in

39:07number of queries written and so like how do we do that in a way that like lets our users get to the answer quickly right iterate quickly and correctly right we want those two things [laughter] uh and So I think like our our model is very well suited to that. Um good question. Thanks.

39:25 >> Makes sense. >> Awesome. Uh got a few more minutes left. So if anyone else has any questions or else we can end a little bit early.

39:35Again I uh we're recording this so we'll send out a link afterwards or you can find the recording on website.

39:43Get a last call for questions.

39:56All right, sounds like that is all for now. Well, thanks everyone. Thanks for showing up, listening, and all the great questions that you had. Uh, and everyone have a great rest of your day. Thank you.

40:06 >> All right, thanks. Bye, Gerald. Bye. Bye, everyone. Bye, Jacob.

FAQS

How do Hex and MotherDuck work together for AI-powered analytics?

Hex connects natively to MotherDuck as a data source and uses AI agents to generate SQL queries, build charts, and create full notebook-based analyses from natural language questions. MotherDuck provides the fast, serverless compute backend. Hex adds a semantic modeling layer, a conversational chat interface for business users, and a notebook agent for data teams. Together, they let business users ask questions in plain language while the system generates accurate, governed analytics backed by MotherDuck data.

Why are column descriptions and metadata important for AI-generated SQL?

Adding column descriptions and table comments to your MotherDuck database dramatically improves the accuracy of AI-generated SQL queries. Without metadata, LLMs have to guess what columns like "dead" in a Hacker News dataset mean. With a description, the model knows it refers to content that has been withdrawn or disabled. You can generate initial descriptions using AI (via MotherDuck's built-in prompt() function), then hand-curate them. This is the single easiest step to improve LLM accuracy and should be done before investing in more complex semantic modeling.

What is a semantic model in Hex and how does it improve AI analytics?

A semantic model in Hex defines trusted business metrics, dimensions, relationships between tables, and calculated measures in a structured format. It acts as a governed metadata layer that LLMs use to generate more deterministic and repeatable SQL queries. For example, you can define "revenue" or "customer count" with exact SQL expressions, specify table join relationships, and build these models iteratively using Hex's AI-assisted workbench. Semantic models can be scoped by business department and typically work best with 6-10 tables per project.

How does MotherDuck's read scaling work with tools like Hex?

MotherDuck's read scaling gives each connected user their own isolated compute instance (called a "duckling"). When multiple users access a Hex dashboard or notebook at the same time, each gets their own MotherDuck instance rather than sharing resources. Admins can configure fleet size (up to 16+ ducklings), choose instance sizes, and manage authentication tokens. This prevents users from stepping on each other's queries and keeps performance predictable for interactive analytics.