YouTube

4 Lightning Talks on Practical AI Workflows from Notion, 1Password, MotherDuck & Evidence

2025/04/10Featuring: , , , ,

How Data Teams Are Using AI to Transform Their Workflows

Four data professionals from MotherDuck, Notion, 1Password, and Evidence shared practical approaches to integrating AI into their daily workflows, demonstrating how artificial intelligence is reshaping the modern data stack.

Using Cursor IDE for Rapid BI Development

Archie from Evidence demonstrated how Cursor, an AI-powered IDE built on VS Code, dramatically accelerates the development of data applications. Unlike traditional chat-based interfaces, Cursor provides comprehensive context about your entire codebase, enabling more accurate code generation.

The key advantages include:

  • Automatic awareness of all files and dependencies in your project
  • Integration with documentation (like Evidence docs) for enhanced context
  • Real-time code generation with diff-style visualization
  • Natural language commands for complex tasks

During the demonstration, Cursor successfully generated a complete deep-dive analytics page with multiple components on the first attempt, showcasing its ability to understand both the codebase structure and the specific requirements of BI tools.

Enriching CRM Data with LLMs in Snowflake

Nate from 1Password tackled a common go-to-market challenge: incomplete CRM data. Historically, teams would manually update Salesforce records, with team members updating 20 accounts each morning—a time-consuming and error-prone process.

Using Snowflake's LLM integration, Nate developed an automated approach to classify companies by industry:

Key Implementation Details:

  • Model Selection: Llama models provided the best results for industry classification
  • Prescriptive Boundaries: Defining 10-15 specific industries rather than letting the LLM choose freely
  • Prompt Engineering: Including industry descriptions and definitions for accuracy
  • Data Enrichment: Passing company names, domains, and notes to provide context

The solution achieved over 90% accuracy in returning single-word industry classifications, dramatically reducing manual data entry while improving data quality for territory planning and lead routing.

Automating Data Catalog Documentation at Notion

Evelyn from Notion addressed the perpetual challenge of maintaining data catalog documentation. Despite significant investments in data catalog tools, many organizations struggle with incomplete metadata, rendering these tools less effective.

The Documentation Generation Process:

  1. Context Gathering: Providing SQL definitions, upstream schemas, data types, and internal documentation
  2. Lineage Awareness: Using generated upstream descriptions to ensure consistency across tables
  3. Human Review: All AI-generated descriptions undergo review before publication
  4. Feedback Loop: Table owners can suggest improvements that the LLM incorporates

The system successfully generates table descriptions, column definitions, and example queries, though human oversight remains crucial—especially for nuanced details like date partitioning that could lead to expensive query mistakes.

Streamlining Data Pipeline Development with MCP

Mehdi from MotherDuck showcased how the Model Context Protocol (MCP) revolutionizes data pipeline development. Traditional data engineering involves slow feedback loops between writing, testing, and debugging code against actual data sources.

MCP enables LLMs to:

  • Execute queries directly against data sources
  • Validate schemas and data types in real-time
  • Generate and test DBT models automatically
  • Provision data directly in cloud warehouses

The demonstration showed an LLM independently:

  1. Querying S3 files to understand data structure
  2. Handling errors (like type mismatches) through iterative testing
  3. Creating validated DBT staging models
  4. Loading processed data into MotherDuck

This approach significantly reduces the traditional back-and-forth between code generation and testing, though it requires guidance to follow senior engineering best practices rather than brute-force solutions.

Common Challenges and Best Practices

Trust and Validation

All panelists emphasized the importance of skepticism when reviewing AI-generated outputs. Results often appear reasonable but may contain subtle errors that only domain expertise can catch. The recommendation: always implement human review processes, especially for production systems.

Model Selection Matters

Different models excel at different tasks. While GPT-4 might excel at product specification, Claude often performs better for code implementation. Mistral's Code Stral model specifically targets code generation without unnecessary markdown explanations. Teams should evaluate multiple models for their specific use cases.

Shifting Skill Requirements

AI tools are changing how data professionals spend their time:

  • Less time writing boilerplate code: AI handles routine coding tasks
  • More time reviewing and validating: Engineers become code reviewers rather than writers
  • Focus on patterns and architecture: Understanding the "why" becomes more important than the "how"
  • Reduced interruptions: Fewer requests to senior engineers for basic questions

The Junior Engineer Challenge

A surprising challenge emerged around supporting junior team members. As senior engineers become more self-sufficient with AI tools, they may inadvertently provide less mentorship. Teams need to actively ensure junior members receive adequate support and aren't just relying on AI without understanding fundamentals.

Key Takeaways for Implementation

  1. Start with Sandboxed Environments: Test AI workflows in controlled settings before production deployment
  2. Provide Rich Context: The quality of AI outputs directly correlates with the metadata and context provided
  3. Maintain Human Oversight: AI accelerates workflows but doesn't replace the need for expert validation
  4. Document AI Boundaries: Clearly define what AI should and shouldn't do in your workflows
  5. Iterate on Prompts: Invest time in crafting effective prompts rather than accepting first results

The consensus among panelists: AI tools are transforming data workflows by eliminating routine tasks and accelerating development cycles. However, success requires thoughtful implementation, continuous validation, and a clear understanding that these tools augment rather than replace human expertise. As data teams adopt these technologies, the focus shifts from manual execution to strategic thinking and quality assurance—ultimately enabling teams to deliver more value in less time.

0:00All right, let's get started here. I'm going to do So, what we'll do is I'm going to share a quick couple slides here on Mother Duck and then we will um uh we will hop into some lightning talks and then we'll have a panel discussion. So, I'm really really excited to see uh to see the crew here.

0:19I'm going to do a quick screen share um about what Mother Duck is and how we think about ourselves here uh at Mother Duck. and uh we will do an intro for our

0:32panelists. So um let's get into it. Thanks everybody. All right, cool. Um so quick intro about what we're up to at Motherduck.

0:45Motherduck is a uh cloud data warehouse built for um the post big data era. And

0:53so I want to quickly talk about what that means. Um well, as it turns out, when a lot of the things that were currently built that you've all used and know and love, like BigQuery for example, uh in 2006, the top laptop had one core and 2 gigs of RAM. Uh today on a MacBook Pro, you can get 40 cores and

1:1236 gigs of RAM. And our and of course our EC2 instances are even bigger, like 448 cores and 24 tab of RAM. So um what

1:23this means is that now we have a lot of power available on one machine and the

1:30amount of data that we thought we were going to see uh ended up being much smaller for analytical data sets than than what uh uh uh the projections looked like in about 2006. And so we've built a data warehouse built on top of DuctTB that has uh uh built in that new

1:49paradigm. Um there's a couple of proof points here, right? Um there's some research. Um but really what we're thinking about is building data warehousing and analytics for 99% of businesses. Um it's serverless. It accelerates business value. It's built on DuckDB which is an amazing analytics research-driven uh analytics library.

2:10So, with that said, let's do some quick intros. Um, I am Jacob Matson and I am

2:16the uh or one of the dev dev advocates here uh at Motherduck along with Medie who is here too. Medie, do you want to do say hi and quickly introduce yourself?

2:30Yes. Uh yes, I'm also a data engineer and devel at motoduck and I've been doing data engineer for almost a more than a decade actually now times why.

2:41That's funny. Thanks. Uh Evelyn, you want to do a quick intro? Yeah. Hi, I'm Evelyn. We I'm a data engineer at Ocean.

2:51Awesome. Thanks, Evelyn. Um Archie, you want to quickly quickly run through here? Hi. Um yeah, I um run the go to

2:59market team at evidence. Um but before that, I um ran the BI team at a company called Patch in uh in London.

3:08Excellent. Thank you, Archie. And last, we have Nate here on on the panel. Hey, yeah, I'm Nate. I work at One Password.

3:14Just joined about a month ago to manage the revenue operations analytics squad. So, we call ourselves the insights team.

3:20Previously was at Smart Sheet for for several years running the analytics engineering team and other pieces there.

3:24So, been all over the data stack. Cool. Thank you, Nate. Really appreciate it. Um, all right. So, with that being said, we're going to hop into the first part of our session today, which is lightning talks. And so, uh, we're going to really talk about how we find AI to be useful today. And, uh, what this means for kind

3:44of us as we think about different personas. Um, and I think up first we have Archie.

3:53Thanks, Jacob. Uh yeah, really really excited to be here today. Um I'm actually personally uh you know uh the Melduct team reached out to us say can you do a lightning talk and uh I you know said of course but personally actually really excited to hear some of the other ones um uh that are coming up as well. I think I think they sound

4:11really interesting. So um today I'm going to talk about um how I use cursor um to improve like my speed of iteration uh when I'm working on on data projects and in in particularly on on data apps.

4:27Um it's been a pretty transformational tool for me. I've probably been I guess I downloaded it uh sometime in 2024 having you know until then been primarily using you know uh like hosted chat uh as my main interface with LLMs or you know like chat GPT or anthropics um chat based uh chatbased interfaces um

4:51and I've been I guess like continually surprised uh by how how good this is.

4:57So, uh, this is going to be a screen share. This is cursor. Um, if you're familiar with VS Code, it's going to feel very similar. It's actually, um, you know, built on top of, uh, on top of VS Code. Um, and the the kind of the difference here is that you have um you just have more uh the uh you just have more context

5:18that you're able to give uh the LLMs than if you're using like a chat based uh a chatbased solution where you kind of have to every time you start a chat you have to say you know this is the code I'm working on and you know this this is what I've got from it. um quick intro to like I guess what evidence is

5:34and the reason for that is I'm going to talk about how I use uh uh cursor with evidence but cursor is honestly a great tool for um uh for writing uh any kind of code. Um so evidence is a uh BI tool that allows you to version control your uh your reports. Um so you write kind of

5:54markdown uh with these special components um and then you get uh you get sort of BI based output. So you know what I actually need to share my whole screen so you can see my uh browser as well. Um here we go. Okay. So here I

6:11have got um this page I go home uh and

6:16this markdown is is producing this uh this page here. So I've got you know like a title um this like details component here um which is name populated here. Anyway I'm not really here to talk about evidence. I'm only here to talk about curs. So let me let me do that. So the thing that I think um

6:34makes this really unique is that it already knows about everything in your codebase. So I can do something. I'm going to um pop open the the chat window. I'm going to create a new one uh and say okay uh I want to create a deep

6:52dive page uh about orders and uh you know what I'm going to change that and because I forgot to include the doc. So the other thing that's kind of nice here is that you can include like extra context and one of the things I've done here is I've loaded the evidence docs uh into um intercursor and you can do that

7:20really easily just you just point the URL uh where your docs are hosted and it indexes the the website for you. Um so I just added that clear and revert. Okay, so now it's going to have another go.

7:35Um, and if I just make this a little bigger for a

7:42bit, so it starts by like finding uh

7:46finding what's in your co code basease and it's kind of going to do like a different mission every time. Um, in this case, it's generated me a new a new page here. Um, and it's created some components. So this is this page that has generated and um it's also kind of uses this sort of like diff style

8:04um visualization to help you understand like what it's changed and you you know you can have a test of it before you. So let's go see like if the page um that was generated is working or deep dive. I mean okay so it generated all of this code to make this page uh on its own um

8:24and it worked first time which I was actually kind of surprised about. I was hoping that it would throw some errors so that I could then uh use uh some of the tools to show how um fixing errors, but I guess um I can do some other stuff here. So, it's like um in natural language um okay, so this chart's kind

8:42of screwed up um because I can't see any of the um any of the item labels on the state names. So, let me see if I can correct that. Um, I can't see any of the

8:57items of uh on the state

9:05access. Uh, can we put it on the

9:11uh yaxis instead?

9:20So, it's had a go there. Nope. Not done that very well. Let's go have a look at what it's done.

9:27So, I'm going to accept where it is at the moment. So, I can then see what it's changing. Sales sales.

9:45Check your spelling on sales. Oh yeah. Yeah. Yeah. Okay. Um. Oh yeah. Okay. Do

9:55evidence. Uh

10:02syax orientation.

10:17What? Well, this doesn't want to time. I actually have to know what the right syntax is for. But that's what it is.

10:24Swap

10:32XY. It's bad.

10:40Okay. So, I managed to fix this myself. There you go. All right. Well, um so that's kind of uh the main interface of curs. So, you're using this um this kind of chat interface. There's also this command k interface where you go um here and you go add uh isol to

10:58this chart. [Music] Awesome. Um, and then it's just generated that for me. So, those kind of the the two main things. All right.

11:12Thank you. I think I'm out of time. Um,

11:17that was uh that was really cool. Thanks for thanks for showing us kind of how uh how you can use a AI assistant kind of within within a uh coding workflow there as it relates to BI. That's really cool to see. Uh, all right, Nate, we have you up next here on what you're up to.

11:35Sounds great. Yeah, I love seeing that Archie the cursor. I I right now I'm going to chat GBT over here and doing other pieces. So, it's inspiring me to go take a look at that I think a little bit later. So, that's really cool to see. Uh, my use case. So, I have worked for many years on go to market teams

11:50both on the go to market side as a customer success rep and now I'm in the analytics side. So, on go to market, it's a unique set of analytics problems.

11:59A lot of times CRM hygiene like Salesforce records is a big deal for those go to market squads. The data is hard to get. There's a lot of tools people buy and and integrate in to try to get I don't know headquarters information, uh employee counts, revenue, like all the kind of typical ways you'd cut an account for territory

12:19creation, for lead routing, all that kind of all that kind of stuff. So, historically that's been really manual with a clean. My first time at Smart Sheet several years ago, we just all split it up and said, "Hey, everyone, take 20 accounts at a time. Every morning when you come in, drink your coffee, update 20 records, and we'll

12:35keep our records updated that way." Was was quite literally the way we did it.

12:39Um, recently with LLMs, now I had a chance to experiment with updating records using logic with an LLM. And so, this is a use case coming from an actual like real life. I got to use it within Snowflake. So, I was actually inspired by Jacob tweeted out showing uh in motherdoc and TV that you could use LLMs

12:59in SQL and I said, "Wait, Snowflake has this, I think." And I had access to it.

13:03So, I went over there and it was a chance to so I'll give kind of the like the world of how you do it. And I have a couple I'll actually share kind of the end queries I put together. Um, this is a preview for I'll go more in depth in it at data council here in a couple

13:17weeks down in Oakland. So, this is kind of a preview of a talk there.

13:21So within CRM's industry is the example that that I got to actually apply this to. It was just, you know, where where should we go in the market? What industry should we tackle? Where do we sell? Well, was kind of the general question. And looking at the data, if you bar charted it out, the nullles were 90% of the accounts and

13:40there was a bunch of I don't know 50 different industries with little bits of it. So really not usable. It couldn't really make a analysis for anybody to go use.

13:49So, interestingly enough, Snowflake has the ability to choose all sorts of LLMs to go and leverage to fill in that data.

13:58So, I had to experiment with a bunch of them and and come up with some of the better ones to use. A Llama model in particular was the one that seemed to have the best results. Um, in terms of a workflow, I'll kind of map out. It's not like you go in say, "Hey, here's a bunch of accounts that I have in my CRM.

14:12Return industries." Like, it just doesn't work as as easily as you'd want it there. Um the first piece of advice and this is typical within the sales world is you want to first define what industries you want the LLM you need to be pretty descriptive to it prescriptive. So it's not like say hey return industries it will choose it will

14:32choose an industry but it won't be necessarily the ones that are usable for your team. So having a mind towards making it giving it bars to fit industries inside was crucial. there's 10 or 15 or so that I said, "Hey, let's go with these 10 or 15 industries would be a great chance to to start." Um, and

14:49so that gave it kind of boundaries to to stay within. um when I first attempted to then say I the query literally is you you concatenate in whatever you want in your request the the natural language and then you pass in I don't know the name of the account the domain of the account you pass in maybe there's some

15:09notes about the account that that would be relevant um things like that that would help the LLM go and find information about it somewhere on the internet so when you do that initially even with a limited set of industries it will return a sentence back to you. I'm sure all of you have used LMS and and they come back with a five paragraph

15:26response and in that it'll say construction like I found an industry for interestingly enough it's constru it's like impossible to to parse it out.

15:34So there was a bit of prompt engineering required and experimentation to have it return one word which was great.

15:42Eventually got down to like one word or phrase for over 90% of them which is totally fine to clean up a few stragglers. The problem was it wasn't as accurate as I wanted. So, uh, what's really nice is it was really quick to iterate and and I could really quickly say, okay, the the key at the end was to

15:59bring in the industry description and say for hospitality, this is what I mean for construction materials, this is what I mean. And once I did that, the the key drop the the coin dropped in the slot and all sorts of account like the accounts came in like accurately and how I wanted it to to be. Um, it was one

16:17word and like the accuracy was pretty dang good. So I'll show kind of the the actual query that I wrote uh as a preview as well as a chance to see kind of the results of it. I did a test data set from I live a little north of Seattle so of some different companies up around here that that's sharable and

16:33it was essentially the same accuracy as I saw when doing it in reality. So this is the query here. Um, again, it's a little small for for some screens here or if you're on a phone, but I concatenated in like here's the list of industries, like here's the definition of each, and then I just passed in company name for the test here. But you

16:53can pass in all sorts of CRM fields. Really valuable to be able to do that.

16:57And it returned back the company name and the industry from the LLM over here.

17:01Um, and for accuracy sake, I went and checked through, hey, like how accurate are these? So this is an example of the the 25 or 30 that I have here. And this is essentially the same as I saw using um a couple thousand accounts in in a different scenario in the for a company.

17:16Uh and you can see there's some that are wrong like Achilles USA was not technology was manufacturing industrial.

17:21So you can go through and find those. Some of them are like close. Uh but many of them were correct. And these are these companies in particular are just small ones up here um north of Seattle.

17:30So there's not really a lot of data online for them. So, um, it gets more and more accurate I found the bigger you get because there's just more information on the internet. But that's just a quick example of being able to use an LLM. There's a lot of ways I want to apply it. Uh, a lot of different ways

17:43to do it. So, I'm exploring that. Um, but I'm really excited to the CRM enrichment side of things has always been super manual and so it's exciting to see this this use case. Awesome.

17:53Thank you, Nate. That was uh, perfect and right on time. So much appreciated. Um, obviously, um, I linked a few things, uh, in the chat there, including an article that Nate did about this. And again, I will repeat, he will be at data council doing a longer in-depth version of this, kind of really getting into the weeds on it. Highly encourage you to,

18:13uh, make it out there if you are, uh, in the Bay Area. Um, all right. So, next up

18:19here, uh, we have Evelyn. Are you ready to go? Yeah, I'm ready. Okay, let's do it.

18:27Okay. So, um yeah, today I'll be talking about leveraging LLMs for your data catalog.

18:34Um we do this currently at Notion. Uh and I want to show you how. So, um you're probably familiar with this problem. You've onboarded your data catalog. You spent a lot of time setting it up. Uh maybe you've entered like some uh very engaged negotiations with uh your third party vendor uh and you have a great contract, but uh when you

19:04actually come to spin up your data catalog and users are using it, uh they don't find a lot of value from it. And one of the main reasons is because a lot of the metadata isn't filled out. So this may be famili familiar to you where you have no table descriptions. You can see here no documentation yet in your

19:25column descriptions are all missing. And then we have a very sad duck there uh because uh stakeholders can't self-s serve their data needs. And this is kind of like a like a feedback loop. So stakeholders can't self-s serve the data needs and then they go to the analyst or data team to try and uh you know create

19:44the reports and then the data scientist or the data analyst or whoever owns that data set uh doesn't have time to fill in the missing descriptions because they're busy with stakeholder requests. So uh basically the key takeaway here is that a catalog is only as useful as metadata and LLMs can really automate that uh kind of tedious process out that

20:07no one wants to do. Uh so in terms of the description generation process itself, this is quite a simplified diagram of course. Uh but the key thing here is that when you're uh supplying the metadata, you need to provide as much context as possible. So at notion we provide a SQL we provide uh the upstream kind of definitions that we

20:28call the JSON schema. Uh we actually ingest internal notion docs uh into the LLM as well. Uh we give data types and another big thing here is that uh we actually use generated upstream descriptions and we make sure to ingest that in uh to the requirements prompt.

20:51uh and that way uh you don't you kind of reduce the chance that uh one table has uh a given description for a column and then a downstream table uses the same column. It's just a select star type of thing uh and it should have the same description but uh it's changed and uh providing that kind of metadata of the

21:11lineage uh is super important to prevent that. And then the final thing here is to review feedback. And this is quite important because uh the worst thing you can do is have a description that is incorrect and stakeholders uh use that uh description for their queries and it turns out they're using it incorrectly.

21:29Uh so before we sync any descriptions, they go in for human review and we have a automated process for that to tag the table owner. Uh and they can make changes or suggestions to the LM and then the LM can regenerate based on their feedback.

21:44uh and incorporate that uh into its memory. Um so yeah, that's a very quick summary of how it works. There's a lot of details there, but you can uh view the blog uh which I'll link uh at the end of this chat uh if you want more details. So here are some real world examples. Uh you can see on the left

22:03here is a column description. Uh is first active space member is quite a uh

22:11cryptic kind of column name. uh doesn't mean like the date that this member was first active, but it's a boolean. What does this mean? And you can see the LLM actually passed it out. It's a boolean flag indicating if the workspace is the first one where the member had an active event based on crossworkspace activity.

22:29So, it's basically uh is this workspace uh the first workspace that had any activity? And that clears things up uh great and uh is actually correct. Uh you can see here in the middle here, it's a table description. Uh it's an excerpt from one of our biggest tables, but you can see here it provides a lot of uh

22:47useful and correct examples of how you would want to use this table. And finally, as a counter example, unfortunately in this uh generated description uh for an example query I provided, uh it looks good uh on the surface. It does some cohort analysis and the SQL looks good. But uh if you actually know about the table demo user, it's actually date

23:10partitioned. So this will actually scan all date partitions and uh it will be a very expensive query. Uh so that's kind of why we need the human review process to kind of prevent stuff like this. So yeah, that's a very quick summary. Uh but uh if you want to check out more details, scan this QR code on

23:28the left and we have a blog post about it. Uh we're also hiring. If you're interested in a infrastructure physician on our team, uh you can scan this one.

23:38And then if you want to connect on LinkedIn as kind of the last one here.

23:41So yeah, thanks for listening. Awesome. Thank you, Evelyn. That's really cool. I I love that. Um, you know, there was a layout an overview of the process there.

23:51You know, I think I think a lot of this is is uh you know, no silver bullets, right? Like uh we still need the humans to do the things that only humans can do. Um, and I think that's you know, some of the promise of of how we can leverage AI. Really cool. Um, do you want to drop the link in the chat as

24:09well, Evelyn? That would be really cool. Thank you. Um, all right, Medie, are you ready to go? So, yeah, I wanted to talk about, uh, specifically developing data pipeline for data engineers. Um, so, uh,

24:25the challenge basically is that, uh, the feedback loop for building data pipelines, uh, can be actually pretty slow. This is a reference from the great book. If you haven't read it as a data engineer, I would recommend it from Jesus and Matt Hosley Fonts of data engineering. But basically, as you can see, the data engineering life cycle,

24:46you have these box here, uh, the black box, and you have three steps. Um, and those steps can be really slow and complicated because that's can be different tools and also you're really tight with this gray area which is storage. So in comparison to building a website, you know, you build some JavaScript and then you have like quick

25:07feedback and you can basically mock everything. You don't need really production data. And so that's basically dependency can really slow down, you know, the whole uh cycle. And so we have um today uh basically um AI copilot that

25:26help us to write code faster. We saw it with Archie and Cursor. And classically the steps is you write a prompt, right?

25:34And then the AI is going to generate it uh codes and then you you're gonna have to test that code against that dependency which is data. Um and so this loop can be uh can be quite painful uh especially if like the AI is making a mistake obviously. So you kind of reprompt and then you test it. And so

25:54the question is how you can you know lower this feedback loop. Um and the answer is basically with uh MCP. So MCP stands for model context protocol and you you may have seen it all the rage around. Um and it it can you can see it basically where you give your uh LLM uh a certain plugging action that it can

26:17action to to do something and here it can execute queries read the result and interpreted the result accordingly. So basically instead of you uh kind of like uh going and tested cow the code the query he can you know directly by himself if you give the proper access and you have the proper MCP for a given database execute the query um and do it.

26:42So um this is a small demo basically um that I'm going to show you here is a prompt. It's pretty extensive but basically I wanted to have I have a couple of different source of data. So it's come like really the injection part where I do have uh you know API search from GitHub Stark overflow and I provide

27:01you know uh S3 park uh file path aur news and then I say you know extracts uh the data from there into dbt project folder and give me some insight with a model. So uh here let's let's go over quickly the the the demo video here. Um but so basically uh it's going

27:25to load. I think it should be better quality now. But basically I'm in cursor. In cursor you can set up uh MCP. And so there is an MCP for mock and duct DB. So here I'm going to use it for uh you know just starting for Doug DB and Doug DB behind the scene is going to uh query the S3 file for me. So the

27:45model is going to be able to uh to query the data for me and do this iteration where he kind of try the model the the schema and define the query. So um here I pass the prompt I just uh give you before. Um I'm just going to speed it up. And as you can see here what is

28:04interesting is that it's prompting the MCP. So you say hey uh I can read from S3 and basically execute that query directly. Would you like me to do it?

28:14Yes. So I do it. So he he figure out the the the query what needs to do you know to explore the data get the results and then now you can you know define specific query which is basically the feedback loop you do as a data engineer as a person where you're going to inspect the source data then write the

28:32data and you see here it's going to it's going to do mistake actually it's going to you know um try uh to filter by year which is uh here an integer but it's actually a string. So we'll come back how to work around uh those things because he kind of like directly query the data. Um but you see it's basically

28:51just this feedback loop going on where he's exploring the data and then after that he's creating my uh my DBT uh model. So you see I have on the right side a couple of staging models that's been created and it's validated. It's already been run uh you know through the MCP. So I don't have to really just do

29:12this back and forth uh step to try it out. It's going to it's going to work straightforward. Um another thing I wanted to show is that you can also ask it to you know provision the data directly in moduck with this MCP.

29:24Everything was working locally and so I asked can you load actually all this data that you've uh generated the staging table directly in modd. And so that's what is happening here. I guess I'm just clicking yes for for the MCP and then I can go to the mo.qi and I see uh my staging data that has been uh been

29:43there and I can do you know manual exploration if I want. So um yeah so

29:49roughly uh basically uh the the key takeaway here it's the timer. The key takeaway here is that you can optimize uh your workflow in different ways.

30:01First you can uh share the lms.txt txt a lot of documentation website provide lens.txt txt which is a new standard to basically be a more AI friendly. So if you pass the this single link you have all the documentation up to date from motherduck. Okay DDB is also working on this lnm.txt. There is um a lightweight version with links. Uh you can add

30:23specific rules. You can say recommending validating queries to the MCP. you can create uh uh local sample data and you can also you know ask him to describe the table instead of trying to quering directly and have an failure. So just describe the schema get the right schema so that the integer and string issue that we saw just before uh doesn't

30:45happen. Uh so you have uh the MCP which is uh on GitHub uh for duck db and moduck. I would recommend you to play around. uh we have cursor that's uh that's a great things to try and that's it for me. Oops. Awesome. Thanks, buddy.

31:01Um I was on mute there. Uh that was really cool. Um really really appreciate you sharing that. Um I have more to come. Uh I I will plug that I'm doing uh more than just a vibe SQL that actually works as a workshop also at data council uh in about two weeks here. So uh definitely going to build a lot on a lot

31:20of the workflows and things that have been shown. um today I'm super super excited uh great timing here uh so that we can all kind of uh share some ideas.

31:30So uh you know these are all really excellent and fascinating uh use cases. Um I think so for our first kind of question here what I'm thinking about is like what is one unexpected benefit or a surprising challenge that you encountered when you were integrating AI you know into your into your workflow that you didn't really talk about in

31:50your talk? Um and so on this one we'll we'll start with Medi. Yeah. So so LMS are really smart of like generating code for data engineer and best practice but like the actual human workflow is kind of like not there. So like the the example I gave in the demo is that you know it's a it's a model is

32:12like okay let me try to query you know it's fail I'm trying again right? But actually you know a senior data engineer would say okay I don't need to query this file it's a parking file with a describe table the query is faster I get the right schema and then I can you know be sure my next query will not fail. So

32:31like guiding uh basically uh those LLMs with like proper data engineer behavior workflow was kind of like unexpected but kind of also normal. They're really good at, you know, generating code, but the way that they're going to iterate is is is a bit different than like I would say a human workflow.

32:52Yeah. Yeah. To totally makes sense. Um, yeah. I was working on something the other day where uh I needed to convert a schema to XML and uh I just asked the uh the LLM to do it and I kind of reflected on that later and I was like, "Oh, that was that was kind of a weird thing to

33:09have just the AI just do the conversion." Um, anyways, uh, Evelyn, what about you? Like, what were some surprising challenges that you encountered when kind of, you know, tackling these problems? Uh, yeah, actually Nate kind of, uh, touched on this one where, uh, basically when we were creating the description process, uh, we found that the LMS were actually

33:30too compliant and too nice in their responses. Uh, they were too indirect. So basically we had the same problem Nate had where we're like return a response in the following JSON and only return that JSON. Uh and then it was always like here's the response you requested. Here's the JSON. Uh even providing examples and asking it to

33:50check its work uh didn't work. Uh so we actually had to write some Python extraction uh to get only the JSON using uh regular expressions. But uh yeah, unfortunately uh it's too eager to help.

34:04Uh so yeah, totally. Yeah, it's definitely uh I've I've heard that Gemini Pro is a little bit meaner. Um I haven't tested it yet, but something to keep in mind. I do think they're very compliant and it's really hard when when you're trying to negative prompt like don't do this or you know shape it a little bit. Um it's really really good

34:26framing. Um uh by the way, we we will see if we have time for Q&A at the end.

34:31Um if you if you want to drop it, there's a Q&A section in the bottom of your Zoom chat. Just drop your questions in there and then we'll try to get them um we'll try to get them at the end. Um uh Archie, what kind of uh what kind of surprising challenges or or benefit that you that you've seen in doing this?

34:54You're muted, Archie.

34:58Thank you. Sorry, just trying to think of a unique one because uh I think hallucination is is probably the the most obvious but kind of least interesting one.

35:09Um I'd say um it's like like picking choosing the

35:17right time to like reach for an LLM tool. Um, I sometimes like spin my wheels for longer than I should. Um, trying to like prompt engineer my way to a solution. Um, when there is just a better way. Um, which is often like going and reading the docs of the thing you're trying to use. um which you know

35:37you're just like hopefully hoping that the LLM has enough context about to kind of to generate but so often like you realize um you've been like 15 minutes in and you just like installed the tool the wrong way to begin with because the LLM was like just pip install it and you're like oh great pip install and

35:54then you you find that like pip install is not the right solution. Um so yeah I think that would be my my one. Um yeah that that totally makes sense. I I think like um you know one thing I've noticed kind of in my workflows is uh especially when you're using kind of like newer libraries that aren't like

36:13hydrated in the training data set um and even if you add them to like the docs in in cursor or winds surf um as like a

36:23ragep sometimes you just need to read the docs I've definitely done the hey let's iterate on this prompt for 30 minutes and then go read the docs and it's like oh here's the exact syntax of the thing I needed Um, so yeah, totally totally feel that one. Um, yeah. And then I see Medi. Uh, yeah, don't do PIP.

36:43UV is already installed. I'll just do a quick shout out. UV is amazing with LLMs. It simplifies a bunch of workflows. There was a comment earlier in the chat like, "Hey, how do you guys think about Python MCP?" And um, I haven't thought about it. And that's just because uh UV has been so good at running local Python that I haven't

37:02really um needed to kind of reach out. But obviously um you know for those of you who who are kind of in the Python world uh we've all spent so long wrestling our environments that some of that stuff is second nature. So um anyways it's been it's definitely been a fun kind of adventure figuring out how to connect those. Um Nate, do you have

37:23anything to add here um to this? The the one thing I'll add is when I made the industry piece, it was designed to be like an ad hoc thing. I probably should have expected this doing analytics for so long, but it it was like so popular.

37:35It's like, okay, can we do sub industry? How about by sick codes? How about that?

37:39And it became like how do I productionalize this? So that's if like you when you make a useful thing, it all of a sudden you're like, wait, is this my job now? So anyway, that's like the one piece that was like I should have expected it in hindsight, but that's the challenge is like how do you productionalize this like SQL code I

37:53wrote in Snowflake into something that's like useful for the team. So that was that was a big piece for me. Yeah, that's super super interesting. Yeah, I think the other part of that too is like, you know, especially um uh if you're using it kind of in like DBT models for example, making sure you're not just like prompting, you know,

38:10hundreds of thousands or millions of rows every D every DBT run, right? like actually you you have to be a little bit more um intentional with your with your incremental builds if you're going to use something like AI in the loop um you know in your in your runs which which I think is I think it's intuitive like

38:28once you do it but like you know uh you know it's a kind of a forcing function there so really really really interesting Nate um let let's connect some dots here so um hearing each other are are there common principles or some key differences is and how you evaluate, validate, and ultimately trust the output of the AI tools you're using

38:50across across your workflow. Um, Evelyn, let's start with you on this one. Uh, yeah, I think throughout the talks, we've heard that uh gathering enough metadata and iterating on your results is pretty important.

39:07Uh if you're just starting out uh using AI, I would caution you uh to not trust

39:14the first result it gives out and publish that. Uh but uh either iterate on it and have a human in the loop or make sure you have enough metadata and provide enough context that it can give a decentish uh result. I would say yeah.

39:32Yeah, definitely definitely true there. Like a lot of what what I found too is that like when you start the you start down this road of asking these questions and then you of course discover oh if I had this set of metadata but we didn't collect it so now you got to go back to the start and you know continue the

39:48process. It's definitely challenging um from that perspective to totally agree there. Um Archie what do you think?

39:57Um I think something I found is picking the right model um is surprisingly important sometimes. Um, so, uh, I, you

40:05know, I haven't, I would not consider myself like an expert in the AI ecosystem. Um, but, uh, every now and then you kind of come come across a thing that, you know, my my default choice probably like a lot of people's for for a while has been to use OpenAI products um, and their like uh, their models um, because they've been quite

40:25performant for like what I've been trying to do. Um but more recently I was trying to generate code um from uh but

40:33without like just generate code don't generate like comments don't generate introduction don't generate like markdown to explain what you're doing that kind of thing. Um and like have it generate code that's right first time and and generate code that's really fast. Um, and after a little bit of research, I actually found that Merest's like code straw model, which they

40:52released kind of end of last year, the beginning of this year, much better at this, like order like I would say an order of magnitude better. Like it never uses uh never produces markdown like explaining what's going on. Um, and uh, it's very good at the like I think 18 18 or 20 languages that it's that it's been

41:12like specifically trained on. It's it's very excellent at. Um, so pick picking the right model um is sometimes uh a good thing to to do and it it's kind of can be a bit overwhelming because there's so many um sort of large tech companies uh shipping models at the moment. Um, but you know there's that classic internet research task of like

41:36finding the 10 Reddit people saying when um uh you know what they were trying to do and which models they've tried and then you you know try a couple of them and find out. Um yeah, the model name was uh Nestral. It was called Code Straw. C O D E S T A R L. Um and yeah,

41:55it's specifically designed to to generate code. Um they it's quite nice. They you can try it for free. They have like a if you loging in and create a misal account, if you're using it kind of just for like within your own ID environment, you can just get a key for that and it's in beta at the moment. So

42:12you don't need to worry about the credits. Yeah, that's interesting. Yeah, I mean I do think that you hit on something really interesting there, which is uh at least for me as I've been, you know, using uh AI more and more, like being able

42:28to I'm I'm going to say like do on the-fly evals for the specific problem you're solving is like actually really important and very annoying. Like depending on the context of the question you're asking, uh different models can be better. For example, when I'm doing what I would call um like typical product manager type work, like hey, I'm

42:48going to define define a spec um for something I want to build. Uh chat GBT or GPT40 is amazing, but when I try to implement, you know, the code in cursor with 40, it doesn't work well, whereas like cloud just works better in that case. And it's like why why are these like this? How do you know this? It's

43:07been super challenging. um you know becoming an expert in like doing eval or or something like on the fly. Um anyways

43:16uh Nate what about you like how do you kind of think about this especially in the context of like being in you know uh like GDM like go to market facing team.

43:24Yeah for me and this came up Evelyn as well in yours is the skepticality of the output. It can look really reasonable.

43:30Like in my case, the industries look super reasonable. And being skeptical of it, I think it's something if you've been in analytics for any amount of time, you've learned that a bit. But with these, they can look really good and it's you're like, okay, you got to remember date partitions here. And you might not think about that. So the

43:44having an the temptation is to think it solved the problem. Let me move on to other stuff with documentation in my case with filling out the industry information. So that that's the line I see being common is it's really tempting to think it's solved, but you have to have an extra attention to skepticality, I think, with with the results you get.

44:03Yeah, totally. Totally. Um especially because they're so compliant, right? Just like, "Hey, wow, you're so smart.

44:10Like, thanks for telling me that." Like, yes, obviously you were right. They butter you up every time. You're like, "Okay, like lay off you. Please, please stop." Uh it's very funny. Uh all right.

44:22It's very very good. Um so you know obviously our workflows have all kind of changed quite a bit you know as we've figured out how to integrate um uh AI into these into our workflows. Um it's and it's not just about speed, right? So I guess my my next kind of set of questions here is like how has

44:42incorporating these tools shifted the skills you rely on most? Like are you spending less time in certain tasks and more on others? um you know how how are you thinking about that? Um Archie, why don't we start with you on that?

44:56Yeah. Um I think the like biggest shift

45:00for me has been um taking up less of my team's time for uh for things that are outside of my comfort zone. Um so um we

45:12you know have a pretty small team here but we're split kind of between the mainly between like go to market side and and the engineering team and engineering is a like pretty precious commodity in any business that has an engineering function.

45:24Um, but I, you know, I after getting stuck on something, like I think if you'd talked to me like a couple of years ago, like my main point of resolution would have been like, okay, like I'm going to go, you know, either block half an hour in in someone's diary later in in their calendar later or um

45:42or, you know, go knock on their door and say, "Hey, can you help me with this thing?" You know, they're almost certainly, you know, they're always going to say yes. um are pretty much always going to say yes even if they're like in the middle of their flow uh doing something else.

45:56But I I think it's given me like a better chance of like self self-resolving like I say I probably like 50 it's probably reduced the amount of things I asked them by like 50%. Um which uh which is pretty nice. Um yeah, having said that uh nice of them. Having said that, that does often mean that I do things that

46:18are wrong. Uh I have like high confidence higher confidence in like the things that I do. Uh because I have like super analysts working for me. Um yeah.

46:26Yeah. So it'd be kind of nice if um if your team could have visibility on like the things they don't ask you. You know what I mean? So you can like intercept later. You see see all the things your team are asking to the AI agents and you'll be like no no no no.

46:41Yeah. Yeah. Yeah. That's so that's so true. Like definitely especially when you're getting into a domain that maybe you're less familiar with like I don't really know a lot about front end but our docs are built at motherduck in docysur which is based on react and so sometimes you need to get kind of into the into the guts a little bit and

46:58that's something that now I can do by myself right like I I'm not reaching out to someone on our front end team being like hey I don't know react very well how do I change this thing I just you know can hop into it and do it although uh yeah if the is the implementation ideal I kind of look at it and say I

47:13don't know like okay then then maybe I'll ask for review. Um but I think knowing to ask for review is also is also hard and um you know that is also takes a lot of time. So figuring out how to do that is is really interesting. Um Nate what about you? So the main piece for me is I spend

47:33less time code smithing which I think is true for most of us. It's the key for me is to take a breath and understand the patterns that I should be using. Like how should I a good example is the UV setup right? like it's going to have you just dive into just advice on how to set up your environment on your computer. I

47:47was just doing that this week as I'm at a new job. You need to set up environments and things. And so taking a pause for myself to understand and take time to read how others are like the patterns of work they use when setting up their environments or different tasks to run. That's the key for me is um my

48:02temptation just to get into code and so to understand patterns. That's been the shift for me that I focused on before but it seems even more important now with LMS. Yeah, totally. Totally true. I find myself definitely when I'm especially when I'm working on like more complex features is like uh I'll just jump into like coding it and then

48:22sometimes I just have to throw that away and then spend some time iterating on what my prompt is and then I've like I've oneshot multiple hard problems that way but like it's been a hard lesson to learn like oh like don't just solve it like think about it first and you know uh especially because it's like oh we

48:37have all this power available I can just start working on it. Um and sometimes you know the work is not um uh the same shaped the same. Um Medie what about you? How do you kind of think about this?

48:50Yeah, I think the the downside effect like I totally I joined RC and you that you know you may need less time or occasion to ask your teammate about something like it really unblock you to move forward not necessarily like with the perfect solution but then the question is like how do you actively collaborate as a human right to like not

49:16be start to be a mono project we we where we just all you know just work with our AI um agent. Uh I think that's the biggest challenge today. I haven't you know got an answer. I just you know notice kind of like the problem where I do less per programming. I do you know uh less reach out that kind of enable me

49:38to pick the brain of others because I kind of like um move most of the things by my own but that doesn't mean that others have you know valuable inputs I think that's that's the biggest challenge yeah totally totally um uh and can I can I just pick up on that one yeah I I think um something that we

50:01noticed Um, so we're a pretty small team, but we uh we take summer interns most years. And I think something that we noticed um was that because of the way that AI had shifted our like working styles, we probably weren't giving our interns enough support um in their like uh initial stages and you know they were you know probably

50:23ripping with AI tools as well. Um but you know your ability to use those

50:31outputs as someone who's quite uh junior and inexperienced um is probably somewhat lower than someone who's had more time to to use them. Um so I think uh we we had to like be a bit proactive about that and be like you know actually for um for these for these people who are you know just with us for like a few

50:52months we should be more actively making sure that they have time to ask us questions and um to like work with our more experienced engineers so that they get value out of their um their time with us. Yeah that to that totally makes sense. Um, and Evelyn, do you have anything to add in terms of how you're,

51:11you know, spending time differently, you know, now now that you have AI tools in your workflow? Uh, yeah. Uh, just to kind of follow on everyone, uh, it's a shift to reviewing code rather than writing it. Uh, I would encourage if you're junior to write your code first and then have chat gvt review it. So do the other way around otherwise uh you

51:33might not uh your learning uh ability will diminish significantly if you use AI too much. And uh that's important if you're a junior. So uh I would encourage you to uh ask AI to check your work

51:49rather than generate your work if you're a junior. Yeah, that's a great that's a great way to like turn it around and learn. You know, I've definitely found it effective, especially when I'm kind of operating in areas where I'm less familiar to say like, "Hey, ex explain explain this code to someone who is an expert SQL user." And it it will, you

52:08know, helpfully use metaphors in things that I'm more familiar with. Um, this is actually kind of a funny trick that's very helpful with me and in understanding Python. Um, but yeah, to totally totally makes sense. It's it's definitely um you want to make sure that you're learning the parts that a human can do well and the synthesis bits and

52:28you know being able to think critically where you know an LLM or you know a geni tech model can't can't do it. Um so we're coming close close to time here. I do actually want to get some time.

52:39There's been some really good chat. Um I do want to get some time for questions.

52:42If you have if you all have questions in the Q&A we're going to hop into that from here. Um, unless uh uh any of the

52:49panelists here have something really really pressing that they wanted to answer in those last questions that we had framed out, I'm happy to uh give you some time. Um, otherwise we can hop into some uh Q&A here. But I I just wanted to pick on the last topic because I found it interesting to make an analogy with

53:07the cloud era. So um before when you

53:11started you know your data um career and B data career you had to learn about network how distributed system works how SQL engine works and with the cloud you know appearing a lot of abstraction has been done for you and then when it's not working or just grasping all those components are working behind the scene you're using a T-nav is actually presto

53:34behind the scene that's been developed there which has specifically you know SQL on read schema features and so the the qu the the the analogy I wanted to make is with you know uh chatbot it's again a level of abstraction and if you don't learn the fundamental it's going to be again much much harder to uh to learn

53:55and to fix things so I do believe that it's even more you know important today's to learn those fundamentals like as we saw it before the cloud and now we Um otherwise people are really going to get lost when things is going to go sideways and it's going to go sideways, right? So yeah, totally. Yeah, I think

54:16that's a good point. Um like the fundamentals still are important because you know at some point your abstraction breaks and um when it does break uh it can be super challenging um to to fix.

54:30Um, you know, for any of you who have like deployed on like Netlefi or Verscell, um, it can definitely be a little bit of a black box and um, uh, you know, I've

54:44built some things and and put them out into the world on those sites and um, they're really really nice abstractions, but on the flip side, like when things go wrong, you you can't really peer inside it and um, or or maybe maybe the the uh, you have to get really you have to know your air messages and where to

54:59go and how to search for them and things like that. So, um, all right. So, we did have a question in the Q&A panel here.

55:05Um, I see Medi is typing an answer here. Medi, do you just want to you just want to take that um take that one verbally here?

55:13Um, yeah, I think I think Ellen might have a better answer, but I think it's mostly just getting your LM's relevant metadata, right? So I think LM can you

55:26know interpret it uh a column description and a column um that you know their their goal uh based on the data itself but I think you know if it's just boolean for example it's really hard to kind of like interpret it with just the column name um and and the value uh so you definitely need you know external you

55:51know output data I'm for Evelyn. You don't want to take that? Uh yeah, this is a good question. Um yeah, definitely like you said, uh you can't just rely on the column naming. Uh you need to make sure all your lineage is in place. Uh so uh the LLM can take into the context.

56:09The other thing I will say is that you cannot use LLM to generate description for your source data. Uh it won't work because uh like if you upload a CSV, it has the context. Uh so I would encourage you to have if you are uploading a new data set or creating a new postgress table create descriptions for those

56:31manually and then have the LM adjust everything downstream otherwise uh you're in a you're in a world of hurt.

56:38Yeah. Yeah. Totally. I I think um yeah getting the right metadata and context is so so important and um this really highlights it especially because you know uh once you have all those pieces you want to connect the dots and it needs to be right and you know uh everyone as you mentioned earlier like just more time

56:58reviewing right um we had one last question here that u medi had answered that I'll just verbally address um which is uh what have you experienced with giving nontechnical engineer or nontechnical engineers users the ability to vibe query against your company's data. Um really really interesting

57:21question. Um obviously we have a sort of biased opinion on the mother duck side which is we have some functionality around um uh

57:31well prompt prompt SQL where you can ask natural language questions and get a SQL query back or even run the query yourself. Um, you know, I think the the real thing is like a sandbox. Um, which I think is what Mie said here, like what kind of playground can you enable this in? Um, and again, I'll talk the book a

57:50little bit here, which is duct DB is an excellent way to give a sand sandbox of data to end users and have them have a fast experience. Um, is it perfect? No.

58:01It's just one way. There's a lot of ways to solve this problem. Um, but really it becomes, you know, hey, like where do you have a playground with the storage you need and the compute you need? Um, yeah, really good. Um, thanks for for answering that, Maddie. Um, all right, I think that's all we have for today. I

58:17really appreciate time from from the group here. Um, uh, panelists really thank you uh, for your time and attendees, thank you for the great questions and interaction. Um, we will follow up and and share this with everybody.

58:31um thank you so much and we're looking forward to um continuing to uh drive discussions around what it looks like to use AI and data together and and um you know what what that means for all of us.

58:43So thanks everybody.

58:48[Music]

Related Videos

" Preparing Your Data Warehouse for AI: Let Your Agents Cook" video thumbnail

2026-01-27

Preparing Your Data Warehouse for AI: Let Your Agents Cook

Jacob and Jerel from MotherDuck showcase practical ways to optimize your data warehouse for AI-powered SQL generation. Through rigorous testing with the Bird benchmark, they demonstrate that text-to-SQL accuracy can jump from 30% to 74% by enriching your database with the right metadata.

AI, ML and LLMs

SQL

MotherDuck Features

Stream

Tutorial

"The MCP Sessions - Vol 2: Supply Chain Analytics" video thumbnail

2026-01-21

The MCP Sessions - Vol 2: Supply Chain Analytics

Jacob and Alex from MotherDuck query data using the MotherDuck MCP. Watch as they analyze 180,000 rows of shipment data through conversational AI, uncovering late delivery patterns, profitability insights, and operational trends with no SQL required!

Stream

AI, ML and LLMs

MotherDuck Features

SQL

BI & Visualization

Tutorial

"No More Writing SQL for Quick Analysis" video thumbnail

0:09:18

2026-01-21

No More Writing SQL for Quick Analysis

Learn how to use the MotherDuck MCP server with Claude to analyze data using natural language—no SQL required. This text-to-SQL tutorial shows how AI data analysis works with the Model Context Protocol (MCP), letting you query databases, Parquet files on S3, and even public APIs just by asking questions in plain English.

YouTube

Tutorial

AI