Instant SQL Mode - Real Time Feedback to Make SQL Data Exploration Fly

2025/04/23Featuring:

Hamilton Ulmer shares insights from MotherDuck's Instant SQL Mode, exploring how real-time query result previews eliminate the traditional write-run-debug cycle through client-side parsing and DuckDB-WASM caching.

Transcript

0:00So, this is a talk about SQL, but really um this talk is about what it takes to make complex software that requires mastery but is fun to use and enables a flow state. Kind of like playing an instrument. First, I want to talk about recording studios. So, in my previous life, I was a musician uh and I worked

0:27with artists big and small in recording studios all around the United States and I loved it. Now, at the center of most modern recorder recording studios is this piece of software called the DAW or digital audio workstation. And the job of the DAW is to orchestrate input signals from microphones and pieces of equipment to recorded digital audio tracks. It helps

0:51you play it back and also mix it to a near final product. And working in a DAW is awesome. Every interaction you have, you hear an instant preview of the results.

1:03You can solo individual tracks. You can loop little bits of audio. You can open plugins and tweak them. And you can hear their results as you're working. This is really, really cool.

1:15It makes getting good at music production actually really fun because that feedback loop is so tight, right?

1:21It's a really magical experience if you've ever done it. So, it reminds me of this quote from John Kulkin. We shape our tools and thereafter our tools shape us. A number of years ago, I moved into the tech industry and began working on a different type of complex tool, data analysis frameworks. Um, but I haven't forgotten that feeling of using the DAW.

1:46That instant feedback loop was just really, really magical. And I've been sort of missing it, I think, in a lot of analysis tools. So, let's talk about SQL. Groovy SQL, right? Of all the ancient programming languages, SQL has had probably the most interesting history of them all. Despite nearly decades, yes, decades of criticism about the language, SQL has both endured and in fact thrived

2:11despite it all. Right? Many of us here owe our careers to SQL. And databases have also evolved and thrived in interesting ways, scaling up to meet the modern needs of the internet, mobile phones, and now this moment in AI. Yet the interaction model of SQL, it's kind of still stuck in the 70s, right? You still type your query, you

2:34hit a run button, you wait, and then you respond to what it gives back. Now, as somebody who is used to instant feedback in work, my initial impressions of the SQL were not actually very good, right?

2:49It breaks flow. This is not fun. I don't like waiting. It's really hard to debug, right? I have one tool to do it, which is write more SQL, and it does not spark joy.

3:02So, everyone knows the experience of writing a new ad hoc query from scratch, right? Draft a query, hit the run button, get an error, fix it, syntax error, um, run it,

3:14hit another error, and so on and so forth until finally you wait, okay, and get a result set. And now your analytical brain turns on. You're wondering, what's going on with this column? It's a bunch of negatives and a huge number. you realize you got the order of operations wrong. So you fix that.

3:34Wait, and finally get a result set. So this is the loop, right? Write, wait, debug. Now, writing and debugging is fun, right? That's using our active brains. It's kind of what we're here for. But waiting, that's not fun, right?

3:49That is a really passive activity. And every time we have to wait for a SQL query to run, we're context switching to something much more boring. And that'd be fine if we're writing and running queries just a few times a day, but if we're drafting a new query, we're doing it hundreds, maybe even thousands of times a day, right? Um, and yes, that

4:07weight time does compound. But I think more importantly, it's exhausting, right? We did not evolve to wait for SQL queries to run. Simply put, now a lot of database engine makers have focused on this weight part and have tried to make it as short as possible, but even a few seconds is just enough time for me to switch to Twitter.

4:28Lose my train of thought. No good. Okay, so that's drafting a query from scratch.

4:33Let's talk about inheriting a massive query with lots of CTE. So imagine your colleague just left the company. you are now in charge of maintaining this SQL query and all of a sudden one of the columns looks totally busted and you have to debug it. So you might try to do your job and read through every part of

4:51the query piece by piece until you just give up and start commenting out whole parts of the query just to isolate the CTE, right? Debugging CTE is hard. Why?

5:03Doesn't make sense. So the popular BI tool Hex actually has a whole blog post about how you shouldn't write CTE anymore. So Hex has this really cool feature where you can chain together SQL in a bunch of notebook cells. Then you can kind of edit certain parts and debug. That's great if you use Hex, but it kind of makes me sad because there's

5:22nothing wrong with our CTE. We love our CTE. The problem is it's hard to debug them. Harder to debug than it is to write them in the first place.

5:33So SQL's observability problems go deeper. Obviously, how many times have you had this situation where you have a column, it's nothing but nulls, and then you realize that complex expression that you prototyped did not work out the way you hoped. To debug this, you have to break apart this expression piece by piece to figure out if it's your logic

5:54or the underlying data. SQL seems to be one of the popular programming languages that for some reason has almost no debugging capabilities, right? The solution always seems to be to just write more SQL. So that's it for human workflows. When you throw an AI assistant into the mix, things get even worse, right? Um, you might prompt it in

6:20natural language to change something in your query and then gives you a suggestion which you can accept, then run the query and then attempt to debug whether or not it even made sense. Same problem, right? Is it the logic or the underlying data? We're basically asking a non-deterministic output to be shoved into our extremely fragile, unobservable SQL system. So, it's no wonder that

6:45people are terrified of throwing AI into SQL, right? Um, there's something wrong with this interaction model that we can't make the best of this moment in AI with the tools that we have today. Good luck with the vibe coding. So, I'm ragging on SQL, but I love it and I think we can do better than that. So, the question is how where do

7:07we even begin with this, right? So, a number of years ago, Brett Victor gave this influential talk called inventing on principle. And in it he says creators need an immediate connection to what they create. What I mean by that is when you're making something, if you make a change or you make a decision, you need to see the effect of that immediately.

7:28There can't be a delay. There can't be anything hidden. And that's great, right? Like DAWs have this down pat. And that makes sense that it does, right?

7:37Because the digital audio workstation has to model a physical system. the recording studio. Recording studios have to be near real time because music is real time, right? So here we have software that has to model a realworld environment. You don't really get that with SQL, right? What are we modeling exactly? It's something that's abstract.

7:57And when software is meant for abstract purposes, it's easy to fall in the pattern that just because there's nothing to model it off after, it's okay if it's crappy, right? I don't think that's the way it should be.

8:10So despite all of that, why hasn't anyone done this in SQL

8:18yet? So let's talk about delay. How do we fix that problem? Well, we need to start by getting sharper about what we mean. So here's my definition. I want to get a result set per keystroke in my SQL editor. That is immediate, right? That is like the lower bound of immediate.

8:37So I think to make that sharper, what I mean is I need to get a result set in basically a handful of milliseconds.

8:46So your database engine can be incredibly fast, but this requirement already rules out a ton of systems today, right? Because if I have to send my query to US East1, going to pay a network cost, no good. Breaking my latency requirement. You need a local database that's incredibly performant.

9:04It also has really expressive SQL and so yes this is a duct DB talk obviously um duct DB if you're not familiar with it it's an inprocess database engine that's super fast actually it's just fast right that's the thing that people first say about it they think it's broken because your queries run locally faster than you can even respond good

9:24example of this I don't know if you can see this but this is a really simple query it's running and returning results in like a handful of milliseconds awesome so we know that duct db can do the thing in theory that we want it to do. But most real world queries that we care about, they don't look like

9:40that, right? But they look like something more like that. Bunch of CTE and joins. And DB is really impressive at this too, right? But it's still a little too slow for keystroke fast. So we have to do something even more radical. What we need to do is find a way to rewrite this query and target a much smaller data set than the one that

10:00we're actually querying right now. So, it turns out DuckDuck DB has a solution for that as well. There's this little scalar function actually tucked away in the JSON extension that lets you use DuckDuck DB's internal parser to get an abstract syntax tree of any query. And this is super fast to generate, right? This is really cool. This means

10:22that we actually have the ability to analyze our query and then potentially rewrite it to target something much smaller. I'm a statistician by trade, so I'm always looking for ways to do more with less data. Awesome. Turns out we're going to need this anyway because DuckDB can query data sources far away. So here I'm quering an S3 bucket probably in

10:44some data center in Amazon and um this would require basically some network, right? So now I think we now have all the pieces together. We can take our original query. We could parse it and analyze it. We can generate a local temp table with some subset of data, whether it's a limit or a sample. Duct DB is

11:04good at sampling. And then from there, we can rewrite our query in every keystroke and make it super fast. This works great if you're basically editing any part of the query other than the from clause. And even then, duct DB is pretty fast, so you can make it work.

11:20So that's how you transform a big complex CTE into something that feels a little bit more like this, right? Okay. So that's it for delays.

11:29That kind of makes sense, right? What I just described. I'm hand waving a lot.

11:32If you're interested in the technical details, come to my office hours. There's a lot more than meets the eye there. But Brev Victor has this other important point. There can't be anything hidden. So what does that mean for SQL?

11:49How do we add observability to the system? So my background is in data visualization and there's this researcher that's well known in the industry by the name of Ben Schneiderman. He wrote this paper called the eyes have it. This is an academic paper and the opening looks like this. He summarizes his visual information seeking mantra and now I

12:12want us all to chant it together. I'll start and then you'll repeat. Overview first, zoom and filter, then details on demand. Okay, now you do it. Overview first, zoom and filter, then details on demand. Again, overview first, zoom and filter, then details on demand.

12:34Excellent. Okay, so now you are all worldclass data viz experts just because you've recited the mantra.

12:40So, what does this mantra actually mean for SQL? How do we enable zooming and filtering and details on demand? Well, it turns out the most reliable proxy for a human's attention on a SQL query, it's probably the cursor, right? Where they move the cursor is where they're most interested.

12:59And so it turns out you can use duct DB's parser and the as it generates along with a bare minimum amount of positional information about expressions and table references, pair it with the tokenizer, handwave, do a lot of work and actually generate this a path through the abstract syntax tree. This allows you to understand exactly where you are in the query at any time at any

13:21level of fidelity, which means that you can take actions depending on where the user's cursor is. This is actually a big deal in a lot of ways. I'm going to show you some random demos of this idea in action. So this first a little hard to see with the lighting, but this is semantic highlighting. I basically know

13:38exactly where I am in the query. So I can highlight any expression or table reference as my cursor moves. And as you can see, it's parsing the query every time I move it. It's extremely fast, right? And on the right side, you actually see that path through the a cool stuff.

13:55This is totally a hat trick. I'm just shuffling the select list in place. Uh not useful at all. I don't think you'd actually want to use this. But if you notice all this stuff above and the very little bit below, the from aggregated data stays in place. We actually have complete control over the character boundaries of the entire

14:14a. This is a little more useful. You can just pop up a menu if you're in a column expression and basically have it generate an alias for your column using an AI model. Super easy to do, but we know where we are in the query, right?

14:27So, it makes it possible. Similarly, we can pop up a little menu and transform our select statement into a CTE. Same idea, right?

14:36The sky's is the limit when you have this kind of parser powered query intelligence. It really makes the process of working with SQL that much

14:46richer. Fantastic. So those are all the roadblocks to actually satisfying Brett Victor's requirements of not having a delay, not having anything hidden. So where does that leave us? So my name is Hamilton and I'm the UI lead at Motherduck and uh today we want to show you something we've been working on.

15:12It's a new interaction model for SQL that encourages a flow

15:20state, is easy to debug, and may require mastery, but is actually fun to use. We're calling it instant SQL. So, I want to show you a few examples. And for that, I'd like to invite my colleague Jacob Matson onto the stage to be my fingers.

15:48Okay. All right. So, we are in the duct DB UI and we're going to I'm not actually sure what we're going to do, but I think we're going to query the New York City taxi data set, beloved from data around data practitioners everywhere. And Jacob's just going to start typing some column expressions.

16:07Um, what's going on as he's typing is exactly this loop where we parse the query. We analyze its dependency structure. We figure out what to cache to render a preview of wherever it is.

16:18We do that in the background and then once it's cached, we rewrite the query, creating a really hot loop that's less than 50 milliseconds. So, what you get out of this is actually a pretty novel interaction pattern, right? It's querying as you type, which is really, really cool. So, oh, we already have a bunch of columns. Great. So, um, I see this fair

16:38amount divided by trip distance as cost per mile. Now, Jacob's Canadian, so I'm going to ask him to actually make a cost per kilometer column instead.

16:48So, I

17:02type. Wonderful. Okay. So, we've got two columns. cost per mile and cost per kilometer. This is excellent. So, Jacob, can you click on the cost per mile column? So, what just happened actually is he decomposed this column expression to reveal the next level of children in the expression tree. So, part of how we're rewriting with instance SQL is

17:23we're taking the query and we're actually not just returning result set, we're actually turning a list of expression trees. So we have full observability into all of the data all the way through all of the expressions.

17:36This makes debugging really easy. So can we do the same with the cost per kilometer column and then do the next one? Trip distance divided by Yeah.

17:43Wonderful. Okay. So we see this trip distance column and then we see a constant. Right now Jacob, can you just like mess with the numbers here? So as you type the changes to your query propagate completely through the tree, right? It's really really cool.

17:58Essentially, we've transformed Excel SQL

18:02queries into a gigantic Excel function that outputs a list of trees. And then we're doing something in the UI to make it easy to explore. So awesome. We're on we're on

18:14the right track, right? We now have a way to zoom in, filter, and get some details on demand. Good stuff. So like any good little data engineers, we're going to wrap this query in a CTE and then actually aggregate, do something more interesting with it.

18:36All right. So, what are we going to do? What's good? I think can we do something with cost per mile? Yeah.

18:46Excellent. Oh, good idea. Pickup day and cost per mile. I like that. Okay. So, Jacob is Jacob's going to find the average cost per mile because he's an accountant. And I'm gonna ask him to actually do the median cost per mile as well because I'm a statistician. You should always use medians. And then we're gonna group by

19:06all wonderful stuff. Okay. And this is why you use the median, right? Average cost infinity and not a number. Those are duct values.

19:16Um, great. So we need to fix this, right? So Jacob is going to click back into the CTE. The result set changes.

19:23That's because we can render a preview of any CTE or select node in the query in real time. So now let's debug that.

19:29Let's start by just figuring out do we have a divide by zero problem? Probably.

19:33Yeah, there we go. See the trip distance column is got some zeros in it. So let's filter those out so Jacob can have his nice average. There we go. How nice. So as you can see the changes that we just made in the CTE propagate to all of its children really, really quickly. It's a very cool way to basically debug the

19:50flow of your data and also obviously his average cost is skewed as you can see compared to the median. It's another statistition joke. Okay.

20:00Um so I think at this point maybe we'll write another CT another CT. Okay, great. Um we're going to write like a classifier CTE that um basically just figures out whether or not we're on a weekday or a weekend.

20:13Something like that. Yeah. Let's see.

20:27Select case statements take so long to write.

20:36All right. Oh gosh, there we go. Nice. Can we actually break that apart? Can you click on daytime? Uh yeah, right. I think you can go right here. Yeah, there we go.

20:46Okay, nice. I think we can click one more download. Oh, very nice. Yeah, very, very cool. Okay, so now we have Can we scroll down to the act the final select node? Cool. Okay, so we've got this pickup date. We're going to I think we need to replace the trip metrics with date classifier and use that. Oh, we

21:01need a star there, too.

21:06Excellent. Excellent. And then we have day type. Awesome. Okay. All right. Well, you get the idea. You can kind of keep going. We haven't hit the run query button once in prototyping this, right?

21:18We're just using a subsamp sample of the data to actually get the modeling right.

21:22It's really really cool. Now, um, at any point you can still run your query because running queries has value, right? The problem isn't that running queries aren't useful, it's just that we have to do it too many times and it's a real drag. So, at any point you can run the query. Did you do that? Excellent.

21:36Thank you. And get an actual result set with the full data set. Now, this one took 607 milliseconds, which is really fast, right? Duct DB is very fast. It's still not keystroke fast for this use case.

21:50Great. So, I think um so we're going to show you something else now. Um let's um let's query some other data sources. So, that was a duct DB table, right? And that's on my local file system. It's super fast. Duct DB is really well optimized for this sort of thing. But we want to show you that um

22:09instance SQL really works for almost any data source that duct DB can query. So we're going to query a parquet file that's on the local file system. Piece of cake. As soon as it's cast, you're basically modeling it the same way as if it were a a duct DB table. Super easy.

22:25So this isn't like the most exciting query. Let's actually do something different. So let's query Postgress, right? Duct DB can query data lakes. It can query S3 files in parquet. It can query tons of external data sources. So I want to query Postgress. And to appease the demo gods, we've actually got Docker running on this computer.

22:44Otherwise, we do it live. Full screen. Right. There we go.

22:55Wait, wait, wait, wait. Yeah. Oh, yep. Got to switch in to SQL. Excellent. And that's it. This is actually quering Postgress. And this is really cool, right? As long as DUTDB can query it, you can actually explore it and model it in real time with instance

23:16SQL. So I think at this point we're going to show you one more demo actually. Um so we're going to switch to the mother duck UI for this one. Um fantastic. So I mentioned

23:28earlier that AI assistants with SQL are terrifying. makes every data practitioner extremely stressed when an AI model hallucinates a comma in the wrong place. Um, so now we're going to show you something kind of cool uh and use an AI assistant to basically prompt this with natural language. So let's take a look. We've got this complaint type column. There is a dash and a bunch

23:50of text in it. And we want to filter out everything after the dash. So right, so let's remove the text after the dash. And we'll trim the white. Yep. Trim the white space. and we'll give it a nice name.

24:03Okay, let's see what it does. Cool. Notice in the results set actually applied that suggestion in real time, right? So, this is super cool. This basically means that you can figure out whether or not the suggestion that you just got is doing the thing that you care about. You can validate it yourself before you even hit apply. It's a really

24:21cool feature. Basically, every AI powered suggestion feature is about to get really powerful and really fun with instant SQL.

24:29Cool. Thank you, Jacob. Okay. I'll let you switch back.

24:41Sure. Fantastic. So that last demo um if

24:45you were paying attention that was we were demoing that in production in Motherduck. Um that's because today we're actually announcing that instance equals available for all DuckDB UI users and anyone using Motherduck as well.

25:08So I want to go back to this quote and how we got to this place and how we got to instant SQL. So um in the late60s um this piece of technology called the multi-track recorder became viable for recording studios. This was like an amazing productivity enhancer in music studios, right? You could track different input signals separately and

25:33mix them differently. You could track certain musicians the next day. It's really awesome for recording sound. But artists and producers also saw the creative um possibilities with this as well, right? And then began experimenting with tape machines in really new and novel ways, doing things that could not be done in a recording studio designed just to capture the live sound of a bunch of

25:59performing musicians. And all that experimentation that happened in the 70s and 80s completely changed the nature of music, right? Um it's easy for us to take that for granted because almost all the music we listen to now owes its creative debt to that experimentation, right? It's just part of the fabric of music. Reminds me of this quote, right?

26:19We shape our tools and thereafter our tools shape us. I think I think that's what that quote is about. Now, back to data analysis. I don't know what comes after something like instance SQL, but I hope you go and try it and tell us what you think. And maybe you can also push out the boundaries. We have no idea what people

26:39are going to use it for, but we're really excited. So, thank you. [Applause]

FAQS

What is Instant SQL mode in MotherDuck and DuckDB UI?

Instant SQL is an interaction model for SQL that gives you real-time feedback as you type, removing the traditional write-wait-debug cycle. It parses your query on every keystroke, caches a subsample of your data locally, and rewrites the query to run against that sample in under 50 milliseconds. You see live result previews without ever pressing a run button, which makes SQL data exploration feel much more fluid.

How does Instant SQL help debug complex SQL queries with CTEs?

Instant SQL lets you click into any CTE or select node in your query and instantly see a preview of its result set. When you make changes in a CTE, those changes propagate to all downstream queries in real time. You can also click on column expressions to decompose them and reveal the expression tree, making it easy to trace data flow and find where issues like divide-by-zero errors or unexpected nulls originate.

Why is DuckDB essential for making Instant SQL work?

Instant SQL requires a database engine that can return results in a handful of milliseconds per keystroke. DuckDB meets this requirement because it runs locally as an in-process database, eliminating network latency to remote servers. It also has a built-in SQL parser that generates an abstract syntax tree extremely quickly, letting the system analyze, rewrite, and execute queries against subsampled data at keystroke speed.

How does Instant SQL improve AI-assisted SQL workflows?

With traditional SQL, AI suggestions are hard to validate because you must accept the change, run the query, and then check whether it worked correctly. Instant SQL applies AI suggestions in real time, so you can see the actual result of a suggested change before you accept it. This makes AI-powered SQL features more useful and trustworthy, since you can visually confirm the suggestion does what you expect.