AI is relearning everything databases already knew

June 26, 202647:44

Hosted by Mehdi Ouazza, Jacob Matson · With Stephanie Wang (Staff Engineer, MongoDB (ex-MotherDuck))

AI is moving inside the database, and it's hitting the same problems databases already solved. Stephanie Wang (staff engineer at MongoDB, ex-MotherDuck founding engineer) joins Mehdi and Jacob to unpack Snowflake's Cortex AISQL, why AI inside SQL is really a query-optimization and cost problem, the compute crunch pushing RAM and inference prices up, multi-agent systems and model routing, and going slow to go fast when you build with AI.

Chapters
  • 0:00Intro and guest intro (Stephanie Wang)
  • 1:35AI inside SQL: Snowflake's Cortex AISQL
  • 10:14SpaceX and AI: new business models for idle compute
  • 12:11Why efficient compute is the new bottleneck
  • 13:43DRAM and storage price hikes, and database margins
  • 18:40Local AI models and sandboxing databases
  • 22:39RadAgents: multi-agent reasoning for medical imaging
  • 25:29Domain-specific vs general models
  • 27:46Sakana Fugu and dynamic model routing
  • 30:12Open weights vs open development
  • 32:47Designing effective AI workflows
  • 33:43The elephant and goldfish model
  • 38:08Formal specs and keeping AI projects cohesive
  • 42:07MPP: machine-to-machine payments
  • 46:56Wrap-up
All show notes unlocked

$catnotes

Show notes

Mehdi and Jacob Matson are joined by Stephanie Wang, a staff engineer on data infrastructure and storage at MongoDB and a founding engineer at MotherDuck, for a systems-first look at where AI and data are colliding.

The through-line: AI is moving inside the database, and as it does, it keeps running into problems databases already solved. It starts with Snowflake's Cortex AISQL paper, which makes AI operations native to SQL. Stephanie's key point is that the interesting part is not the new syntax, it's that an AI predicate is a black-box, expensive UDF, so "AI inside SQL becomes a query optimization problem." The optimizer now has to reason about how selective and how costly an LLM call is, and lean on adaptive model cascades (cheap models for the easy rows, escalate only the hard ones).

From there the compute thread: SpaceX renting out idle data-center capacity for inference, DRAM and storage prices spiking (Jacob's RAM tripled between buying it and gifting it), and Stephanie's framing that "compute is becoming the new database engine layer for AI." Databases already know memory pressure, caching, and data movement, so AI is relearning those lessons. Then local models and the push toward sandboxed, isolated execution for agents.

The back half is about building with AI well: RadAgents (a multi-agent radiology workflow with an orchestrator and synthesizer), Sakana's Fugu model router, open weights vs open development, and the "elephant and goldfish" method: use a long-context session (the elephant) to hammer out a design doc, then hand it to a fresh, memory-less session (the goldfish) to check whether the design actually stands on its own. Stephanie makes the case for formal specs (TLA+) and "go slow to go fast," because a vague design now produces a lot of wrong code very quickly. We close on MPP, a machine-to-machine payments protocol for agents (and whether it's just crypto rebranded).

Key takeaways

  • AI inside SQL is a query-optimization problem, not just new syntax. AI predicates are expensive black-box UDFs, and a single one can be orders of magnitude more costly than a normal predicate, so the optimizer has to price and plan around it.
  • Adaptive model cascades matter. Use small, cheap models for the obvious rows and only escalate the hard cases to the big models. An AI filter in SQL is a natural place to control which model runs.
  • Compute is the new database engine layer for AI. The wins are in scheduling, batching, memory bandwidth, caching, and cost control, not just picking a model.
  • AI is relearning what databases already knew. Memory pressure, data layout, and data movement are old database problems now hitting AI infrastructure, on top of real DRAM, storage, and electricity price hikes.
  • Go slow to go fast. A vague design makes AI generate a lot of wrong code that still compiles and passes tests. The "elephant and goldfish" method and formal specs (TLA+) force the design to be explicit before you write code.
  • The best way to use a cheaper model is to be the domain expert. Your context and prompts get a small model on the rails; a smart model just brute-forces missing context with a lot of tokens.

Transcript

0:01Mehdi: to another episode of Explain Analyze where we basically re watch and re rint all the news around data and AI with data experts. I have today with me Jacob Matson. Jacob, how are you doing?

0:17Jacob: I'm doing great. I'm doing great Mehdi. Nice and cool here in Seattle by the way. It's sixty degrees.

0:21Mehdi: Yes, okay, sixty. So here it's around ninety eight Fahrenheit, something like this in my room right now. so if you don't know him, Jacob, of course, Devrell at Mother Duck in if you haven't watched any content of Mododucts, then maybe you don't know him, but yeah, it's been a long time around webinars and podcasts. And we have of course a special guest today, which also used to work at Mother Duck, Steph, Stephanie Wong.

0:25Jacob: man, it is hot.

0:50Mehdi: Saf, how are you doing?

0:51Stephanie: Hi, I'm doing well. Thanks for having me.

0:54Mehdi: so Steph, you are where are you working right now?

0:57Stephanie: so currently I'm a staff engineer working on data infrastructure and storage systems at MongoDB. But as you said, I was a founding engineer at Mother Dogs. I had a great time at Mother Dog, so so happy to be back.

1:12Mehdi: Yeah, so it's great to have your perspective now that you went to another database company. and we'll get to that. But so you brought do you wanna start on the links? You brought one around actually another yet database company paper, which is core text from I believe like I see all the names, Snowflake. So so why did you pick up this link? What what is resonated with you there?

1:35Stephanie: Yeah. Yeah, so actually to prepare for this podcast, I shared three links. not because they're the most three recent links I looked at, but but I do think that there is some kind of connection amongst the three links. And I usually think about AI from a more infrastructure and system perspective, probably because of my background, which is mostly in databases, distributed systems. And analytical data platforms. So I think this first paper is also kind of along those lines. So may maybe I can explain quickly what the this paper is around. So basically the paper is, I think, going to be presented at Sigmold later this year in September. but it's Cortex AI SQL. So this is invented by the Snowflake's team. And you know, essentially the core idea here is Snowflake is trying to make AI operations native to SQL. So instead of exporting data out of warehouses, you know, writing scripts or calling LLMs APIs kind of row by row. On the on your data and then loading your results back again, they just expose AI functions directly inside SQL. So you can just write, you know, queries that do things like an example they've provided is classify the customer feedback, right? Filter rows where the customer sounds frustrated, summarize support tickets by product, and then join a transcript to a product catalog. based on some type of semantic reasoning, meaning rather than just string equality. So I think this is a very cool idea and is also really compelling for user experience because it allows people to combine structured and unstructured data in one declarative SQL query, right? But I think the most interesting part though about this paper is not just the syntax change, right? The part is now AI is inside SQL and it's become a query optimization problem. I think that's really interesting.

4:01Mehdi: Hmm. Yeah. So I think Jacob ticked when you say, you know, context context layer because he's been he's been busy on that. Have you have you been aware on on this paper, Jacob? Do you have a a take on that? On how like this kind of like interface that that you put over there to provide any context on specific queries?

4:10Jacob: Mm-hmm. I mean, I think this the architecture broadly, I think, makes a lot of sense, right? Which is that like we got really far kind of in the modern data era by decoupling all of these pieces and then like integrating them, right? And choosing best of breed. And you know, the pendulum s so so swings back the pendulum on bundling versus unbundling. And so you know, we're bundling these things. I think it makes a lot of sense. I think like, you know, honestly. especially for like inference at scale, like where you you need to use a database like Snowflake, I think you know, it makes a lot of sense to put this somewhere where you're not going to be experiencing maybe the rate limits that you would on like a REST API, right? Where like you can't just hammer that public endpoint with like a million rows at once. so how do you how do you think about, you know, I I think I think the specific implementation makes a ton of sense. really cool to see that this is how they're pushing it. you know, obviously Mother Duck has had alternative implementations of this where we're using you know, we're using third party L providers instead of doing it first party, but

5:36Mehdi: Yeah. And we do have a prompt function similar, but there is no

5:39Jacob: Yeah, I w I think we had it first actually, so you know.

5:42Mehdi: Yeah. I think I I do think for like to give back also to the OGs like BigQuery I had stuff like in you know to to run ML pipeline like before, you know, LLM era where you could do within the SQL syntax. So here for the listener, basically the syntax I'm displaying is that you have a select transcript ID, sales agent from sales transcript, and then aware, and then there is a AI filter you know, custom SQL function nested with a prompt and then a question is this sales transcript does the customer become irritated, like Stephanie explaining in the example. my concern here is that aside from what's building on the other side, Cortex with the API service and so on, is that we yet spends another SQL dialect over there that doesn't seem to be a lot of standard around this. Do you have any concern on that, Stephanie? I mean specifically I'm curious. You're looking at at SQL and MongoDB has been a known, you know, for four no SQL interface. You do have SQL interface as far as I know. Maybe you can give me up to date around those topics and how is it challenging basically to to build new SQL syntax that makes sure It's an off standard, yeah.

7:06Stephanie: So I think there are two parts to it. So one part is the syntax part, right? So they introduced a new SQL syntax to support it. but I think on the other hand, what's even more fundamental here is that as they pointed out, you know, these are AI operators under the hood. And AI operators essentially are expensive UDFs where they can be, right? So you know, in a normal database, traditional database, the database optimizer usually Would reason about predicates, joints, indexes, statistics, you know, and cost, right? But for AI predicate, that's slightly different. And the database doesn't just naturally know how selective that predicate really is, right? So if AI comes in and runs some LLM query, maybe 1% of the roles match, or maybe 80% of the roles match. So it doesn't really know how expensive each Of those evocation can be, especially if it goes to a large model. And it can really drive up the the cost over there. So the argument here is that you get this black box operator that's sitting in your query plan, right? and it may not just be slightly more expensive than a normal predicate. it can be orders of mac magnitude more expensive. So the argument here is really to change the optimizer's job. to you know take that into account. so you know thinking from from that perspective I think this really goes beyond syntax change.

8:41Mehdi: Yeah, yeah, no, that's true. I think it's because today we are mostly interacting. So my question behind is like who is gonna write this? Is it a human or is it it's yet another agent, right? What's what's your what's your thinking on that?

8:55Stephanie: so I think it could be either. You know, I don't think this has to be written by a human. it s most certainly can be written by an agent. in fact, agents might, you know, write more sequels or more verbose sequels or generate just more volumes of it. And it's potentially more dangerous for for those type of queries if you don't have this type of handling within your you know query planning, essentially. Right. So I think yeah, it's really a way to help with help control, cost control. And I think another, you know, thing is that what they mentioned is it's helpful to use adaptive model cascades, which means you don't always need to use the most expensive models every role, right? You can potentially use way smaller and cheaper models for certain obvious cases and only escalate complex things for you know to the bigger models, right? So that could be helpful there as well.

9:57Mehdi: Yeah, this is something I didn't think about is that yeah, if you had another injury point for an AI agent like with this AI filter, then you can balance what kind of models you are using there instead of like shooting you know, Opus four dot eight or even Fabble for just a select star limits five query. all right. next. I think this one is yours. I believe. Jacob?

10:26Jacob: yes. This is where we absolutely. Okay, so this is about SpaceX, which has been in the news for a couple reasons. the first one is obviously they went public. congratulations to them. the second one is as it turns out they're making a lot of money on things that are that seem not core to the mission, right? of which number one here is this is an article about open source AI startup reflection. Signing a deal to serve inference on SpaceX owned data centers. that's worth what is that? Eight six six billion dollars. first of all, that's a lot of money. and they've signed a lot of these deals. I kind of called this out because I think it's interesting. Like a lot of people have made pretty big bets around AI from a business perspective. And what we've seen SpaceX do is make a big bet on hardware. They also made a big bet on AI, you know, on AI with with X with what was it, X AI? I can't remember what their model is, or Grok, which is not going well, and and the XAI piece. and so they had a lot of idle hardware that I think they were intending to use for their own inference that now they're like, you know what? Let's we can use it. Other people can use this. So I just think I think it I think it's interesting because it's it's showing that like the only winners are not just or no, I I think I think for a while we were kind of feeling like the only winners are, you know, Open AI and anthropic. but there's other ways. And like we we will see if this will hold longer term, right? Like is this a durable advantage? Like is it just that SpaceX is really great at like building and deploying data centers at scale? I have no idea, right? Is that a durable advantage for them? But certainly certainly it's interesting to see that like lots of big companies are signing lots of big deals with SpaceX for compute.

12:11Mehdi: What it what is your how do you think about compute these days, staff, on your challenge for for your for your database? Because we just talk about, you know, the the query plan and how we optimize, you know, on the database level, but if we go a layer down, is that like what are the the challenge that you you see in that context?

12:32Stephanie: You know, I I think that compute is not just, you know, the the more the merrier, right? I think The focus really should be how efficiently we can, you know, turn expensive hardware into you know kind of effective low latency work leveraging you know that compute, right? so I definitely think that compute is becoming kind of the new database engine layer for AI almost, right? people love talking about models for sure, but we do need the ability. to execute workloads efficiently, right? And that, you know, goes beyond just the model part. It means scheduling, batching, you know, memory bandwidth, networking, storage locality, caching, utilization, right? And on top of that, knowing your cost and controlling the cost, right? So I think that the increased demand is also going to make more infrastructure level and system thinking important.

13:43Mehdi: Yeah, and actually have a good sideway on this on one it's everything is is connected today. And just for as a reminder, we didn't read each other's links, right? So everybody's bringing links. So we apparently we just speak about hardware today and database. so this was from the register early this month. Expect more of those DRAM price hikes as memory shortage continue to bite. And for me, how it was a revelator is that there is a prime day going on on a you know on Amazon and I was like, Okay, let me purchase actually just NVMe I think it's not just D RAM, it's also storage. I wanted to purchase another NVMe storage of like four terabytes and I realized like the same one is like you know forty percent more expensive, like roughly for the same like maybe you know there is more bandwidth and so on, but I was like, what the hell is it happening? Like it's supposed to be a discount days. And tele watch a bit this and they said that we should expect another sixty three percent of increase this quarter. And I'm curious because we I think at Mother Dog I haven't think like to be honest too much about because there is so much things to think when you have a database, right? And now it's like, but wait, how are you gonna deal with your margin? So you were talking about efficiency stuff, you know, around your database. Is there like specific projects you can call out or technical thing that you made to basically reduce those costs as the hardware expenses, you know, skyrock rocketing here?

15:28Stephanie: Yeah, I you know, to be honest, I am not surprised about this article at all. I guess none of us really, you know, was surprised about this because we we're in databases and we we know that right memory pressure is something that is not new to database systems, right? so the cache behavior, the data layout, the you know, data movement, all of that stuff, right? And this just feels like okay, AI is just l learning the same lesson that is what database knew about, right? so I think it's really about, you know, building software that can work efficiently with your hardware constraint and knowing you know what your limits are and making sure that things are bounded, right? so you don't run into unexpected situations. So you can plan ahead sort of.

16:26Mehdi: Jacob, what's your your feeling there? have you done some pragmatic thing to have your your worklog bounded or you're not really looking at the moment?

16:34Jacob: I mean, you know, it it's so funny. Like most of the stuff that I work on is not running into those types of problems just in general. Like but I do think it is interesting kind of framing this w within the context of like DuckDB, right? Which like if you think about why it exists and like how it came to be, is like it's totally just an arbitrage of the fact that we had a bunch of idle hardware. And now that we are in the situation of we have more work to do on our computers, right, than we have hardware available, which is new. I think like that was not a that was not something that existed five years ago, right? It was like, all right, we've got all this idle compute, great. Like let's we don't really have anything to do with it. Now it's like all idle compute goes to AI labs. And I think you know, so it's really interesting to think about well, I I don't know what the implication is from a DuckDB perspective or just like a database perspective. But certainly it means that everything gets more expensive, right? Because we got to take advantage of the fact that RAM was cheap. I'll tell a funny story about this, which is last summer, I was building building a mini PC and I bought some RAM and it didn't fit and I was like, this is annoying. Like I have I have the RAM. I had 96 gigs of RAM on two sticks that I was like, this is annoying. And then I I just had it. And then my daughter was like, she's like, I would like a laptop for For my birthday, so I built her one. And we put that RAM in there. But but in the time between her birth when I bought that and her birthday, I think the price tripled on that RAM. And so now she Yeah, now she now she has a framework laptop with 96 gigs of RAM in it. And she's, you know, watching YouTube videos or whatever.

18:06Mehdi: So it's an expensive laptop now. Yeah.

18:18Stephanie: And honestly to that point it's not just RAM that's getting more expensive, right? Even electricity itself is getting more expensive. I live in New York City and it feels like the electricity price is hiking, you know, every few months or so, because of the data centers and all these new needs created by AI.

18:40Mehdi: Yeah. Okay. right. So this is another link. It is actually not a promotion for me, but it's just that they had the interesting comments. So I've I related to to the cost, I was this is another angle is that said that AI labs will stop selling you cloud access. So that was my poll prediction for next year, and they will provide some kind of license with a model that you can run on your side. you know that's maybe you know less powerful but like can do other everyday ta everyday task. and the two reasons was I I was mentioning was like how open source model get really good that it starts to be good enough and we s the hardware that we have under our desk is is getting better. We mentioned DuckDB do you do you think other database I I'm I'm curious Steph to hear your take on that and If you see database because we we're talking about local AI, we know DuckDB that's can run local. Do you do you think we can expect other database that's run more more on local in combination with you know a server approach?

19:51Stephanie: So I think that there is definitely a more push towards that, but more like running databases in a containerized environment, in an isolated environment, so it can run different experiments and make sure it doesn't affect other yes, exactly sandboxing. I think it's I've seen a lot more of that, not just within databases but also in AI infrastructure. You know, there are lots of companies who are you know now building different types of sandboxes, you know, of different types of isolation level. Some of them even provide all the way down to like the physical hardware and, you know, it can reach out to, you know, even credit card or financial services and make sure everything is absolutely sandboxed. so definitely seen a lot of that and I think it's there's a real need for for that just because you know we we need to be able to put control AI behavior and run lots of fast iteration, you know, using AI. so yeah.

20:54Mehdi: Yeah, that that's a good good take. I haven't I mean we I I've been thinking but I I didn't know that like, yeah, labs are gonna or services is gonna need to sandbox stuff that used to be only server side. Jacob, do you do you see that also happen with other service or you've seen or you're not really believing in that trend?

21:16Jacob: I don't actually I don't know if I can really speak to it. I think that the the interesting I really hope we do get a local model like you talk about here that we can run on our laptops to to help us with some of this stuff. you know, mo mostly because then we get we can we unlock all of the kind of smaller smaller pieces of compute, including including sandboxes too. But you know, I I'm So far I haven't really seen anything that's that's close yet at like reasonable token number of tokens a second. cause I think the the other the other flip side of this is like once you figure out how to run something and this is this is like the danger of not not danger, but like the the loop on AI that's really interesting is like once you figure out how to like run something in a harness, you just keep zooming zooming out, right? It's like, well now I need a sandbox for that one, and then I'll just have like an orchestrator that runs all of that. And then I'll have an orchestrator for that orchestrator. I'll just like give business goals to that and it's gonna cascade it down to everything else. And so in general, you end up needing way more tokens per second than you would expect because once you build an autonomous system, you you can just start automating everything. And so I think that that explains some of the demand drive. But on the flip side, you know, as a human user, I'm really, really h hoping that we do get local models that are powerful enough to, you know, help us with tasks on our machines.

22:39Mehdi: Right, next link we have what did you bring? A rad agent staff. Can you talk us through through that? From Oracle. Interesting. I didn't know they were still in the game.

22:51Stephanie: yes. yeah, this is yeah, it this is a very interesting paper. I saw this at the first ACM conference for agentic systems recently. that it was in in May. so this is actually a paper about using multiple agents for chest x ray interpretation. So obviously I'm not a radiologist, so you know I cannot evaluate on the medical, you know, front of this. But as an infrastructure engineer, I do think the architecture proposed here is very interesting. So basically the key idea from this paper is that they use different agents that are structured around a real expert's workflow. Right, so they kind of have five different domain-specific agents that are doing very domain-specific things in terms of the medical you know sense. And then they also have a orchestrator that kind of decides what needs to be done, and a synthesizer that combines the outputs, checks the context, and then resolve conflicts amongst these agents. Right. so The in the interesting thing is it also logs intermediate artifacts, you know, as these agents run. So basically, rather than using a single model that kind of just goes end to end and be like be like, okay, let me perform these steps and here's the answer, and you know, can trust me on the answer, you actually do get the traces of what was inspected. you know, what measurements were taken, what tools were called during the process, and then how the final answer was assembled. So I think that's a nice thing that we can learn from as we build other types of, you know, systems.

24:36Mehdi: Yeah. It is like I I think what I like here is that today we often think about solving a problem with like one large language model and not, you know, specific domain experts. And I'm not sure how we can h I feel it's not in the in the incentive of big AI labs to have us like have more dedicated domain or maybe they will do B to B you know, contracts with like, okay, let's I don't know, build a specific domain for a specific large you know, company. But yeah, I don't know for me on a day to day, I would love to have like I I I I'm I'm curious how I could apply this for like smaller tax today. I don't know if you you had some inspiration there, Jacob, but like to have more smaller model for sm small stuff or maybe narrow narrow thing like I do it indirectly indirectly with image and so on. I'm not using cloud for image for example. But what's what's your take on that, Jacob?

25:38Jacob: Yeah, I mean I think, you know, so I've doing a bunch of eval work as it relates to kind of like text to sequel, kind of if you think about it that way. But more just like can we get answers you know from a natural language question and SQL is just the you know, the substrate for that. so I've been doing a lot of work, you know, in that notion, and I've definitely found that y for specific tasks you You may want to train a model to be excellent at that task. Right. And what I mean when I say train, I really I that's that's the wrong word. Train is wrong word. But what I really mean is tune your prompts so that that specific low cost agent, you know, has all of the context it needs to be really effective at that job. and I think the hard part about this, honestly, is that being a like the first thing you need to bring to it is being a domain expert. like the domain expertise is actually super important in honing the model. Right. Like a lot of these models, like the the best way to use a a dumber model is to be a domain expert in the domain that you're in. because that means you can really get it on the rails and you can use your expertise to add the context that it's missing. Whereas with a really smart model, it just overcomes the fact that it doesn't have context by just being really smart. And like that's fine, but it's also really expensive. It uses a ton of tokens. and and I think, you know, using a hybrid approach. is clearly effective. Actually we have we have a related article to this, which is one I submitted. So we'll we can talk more about it then. But I think that absolutely this type of framework, you know, is proving effective not only in the academic or in this case like healthcare environment, but also, you know, in in business too and in academia too. So it's really cool.

27:29Mehdi: Did you did you submit it for this session or is did you mean

27:32Jacob: It is. It's the Sakanafugu article.

27:34Mehdi: okay. Tell us tell us all about the way I think it's a good sideway.

27:38Jacob: Sure, perfect. so this is a AI lab that came that is in Japan, I believe. and they just dropped this model. yeah, they just dropped this model.

27:46Mehdi: I love the animation. If if if you're listening, you need to come on YouTube just to watch this animation. It's pretty it's pretty beautiful.

27:53Jacob: So the website is Sakana.ai, S-A-K-A-N-A.ai. And they just launched this model. and they basically I don't know which who who like what they trained on or fine-tuned here, but they just said, what if we use a model to orchestrate multiple models? So they basically built their own model router as the entry point into the model. and I haven't used it yet. I'm excited to try it out and do a little work on it, but I think. This is the this is the packaged version of the paper. or or part of that, right? is is this model from Sakana. And so it's been really cool to see. Supposedly it gets, you know, really good scores on the benchmarks. You know, who knows how much of that is benchmaxing versus real performance. seems like a really cool approach. It makes sense. I think this is the I honestly think like I remember I think it was like 18 months ago, Deep Seek, launched reasoning, right? Which was like everyone everyone's like, we just need this. Everyone implemented that immediately. and I think this is probably the next thing that we're going to see all the labs put into place, which is like dynamic model routing to a single endpoint and figure out how to make this work and make the caching work and all the things you need to do to actually run this at scale. Right. so so I think this is this is really cool. This is available now. I I need to play with it on open router. I think it's pretty affordable. well affordable relative to opus. But anyways.

29:21Stephanie: This is very interesting and I I can't believe basically it's along the same idea as the RAT agents paper. yeah, basically it it looks like it's something that's sort of like a general purpose orchestrator, right? that kind of is responsible for routing your your prompt to specific models for for the job. And then it would verify intermediate outputs and then also synthesize the final final answer, kind of similar to what the RAT agents are doing. I think it's most definitely a very interesting development angle moving forward, especially given that this type of orchestrated framework can produce even better results and potentially, you know, cut down the cost. you know, so yeah.

30:12Mehdi: I'm just I'm just like a bit pessimistic about big AI labs wanted to own everything. Like it's always the same story, right? It's like code Cloud Code, for example, was really open in a sense you could use API token and it changed the pricing multiple time. because they want basically people to stay in their environments. And the reality here is that we have you know different models that do They are good at different things, right? So yeah, I'm I'm I'm a bit pessimistic on like their their they have no incentive to do that. Maybe they're gonna be forced to kind of open. But yeah, what what's your what's your take on that, Stefan? I I'm curious also on on your daily work, do you actually use also various, you know, AI model or you mostly stick to to one large?

31:03Stephanie: Yeah, so first is on open models and open development. So I think there are different levels of openness when it comes to AI model as well, right? So the the the the way we describe open source models today is mostly open weights, but we're really not seeing, you know, the more like how the data is trained and How inference is done, like all of that stuff. To actually come up with a model, you have to go through a lot of different steps and you end up making lots of mistakes, right? But I do think that it's somewhat analogous to open source projects and you know, just in general, how software evolved in the last you know few decades. I'm hoping that AI models will follow a similar trend where open development. is going to drive more innovation, you know, in this in this domain. And in terms of domain-specific models versus more general purpose ones, I think it's still unclear, you know, which one is end gonna end up winning, you know, but I think that probably for different use cases you have a need for different types of models. So you must we might just still see you know the the kind of plethora of different types of models in the end. And day to day you know, I use different types of models as well depending on the task, right? If it's something straightforward, I just use cloud sonnet models. a lot of my performance analysis, for example, I don't even use the more advanced models because I don't feel the need to. The model kind of does, you know, relatively basic things, you know, so you've already come up with like the knowledge it needs to perform its task. So it reduces the need to use a super complex model. And you know, half the time you can't even just rely on the model to figure it out anyway, right? You're you still have to really provide it with kind of the the the design and you know the the the key ideas and the context for it to perform well. Actually this segments really segues well into the paper, my my last paper, or the Google research one regarding elephant and goldfish. Look at us. It's just a smooth transition. Yeah, so so this is something that recently came out of Google research. So, you know, essentially

33:28Mehdi: Okay. Do you wanna you wanna talk about it?

33:43Stephanie: this is an article talking about elephant and goldfish model for working with AI. And the key idea is that you know AI can generate code very quickly, but that's not automatically good, no matter what kind of model you're using. So, in fact, you know, really fast code generation can make things even worse if the design itself is vague. so the idea here is the elephant is the long context AI session and you use that session to explore the problem, debate with the AI, ask questions, right? And figure out all of the assumptions, edge cases, trade-offs, etc. And then you produce a detailed design document at the end before writing code. And then the goldfish is a brand new AI session with no memory whatsoever. And then you give the goldfish only this design document and then you ask Can you understand what we're trying to build? So then if the goldfish is unable to understand the design from the document alone, that means the design is just not good enough. So you have to go back and revise your design before jumping into coding. I think this is also intuitive because I think this is how human engineering teams work today, anyway, right? So we write design documents and we share with other engineers, and the design document is good. Only if someone who is not there in every meeting talking with you can still understand the system, you know, the constraints, the invariance and you know, the rollout plan, disaster recovery, etc., right? so I think this is just a nice way, that's summarized, you know, around how to work effectively with AI.

35:34Mehdi: Yeah, I actually I've used also, you know, GitHub specs kit kits, right? Where you basically they force you to to build this s heavy design document and then you do some, you know, some tasks by one. But I feel like there is no clear segregation as you mentioned with the goldfish and elephants, like basically it's just one single big model. Jacob, do you I have you done such a workflow like with like clear news session and I mean cloud now supports sub agents, but I think always it's what's annoying me with like such harness, I understand, is to make it easier for the user, but then you don't really know what's you know, what is happening, what has been launched. You can like the cloud code is a bit limited in terms of like. Do you have experience in in type of workflow, Jacob?

36:24Jacob: Yeah, I've done a little bit of this. I I mean I think the the thing that I think about is I always wanna leave my project better than I found it. And so what that actually means is if I have a long detailed plan and then I'm going to implement it, you know, in phases, that what I'm really doing is writing my Claude MD or my agents MD file is like the evolving doc that partners with that. to kind of keep you know, along with memory that might be in the harness, let's say like clawed code. and I found that like really, really effective in terms of like maintaining context as we work on something that may take, you know, multiple days or weeks to deliver. I I don't think I've done it, you know, I mean again, I work in DevRel. The the scope of the work that I'm doing is not production software broadly, although you know there's stuff that we we we we produce that lots of people use. But but it's not it's not like you know, I'm not working on database internals. so so I think you know, I found that really, really effective in terms of just you want to keep it focused on the th what you're trying to solve, but you also need to give a bunch of other context because the the challenge I think with these LLMs is they're really good at solving a specific thing, but they're really bad at like zooming out into the architecture. Right. If you just like tell an LM to build something for you, you get like a really gross architecture. Like it's just none of the decisions are cohesive with each other. And so like thinking about how you make it cohesive, you know, is is important. And I think there's still lots of, you know, lots of room for human design in that too. but yeah, I don't That's what I that's what I found is effective.

38:08Mehdi: Do you spend a lot of time with Steph on on like designing skills for cohesive usage around teammates or other things like this?

38:17Stephanie: Yeah, absolutely. I think so. Because I think, you know, if you had a vague design before AI, that would most likely just slow down your implementation because your coworkers are gonna ask follow up questions, right? And clarify and then you get to clarity and then you go forward with actually writing the code. But the thing with AI is a vague design would just produce a lot of wrong code very quickly. It's actually quite dangerous because the output is gonna look Real, right? It's gonna compile, it's gonna have tests, and it's gonna give you a pretty confident explanation of how well it's accomplished its task. However, it might have that violated invariance, right? That was just never written down. so I think that's also why I start to realize that TLA plus and form of specification can really help with coming up with a more explicit design at the right level of abstraction. Right, because basically ahead of you know jumping into implementation and even finalizing your design and making that very clear, you're you know, thinking very specifically about, you know, what is my invariant, right? What happens during a partial failure? who owns this state? you know, is this thing safe to retry? Is this thing idempotent? Right? you know, how do we handle disaster if we have a data loss that happens, you know, how do we recover from it? Things like that. So it really forces you to think about the system states and you know the transitions that are allowed and the properties that you have to hold during that, right? so that's another thing that I'm excited about is integrating more of formal specification into my day-to-day design and thinking.

40:08Mehdi: Yeah, no, I I think the the point you you you mentioned is true is that if it's it's going so fast you generate the code and I feel sometimes you actually don't need to think about all those questions you mentioned. And I think the danger is that the LM often takes that for as a default, right? It's not asking any question. I feel like the planned mode of code is just like, okay, maybe the system prompt, I don't know what's the sips. prompt around the plan, but I feel like it's just ask three question at least. It's often like this, maybe. And it's already already doing much better, right? But it's exactly as you said, it's needing to think about various things before jumping towards a solution, which will still be green but without a lot of like cases taking into account.

40:52Stephanie: I definitely will go slow to go fast here, you know, to really spend the extra time during the design phase to even say, you know, we are good with the design now. It's not just kind of letting the agent to drive that, right? it's you working with the agent to come up with a design that is not vague, that is correct from, you know, your requirements.

41:22Mehdi: Yeah. And you can afford that, I would say now. Because before like having human conversation over and over and you say, Let's just implement this, it was taking a lot of time to implement this. So you wanted to cut those discussion, you know, I mean to go as quick as possible because you have like fifty percent discussion and fifty percent implementation. But if the implementation is actually just twenty percent, then actually you could still spend time, yeah.

41:50Stephanie: Yes, in fact, you know, when you've spent the time on this design, you can use that as your, you know, execution plan almost, right? It's almost very natural next to execute on this and just write the code. That becomes way more trivial, compared to, you know, knowing what to to write at all.

42:07Mehdi: Cool. There is I have another link here, which is a bit interesting. It's a bit sideway topics. but I found it super interesting. I was at a conference where Stripe talked about MPP. Have you heard about it, Steph?

42:23Stephanie: I have not. This is a sign for machine made

42:25Mehdi: Okay. So yeah. So it's a machine machine payments protocol. and the idea basically is that they have they they're defining a new kind of a protocol so that agent can do payments, you know, on your on your behalf. Like you found a wallet, right? You say this is your pocket money, my agent. And he can do he can do whatever he wants with the service that enables that. Like there is no now you need to enter your credit card and your information. You don't need to do that. So that's that's the idea. And they have website also called mpp dot mpp scan.com, I believe. Let me show you. So this is some known service that shows what volume there is at the moment on agent doing payments to machine to machine using that protocol. And I just found it fascinating because it feels like how does it look like in the database where it's a lot of B2B. You you do a contract to say, Hey, I'm gonna, you know, take this monthly subscription to MongoDB or Mododac pay that. But what happened is it's actually, you know, per request and the agent is actually just go there and charge directly what is needed. so yeah, I I was I was I I was curious to hear. So you see Deepseek is also doing enable that and others. so what w what is make you think about that, Steph? What's your take?

44:05Stephanie: This is this is very interesting. So is the basic idea here that agents can now spend money? Okay.

44:13Mehdi: Yeah. Yeah yeah. So you you found like human will find a wallet, right? And you give the the wallet your agent a wallet. But then he y you don't need to go to Amazon and create an account or whatsoever. It does the payment for you. and any service that support that, it will be able to do, you know, this payment. You don't need to create the account or do, you know, the payment information for you.

44:36Stephanie: I see. And so MPP is sort of like the orchestration layer that ensures correctness for this

44:43Mehdi: Yeah, the pa the payments, the payment basically protocol between the machine to machine because today it's not. It's always you need to fill your credit cards as a person and then, you know, fix that, have the payment to your bank and so on. Here it's a new payments banking protocol.

44:57Jacob: I I here's what is this is this just crypto? Is this crypto is this crypto rebranded?

45:04Mehdi: The okay, that's it is true there is some crypto out there. But I don't know. That's that's that's a that's a good take. It's just that yeah, I'm I'm wondering what would happen in a world where I go to clouds and give update my wallet to say that's like your pocket money for any services. And then it's just giving ask me permission, but I don't need to fill any payments and like you know, Mododac MoringoDB get the bill. yeah, these agents charge for two queries over there. So I think it would be I see the volume, the transactions are pretty high and the volume are pretty low if you compare to that, right? But I think that that's expected. Yeah. I don't know. I don't know, Jacob, what's w what are you thinking there?

45:45Jacob: I'm just I just pulled the docs up because I'm just curious. Yeah, it is it looks like it's using USDC, which is a cryptocurrency, to kind of facilitate this. which which makes sense. I I do think so so putting the crypto speculation aside, I do think this makes a lot of sense. I think for whatever this emerging paradigm is gonna be for agentic commerce. you know, doing doing it in a way that is secure and safe, but also like appropriately experimental feels like a nice a nice fit for crypto actually. and and you know I think it lets us see if this is something that works in a pretty low risk way. so I think I think it's I think it's pretty cool. I mean I think I think I could see it being really cool you know in in a place where you're building building applications or like running them and you need to be able to just like have your agent kind of build everything out, right? yeah. I I think it it it reminds me, do you remember Brave in the early days was like had this crypto play called basic attention token? And it reminds me a lot of that. the bat the bat token stuff.

46:56Mehdi: Yes. Yes. Okay. Cool. we already timeout. we had other links, but we'll put them in the description if we want to get up to date on the latest Data AI. Stephanie, thanks again for joining us on on this episode. It was super nice to have your take and your resight and your paper that you brought in. and I wish you a great day. Hopefully it's cooler on your side of the world, at least cooler than than mine. And every notes of the episode is on motherduck dot com slash podcast. So you can go there and get the notes of all the links and the transcript if you like it to your dint. thank you very much and I see you in the next one.

47:44Jacob: Amazing. Thanks, Maddie. Thanks, Steph.