Beyond the Benchmarks: A BigQuery Co-Founder's Guide to Evaluating Data Warehouse Performance

2025/10/31Featuring:

TL;DR

Big Data is Dead: Industry benchmarks obsess over petabyte scale, but 95% of real-world workloads are under a terabyte. Evaluate for the size of data you actually have.
Lived Experience Matters: Performance depends on Dual Execution minimizing driver latency and Per-User Tenancy eliminating noisy neighbors, not just raw server speed.
Billing Granularity: True speed saves money. MotherDuck's 1-second billing minimum ensures faster interactive queries directly lower your bill.
Time-to-Insight (TTI): Prioritize workflow optimization, like no servers or partitions to manage, over theoretical maximums.
Real-World Impact: Case studies like FinQore demonstrate that optimizing TTI can reduce an 8-hour pipeline to 8 minutes, a 60x improvement that benchmarks miss.

Vendors frequently reduce data warehouse selection to a single, flawed metric: raw speed. They present charts proving they are the fastest at petabyte scale, making the selection process both expensive and intimidating. This approach ignores the core thesis of MotherDuck CEO and BigQuery co-founder Jordan Tigani: "Big Data is Dead."

The predicted data explosion never materialized for most companies. 95% of users operate with less than a terabyte of data. Relying on benchmarks designed for petabyte-scale problems to evaluate "small" or "medium" data workloads is fundamentally flawed. Focus instead on value, usability, and performance on actual workloads rather than misleading benchmark wars.

Why Are Traditional Data Warehouse Benchmarks Deceiving?

Vendor benchmarks are unreliable because vendors design them to favor their own products. This disconnect between test results and a product's real-world "lived experience" represents a critical flaw in how organizations evaluate data warehouses.

Evaluation Criteria	Traditional Vendor Benchmarks	MotherDuck "Lived Experience" Approach
Primary Metric	Theoretical raw query speed	Total Time-to-Insight (TTI)
Workload Type	Cherry-picked, artificial queries	Real-world, user-specific workloads (<1 TB)
Tenancy Model	Shared resources (Noisy Neighbors)	Per-User Tenancy (Isolated "Ducklings")
Latency Focus	Server-side execution time only	End-to-end latency (Hybrid Execution)
Optimization Goal	Winning the benchmark chart	Reducing developer friction and billing costs

The "Lived Experience" Gap: Server Speed vs. Driver Latency

External benchmarks often misrepresent performance by measuring isolated server time while ignoring the network and driver latency that defines your actual workflow.

This disconnect plagued the early days of BigQuery. The team knew the system was fast, yet external benchmarks often showed competitors winning because they measured isolated server time.

MotherDuck addresses this through Dual Execution. This hybrid engine allows you to attach your local DuckDB instance to the cloud. Queries on local data (like a CSV on your laptop) are processed locally, while cloud data is processed in MotherDuck.

This routing eliminates network latency for local steps entirely. The result is a responsive developer experience that server-side benchmarks cannot capture.

Consistency Matters: The "Noisy Neighbor" Problem

Consistency defines your experience as well. In traditional shared warehouses, performance can degrade when a colleague runs a massive, complex query. This is the "noisy neighbor" problem.

MotherDuck provisions isolated compute instances ("ducklings") for each user. Your interactive dashboard is never slowed down by someone else's heavy batch process. Your analysis remains fluid regardless of what the rest of the organization is doing.

The "Faster Ducks" Phenomenon: The Danger of Cherry-Picking

Vendors create benchmarks where their system wins. By cherry-picking specific workloads, they build tests that play to their strengths, a satirical "Faster Ducks" graph. A system excelling at joins is tested on join-heavy workloads, while competitors face their weakest queries. This selective reporting distorts performance comparisons.

Gaming the System: When Vendors Disable Safety Gear

Vendors "game" these tests in ways that do not benefit customers. They often build special code paths to accelerate specific benchmark queries or remove "safety gear" like overflow checks or ACID guarantees to appear faster. The ecosystem incentivizes optimizing for artificial tests over solving real-world complexity. Your own workload remains the only benchmark that matters.

Beyond Raw Speed: Why Prioritize Time-to-Insight (TTI)?

Query speed alone is misleading. True performance is "Time-to-Insight" (TTI), the total duration from asking a question to delivering a data-driven answer. This perspective accounts for the end-to-end developer workflow, where the most significant bottlenecks occur.

Real-World Validation: The FinQore Case Study

FinQore, a fintech company, reduced a critical pipeline from 8 hours to 8 minutes by migrating from Postgres to MotherDuck. This 60x improvement was not just about raw query speed. It transformed the entire data processing workflow, enabling real-time AI products that were previously impossible.

Optimizing the Workflow: "No Servers, Clusters, or Partitions"

Ease of use is a performance feature. In many "Big Data" systems, developers waste hours on configuration. MotherDuck is designed for "No servers, clusters, or partitions to manage."

Eliminating this operational overhead allows teams to focus on building data models and shipping features rather than tuning infrastructure. Optimizing the entire workflow reduces time to act on intelligence, the ultimate measure of a data warehouse's value.

The Economic Reality: Scale-Up Architecture and Billing Granularity

A performance claim without a price tag is a marketing distraction. A faster data warehouse should produce the same result for less money. Otherwise, speed becomes an expensive luxury.

Architecture Type	Scaling Behavior	Billing Minimum	Performance-per-Dollar
Traditional Distributed Warehouse	Diminishing Returns (Scale-Out complexity)	60 Seconds	Decreases at scale
MotherDuck	Linear Scaling (Scale-Up simplicity)	1 Second	Stable and Optimized

Linear Scaling via "Scale-Up" Architecture

Scaling should be linear: paying twice as much must deliver twice the performance. Many distributed, "scale-out" systems exhibit diminishing returns because of coordination overhead.

MotherDuck uses a "scale-up" architecture (vertical scaling). Moving from a Standard to a Jumbo instance adds more CPU and RAM to your isolated container, providing predictable, linear performance gains without the complexity of managing distributed clusters.

The 1-Second vs. 60-Second Difference

Speed must translate to savings. In the "small data" world of interactive analytics, queries often run in milliseconds. Major providers like Snowflake typically charge a 60-second minimum per query or cluster spin-up. MotherDuck charges a 1-second minimum.

If you run a query that takes 200ms, a traditional vendor bills you for 60 seconds.
MotherDuck bills you for 1 second.

This 60x difference in billing granularity means that as the engine becomes faster, your bill decreases proportionally.

Evidence in Production: How Optimization Should Impact Your Bill

Performance claims require real-world validation. Following recent engine optimizations, we analyzed approximately 100 million queries to measure actual results. The average query became 19% faster. The most revealing metric, however, appeared in the billing data.

Billing growth slowed immediately following the release, despite high query volumes. Because pricing ties to compute time with a 1-second minimum, this efficiency gain translated directly into lower costs.

Metric Category	Performance Change	User Impact
Average Query Speed	19% Faster	Direct reduction in usage-based billing costs.
Tail Latency (99.99th %)	45% Faster	Eliminates long waits on complex queries.
Median Query Time	Stable (11ms)	Maintains brisk performance without over-engineering.
"Instant" Queries (<200ms)	Increased to 96%	Reduces workflow interruptions and friction.

Targeting Tail Latency to Reduce Friction

We fixed the queries causing the most user friction. The slowest queries, those in the 99.99th percentile, became 45% faster. Shaving minutes off a long-running query transforms a developer's workflow and their "lived experience."

The Stability of the Median: Why 11ms is Fast Enough

The median query time remained stable at 11 milliseconds. This stability is a feature. You cannot perceive the difference between 11ms and 9ms. Rather than investing engineering effort in imperceptible gains, focus remained on slower queries where users experience the most friction.

Conclusion

Selecting a data warehouse based on vendor benchmarks is unreliable, especially when those benchmarks test data scales you do not have. Effective evaluation rests on:

Test against your own data: For the 95% of us with "small data," big data benchmarks are irrelevant.
Analyze price-performance: Ensure speed translates directly into cost savings via granular, 1-second billing.
Measure total time-to-insight: Look for workflow wins like Dual Execution and Zero Maintenance over raw server speed.

MotherDuck and DuckDB support this approach.

Move beyond vendor charts and start a free MotherDuck trial to measure performance on the workload that matters most: yours.

TABLE OF CONTENTS

TL;DR

Why Are Traditional Data Warehouse Benchmarks Deceiving?

Beyond Raw Speed: Why Prioritize Time-to-Insight ?

The Economic Reality: Scale-Up Architecture and Billing Granularity

Evidence in Production: How Optimization Should Impact Your Bill

Conclusion

Transcript

0:00Welcome everybody and happy Halloween. Um, if you see the hand, Eminem hands. No, you didn't. Uh, I've got with me Jordan who's CEO here at Motherduck. Um, my name is Garrett and today we're going to be doing a little impromptu live stream. This looks weird. Uh, about performance. Um, just a quick reminder, quick housekeeping thing. Um, this is

0:23going to be AMA style. So, I've got some questions for Jordan that I know he wants to to chat about, but we'd also love to hear from you. So, as you as you listen and have things that you you want to nerd out on, anything performance related related to duct or mother duck, please put them in the stream chat and

0:40we will uh weave those in. Uh, one other reminder, next week, >> it's Halloween today. I thought it was just aquatic bioluminescent [laughter]

0:51form awareness day. >> Yes. why my wife laid this out for me when I was going to work.

0:56>> Can you just be for the audience like what what are you exactly? >> I'm an angler fish. [laughter] >> Uh I'm not sure. Yeah. In case it wasn't entire entirely clear.

1:08>> Yeah. The only you know the only uh to my knowledge the only living being with an actual light bulb on their head >> which is pretty [clears throat] pretty cool. Um, one last plug, next week is Small Data SF, which is the conference sponsored by Motherduck, all about um building more efficient, effective systems in data. Uh, if you

1:30find find yourself in the Bay Area, please join us. Uh, we'll have a link there in the comment. Thanks, Gerald.

1:36And we'd love to see you there. Um, let's get into it, shall we? >> Sounds good.

1:42>> Okay, Jordan. Uh, one of the things I'd I'd kind of love to start with, um, just a bit of a history recap. You were one of the founding engineers at BigQuery.

1:51Can you just like walk us through how you thought about performance there? Like were you benchmarking constantly?

1:58Was it public facing or mostly internal? Like how how was the early team thinking about that?

2:03>> Uh, actually we didn't do a whole lot of benchmarking. Um, we sort of we it

2:09sounds silly, but we kind of knew that it was fast. It was like we had experience uh internally at Google that it was that it was quite fast and like customers that were using it, they felt that it was fast. And the funny thing was that actually when um a couple people came out with some benchmarks and

2:25they showed BigQuery not going very you know not doing very well compared to compared to competitors um it was it was quite a surprise to us because it was like hey you know like this doesn't necessarily fit our lived our lived experience and lived experience of customers and that was I think that was one of the first signs that I had that

2:46benchmarking is dangerous that you kind of actually need to um you know and this is what we would encourage customers to do is sort of like is you know edge marks are interesting they're useful uh directional um but they are a particular workload and they're you know they are um what you really need to do is you

3:07really need to to look at how how does your own workload um actually measure up on on these various different different systems. It's a lot more work. you know, it' be great if you could just sort of like, hey, like X is fast, Y is slow.

3:21I'm gonna choose X. Um, but it turns out that, you know, the real world is a lot more complicated. And, you know, I think one of the other kind of key things was, um, is, you know, often, um, there will be many queries that, you know, you need to run or many things you need to run. And

3:41if some of those are you know on one system some of those might be super fast but like one work one of those will be like really slow because it's just a different ar you know that like compared to something else it's a different architecture and uh and then that also tends to sort of dominate people's kind

3:58of lived experience of performance or at least like their tolerance for for performance. Uh because if there's sort of one thing that you you need to do, you need to run this every day and it's slow um you know, you will you'll tend to notice that even if everything else is a bit is a bit faster.

4:15>> It it's funny. It seems like we're uh as I hear you talking about this, it's like a lot of the echoes of the current discourse around AI evals seem to be following the same path where we we started with like you know very uh market centric benchmarks that everyone is chasing and you know of course there's the training data stuff as well

4:33and now people have kind of split into the camp of you know actually your evals need to be tailored to the specific product you're building or a specific workload right.

4:42>> Yeah. Just because it can like, you know, do PhD level math doesn't mean it can add.

4:47>> Yeah, exactly. >> Count or count the uh the bees and strawberry. [laughter] >> Yes. We I guess we we've just recently eclipsed that one with uh GPT4. Um this

5:00brings up something that you shared in the uh the faster ducks blog post, which I think was a satirical graph that honest and Mark published. Um, and if I'm remembering it correctly, you know, it's basically two bars and it's uh on the y- axis is speed and it shows their system uh or the y- axis is is time.

5:21Their system is very slow, our system is very fast, right? That's figure figure one like can you just explain the joke here?

5:29>> Yeah, I think the joke is that every like every system can come up with a benchmark like that. Everybody can can basically come up with a workload where you know this is faster than than this other thing. And um uh you know it might

5:48be by you know cherrypicking you know you know certain types of workloads and certain types of certain types of data.

5:55Um and you know I think the thing is like but that may not actually measure up to to people's to people's real world real world experience. And I think very often people who are you know database companies, data companies, they have a vested interest in making their system look good.

6:12>> And so they will come up with a benchmark that makes their system look good. Uh you know by you know running the types of queries you know if their query if their database is great at joins then there'll be a lot of joins.

6:25If it's not great at joins there won't be any joins. if you know if it is good at you know has a really fast um regular expression parser there will be like a bunch of you know some regular expressions in there and and >> um if your workload looks like that then you know maybe it's actually a

6:42reasonable benchmark >> um you know I think there was >> in the early days of BigQuery there was sort of a benchmark gate where um a [snorts] little bit of a scandal where we kind of we came out with the bench like I think on stage some executive showed you know bigquery beating I don't know if it was snowflake or red shift by

7:06a lot on some on some like TPC which is you know the standard standard benchmark >> um but it was one query and we ran like there was like one query that it was really fast um and then kind of most of the other queries it didn't it didn't do as well but we kind of had cherrypicked

7:23that one that one benchmark um so I think that just sort goes to show the the danger of of of of benchmarks. Um you know over time big got much much better at the other at th those other those other queries and the TPC benchmark and also the TPC benchmark you know a lot of people have said is

7:40actually not a great benchmark because it doesn't actually represent the kinds of queries that people actually run these days. Um, so yeah, I think the, you know, the the the summary is like benchmarking is, you know, benchmarking is hard and I think that's what Hanis and Mark were trying to get across when they when they wrote that paper, which

7:59is like actually doing a a a fair benchmark against, you know, multiple systems is hard. And um, most people

8:09they don't set out to do a shitty job. They don't set out to do to like >> to do to have a skewed a skewed benchmark. They they you know legitimately try to because people evolve and they build their systems >> to suit the kinds of workloads that they're seeing and they will make their system fast on the workloads that

8:27they're seeing. So sometimes even actually just sort of tilting towards the kind of workloads that they see make it seem like they are tilting towards their own, you know, their own system.

8:40Um, especially for outsiders, you're like, you're like, "Oh, well, this is only, you know, it's not using multiple tables. It doesn't do any joins. Like, how silly, how silly is that?" Or it's not it's over a smaller data size, you know, how silly is that? Like, it's going to fall, you know, like um that's not real world. But in the real world,

8:57actually, you know, most I don't know the um the numbers. I would I would suspect uh you know, 98% of queries use

9:07a single table. And um you know 99% of

9:11queries are you know under a couple of gigabytes. And um and so you know that benchmark you know that you know may may be uh may be better than kind of the database you know like because the database nerds that are trying to like actually figure out okay which which one has a better optimizer and which one can

9:30you know can can optimize these kinds of things um >> than they would like to believe. I mean, I think that's what like the TPC benchmarks have been. You know, there's a TPCH, TPCDS. TPCH has 25 queries.

9:42TPCDS is um uh 99 99 queries and they

9:47tend to be very joinheavy and they tend to sort of be kind of targeted at things that are hard for databases. And so kind of database teams, they'll look at each one of these and they'll kind of like they'll come up with a trick to like okay well how do we okay well this one that the the default you know most of

10:07the time the query plan that you naively generate is going to have to do these like a million times or it's going to this is going to cause a nested loop join or it's going to do like these goofy things if you don't have great cardality estimation and like and so somebody will figure out a way to okay

10:24well we can you we can do this, you know, whatever. Um, >> I'm I'm I'm um I'm handwaving here, but there are there are a lot of uh you know, these kinds of things where you come up with like, okay, here's a trick you can do to solve query number nine, and then and then you get much better on

10:38query number nine, and then you come up with another trick for query number 13.

10:41And um uh and so over, you know, so I

10:45think a good optimizer tends to be sort of a collection of these tricks. Sometimes these tricks, you know, actually evolve into um into really, you

10:55know, actually good performance and and and sometimes those tricks, you know, I have I have heard, you know, rumors of uh certain database vendors having, you know, basically detection of is this one of the TPC benchmarks running? If it is, then they have sort of special like cacheed query plans that they use uh in order to uh in order to do do better on

11:19those. Um I won't say any any names though. >> Okay, I'll say save say that one for the followup. Um uh so what I what I'm hearing you say is that if there were a duck duck bench for example that we you know we we solely measured duck themed databases that we would be pretty well positioned.

11:40Yeah, I think if you know uh you know duck facts, you know, duck facts uh benchmark uh you know we we would we would crush that. Um but I'm not sure that you know people would consider that you know kind of authorative for performance comparison.

11:57>> Certainly any benchmark with um CEOs dressed as anglerfish would be for um

12:04Okay, cool. Let's uh let's go a quick deeper on some of the duct DB stuff because you know you in the blog you talked about sort of the impacts of ductb 1.4 and the improvements that arrived in that release. Um I want to get to the mother duck side of this and what we measured in terms of our

12:23benchmarking. Um but could you just take us through what those improvements were like? What came in in 1.4 that made it so much faster?

12:30>> Sure. So you know I mean know you know mother duck we have a uh you know a usagebased billing product and so you know we see a lot of people they you know have they run motherduck uh which runs ductb instances in the cloud and we charge you for you know the length of time that those instances run

12:47and um you know I think we take we we follow the metrics very carefully and we kind of watch how things watch how things have been have been growing and we released support for DB 1.4 before and then all of a sudden the metrics stopped, you know, stopped growing the way they've been growing are like, wait, this is like um what's going on here?

13:08And it turns out that duct TB was running a lot faster. Uh and you know, in particular, DUTDTB, the things that used to be the slowest, you know, got the most faster >> and sometimes, you know, and also things running on the biggest machines on the biggest instances, which we call ducklings, um you know, got got the

13:27most, you know, faster. So kind of it was uh um uh it was a very interesting

13:34kind of experience to just sort of see like hey you know the we're get giving more value to customers we're running things we're running things faster uh and then we sort we dug into it and it looks like you know duct TV um has gotten you know so we I did you know crunch crunch the numbers we looked at

13:49sort of queries before and after um we looked at sort of customers with stable workloads um uh we looked you know at uh

13:58you know kind of across the various like quantiles of of of usage and saw that like on average or the average you know per query time it was almost 20 almost 20% um performance improvement.

14:11[clears throat] >> Um but that was very much tilted towards the you know larger queries larger you know the very largest queries or the small or the queries that that were slowest um you know got you know some you know twice twice as fast. We even kind of I you know did a a breakdown by by customer and and like which of our

14:29customers um you know saw various performance improvements. I think that's also one of the interesting things about um about databases and performance is that the performance improvements are really all over the map. um virtually every customer, you know, saw some improvement um from like, you know, 4% to uh 1,200%.

14:53And um but there were a couple that a couple that couple that actually slowed down um but it may be that their their their workload changed or it's just you know the the the kinds of improvements that were made in in the ductb 1.4 may not have been relevant for them. Like so like duct DB implemented a new sorting a

15:13new sorting algorithm and so a bunch of things got faster and memory usage got got a lot better and so um uh you know

15:21typical typical you know many many things uh you know many things have to run have to run a sort at some point and um and so those got those got a lot faster. So there was a bunch of window function improvements that um so people were using window functions those those got you know significantly faster. There was also um they did some work for high

15:45core count machines. So some of the machines that we that we use in uh in motherduck um have a lot of cores and um

15:53you know kind of sometimes typical if you just sort of naively write software um you know with locks and you know and and stuff like the uh uh

16:06uh when you add when you get to more than a certain number of cores say like 16 or something the um you start to see a like um you stop to you stop seeing kind of linear performance as you >> improvements as you as you add more as you add more resources and you know there's more like you know things

16:25bouncing back and forth between caches and there's like you know there's numa effects where there's kind of basically different regions of the of the machine.

16:33Uh, and so if you kind of can write things a little bit more carefully so that they know how to that they know how to know how to deal with that and prevent things from kind of ping ponging back and forth um it should it it it helps a lot. One of the nice things, the lucky things is, you know, duct DB was

16:49sort of written the uh sort of the morselbased parallelism can work really nicely in some of these machines because basically you kind of each each core gets to work on a subset of a subset of some data and then each one can work on its own hash table and the hash tables then get merged, you know, and and so um

17:08uh it can work really nicely, but it's just like they hadn't put in the work and they finally put in, you know, a bunch of the work and so, you know, high core machines uh got a lot a lot faster.

17:19So, kind of running on a bigger machine uh you know whereas it used to not help as much uh now it helps it helps a lot more and I think there's still a bunch of bunch more work to do there and like >> um but they they you know they got a bunch of lowhanging fruit in this last

17:33uh in this last release. One of the things you shared in the results in the blog posts was that the median query stayed put I think right about 11 milliseconds. Um maybe a naive question like wouldn't it be better if fast queries were even faster?

17:51>> Well so first of all I mean to some extent yes. Um but first of all like one of the things is like just our like measuring things that are super fast um can sometimes be sometimes be hard depending on which kind of timers you use in the system. Um you know they may have a a resolution of only of only a

18:12few a few milliseconds and so all we're storing is is milliseconds. I don't even I don't even know if we have a if the timers are you know what the granularity are under that is. So like um there may be just sort of some some measurement uh >> improvements that we can do so that they they might have gotten faster. We just

18:31don't necessarily know. >> Um but also like you know if you consider the you know the average human reaction time is you know 200 200 milliseconds. Anything under 200 milliseconds appears instantaneous. So you know if you drop from 100 milliseconds to 50 milliseconds you probably don't necessarily notice the difference. um you drop from 50 milliseconds to 25 you probably don't

18:53notice a difference. Now I think what some you know what can be you know what can help is you know or where you know making fast queries even quite faster is like sometimes you have a dashboard and the dashboard has you know 20 different charts and like bunch of different elements and uh and you want that to

19:08render super quickly or you kind of you want things to recalculate really quickly um or you're running a bunch of things you know you have a bunch of concurrent users all hitting something and that's when uh you know the faster those queries run the better because because then they they running in the get out of the way and you can run you

19:26know you can run 50 queries per second instead of you know 25 queries per second and so that's um that's helpful but I think just in from a from a perspective of like a user running a query and waiting for an answer you know making an 11 millisecond query 9 milliseconds is you know is really not going to [clears throat] um is not going

19:46to help that much kind of the other thing is you know when you get to sort of performance that it's sort of at that at that level at that you know at those speeds. there's a lot of um there's a lot of fixed overhead, a lot of just sort of like >> things that's it's less about like how

20:03fast can we how fast can we scan and aggregate and like and do these joins and you know uh you know optimize this query and it's just more about like okay what's the sort of the the the length of the code path the shortest code path that you know to get to get through the system um I think luckily you know I

20:22think duct db can run you know single millisecond queries it can run incredibly fast in incredibly fast queries depending on you know what types what types of queries there there are but you know I think once you start getting down in the you know in the singledigit milliseconds uh you know probably most of the improvement is is going to be on you

20:43know on query routing because the other thing is uh depends on where you're sending your query from and where that you're sending your query to um the difference between a 10 millisecond query 11 millisecond query and a 5 millisecond query may be entirely negligible. For example, if you are, you know, on the US West Coast and you're

21:01sending to the US East Coast, uh, and that's where your analytics server is, >> um, it doesn't really matter like like the just physics, it's going to be like a 100 millisecond ping time. So like just the speed of light is going to take, you know, roughly 100 milliseconds. And so if you kind of if everything works perfectly and like as

21:23fast as is physically possible, you know, uh it's still going to be 100 milliseconds. And so if you make it five milliseconds faster, you know, in general, that is uh that's not something that anybody is going to uh to typically notice.

21:39>> Yeah. I mean the the query engine doesn't exist in the vacuum, right? It's like part of the the entire system. The stack like latency can be any piece of that. Mhm.

21:49>> Um I want to come back to benchmarking for a second and um just for the folks joining us uh if you have questions please pop them in the chat and we'll uh we'll weave them in AMA style. But one thing that we did in addition to kind of this internal piece is we updated ClickBench.

22:09Could you tell me what happened? >> So ClickBench is is an industry benchmark. It's a vendor benchmark uh that was created by ClickHouse. Um it doesn't do it doesn't do joins. It's a single table. Um but it is based on workloads that they saw real web workloads. Um and so it's a pretty good it's a pretty good benchmark. It's not

22:28huge. Uh the data is is you know uh I think it's 30 gigabytes or something and um it's got a bunch of columns and a handful of string columns but it looks like sort of web web traffic data. So, and web logs. And so, if you're querying over web logs, it's actually a pretty a pretty decent uh a pretty decent

22:47benchmark. And one of the nice things also about it is, you know, they put it on GitHub. Uh anybody can see the code for it. Anybody can submit an additional uh you know, run for for kind of their database. And so, a lot of vendors have have submitted uh submitted their own their own uh their own benchmarks. And

23:08you know, I think a lot of vendors actually have started to take ClickBench pretty seriously because, you know, it's um it's it's out there. It's easy to reproduce. It's easy to see. Anybody can run it on their own. Anybody can see whether the you know, it's open source.

23:22Anybody can try to reproduce uh things on things on their own. And um and so

23:29yeah, we we we updated the uh the benchmarks and we finally added our you know, a larger instance size, which is our mega instance size. And um we were briefly at at the very top of ClickBench. Um uh I think they um had

23:43made some recent changes to penalize some things that they thought were cheating and so other things went down.

23:49But then then quickly after they added some larger instance that some larger machines that were even larger than the ones that we were using and so we got you know barely barely pipped by um by some other other systems. But if you look at just the data warehouse vendors, so if you look at Red Shift, BigQuery, um, Snowflake,

24:11uh, you know, the, you know, you know, Mother Duck did pretty well in in, uh, in in comparison. And um even to their you know our smaller instances were actually considerably faster than several of the larger their their larger instances that you know cost you know up to 100 times as much. And um so it was kind of an exciting validation for us to

24:35see that like hey we can we can compete with uh and we can beat things uh that cost 100 times as much or if we for our instances you know if we compare you know like for like in terms of um uh in

24:51terms of cost uh you know we can be you know many times faster. Um so that's you know it's also um kind of a validation of our whole like you know uh architectural uh theme of you know of building building a you know a scale up scale up system focusing on latency uh versus versus throughput um and you can you can

25:16get throughput by by running multiple instances. [snorts] >> That's a good one. Uh [laughter] you you've talked about cost a little bit. I want to dig in here because I think it's um you know, you can think about like the benchmarking world. It's all pure speed and to my knowledge there's not um the the ClickBench UI doesn't explicitly talk about costs and

25:38it's in the repo somewhere potentially. Um can you just explain a little bit more about the relationship between performance and cost for like a system like Snowflake or Redshift, you know, a classic data warehouse system? Well, I just think, you know, typically, you know, data data warehouse vendors and database vendors give you ability to pay more to have things to have to have

26:01things run on bigger hardware and ideally you pay more to have things run faster. Um, and kind of the,

26:10you know, the asterisk is usually you pay pay more to have things run faster because you have more throughput. um very often it doesn't actually help it doesn't actually help latency uh especially for smaller things um just because what you're doing is typically you know when you run you know more snowflake instances more red shift instances more bigquery slots uh just

26:34the the communication overhead gets uh gets gets bigger it gets harder and you know you have um you know if you want to ditch to dispatch a query you uh you know you send to multiple nodes, you have to you have to coordinate all those nodes and then like if there's high cardality aggregations or joins or or

26:54kind of multi-stages, you end up having to ship all of this data between all of these nodes over the network which is slow. Uh you know there's there's a whole bunch of just sort of overhead things that happen once you start to add these um these kind of these distributed systems uh and as you add more nodes to

27:12the distributed system. So ideally you get twice the you know you get twice the performance as you as you scale up. Um but very often you kind of you see you know you double the hardware and you get 50% more performance and then you double it again and you get 40 you double it again and you get 20 20% more more

27:32performance. And I think that was the other nice thing that we saw is you know you kind of see well you see like for example snowflake kind of there's a point at which you know increasing uh increasing the snowflake size uh doesn't actually doesn't actually make it go make it go faster and you know even even slower. I think the 3XL was

27:55faster than the 4XL. Uh but the kind of the the you know once you got to the XL they were kind of pretty pretty close in uh in performance and which means that you spend more and you don't really get more and um and the nice thing about you know the mother duck results were you know the

28:15uh you spend more and you kind of get you know a linear a linear improvement at least at least pretty close you know because you you spend you know 2x you get 80% more uh you spend 2x again you get you know 75% more uh instead of you know 20 20% more. So kind of we did we

28:34did a run we did a um I did a benchmark where or I did I used the results of the benchmark um and there was cold runs and hot runs and so I figured cold runs you know not are infrequency you run I added up the cold runs with you know I said okay we run the hot run 99 times so the

28:51cold run was 99 hot runs >> um and how much would that cost and on various you know various instance sizes and the um >> kind of the the you know if if you have perfect linear scaling then like uh the

29:10instance size like the basically the cost for different instance sizes should should stay flat because as you increase the uh as you increase the size it it's

29:22faster so it runs for less time so you don't have to keep it up and running for for for for as much. So like actually ideally you pay more or actually ideally you you pay to run it on a larger instance but you don't actually pay more because it only runs for you know it runs for for a shorter a shorter amount

29:39of time. >> And why why isn't the relationship perfectly linear? Is it all to do with you know >> a lot of it the communication the communication overhead the you know like the and like you're you know you're kind of dealing with uh uh you're dealing

30:00with more machines and machines that are really I mean like typically larger instances are designed for kind of throughput and handling more handling more data rather than having handling the same amount of data their data faster. Um there are also just a bunch of nonlinear things in um in how in how you know database execution works. So if

30:21you you know if you can store everything in memory then that's that's awesome. Um if you can't store everything in memory then you have to start you know doing a lot more IO and IO tends to be a lot slower and you have to page things to either you page things to disk or you have to reread things from from source

30:37and like and so that is typically

30:41uh is typically slower and that can lead to sort of non nonlinearities or can lead to a point where like hey I actually doubled I doubled the per I doubled the size and it got 10 times faster >> and then you double the size again and it gets 10% faster and like um because when you doubled it the first time it

31:00then everything was was in memory and then when you doubled it again it was already in memory so it actually didn't didn't help help you that much.

31:07>> So there there's like a lot of uh I I don't want to say it depends but it depends right >> it's uh yeah it's it's complicated. We don't say it depends it's complicated.

31:18>> That's right. Um just coming back to benchmarks for a second like uh why isn't in your in your mind you know you've been in the industry a long time like why isn't cost a bigger component of how we talk about benchmarks as an industry? Obviously you can dig in as as you did and sort of run the numbers for

31:37a given workload. Why is um why isn't that front and center alongside performance because it matters right for the system you choose? I think one of the things is you know cost and performance are really two sides of the same coin you know because you know I can choose if I like if what I care about is performance I can choose to

31:57spend more and you know run bigger instances and it generally runs it generally runs faster. Um but I can also if what I care about is uh is cost I can

32:11I can for the same uh you know I can I can lower I can lower the cost and a faster and a faster engine will um you know will still be um

32:24you know for for kind of a given price point um or sorry a given performance point I can you know a faster system will cost you less. So >> performance is sometimes a way of talking about is sometimes a way of talking about cost. I think there's also just a little bit of like braggadocio uh involved of like hey my system is

32:44faster, this system is faster than this system. Um versus like when you say this system is less expensive. Uh you know people like oh is this the is this the discount system? Is this the bargain bargain bin database? Uh like [laughter] I want the I want the premium you know the premium database the craft the craft database. um even if it costs a little

33:06bit more >> the the the duck themed database maybe um if you're an animal lover. Um so if you like you a few I think a few years ago now there was a a previous blog post you shared um called Perf is not enough if I remember the the title correctly.

33:23So, you know, if I was a a customer looking for a new data management system, a new database, like, yes, I could probably go to ClickBench, pick off the top, you know, five or 10 systems, do some testing, and maybe pick the fastest one for my workload. Like, why is that not the ideal way to shop for a database?

33:44So, you know, I think there's, you know, a bunch of different reasons and a bunch of different reasons that I, you know, I outlined in the post, but um I think one of them is, you know, when you when you choose a database, it's often like it's often a long-term relationship with uh with a vendor and with a database. And

34:02so you want to bet on, you know, which one which one is going to be the the the

34:09best database, you know, not just today, but a year from now, three years from now, five years from now, you know, three years from now, are you going to look like the, you know, the genius for bringing in this, you know, awesome database that made everybody's lives simpler?

34:24>> Um, and, you know, is now the the the hot new cool thing that everybody's using? or are you gonna like be like it's like oh well we have to migrate off of this because you know that you know that you know clown uh you know had us use this this crazy this crazy database.

34:40So it's a it's a big it's a bet that people are making with their with their careers and um and so you know I think

34:50trajectory is is important and you know it's one of the things I'm you know most attracted me to to to duct DB when we started uh when we bu started building the company around it was like the improvement rate of duct DB was incredibly fast and we talked a lot about that and I talk about that in that

35:06blog post which is from a couple of years ago and it was exciting to see is like is that the the improvement rate is still very fast. I mean, the fact that they're still, you know, they're kind of at, you know, towards the top of benchmarks and they're doing doing quite quite well. They're, you know, very very

35:22fast already and that they can then they can, you know, make make big jumps, you know, 20% version over version, which they do several times a year, uh, is, uh, is is pretty amazing. Um, you know, I think there's also, you know, that um,

35:41performance on a benchmark is typically not, you know, is is not a great reason to choose a database. Performance on your workload might be a better better way. So, you kind of have to you have to validate against against your own workloads. Is this going to make my job better? Um, and then there's a bunch of features and stuff that, you know, one

35:59database has versus the other. Although I think you know in the fullness of time kind of those things those things all all converge. It's like oh this database doesn't have this sort of governance feature and this one does. Um okay you know right now that may be the case but you know I think in general kind of all

36:18of those things all of those check boxes end up end up getting getting checked.

36:23Um but kind of key architectural things are the ones that are you know are much more difficult. you know, ease of use uh is also very a very important one. You know, some databases, some data warehouses, I won't name which ones those are are kind of notorious for being kind of requiring a lot of like uh

36:43administration overhead uh and tweaking and and work to kind of get them to run to run well and to run at their run at their best. And so, you know, if you have a database that um is just easier, you know, or it's easier to get data in, it's, you know, I think one of the the things that I like also about duct DB is

37:03they take seriously the whole problem of of data management, not just returning query results fast. So um you know I kind of like to use the analogy of like a of like a burger like is like where some people just focus on the patty of the burger other you know ductt labs and duct TV folks focus on and mother duck

37:24we focus on sort of what's the whole what's the whole experience the whole experience is like how do you make it fast from the time when I have a question to the time that I get an answer which sometimes means I have to like ingest this like really wonky goofy CSV file that has these bizarre null characters somewhere in the middle and

37:41you know it's the kind of thing that can cause you to wrestle for hours uh you know over like how do I get this damn thing in and you have to do surgery on it and it's it's it can be it can be a pain. Um you know DuckTb has like the world's um the world's best CSV parser.

37:57It's an you know they they write research papers on it. It's like it's incredibly incredibly well done. all kinds of like bizarre things and they can you know do um uh you know schema inference and they can figure out the schema even when you know there's you know bizarre things bizarre things in it. um which just makes it easier you know makes it faster

38:19because if let's say let's say let's say the query was like you know 10 seconds instead of 1 second but you saved an hour getting the data in you know that's certainly >> uh that's certainly a worthwhile a worthwhile trade-off so I think you know you kind of have to look at the overall >> um ability to solve problems and solving

38:38problems faster versus just the pure pure speed uh even on your workloads the pure speed in in benchmarks.

38:47I I was just thinking back my very first professional experience with a database in an office setting was Vertica deployed on prem access through a SAS client and um you know you're smiling because it's obviously painful but it was not just the query time it was the entire workflow that was hard and then you know in contrast using DuctTV feels

39:10magical and fast and just um incredibly ergonomic. Do you think that um you know obviously 20% uh roughly performance improvement in a release is pretty awesome? Um should we expect jumps like that going forward if you had a crystal ball? What do you think? I so I wouldn't have expected uh

39:35but they did it this time and they did it last time and they did it the time before and like and so um you know a kind of at some point yes like you know

39:45uh all exponential systems you know deca decay to kind of linear or or or something else but um

39:56you know they've they've kept it up they've kept it up so far and I think they have a bunch more tricks up their sleeve and I and I Um uh you know I think there's a bunch of things actually in IO that's going to get a lot faster with asynchronous IO coming up and um you know and they've got some super

40:12smart people kind of the other cool thing that you know the advantage that DuctTV has is because it started as a you know in the research community um a lot of people who are you know getting their PhD or kind of on their master's um track they uh you know they they demonst you know they they build this

40:32sort of like the new thing that they're that they're you know their their research is is in making you know improved improved joins improved optimization improved you know cardality estimation include improved all this other stuff and they do it in ductb because it's open source and because it's it's kind of well known in in academia and then so the best stuff gets

40:54merged back into ductb and so ductb keeps sort of getting better kind of automatically because like the newest best um you know best stuff is going into going into into ductb and actually the ductb team doesn't have to build all of it them all of it themselves. I think sometimes you know they you know there'll be something that will

41:17be built in in another database uh and they're like hey that's actually pretty clever and they will they'll implement it themselves. Um, but they also stay very very much up on like what is the best way to do string compression? Like for a long time people thought floats weren't really compressible and they kind of they you know have a way to do

41:33kind of floating point um compression and and in a vectorzed you know in a vectorzed way and like um some really really neat stuff that's coming out that's sort of just making it continues to make it faster.

41:47Um I saw a comment in the chat that I want to bring up on the screen here from Ignasio who points out that it's not as you say it's not just performance right and I think even beyond kind of the workflow you just you described it's the values shared the community the commitment from maintainers to build something incredible and um you know it

42:10seems like duct DB has that in spades. Yeah. Well, I love the value shared thing because like I think one of the things that we try to, you know, we try to talk a bit about, you know, kind of the values and the beliefs that we have and like and a lot of people a lot of our customers come and they're like, you

42:26know, we believe that, you know, the size of your data is not the most important important thing about, you know, how to how to how to use the data and um or people like the ductb community. people like you know they just get this you know they get good vibes from duct DB and like um and you

42:44know we try to we try to reflect you know reflect that as well. Um so so yeah I think those are all those are all important things and also just like you know what is the commitment to the vendor of of actually helping helping you and solving your serving solving your problems and like can you get the

43:01bandwidth that um you know that you that you need either from the people that are building it or the people that are supporting it or the community um you know do you get do you get what you need or is it is everything purely transactional?

43:16It's quite quite different. I mean, there are plenty of um databases and and vendors out there who want to be the biggest and fastest. Uh and it seems like quite a a crowded um message to to

43:29sort of hawk. >> Yeah. Yeah. And I think just as a database, I never wanted I never want to rest. I never want to like develop have my sense of worth be based on we're the

43:42fastest because like because it's like there's always somebody gunning for you and there's always like there's like and then if okay now if you're the second fastest does that mean you're worthless?

43:53Does that mean you suck? Um it's like you know I think there's so many other ways of sort of measuring um you know the value that you're that you're creating and and I think also if all you're focused on per is performance then it's going to lead you to do you know unnatural things and kind of I've

44:11seen things and I've seen things in code bases and like you know with you know handcoded assembly optimizations and like all this stuff that actually just slows you down and it makes life harder you know it may slide harder in the future. Maybe yeah, you get this sort of great, you know, performance improvement, this performance tweak, but

44:30in the long term, it's going to it's going to slow you down. And um and I think if if really what your value your self-worth is all about performance, then it leads you to do those kinds of things. But if it's if it's not it's sort of like you know we will make it as fast as you know as we can and while

44:47also m you know providing a maintainable system because when we provide a maintainable system then we're going to keep being able to improve it and expand on it and make it you know work well in other in other in other ways and in other you know in other areas and it's going to be more extensible and more

45:03people are going to be able to understand how to how to do it and improve it. Uh, and those those things are all um they're harder to measure than just, you know, performance on a benchmark, but I think those are even more important.

45:16>> As I've heard someone say around the office, it's uh collectively the vibes. Um, maybe it's a little unscientific.

45:23Um, two more and then I think we'll wrap up. Uh, sort of lightning round. If you had to pick one benchmark to evaluate a database system on with all of the caveats that you've just spent the last 45 minutes describing, what would you pick and why?

45:41>> Duckbench. No. Um, [laughter] >> no. I I don't think there's a good I don't think there's a great one. like you know I think like there's everybody complains you know there's a there was a paper I think it was in VLDB uh you know database you know academic publication uh about you know the TPC benchmarks weren't great but those those are ones

46:00everybody uses you know click house the clickbench is is is okay is good you know is good has has some problems you know there's a JSON bench there's an H2O AI benchmark there's like um usually when people come up with a new benchmark they have an axe to grind like they have or they have a thing that they're trying

46:18to do. Whereas was like in BigQuery, you know, I remember we wanted to create our own benchmark because um we're like, hey, well, BigQuery, our customers are getting a lot of value.

46:27They're finding it's incredibly fast, but it doesn't show up fast on this benchmark. Maybe we should create our own benchmark that that like can show what we actually do well, but then nobody's going to take that seriously.

46:37Nobody's gonna nobody's going to believe it. Um uh there was a benchmark actually from the University of Washington uh that I saw that was they measured

46:50the uh you know was trying to show kind of like you know web um web- based analytics you know for for doing for doing data visualization which is what a lot of the times you know is is the a lot of the load on on on database systems and and the interesting thing was that the metric that they used was

47:09the the number or the percentage of queries that finish in less than a second which I thought was interesting instead of like trying to sum up the whole time of the benchmark or taking the geo mean of the benchmark is sort of like it's like if it's less than a second then it's interactive if it's more than a second it's not and um and I

47:27think that's also a pretty interesting way of I so that was a pretty a pretty good a pretty good benchmark but I may just be saying that because I think that duct would do really well on that benchmark >> you you uh you mentioned duckbench.

47:39Would we ever make a duck bench? What do you think? I don't I don't want to pin you to anything. Of course, I think >> I don't I don't know like I I I um you know just I'm skeptical of of vendor benchmarks and you know we're a vendor and so I would be skeptical I would encourage you to be skeptical of any any

47:56benchmark that we that we put out. >> There it is. Um Jordan, thanks for doing this. One last quick plug for everybody.

48:04If you are find if you will find yourself in the Bay Area next week, um we would love to have you stop by Small Data SF which is a conference sponsored by Motherdoc for all things efficient, fast and cost effective data. Um and with that I think we will leave it there.

48:21If you believe if you believe that the you know the size of your data is not the most important thing about it uh you know you know small small data may be a good a good place to find some like-minded like-minded folks. Uh and you can even see maybe what I look like on you know without this angler fish.

48:37[laughter] >> You'll have to come to see if he leaves it at home. He might just bring it.

48:42>> Yes. >> All right. Thanks for joining everybody. Have a great weekend. >> Thank you.

48:47>> Thanks Jordan.

FAQS

How much faster did DuckDB 1.4 make queries on MotherDuck?

After upgrading to DuckDB 1.4, MotherDuck observed an average 20% per-query performance improvement across customers with stable workloads. The improvements were tilted toward the largest and slowest queries, with some seeing up to 2x speedups. Key improvements came from a new sorting algorithm, window function optimizations, and better performance on high-core-count machines.

Why are database benchmarks misleading?

Every vendor can cherry-pick workloads that make their system look best. If a database excels at joins, its benchmark will be join-heavy; if it has a fast regex parser, the benchmark will include regex operations. CEO Jordan Tigani noted that even at BigQuery, a benchmark was once presented using a single cherry-picked query that performed well while most other queries did not. The real-world experience of users rarely matches any single benchmark.

How did MotherDuck perform on ClickBench compared to other data warehouses?

MotherDuck performed well on ClickBench, briefly reaching the top overall and showing strong results compared to traditional data warehouse vendors like Redshift, BigQuery, and Snowflake. Even MotherDuck's smaller instances were considerably faster than several competitors' larger instances that cost up to 100x more. This validated MotherDuck's architectural approach of scale-up, single-node performance focused on latency rather than distributed throughput.

Why is performance not enough when choosing a database?

Choosing a database is a long-term commitment, so trajectory and improvement rate matter more than a single benchmark result. Jordan Tigani pointed out that DuckDB's rapid improvement rate, ease of use, and focus on the entire data workflow are just as important as raw query speed. DuckDB has the world's best CSV parser for handling messy files, and a database that saves you an hour of data ingestion hassle is worth more than one that's a few seconds faster on queries. Learn more about taming CSVs with DuckDB.

What is the relationship between database performance and cost?

Performance and cost are two sides of the same coin. A faster engine lets you finish queries sooner, which means you pay for less compute time. However, scaling up distributed data warehouses often yields diminishing returns: doubling hardware might only give 50% more performance, then 20% more the next time. MotherDuck showed near-linear cost-performance scaling on ClickBench, meaning you get proportionally more performance as you increase spending. Some competitors' larger instances barely improved speed.

Beyond the Benchmarks: A BigQuery Co-Founder's Guide to Evaluating Data Warehouse Performance

TL;DR

Why Are Traditional Data Warehouse Benchmarks Deceiving?

The "Lived Experience" Gap: Server Speed vs. Driver Latency

Consistency Matters: The "Noisy Neighbor" Problem

The "Faster Ducks" Phenomenon: The Danger of Cherry-Picking

Gaming the System: When Vendors Disable Safety Gear

Beyond Raw Speed: Why Prioritize Time-to-Insight (TTI)?

Real-World Validation: The FinQore Case Study

Optimizing the Workflow: "No Servers, Clusters, or Partitions"

The Economic Reality: Scale-Up Architecture and Billing Granularity

Linear Scaling via "Scale-Up" Architecture

The 1-Second vs. 60-Second Difference

Evidence in Production: How Optimization Should Impact Your Bill

Targeting Tail Latency to Reduce Friction

The Stability of the Median: Why 11ms is Fast Enough

Conclusion

Transcript

FAQS

How much faster did DuckDB 1.4 make queries on MotherDuck?

Why are database benchmarks misleading?

How did MotherDuck perform on ClickBench compared to other data warehouses?

Why is performance not enough when choosing a database?

What is the relationship between database performance and cost?

Related Videos

Preparing Your Data Warehouse for AI: Let Your Agents Cook

The MCP Sessions - Vol 2: Supply Chain Analytics

Can DuckDB replace your data stack?