YouTubeBI & Visualization

Is BI Too Big for Small Data?

2024/11/17

If you’ve ever sat through a demo for a Business Intelligence (BI) tool, you know the story. A key metric on a dashboard suddenly dips. An "intrepid analyst" dives in, slicing and dicing the data with a few clicks. They join a few tables, write a few queries, and—voila!—they uncover a previously unimagined insight that saves the day.

As Benn Stancil, founder of the BI tool Mode, explained in a recent talk, "This sells! This story actually really works." It’s the narrative that built a multi-billion dollar industry. But Stancil, who has given this demo hundreds of times, reveals the conflict at the heart of the BI world: the story sells, but for most companies, it doesn't actually work.

If this narrative were true, why are so many teams unhappy with their BI tools? Why do our dashboards become "trashboards" that nobody uses? The answer, according to Stancil, is that the entire "intrepid analyst" fantasy is built on a lie. Not a malicious lie, but a foundational misunderstanding of the data most of us actually have. It’s the Big Data Lie.

The Myth of "Data is the New Oil"

To understand why the story fails, we have to go back to the early 2010s, when the "big data" hype train left the station at full speed. A series of iconic stories cemented a powerful idea in our collective consciousness: that data, like oil, is a raw material just waiting to be drilled into to produce immense value. We heard how Target’s data science team used purchasing data to predict a teenage girl’s pregnancy, suggesting data had prophetic powers. We saw it in Moneyball, where the Oakland Athletic's used decades of historical baseball data to build a winning team on a shoestring budget. We witnessed it when Nate Silver, hailed as a "witch," predicted the 2012 US election with stunning accuracy by analyzing millions of polling records. And we learned it from Facebook, which data-scienced its way to the magic "7 friends in 10 days" formula for growth.

These examples, and the "Data Scientist: Sexiest Job of the 21st Century" headlines that followed, created a powerful belief system. Stancil summarizes it perfectly: "A lot of people came to very much believe that data just contains value... And all it takes is us to have the right tools to get that insight out." This belief spawned a generation of tools promising to "unlock the power of your data." The problem? The premise was flawed for most of us from the start.

Target had $73 billion in revenue. The Moneyball team had nearly 12 million historical at-bats. Nate Silver had 650 million votes. Facebook had over a billion users.

As Stancil states bluntly, "This is big data." Most of us don't have that. We have a few thousand customers, not a billion users. Our charts don't look like the smooth, predictable curves from a dataset of millions. They look like this:

When faced with a chart like this, you don't "slice and dice." You don't "drill down." As Stancil hilariously puts it, "You squint at it and you're like, it's up-ish." This is the disconnect. We were promised a treasure map, but we got a squiggly line and a shrug.

So if the heroic "big data" playbook is a fantasy for most of us, what are we supposed to do? If our reality is more "up-ish" than insightful, it's clear we need a different approach—a new playbook designed not for finding treasure in petabytes, but for finding meaning in ambiguity.

A New Playbook for the "Small Data" Reality

If the old playbook is broken, what does the new one look like? Stancil proposes a new set of principles for finding value in the "up-ish" reality that most of us live in. This new playbook shifts the focus from heroic exploration to pragmatic interpretation.

Principle 1: Shift from Exploration to Interpretation

"The hard part is not creating this chart," Stancil argues. "The hard part is interpreting it."

Most BI tools are built for exploration. They give you endless options to filter, pivot, and visualize. But when your data is sparse, these features don't lead to clarity; they just create more confusing charts. The real bottleneck isn't getting the data; it's figuring out what, if anything, it means. Stancil's insight is that "interpretation of data is often a lot harder than exploration."

This is where speed and interactivity become critical. To interpret an ambiguous chart, you need to form and test hypotheses rapidly. Is this dip because of the holiday? Let me pull last year's data. Is it a specific user segment? Let me filter by plan type. Is it a bug? Let me look at error logs from the same period.

If each query takes minutes to run, you lose your train of thought. The friction of waiting kills the iterative cycle of questioning that is essential for interpretation. When you can test ideas as fast as you can think of them, you shrink the gap between question and answer, making the hard work of interpretation just a little bit easier.

Principle 2: Embrace Unscalable Work

This might sound like heresy to data professionals, but it’s Stancil’s most powerful point. He tells the story of a friend tasked with analyzing the sentiment of articles on a specific topic. She started building a complex AI model, only to realize there were just seven articles.

"Why am I building a tool to look at seven articles?" Stancil asks. "Go read them." It takes 20 minutes and yields a far richer understanding than any model could. This "unscalable" approach is incredibly effective for customer data. Instead of trying to find trends across thousands of users in a noisy dataset, go look at the raw activity of a single user.

With a tool like MotherDuck, you don't need a complex pipeline to do this. You can query your raw event data directly to "read the story" of an individual user's journey. For example, let's say you want to understand what your most active user from the last week was actually doing.

Copy code
-- Stancil's point: Sometimes the best insight comes from looking at one customer.
-- With MotherDuck, you don't need a complex pipeline to do this.
-- Just query your raw data directly.

-- First, find our most active user this week
WITH user_activity AS (
    SELECT
        user_id,
        COUNT(event_id) AS event_count
    FROM events
    WHERE event_timestamp >= NOW() - INTERVAL '7 day'
    GROUP BY 1
    ORDER BY 2 DESC
    LIMIT 1
)
-- Now, let's pull their entire event stream to "read the story" of their session
SELECT
    e.event_timestamp,
    e.event_type,
    e.properties
FROM events e
JOIN user_activity ua ON e.user_id = ua.user_id
ORDER BY e.event_timestamp;

The result of this query isn't a high-level chart; it's a narrative. You can see every click, every page view, every action this power user took, step-by-step. This is the "unscalable" insight Stancil talks about. It doesn't tell you what all users are doing, but it gives you a deep, qualitative understanding of what an engaged user's journey looks like. That's often far more valuable than another "up-ish" chart.

Principle 3: Honestly Assess Your Data's Scale

Stancil's final piece of advice is to be honest about the scale of your data. Don't use a "big data" sledgehammer for a "small data" nail. Big data problems are real, and they require big data tools like Snowflake or BigQuery. If you're managing petabytes of data for a global enterprise, those are the right tools for the job.

But MotherDuck and DuckDB were built for the other 99% of us. We excel in the vast space below that massive threshold, where most companies operate and where the challenges are different. It's not about wrangling petabytes; it's about getting fast, reliable insights from datasets that fit on your laptop or a modest server. It's about using the right tool for the job.

Conclusion: Embrace Your "Up-ish" Data

The "big data" dream set an unrealistic expectation for many of us. We were told our data was a gold mine, and we felt like failures when we couldn't find the gold. Benn Stancil's message is a liberating one: your data isn't the problem. The "small data" reality isn't a failure; it just requires a different, more pragmatic approach.

Stop chasing the "intrepid analyst" fantasy and start embracing the messy, ambiguous, "up-ish" world you actually live in. Shift your focus from exploration to interpretation, embrace the unscalable work of looking at individual examples, and choose tools built for the scale of data you actually have. The best way to begin is to try a more direct, interpretation-focused approach yourself. You can start with a local DuckDB instance or sign up for a free MotherDuck account and see what stories your "small data" can tell.

Related Videos

"AI Powered BI: Can LLMs REALLY Generate Your Dashboards? ft. Michael Driscoll" video thumbnail

2025-05-20

AI Powered BI: Can LLMs REALLY Generate Your Dashboards? ft. Michael Driscoll

Discover how business intelligence is evolving from drag-and-drop tools to code-based, AI-powered workflows—leveraging LLMs, DuckDB, and local development for faster, more flexible analytics.

YouTube

AI, ML and LLMs

BI & Visualization

"A duck in the hand is worth two in the cloud" video thumbnail

33:49

2024-11-08

A duck in the hand is worth two in the cloud

What if I told you that you could complete a JSON parse and extract task on your laptop before a distributed compute cluster even finishes booting up?

YouTube

BI & Visualization

AI, ML and LLMs

SQL

Python

Talk

"A new paradigm for data visualization with just SQL + Markdown" video thumbnail

1:00:53

2024-09-24

A new paradigm for data visualization with just SQL + Markdown

Come to this Quack&Code where Mehdi will discuss data visualization with DuckDB/MotherDuck, specifically focusing on Evidence! Archie, who is building evidence.dev, will join us to share his wisdom on charts ;-)

BI & Visualization

YouTube

Quack & Code