Moving Forward from Ad Hoc Reports with DuckDB and MotherDuck

2024/08/21Featuring:

TL;DR: Transform ad-hoc reporting from one-off CSV exports into reproducible pipelines using DuckDB and MotherDuck, with practical examples of persisting results to track trends over time.

What is Ad-Hoc Reporting?

Jacob Matson, a developer advocate at MotherDuck with an accounting and computer science background, defines ad-hoc reporting as:

One-time asks that need answers right now
Questions that are hard to answer with existing dashboards
Often driven by executives noticing unexpected changes ("Why did churn change from X% to Y%?")

The key insight: if a request reaches you, it's both important and difficult—easy, unimportant questions don't escalate.

The Old Way (Excel + CSV)

Traditional ad-hoc analysis meant:

Export data from CRM, QuickBooks, or other sources to CSV
Mash up data in Excel using Power Query
Build pivot tables and charts
Deliver results... then get asked to do it again every Monday

The trap: Making analysis too good turns it into an ongoing pipeline you're now responsible for maintaining.

The DuckDB Approach

DuckDB changes the equation:

Same workflow: Still pulling from multiple sources, but with SQL
Reproducible by default: SQL files and Python scripts can be version-controlled
Low lift for users: Tools like DBeaver and Harlequin provide familiar IDE experiences
Scale yourself: Automation is built-in, not bolted on

Doing Better Analysis: Persist, Run Again, Trend

The key to better analysis isn't just speed—it's persistence:

Persist results: Save your ad-hoc query results somewhere (MotherDuck works great)
Run it again: Schedule the same analysis to run daily/weekly
Put it on a trend: A single number is meaningless; trends reveal insights

Example: Instead of reporting "pipeline value is $1M today," track it over time. When you see it jump to $1.2M, you can identify when the change happened and investigate why.

Real Example: NBA Predictions

Jacob demonstrates with his MDS-in-a-Box project (NBA forecasting):

Each model run produces new predictions that replace the old ones
By persisting daily predictions to MotherDuck, he can track forecast changes over time
The Milwaukee Bucks chart shows a sudden drop in predicted wins in January—investigation reveals they fired their coach
The Philadelphia 76ers chart shows two distribution changes, both corresponding to their MVP's injuries

Technical Implementation

Using dbt post-hooks to persist results:

Copy code
{{ config(
    post_hook="COPY {{ this }} TO 's3://data-lake/{{ this.name }}.parquet'"
) }}

Then load into MotherDuck with simple SQL:

Copy code
CREATE TABLE predictions AS 
SELECT * FROM read_parquet('s3://data-lake/*.parquet');

The Future Vision

Imagine if every BI tool had DuckDB-Wasm built in:

Press a hotkey to see all data on the current dashboard
Join data from Tableau + Evidence + HubSpot without CSV exports
ATTACH data sources directly: ATTACH tableau_dataset TO evidence_dataset

CLI Bonus: Plotting in Terminal

Using uplot with DuckDB CLI for quick visualizations:

Copy code
duckdb -c "SELECT extension_name, install_count FROM 'stats.json'" \
  | uplot bar

Coming in DuckDB 1.1: Built-in bar chart rendering in the terminal!

TABLE OF CONTENTS

What is Ad-Hoc Reporting?

The Old Way

The DuckDB Approach

Doing Better Analysis: Persist, Run Again, Trend

Real Example: NBA Predictions

Technical Implementation

The Future Vision

CLI Bonus: Plotting in Terminal

Transcript

0:06[Music]

0:27he oh [Music]

0:58[Applause] [Music]

1:25he [Music]

1:59he

2:10hello everybody and welcome to another episode of uh quack and goat and for this episode we're gonna talk about uh adoc uh reporting uh we're gonna discuss

2:24first like the the definition I think uh just to be everybody on the same page and I have uh a really um I would say

2:34fun and interesting uh guest because he's also just a new teammate in mod deck uh Jacob Matson and he's gonna talk to us uh through that uh Jacob welcome hi M thanks how are you doing you doing I'm good I'm good uh you know it's good to be here yeah and so uh first couple of weeks and I drew off the fence with a

3:04live stream so you know just to get to warm up uh how do you how do you feel about about the topic but uh but first introduce yourself and um and a

3:20bit about your your background where you where you exactly yeah that sounds great

3:26yeah um so my my back background is uh I

3:31actually have a uh degree in accounting and computer

3:37science and so my career started on the accounting side um I kind of like to make jokes that like I've always worked on data pipelines they just ran once a month um and you know the data Piece of It kind of came very naturally to me uh over time and so uh you know a lot of um

4:00my experience kind of is much more kind of in the domain than it is um kind of in the kind of more hardcore technology space um so you know my perspective is is uh you know we need we need better tools to help us enable to do um awesome stuff so it's part of why I'm excited about mother duck um yeah so that that's

4:23the quick intro um anything else I should hit on mie yeah uh when I'm curious when when was the first time you you play with uh with WB it was probably like fall of 2021 is

4:42my guess um you know I'd been um working with like DBT for a few years uh you know my background is mostly in the Microsoft stack so SQL Server um and there's this gap between you know once you get to more than a million rows uh it's very difficult to use Excel in a scalable way and so I had some data sets

5:07that were larger than a million rows and I needed to do some work with them and so dctb just had an awesome CSV reader at the time there was you know the spatial package now can read Excel so I can skip some steps but you know I I could export the data I was looking at in csb or um I actually had some some

5:27fun files that were like um tab partitioned Excel files right so they were like one million rows per tab in one Excel file it was very fun very fun like data sets like deal with um so like rip ripping those one tab at a time to csb and then combining them uh you know in in Duck DB was kind of how it started

5:48um you know so very much out of like a need um and then from there you know it's kind of expanded and um you know become more useful um partially partially for two two reasons there hey I've gotten a lot better in Python and the second piece is um you know the functionality of duct DB has grown since

6:06the first version installed which is probably like 0.4 something like that 0.5 I don't know there yeah yeah no it's uh it's a b different word compared to when when you started but so if I understand correctly you're an Excel user converted to to De

6:25DB is that correct I'm still an Excel user but um you know duck DB is a great tool in the toolbox um you know I like to think about um Excel is my favorite tool in the world for consumption of data um it

6:45is not good for creation right and that includes like pipelines and Transformations but a lot of people use it that way and so if you kind of think about you know those two different modalities um you know duct B fits in really nicely to rep a lot of that workload um that used to be creation and transformation in Excel and now I just

7:06do that you know in dtb and Python and then you know can still do exploration and pivot tables and so on you know on the final data set using Excel yeah know but that's that's fair I just ask uh let let us know in the in the audience if uh you're both uh Excel

7:25and Doug DB user I think it's a it's a kind of a niche uh profile if you if your if your data stack is basically that right I'm not saying I do use Google sheet from time to time uh pretty rarely but it's part of my stack right but if your so your stack is Excel and DV uh maybe there is

7:48something to to dig in and uh let us know in the in the in the comment in the chat um but so moving on to the ad

7:57reporting which is um our top today how how do you define your um hadock reporting by itself yeah so I think about um ad hoc reporting really as about you know you have a you have a need that you need an answer for right now um it's a one-time it's a onetime ask um it's hard to get with the

8:22existing with the existing data set right so uh a lot of times you know those types of asks are really driven by you know uh the executive team right it's like hey like uh I saw that our turn changed from X perent to Y percent you know why right um and those those

8:42questions can be difficult to answer if you don't have the right Telemetry in place um but but what they're really telling you is when you when you get an ad hoc request is like hey this is something important that's hard right because if it was easy unimportant it wouldn't make its way to you um and so you know how what I

9:03like to think about you know and kind of um what I really think we can do with ducky be and mother duck is like uh turn that into something that's much easier and repeatable so yeah yeah I think there is also the kind of like a a storytelling what I've seen on ad reporting is like which you would

9:24spend a bit more time as you mentioned on a problem which is a bit more complex to try to NE Nuance the answer like

9:34exactly as you mentioned like oh this is down you know why oh because we didn't do any sales no it's you know it's a bit more than that and so there there there is the story around this um and I think this is um in my in my experience that was hard to kind of like uh reproduce

9:53systematically with like standard bi tool like I'm just asking you like yeah based on your based on your experience like dougb you know and mod out of the picture you have a classic dashboarding tool how do you use to do those like adoc uh you know requests before yeah it's a good question um uh you're kind of getting at like

10:20what like how do I how would I think about that the you know answering ad hoc ad hoc reports question reporting questions is that what your question is yeah yeah yeah and also just like on the on the stack like before like what was like your tool set I mean you mention Exel and I'm sure you're going to

10:38mention it again but yeah yeah I mean you know the core the core of that workflow for me and what I got pretty good at was definitely doing that in Excel right so um you know exporting data into CSV or if I was fortunate enough for it to be in a in a SQL database that I could reach from Excel

10:57you know using power query pulling that data in you know mashing it up kind of tweaking it a little bit um and then and then kind of producing an end result yeah I mean all mo mostly in s or mostly in Excel um you know but via you know lots of tools right um if someone's asking a churn

11:14question I might you know go to the CRM and pull down a list of active customers right um and then you know put that into to excel I might go to QuickBooks or netsuite or whatever to get you know Revenue by customer over the last you know number of periods and compare that so I can just say okay I can see this

11:33customer had zero Revenue this period and they had you know $10,000 last period like there's a churn example right um uh which which brings its own challenges but um you know uh the move forward on that

11:50is you know we can glue a lot of that stuff together with duck DB and python right yeah so uh but yeah you me you mentioned something interesting here is that I believe like a lot of bi Tool uh you know advertise to say you can do um

12:06you know addock reports made it easy you know because you you like anyc level for objective business people you know they the any rates from dashboard from uh from the data team which are pretty you know static and limited and usually you say yeah you can change Dimension there and build a new dashboard but I feel often when you have zok request you need

12:30a specific data source which is often you know not there or that you need you know as you mentioned go to uh to the QuickBooks to run something uh so you basically kind of play a detective and you are not sure where to look at the data and then usually if there is manful repeatable Insight you end over that to

12:55the data team which you know pass it in the dashboard but I feel like that like that's what happened to me it's like some hadock report um you know has been transformed later on on full data project right um but I feel like this is where dark DB shine is that there is an easy way to ingest and to um to pull

13:17from from various sours and I guess that that's where you we we have like there there is that and there is like someone commented uh something Jack I

13:29I had to process huge XL 5 with many tabs to end up processing so I think apart from the size there is like the flexibility of the of the ingestion maybe yeah I I think one thing that I would say is um I'm a very pragmatic

13:46data engineer like um I would rather not do any data engineering um but uh I have

13:54to do it to get the answers I need right to solve to solve the questions the business is asking and so in the past right um when you're using things like Excel a lot of conventions uh you prioritize results over convention um and I think one thing that duct TB lets us do is it lets us you know take a reproducible approach

14:16right respect the convention of how we do data analysis without um without

14:21trading off on you know the ability to deliver results right um that that's the part that excites me the most about it is like the It just fits in your workflow um I don't have to like you know change it uh you know it's very easy to train other people on there's a bunch of really great um other other

14:39projects you know um that use duct DB as well right to kind of make this stuff easier right um one thing that I'll that I'll share that I that I did at previous company was um we were all SQL Server uh users

14:53and so everyone's familiar with you know the SQL Server IDE which is called SQL Server management Studio um pretty much everyone across the company across the operational and finance functions KN knew that tool and we're using it regularly and so being able to have um you know like a dbeaver dbeaver integrates really nicely with um

15:15uh with duct B and kind of can give that IDE type experience um same with like you know Harley Quinn which is another one um that I like to use kind of in the command line um and so users the the the lift to get users kind of start starting with SQL from a CSV is much much lighter

15:32now too right it's like all right you know there there's there's um there's analoges for the tools you already know right um to get started which is really cool too yeah that's that's totally fair I think back then you other needed if you wanted something local to parse a CSV like Custom Tool the easy way was python

15:52but then you need you know Library uh coding uh a bit of coding language a python environment and even like before container locally and so on that's was pretty much hard to just install python um uh so so yeah the the barrier to entry to just spse and read a CSV was uh

16:13was hard I think Google Google cheat did a pretty good job uh after to integrate like import CSV and so on I see uh someone comment that my data stack is one person big query and I yeah go sh right that's sound right that's about that's about right um but uh yeah so that's that's fantastic um moving on uh so you have a couple of

16:39things to uh to show us uh so maybe

16:43maybe we can go uh to that and I'll probably uh ask the the domain question and if you have any question uh in the audience uh please uh let us know uh

16:56we'll we'll stop and uh and answer them with pleasure so I'll put your slide there cool yeah we'll just go through some slides uh I'm not going to go through the whole thing but we'll we'll kind of Bounce Around um y so you know

17:12uh this is just kind of like putting us in context like what kind of stuff can we do uh Alex wrote a great article about doing some very crazy things that are kind of unconventional with SQL for example right uh we can use it inside of of tools like like hex for example you know we can do lots of visualizations

17:30really fast with something like Mosaic right we have the evidence guys building this stuff out uh Universal sequel uh you know which I which I've used in NDS in the box which is project that I run uh and of course you know you can do this you can also do things like this this is Jake Thomas he did a great

17:47talk we're taking we're taking a look at about you know crazy scale uh you know during doing a serverless kind of like parallel processing over at OCTA so uh right and it's fast right as we talked about you know someone someone called out previously like you take all this Excel stuff that's you know uh all these different tabs all these different

18:08formats and you can just pipe it right into duck DB and do all of your um do all of your work you know I think one of the first things I did um I was working on this uh this is you know 2021 probably I was working on um uh just a fun like Wordle solver right so you

18:24could put in whatever your status was in the in the in the game Wordle and you could you could put in the letters and it would tell you which which word uh to guess next uh anyways so I took a pretty

18:37gross unoptimized CTE and it took you know 487 seconds in postgress and six seconds in duct B so uh probably even faster now right um it's getting way faster um across you know all these benchmarks and lots of things you know pandas takes pandas takes you know 4500

18:56seconds versus duck takes you know 48 uh for example on this 50 gab data set so uh yeah lots of lots of great things I would be curious I wonder how these benchmarks perform versus Excel the answer is probably like you know doesn't run or like out of memory but uh that is that is really interesting there is

19:15really interesting we haven't done we everybody's fighting against the database you know yeah yeah and and the reality is that people are using also Excel as their as their tooling right they're using Excel in data break and XL and snowflake so uh it would be still interesting to see how far you could do you know uh some some some comp common

19:39things over there but just ring the queries I think you need extra plugging or whatsoever yeah I I'm not sure how much RAM I would need to do that um okay

19:50so uh what's really cool is because our

19:55tools are way better right we get to do a bunch of other stuff too um uh um uh ad hoc analysis becomes trivial

20:08right and so we can do more faster which is great more faster is a great outcome but my favorite outcome is we also can do it better um so uh well I'll talk

20:18more about kind of what my what of my Approach is there in a second I think I see a question here um in the comments yeah it's really it's h it's mostly uh related to to the behavior so we'll we'll come back to that uh that later okay very cool we'll come back to it um so what does it look like to do

20:39better analysis um you know I think uh a

20:45lot of this kind of comes from my experience you know running running accounting and and operational teams um persist the results right that's already baked into like most accounting um software right so uh it's really easy to go back and see what was my what was my profit you know last year versus this year but that's like intentional right

21:03those systems are designed to do that um a lot of our you know systems that are don't have a financial backing don't do that necessarily out of the box right what was my Salesforce pipeline yesterday versus today uh that can be a difficult question to answer um so doing better analysis means persisting persisting results running it again and then you know can we put it on

21:24a trend right if we can just do those three things you're probably in like the top 10% of data anal analysts right um so like you know here on the left you can see this is just a single point doesn't mean anything I don't know it means that the value was about you know uh 300 on September 11th

21:43right um if we put it on a trend we can see something a little different which is like you know what is exceptional what is not exceptional you know normally we're around 200 so maybe 300 was slightly higher than usual uh we had a spike you know sometime in Middle September you know that tells us that's a jumping off point why do we have that

22:01there you know um and I think that that is the interesting piece that we can unlock you know just thinking about you know taking these ad hoc requests and then putting them into a um uh putting them into you know Trends over time so it's really about the the the what I was mentioning the repeatable process right because I feel like it doesn't

22:28even need to be repeatable right like a financial month-end process that's generating your uh you know financial statements is like repeatable in the sense that humans do stuff but it's not like someone clicks a button and runs a runs a pipeline right um so what we really need

22:49to is is a way to take whatever the result of that and save it somewhere right and so that's the piece Let's see we get to the next page okay um right so

23:00uh my my proposal here is that we we can use mother duck uh as a metal layer to store the results of the ad hoc analysis right so if I'm doing an analysis that tells me my pipeline my sales pipeline was you know a million dollars last week let's just let's just write those results somewhere let get them into

23:17mother Dock and then I can look later and I can write another number and I can see it's 1.2 million this week and I can see that there's a change I don't need necessarily the granularity to know exactly why things change right like the granularity is nice and great but also it adds a bunch of complexity right that's something that

23:36like accounting systems understand kind of existentially right is uh something one thing that's good about them is they're conformed on a certain set of Dimensions right we have like Gap in the US um and uh that also like begs the

23:53question of like why did things change you have to actually look at the data and kind of interpret it to know um um the granularity that you need is not necessarily in the data right and that's okay that's an okay tradeoff um so uh yeah I mean this is kind of a

24:10little bit of a pitch I don't want to lean too hard in on it but like uh I'll show a couple of examples of something I did you know I run this project called MDS in a box um it's an NBA forecast it forecast some other stuff too um but one thing it does is every time it runs it

24:26um it a new forecast and the old forecast is fully replaced right so it's a classic ad hoc report uh in that sense

24:37um so what I've done is built a little Loop um to actually persist results um so like here's an example of

24:49um what I'm doing here this is DBT I don't know if any anyone's familiar with DBT but um it's it's saving basically

24:57probabilities certain events occurring um for specific teams um and so I'm using a post hook to uh drop that data set as parquet into kind of my local data Lake right um and

25:13then what I can do is I actually set like a VAR in my DBT run so I can actually run point in time you know kind of go back go back in time this is kind of unique thing this is probably like a misuse of how those DBT variables were intended to be used but uh you can

25:28basically I can basically back calculate the effectiveness of the model uh and then run it in a loop with make so uh once I have all that data right I can just use a little bit of python and mostly SQL to load the data into mother duck right um this is just SQL these are just SQL statements right this is just a

25:47stored procedure right there's not anything too complex happening here um the thing that we're doing is making the choice to load the results of the of our ad hoc analysis into somewhere for analysis later right um so what's really

26:03cool is this lets us kind of see look look back at Trends right um so for example what I would see right now in my ad hoc report results um if I were to look at the Milwaukee Bucks for example I would see this information they won 49 games because the season's over their seed is the third seed

26:23they're predicted wins 49 that's matches the actual wins of course because the season's over um and they underperformed versus what their expectation was at the beginning of the season right but like what can we see by looking at this over time right and so what I did is I said every time every uh every day that I run the model

26:42I'm going to just look at what their um predicted wins are for each day right so I can see that in January something happened right and so now this tells me hey something happened at that event let me go look in the data and figure out what's what's going on right and and what's what's usually your process into like those kind of steps

27:05yeah I mean I think that this leans heavily on like Deming and statistical process control um really which is about looking at things over time and then identifying when Trends change right um so in this case we can look at that line and we can see the trend goes down um you know outside of kind of ranges that were uh unexpected um around

27:26that time in January right um and uh there there's there's there's actually a set of rules kind of in in the uh dimming kind of Wheeler space of statistical process controls where one of them is is like if You' got more if you have more um I think it's like seven out of eight dots uh over the average in

27:48a row it tells you that your distribution has changed so you can actually see we had kind of two different distributions right we had one one here before um uh before January and then after January and you know we can say all right here's where this thing occurred right uh what what happened oh they fired their coach right they got

28:07worse they fired their coach and they got worse that's probably not the outcome they were expecting but at least we can kind of look back and say Hey by tracking this ad hoc metric which is you know um predicted wins we can actually see um you know changes in Trends uh so we got a couple more I'm

28:23not going to talk too much about these but this one's one of my favorite ones uh this is the Philadelphia 76ers so we can see kind of two distribution changes one right here and then one right here uh their MVP best player was injured and then he was injured again so you know it's actually kind of funny

28:41because you can tell pretty much exactly that this player was worth like you know almost nine extra wins right they were trending to about 55 uh 55 wins and they ended up around 47 let's say eight eight right so we we can actually learn we can actually kind of learn implied things about about how their team Works um anyways all right

29:04I'm going to stop sharing um because it's just more NBA whatever but like I think the point is made which is uh you know by by tracking these these outputs right you could think about this as like imagine that this number instead of NBA wins was you know total number of um you know prospects in your pipeline right uh

29:24well now now all of a sudden you have something a really interesting narrative that like tell you know that leads you to a jumping off point to say hey things are things are getting better or they getting worse why right um and and really kind of get to the heart of of why we do this data analysis in the

29:38first place yeah no it makes uh makes total

29:44sense um I think um it's it's always

29:49like a a challenge uh may maybe we can talk about the the downfall of like the adoc uh reporting voice you've seen being like kind of like anti-art turn or like maybe slowing you more down than providing you Insight you have you have some it seems laughing so I think it's a it's a it's a good question it's a

30:13really good it's a really good question um you know I think

30:19uh especially if you've worked like directly for an executive team there's a real balance on ad hoc reporting okay what I mean by that is

30:31there's kind of two outcomes you're aiming for right well there's one outcome you're aiming for which is you do the ad hoc analysis they say hey Jacob great job um this is this is awesome you know this will really help us you know make a decision right um the bad outcome is hey Jacob really great job I need this every Monday right

30:52because now all of a sudden I've turned an ad hoc report into a pipeline right so so in the past that meant like oh crap like I have to pull all this data down and I and in order to kind of make this work what it also did though was it meant that hey I'm not going to actually do the extra work to

31:10make this pipeline you know uh or to turn this into a pipeline because it was too hard right now it's way easier so you can build you know I would almost say like by default my my Approach on uh ad hoc reporting would be to think about it from a pipeline perspective right I was almost preventing because of

31:28previous tools to think about it from a pipeline perspective because if it was too good I would then get you know stuck doing that analysis every week right um and I didn't have the tools to actually scale myself right or be like hey I'm gonna you know I need to add more people if you want that every week um yeah yeah

31:50I I I have a a mem about that uh that um

31:54but I cannot see it's like uh find it back where you have basically a bridge which is you know starting so you think as a pipeline it's like an adok and then you had like some duck uh some tape around to you know to to to make it you know more um more frequently and it's just I feel like

32:16it's it's sometimes hard to find the balance to say I I do agree that the tools are getting easier to think in term of pipelines directly that I think we we should all do that but uh some sometimes you have to take shortcuts right and sometime there is a a point where you need to slow down to say okay

32:34let's refactor this code because initially I I did this query for you know a one um a one day hadock

32:43report which was one hour before the sea l altic so there is kind of like a different you know uh time pressure versus okay how do I make this robust so that it's regenerated every day and read by you know you know 100 people which is completely a different um you know requirement I feel so yeah but yeah

33:08knowing your requirements for adch reports and finding the right balance to say sometimes no pushing it back to yeah do you actually really need that um it's great but I really like the the point of the mindset of uh um see uh think it uh

33:28Pipeline and by this thing I have actually another small thing I wanted to show I show you just just before I already SP Jacob um but with Doug DB CLI

33:40if you think um you can do a lot of things just with the the CLI and AD up reports if you're looking at some data the only thing missing is basically a bit of plotting in the terminal right because you can quickly pull uh data in data out with a CLI um but uh you can display numbers um but

34:02sometimes you know having just a a bar chart uh plot help you visualize what's going on uh or line chart and by the way in the next release in dub 1.1 do you know what's happening Jacob uh no what's happening okay so they uh so there will be a function to actually building in dub to display play the results as bar

34:30uh as a bar chart um so in the terminal so that's that would be pretty cool charting that's awesome yeah so just just for simple bar charts but that you can already do that by uh using up plot

34:47so up plot if you're a Mac OS you just uh do a pro install up plot there is also a guide uh you plot guide let me share that with the audience right now so dgdb has

35:06uh a guide over there if you want uh to know more but basically um I did some black magic which uh and here you see I'm doing just a a onetime commment I'm invoking theb CLI um um copying uh and

35:24getting data from uh this is the extension download from uh uh from theb

35:31so there is uh always a Jon file hosted there which is the last week download and basically uh put it out on this STD out as a CSV and passing by to the up plot which as mentioning is like uh enable you to plot in the terminal and so what happens when I just eat that command so I'm piping the results and I

35:56can directly display it and look how how beautiful so this is really like in term of pipeline this is an adoc report and you know watching how many downloads each tdb extension has been U has been going and it's just one command so this is really easily reproducible um compared to I would say what we were talk the you know dinosaur

36:22tools where you have a server and you're pretty limited to what you can install locally in yada um so yeah think it's a it's pretty uh neat example to also illustrate what what you can do and you're going to be able to do a bit more chart building in and ddb but I'm curious if you're using the CLI let of s uh if if you have

36:45creative ways of displaying things with your plots U I think there is a lot to play um anyway now you need to write those results into mother duck so you can track it over time there you go yeah exactly so again it's just it's just one command and actually I could uh query instead of like calling the data source

37:08just query a database from mother deck and also display it in the terminal so uh so that would be that would be fun actually not anymore needed a B tool just my terminal um pushing the fix to the extreme um I I wanted to um ask you also

37:31regarding the um the adoc uh reports uh

37:37what what do you think is missing in the CL classic bi tools uh today to enable

37:46you know more um reactive adoc

37:55reports that's such a good question you know I think um I think the reality is is I would like to see way more SQL everywhere um even in my dashboard right

38:08um like evidence not I mean like evidence sure but like what I really want to do is if I'm on like a evidence page is be able to like you know press a combination of keys and see all the data that's presented on that page and just use like duct bwom to mash it up to maybe show a

38:27different a different view right I don't even need to necessarily chart it but the problem what ends up inevitably happening is like you know I need to get uh data from some Tableau dashboard over here maybe an Evidence dashboard over here hub spot over here I pull them all into CSV and then I you know I still can

38:49use duct B but I still have to do that step right imagine if those all had like Duck wasm front ends that you could connect to each other right like attach Tableau Data set to you know my evidence data set right um obviously like no notionally that is like very far away but like um you know I I think that we

39:10have managed to figure out a clever way to separate our transactional workloads from our analytical workloads and so the next question is how do I get all of my analytical workloads together in one place right uh I think that um it's a

39:25great talking point to put on a deck to say we're going to talk about a single source of Truth um but the reality is uh it's

39:34messy and I see uh you know one of the

39:39things that's really cool we can do internally at mother duck is uh I can I can be inside of the mother duck tool and I can uh hit our data warehouse which is also in mother duck and then add a CSV to my wasum local and do reporting on it right uh or hit a Google sheet and I can say hey take the list of

39:57these customers in this Google sheet join that into my data warehouse and give me a list of all the ones that have run you know X number of queries in the last 10 days right yeah um those are things that like are Out Of Reach right of like uh most most tools today but like if you think about what the what the ex

40:17extensibility is with something like duct Tob kind of backing all of that you can start pulling together all these kind of loose threads without having to necessarily worry about doing data engineering yeah so instead of having this Pon export your data as CSV which is you know already one step further you could open a Deb shell and directly

40:40query the data over there right yes and if you're if you're a bit more smart then you may have um you know a way to access that because yeah today a lot of like uh tools to internal tools when you

40:56want to to create those data set Eder is an API so there is an extra work to do or even like you know exporting the CSV is you know one extra step that you need to do so yeah that makes a lot of sense to a lot of sense to be um it's uh I think it's it's still we're still not there but uh

41:20that's it's a creative it's the future that I would like to imagine yes yeah and I think the the other point you mentioned about um bi tool and I I mean

41:33I thinking adoc report as pipeline is one thing but I think also uh thinking you know uh that you said it's not necessarily plotting thing that's true but I think there is also thinking plot and any dashboard as code too because that's much more reproducible in tunable right and I think that's what Miss in the previous uh bi Bo where people do

42:00adoc report they create new manually reports and then they need to refactor things or you know improve things and it's really hard because it's it's just an UI so they and I think uh

42:14people underestimate you know the effort to maintain um you know dashboarding assets and I think they they should be treated as good it's a software asset so there is no need there's no reason why it should not be treated as code related to your need of having a SQL um you know interface for digging into your data so

42:37uh so yeah I think the two two together uh would be would be pretty neat and I think there is tools like we mentioned evidence which are actually uh they were actually in the in the chat so hello evidence people um but tool like evidence also and other bi tools as a code are inra in kind of those best

42:58practice where you can easily reproduce your your hadock reports um we have a we had a couple of question had a question and we have a couple of minutes left which are completely random question but let's take some minutes to to answer sure someone asked me a challenge with um the behavior and I think you've been using the behavior I'm not I'm not a

43:23user um I mentioned the that you can turn also the mother slack if you want to up dat back but do you have uh any solution out of your mind directly there yeah uh so um in your Connection Manager in Duck DB uh sorry in daver you can set it to match the path of your database um and it should be able to

43:44read that um you now keep in mind that you can only have one reader attached to a database at a time so uh you can run into some conflicts there if you're not careful um the other thing you can do is in D Beaver is you can put um uh as your connection string you can do MD colon and it'll actually just pull in

44:02your mother duck databases also so you can just it if you are using mother duck you just do MD colon MD beaver and now all of your databases are available yeah um that's sure that connect directly yeah to multiple database with one connection to uh to um another random question I can answer this one I like this question uh

44:27I'm an aspiring data engineer and one question that how to grab the de internship because I've heard they are pretty low we can both answer those one this one I think for for me it's um it's

44:41all about first your your network you need to you know uh kind of connect to people uh in the industry like you know following those uh those live stream as you do but also inperson meetups um and finding people that would give you this opportunity because they they had contact before with you right um not just to call email to say hey I'm

45:06looking at an internship so I think having discussion or learning with people that's the first point and the second one is just sharing your project sharing your uh your site project and one thing under it it that people the biggest mistake I've seen ins side project is that people have a get a preo and that's it read me setup no a full uh

45:31uh I would say side project is something that you that someone can use and you know uh try without going to a read me

45:40setup project no one is going to do you know installation of running your project locally so have something to show off and even if you are a data engineer and you're not good at reporting noways it's pretty easy to do um to do something so you did something with uh uh MDS in the box right Jacob I think

46:03the other things I put recently um is Du DB stats so those are

46:09for example two projects you can look and get inspired but there is multiple way to highlight your project so that it's it's visible to the outside what's your take on that uh jaob um I think I think my my take is a little bit maybe more contrarian but um I think that uh I read recently that uh Insight

46:37comes from action not analysis so if you think about what the point is of working in data right it's a lot about that how what kind of insights can we drive um so don't be afraid to get into the domain right uh I started in the domain I did not in here in like get here uh or I did not start you know in

47:01the domain to to eventually get into into Data but here I am um I think that uh there's data all around us you know one one conversation I remember with with a guy who had just graduated from school and was working at a brewery kind of running the front of house and he was like I'm trying to get into Data but we

47:20don't have any data here and like you know how do I how do I break in right he had just gotten like an undergrad degree and like data science and I was like what are you talking about like there's data all around you like think about what is in your point of sales system right how do you get that out and turn

47:35that into something useful um you know you guys are bu making your own beer that's like a very data intensive process right to to improve it um there's very precise chemistry happening you know in those processes um that you can use data to improve and so you know um that that was the first bit was advice the second part

47:55was really interesting is like about six months later he's like hey I'm still running the front of house but what I was able to do is use our Point of Sales system to see when we were getting the most sales and then use that to to schedule you know um uh folks when when

48:10folks worked right when when we were staffed and I said exactly right like that like it's not that you have to be able to see it like you know I I think um and that that can be a little hard but like um you know don't be afraid of the domain there's lots of opportunities you know especially in um smaller

48:28companies that are going to let you just you know pip install whatever you want on your laptop um to to kind of build your own pipelines you don't necessarily need to be you know reporting into a data function you know finance and Ops are great great places to start too yeah no that's very that's very true I think I had an example where

48:49someone I know was working in uh fintech

48:54uh and had a lot of knowledge about how you know online payments is working and the protocol and so on and the partners that required and U was mostly working as a product person but less Technical and he wanted to go into Data but for him going into data analyst is was actually pretty um makes sense because he had a lot of knowledge on the

49:18domain knowledge to you know make um valuable insight and sense from you know a fintech perspectives business business um so so yeah can definitely relate that um nowadays with data engineer you have so much technical tools to master and we often forget that if you are good within actually u a couple of domains then uh you can also stand out um cool that uh

49:47was the the last uh the last one and uh

49:51on that I think we can uh close on this do you have any uh closing thoughts or thinks you are looking forward Jacob uh duck DB 1.1 right coming out in

50:03September so lots of new tools to play with I'm very excited that is that is very true and the bar the bar uh bar Shard display right we didn't the terminal y indeed so lots of fun stuff happening there yeah I'm excited cool yeah that's that's will be a couple of weeks and actually the team of mod will be also in Europe uh by that

50:25time that's this release so probably can do also also something um yeah so that was quite and good U you can check out mod doevents for any other online events

50:37uh both mod duck and outside M duck all duck related uh also F all about dos I'm joking technical dob what related um and

50:49yeah I'll see you soon online thank you jackob for joining all right Manny we'll cheat later all

FAQS

How can DuckDB turn ad hoc reports into repeatable data pipelines?

The key is to persist the results of ad hoc analyses rather than treating them as one-time outputs. By saving results to a DuckDB database or MotherDuck, you can re-run the same analysis later and compare results over time on a trend. Jacob Matson, MotherDuck Developer Advocate, recommends thinking about ad hoc requests from a pipeline perspective from the start. DuckDB makes this easy enough that there is no reason to treat analysis as throwaway work.

Why is tracking ad hoc metrics over time valuable for data analysis?

Tracking ad hoc metrics over time reveals distribution changes and trend shifts that a single data point cannot show. Using statistical process control principles from Deming, you can identify when a metric's distribution changes, for example, detecting that a sports team's predicted win total dropped sharply after a coaching change. This transforms ambiguous single-point reports into actionable findings by providing context through historical comparison.

What is missing from classic BI tools for ad hoc reporting?

Jacob Matson argues that BI tools need more SQL access everywhere and better cross-tool data integration. Currently, ad hoc analysis often requires exporting CSVs from Tableau, Evidence, HubSpot, and other tools and manually combining them in DuckDB. The ideal future would let you attach different BI tool datasets to each other directly, similar to how DuckDB can attach multiple databases. MotherDuck already lets you join cloud warehouse data with local CSVs or Google Sheets in a single query.

How did DuckDB replace Excel for operational data workflows?

Jacob Matson describes how DuckDB replaced much of his Excel-based data work. His previous workflow involved exporting data as CSVs, pulling data into Excel via Power Query, and manually combining sources. With DuckDB, the same work that took hours of Excel manipulation now takes four SQL statements and 30 seconds. DuckDB can read Excel files with multiple tabs (via the spatial extension), handles larger-than-memory datasets, and produces reproducible, version-controllable SQL instead of fragile spreadsheets.

Moving Forward from Ad Hoc Reports with DuckDB and MotherDuck

What is Ad-Hoc Reporting?

The Old Way (Excel + CSV)

The DuckDB Approach

Doing Better Analysis: Persist, Run Again, Trend

Real Example: NBA Predictions

Technical Implementation

The Future Vision

CLI Bonus: Plotting in Terminal

Transcript

FAQS

How can DuckDB turn ad hoc reports into repeatable data pipelines?

Why is tracking ad hoc metrics over time valuable for data analysis?

What is missing from classic BI tools for ad hoc reporting?

How did DuckDB replace Excel for operational data workflows?

Related Videos

The MCP Sessions - Vol 2: Supply Chain Analytics

The MCP Sessions Vol. 1: Sports Analytics

LLMs Meet Data Warehouses: Reliable AI Agents for Business Analytics