DuckDB & dataviz | End-To-End Data Engineering Project (3/3)

Q: What is the Evidence project structure and how do you define data sources?

An Evidence project has three main parts: Pages (markdown files with embedded SQL and chart components that render as web pages), Sources (SQL query files and connection configuration for your data source like MotherDuck), and evidence.plugins.yaml (plugin configuration). Source queries are defined as .sql files, and you reference them in your markdown using the file name. The connection type and authentication token are stored in connection.yaml and connection.options.yaml respectively.

2024/06/27Featuring:

Mehdi Ouazza

In this part 3 of the project, @mehdio explores how to build a Dashboard with Evidence using MotherDuck/DuckDb as a data source.

Transcript

0:00Building open source project it's important to be able to track usage metric while GI up stars are rather Varity metrics there are other ways to Mis whole people are using what you've built specifically when building a python library and pushing it to PPI it's actually possible to get a ton of information about how many downloads you are getting but also which python

0:20version CPU architecture your user have or even where the people downloading the python Library are located in this video we'll build this beautiful dashboard that helps you get insight about a given python package we'll be using dgdb and mod duck for storing in processing the data and a bi has a code tool called evidence for the dashboard we'll focus

0:43on the inside about dougb python package but the code will be flexible so you can get insight on any python package

0:52sh every resources including the full source code is in the video's description and this video is part of a series and if you are just joining us here well welcome we assume that you have already the data you need to start building as we will only focusing on building the dashboard so not ingesting and pre-processing the data however

1:13don't worry if you didn't do that first the data will be there for you ready to qu but let's understand the overall architecture and do a quickly recap of what happened in season one and two a not so far long time ago medy covered an end to-end data engineering project project where in the first series he covered the ingestion of raw pii data

1:35this was using a python Pipeline and ingesting data to duck DB to be able to write the results either to object storage like S3 or directly into mother duck in the second video we used DBT to Define a simple model and transform our raw data into an actionable data set that would be used for our dashboard the

1:59output of this DBT pipeline was in mother duck but why what on Earth is mother duck all right I'll take it from there when building an anal dashboard you will have typically two prerequisites first the data and second a quarian Gant typically a noap database typically the dashboard will often create behind the scenes queries for instance as soon as you change a filter

2:22to display the results and these queries need to be performed somewhere and they are typically analytical queries work involving Group by Sum Etc so all that databases like bigquery snowflake or modern duck fit the best for days use cases business intelligence tools like Tableau powerbi and others all rely on an external query engine to fetch the data and then display it in the

2:48dashboard well what about M deck well mck is a sever lless analytics platform powered by Duck DB it means that anywhere you can run duck DB you can run mod duck and Leverage The Power of the cloud to build this dashboard we created a share in moddu that you can use directly share is a database that you

3:06can easily share or rather attached from any dug DB client to access a share you need to have a mod duck account we have a free chair that you can use so go ahead and login and the share URL of the database and once you're connected you can attach a share with the following commands attached with the share URL and

3:26of course you can create a database shared from any other duck DB client would it be python dogs whatever programming language have you there is always a duck somewhere let's take a quick look at the duck dbcl for example how to connect to mod deck and read a share the only extra thing I need here assuming I have my modu account and my

3:47share URL is the modu token to authenticate to modu you can find this one in the modu UI and I've put my modu token as an environment viable so I can connect to mod using the attach MD colon command as I already attached a share earlier in the UI you can see I can directly see the share database and

4:07start query the B as Cod tool we are going to use evidence use the same mechanism to authenticate to model techn be in NOS will connect to modu and you may have guessed by now the only thing you'll need to configure inevidence for mod deck is the Modoc token note that Modoc supports multiple other dashboarding tools and I'll let you

4:29check the full list on our documentation website link in the description all right let's build the dashboard so first let me give you a quick overview of evidence evidence is a no framework to build a dashboard using markdown and SQL at the end you get the NOS javascrip application you can deploy anywhere or to evidence Cloud it's great because it

4:50helps you enforce software engineering best practice our dashboard will be versioned we have clear view on the source queries and we can deploy it to multiple envirment if we want mainly development staging production to build a dashboard you can start from a template provided by evidence and let's do a quick walk through about this one and then we'll dive into the specific of

5:13our dashboard which is getting Insite on PPI data for a ddb python package all right so the first thing you want to do is basically get clone uh the event mod deck template so you can find the link in the description and so I just have it uh already uh clone and then just open basically your favorite editor I'll be

5:38using vs code and uh we're going to open

5:42uh a new Dev container which is basically as I mentioned earlier a no GS uh application so we're going to go forward and pick uh no GS uh 20 and if

5:56you're not familiar with Dev container it's simply uh vsod feature to quickly uh create a container for your development environment you pick from template so now basically is just loading building the container and will reopen vs codes so now I am in my Dev container I have no GS uh dependencies and we'll cover quickly the structure of

6:21evidence project which is including Pages uh source and then there is evidence plugin and you recognize if you're familiar with not GS uh the node dependency we've packaged Chon so let's start where the most important is uh which is the uh Pages you can create multiple Pages we'll have only one Pages for our dashboard and here you have an

6:45index uh MD so it's a maridon file where you can write any maridon that you would like but you see you have specifically syn tax to do some if else condition but also including some comp component and again if you've been playing with react or other framework you see it's kind of like the same IDE where you include a

7:07dashboard component we'll go into that later and basically write uh your SQL

7:13query that's you're going to use so here you see that this is a SQL query daily service request uh querying from moddu uh which is the source connection we're going to see that and basically the name here daily uh service requ Quest is used into do chart you see so it is Shard here when I'm fitting this data and I

7:37have on the xais uh create dat so initially I mentioned that the only thing you need to do is uh configure your mod doc token and what we can do basically is just install the dependencies so you just do uh an npm install grab a coffee once you've installed the dependency you can use and P run def to start the local

8:01server and as you can see we don't have any graph display it's normal we didn't feed our service token so coming back to mon Qi you can get your service token just by going to settings and then here uh copy your token which going to be copied to your clipboard and then in in evidence you just go to uh the setting

8:24page and you see have a data source here you click edit and here you have have the value uh the value of the token that you're going to pass can test the connection and again to make a relationship with what we discussed earlier we have basically a Doug DB client in JavaScript running on evidence framework and this is basically

8:47leveraging the mod de extension and just connecting to mod Deck with the token which is required to authenticate so the connection is successful don't forget to confirm the change and save it and now if you see you have mod du Q to update and it say it's managed to get fetch the data so if we go back to home now we see

9:09that we have the data which is display and we can also play also with the filters uh the slider here and uh have some more information here about the New York City uh service request all right coming back to your uh to our repository so we talked about the main uh spaces where we Define uh SQL query and

9:31components which are or graph or charts if you prefer you can head down to evidences documentation to see all the charts that they have basically here you see you have an area map and you see it's just a component you feed the data based on a SQL query that you define in markdown uh and you have your uh shart

9:52so it's really a quick way to iterate because when you have your server running locally which we do right um you can basically adapt any query any graph and see the result of your dashboard directly so the other thing I want to uh cover is the source and as you see in the source folder we have modu which

10:11stand for the modu source uh we have um

10:16the connection option and this is generated this is where our modu token is stored uh you should not get this one by the way uh so this is just uh a configuration uh file to save the token locally and when we're going to deploy to the cloud any other service you will store them as an environment viable then

10:35you have the connection. yaml which is specify the tab of the connection um for evidence which is modu then you have the source query and the source query is really your data eurus uh query in modd

10:48and you see that we are querying the sample data New York City and service request this is named New York City service request volume and if you go back to our graph we're going to see that if we look at the from statement we have from moddu which is you know the source here folder and then the name of

11:07the query which is the name of my SQL file okay so you define the source query here and then you can leverage uh this table based on the name which corresponding to the CLE file name by the way I can also run those query basically in a modern if I want right so if I run this query I'm going to have

11:28the result here which is is the one which is fitting uh the dashboard sample database is basically a database which is uh attached by default for any mod de user and contain a series of sample of data to play with that's why we didn't have to do any attach commands but for the specific database that we are going

11:46to use uh for PPI data it's a dedicate database and I'll provide the share URL so that you can attach on your site that's why we had to attach earlier to be able to query uh this database the last last one I want to cover is the evidence plugins uh you see that you need to have some plugins defined

12:05depending on the source that you are querying here is mod deck it's included in the template uh and that will download specifically uh some package for you dependency required that's it that's pretty much everything so in Pages you define your SQL and markdown uh and are using comp erasable component uh to Define charts that you can find in

12:27evidence documentation we have the connection which is specifying the type of the source and we defined the token through the UI and it created a connection. option. yl which contain the token locally and then we have a source query uh which is basically going to be referred in my uh markdown file all right so let's build our own dashboard

12:49Now using the dub PPI stat data and for that we're going to Define SS query and then in our markdown file we'll play around a three can of charts so D now on the main repository as a reminder is a monor repo setup so we have the inje pipeline the transformation Pipeline and the dashboard into uh the same

13:09repository which is Pipi uh duck flow against Link in the description and we created a dashboard folder which basically is the structure that we saw just earlier with basically the mod template from evidence and if you open this project you can also reopen in uh Dev container and the beauty of it is that we see that we have two Dev

13:31container definition we have one for the injetion and transformation which is python you see that's what in proposing when I'm reopening container and the other which is nodejs for the dashboard so I'm going to click that one to going into the dashboards folder which is where the evidence dashboard is you can run again uh the same local server had

13:52your token so that you're ready to connect uh to uh mod deck and don't forget to do the attach uh command to the share database. DB stats so that you can access this uh share database you only have to do it once so you can do it using the model du UI as we did before or the ddb CLI the first thing we want

14:12to do is basically write the source query and so to Define this Source query we can have a look at the source data the road dat I set basically here is a PPI file downloads run row represent one download so this is really granular in details there is a lot of information regarding um CPU architecture and so on

14:34you see that we computed in PPI daily stats and regroup by series of information we have the country Cod and the python version so this reduce already drastically our data set so the row data set contains more than 44 Millions yes that means that Doug DB has been downloaded more than 44 million uh over time since its release uh in 2019

14:57at least the python package and if you look look at spy pii uh daily stats we have about 1.7 million uh rows uh for

15:06when it's re when it's Aggregates per day and so this thoughts of like how much aggregation you need is really important because you want basically to optimize your dashboard to fetch as less as possible data right so if you don't need that much granal or specific da uh daily view or maybe you just need a daily View for the last seven days you

15:29may want to create a dedicate squ a dedicate table to basically speed up uh your dashboard and I vot pumping too much data to the client so coming back to to a source query I want to define basically uh a weekly download view based on the daily uh download table view that I have I'm also filtering just

15:49a couple of column that I'm interested to which is the ddb version country code python version doing a sum to have a weekly download sum and I'm basically getting the week start date based on the download date so this give me the date of the Monday for every other week so if uh the download date is happening on

16:09Friday it will refer to the MonDay uh date of that week and then I can do grow by and have a weekly view but I can also see basically the growth over months and over year easily with that so yeah again uh feel free to use the the UI to basically prototype your query you see here that I still have like

16:31300,000 rows but it's still much much less than the 1.7 million row okay so what I'm going to start is just getting a view on last week's download we're going to we're going to do a line chart of view over time and we're going to do a bar chart of uh the python version that people use so for the download of

16:50last week I can just use this uh which is a big value you see there is a couple of things that I can even add uh to compare versus last week or last month I'm just going to keep a big number here to keep things simple so one first thing I want to show you is that we you can

17:08chain uh SQL query so here for example what I'm uh Computing is the last four uh weeks of the current day when this is running so that I have a list of a view of the past uh 30 days but also easily can uh filter on the last week so whenever I save basically uh this and I

17:29have my local server running uh I can inspect actually the result of this query and if everything is going well in the in the evidence dashboards and by the way if you don't see those query results and the query definition you just go to setting and you can see maybe it's hide so if I hide them they're

17:46hided by default uh so when you see the final dashboard you don't see actually the queries so now that I have a results giving me the four last week we can start with a line chart to see the weekly download and as you can see here I'm chaining actually the result from the last weeks taking everything except

18:06the last week which might be the current week and not incomplete because otherwise as we saw the downloads are growing and you're always going to have a drop as the week is you know getting complete so now we have the results here again I have my query results so we can just pick uh a line chart so going to

18:23the documentation we have line chart this is how I Define it so here I the in my line chart and you see I specifying the query here and with the weekly downloads and the weekly start which are the the two column Y and I and x axis and that's it we have uh my graph which is uh download per week all right let's

18:45do another one let's do the Big Value one and the bar charts so here is my query to get the download by last week again I'm filtering by an offset to take a full uh preview week so Myra is basically another component you see I'm calling refering the data which is this query specifying the value and here is the formatting and yes it's

19:06a big value for dougb we have to count in millions for weekly downloads now if you have a small project maybe you want to fine tune the format and that's it we see that the last week was Zero 1.10 millions and that's exactly what we can find is Runing up it's 1.95 actually but that's correct you start to get the grip let's do one more

19:27a bar chart to uh display the number of

19:31downloads per DB version and so here I'm actually uh looking on the past uh four weeks because ddb is evolving and it's the same if you want to look at python version I don't think it makes much sense to look at a fulltime period I think the last 30 days give you a good Insight who has been upgrading their dug

19:51DB client or not and let's go again to uh the documentation and look for a bar chart uh like this like horizontal bar chart so I haded the bar chart uh specifying the query deck DB version total download and I'm swapping to get and run it all well and yeah it's working so you see that we on the past

20:1130 days we had mostly 1.0 which was the load which has a big event and then the rest we still have a lot of client on 0.1.3 0.1.2 all right one more thing I want to cover is the layout you can custom a bit your layout colors everything and you see that for me I had a simple log go I said to not show the

20:31queries by default I had the sidebar the sidebar is when you have multiple pages so you can have a look also and fine tune the design of your dashboard all right once you're done you can basically deploy it to versal or netify as it's just a JavaScript application and evidence are their own cloud make to make you easy to publish and the only

20:50thing you need to specify when publishing is your environment viable for the mod do token there's multiple other thing we haven't covered in evidence you can add of course filters to dially exchange uh your queries and your dashboards but at least you get a grasp on how things are working together all right that's it for this video and

21:08for the series of this endtoend data entering project in this last part we built the dashboard using a b as code tool called evidence and leveraging modu as our Central data repository not that the series might be finished but we could all cover plenty other things how to orchestrate the pipelines how to do data observability and data quality so

21:29let me know what you would like to see in these comments and in the meantime keep coding and keep quacking [Music]

FAQS

How do you build a dashboard with Evidence and MotherDuck?

Start from the Evidence MotherDuck template, configure your MotherDuck token in the Evidence settings UI, and define SQL source queries that reference your MotherDuck databases. Then write markdown files with embedded SQL and reusable chart components (line charts, bar charts, big values, etc.) from the Evidence component library. The dashboard runs locally during development and can be deployed to Evidence Cloud, Vercel, or Netlify. See the MotherDuck getting started guide for setting up your account.

What is MotherDuck and how does it work as a query engine for dashboards?

MotherDuck is a serverless analytics platform powered by DuckDB. It runs analytical queries (GROUP BY, SUM, etc.) in the cloud behind your dashboards. You can share databases with team members using share URLs, and any DuckDB client can attach to a shared database. For BI tools like Evidence, MotherDuck authenticates via a service token and handles all the SQL execution needed to populate dashboard visualizations.

How should you optimize data aggregation for analytics dashboards?

Pre-aggregate data to reduce the volume fetched by the client. For example, instead of querying 1.7 million daily rows, you can create a weekly aggregation view that brings this down to around 300,000 rows. Think about what granularity your dashboard actually needs: daily views for the last 7 days, weekly summaries for trends, or monthly aggregations for long-term patterns. Dedicated summary tables speed up your dashboard and avoid pumping too much data to the client.

What is the Evidence project structure and how do you define data sources?

An Evidence project has three main parts: Pages (markdown files with embedded SQL and chart components that render as web pages), Sources (SQL query files and connection configuration for your data source like MotherDuck), and evidence.plugins.yaml (plugin configuration). Source queries are defined as .sql files, and you reference them in your markdown using the file name. The connection type and authentication token are stored in connection.yaml and connection.options.yaml respectively.