The Future of BI: Exploring the Impact of BI-as-Code Tools with DuckDB

2023/12/07 - 9 min read

BY

An analytics dashboard is a software asset. It must undergo testing, be appropriately versioned, and exist in various environments, including staging and production, during its development stages.

The emergence of BI-as-code tools addresses long-standing challenges in this field.

DuckDB is an excellent tool for quick analytics, but what if you need a more sustainable visualization? In an earlier YouTube video, we explored integrations of DuckDB/MotherDuck with Preset and Hex. However, the world of data visualization offers many more tools beyond standard BI dashboarding.

This blog post discusses three Bi-as-code tools - Evidence, Rill, and Streamlit. For each tool, we’ll go over :

  • Setup
  • Project structure and connection to DuckDB/MotherDuck
  • Creating Data visualizations
  • Deployment

And of course, you can follow along with the full repository code.

Plus, instead of using boring demo data, we'll dive into PyPI statistic insights from the DuckDB project.

If are too lazy to read, I also made a video for this tutorial.

BI-as-code ?

Before diving into the different tools, it's important to understand why such tools are emerging. Data engineering has seen significant advancements, yet the rest of the analytics chain often hasn't kept pace.

Typically, business and analytics users extract data from a data warehouse and build their dashboards using WYSIWYG tools such as Tableau, PowerBi, or Excel.

What's the issue with these tools? They were designed with a user interface-first focus to lower the technical barrier to entry. However, this approach can lead to increased technical debt.

For instance, how do you roll back a UI dashboard or prevent it from breaking? Ultimately, the dashboard that presents your Key Performance Indicators (KPIs) is a software asset.

Let's consider what some tools offer today, along with their advantages (remember, the choice is yours). They all share some common features:

  • Open-source nature
  • Paid or managed services for hosting
  • BI as a code approach, allowing for versioning and testing through standard CI pipelines
  • Compatibility with both DuckDB and MotherDuck

The New Kids in Town

Evidence : SQL + Markdown

Evidence is a lightweight JS framework designed for building data apps using Markdown and SQL. You simply construct your dashboard using existing components, incorporate them using SQL within your markdown, and you're set! The end product is a static website that can be hosted anywhere: Vercel, Netlify, or Evidence Cloud.

Rill : SQL + YAML

Rill, by Rilldata, allows you to create dashboards using only SQL and YAML files. They offer a convenient CLI for running it locally, using a local web UI to draft queries/dashboards, or for deploying on their Cloud. Here's a fun fact: Rill is built using DuckDB.

Streamlit : Python

Streamlit has been in the market for a few years and was acquired by Snowflake in Q1 2022. The primary advantage (and possibly disadvantage) of Streamlit is that it allows you to stay within your Python data workflow and use the same ecosystem to develop your data apps. Hence, you build your data apps with Python and need a Python runtime for hosting.

So how are these quacking ? 🦆

Let's come back to our use case to analyze PyPi statistics on the Python package of duckdb. I won't go into the complete code example, but I'll put some beautiful screenshots and code snippets to grasp how each one works. Feel free to follow along with the source code for the dashboards below.

Evidence

evidence_dashboard.png

Setup

To quickly start an evidence project, you essentially need to copy a JS template. You can do this either by using Node.js and degit package or their container image using the devcontainer feature from VSCode.

According to their documentation, here's how you can do it:

Copy code

npx degit evidence-dev/template my-project cd my-project npm install npm run dev

Project structure & Connection to DuckDB

There are 3 main important part in the projects

Copy code

.evidence // evidence configurations pages/index.md // Where we write .md & SQL evidence.plugins.yaml // configure evidence plugins

Once the local server is running, you can connect to DuckDB through the UI settings page.

evidence_setting.png

For a local DuckDB database, you will just need to provide the path and the extension.

evidence_duckdb-ce2f0e109d901cb779742d3d554044f6.png

For a MotherDuck database, you will need to specify your MotherDuck Token. More information about how to retrieve this token and work in production on our documentation.

Creating Visualizations

Evidence renders markdown files into web pages. When developing, the markdown file /pages/example.md is rendered at localhost:3000/example.

Evidence has a collection of components that you can use for your visualization. You then defined your SQL query attached to this component.

Copy code

<BigValue title='Total download past 2 years' data={total_count} value='download_count' fmt='#,##0.00,,"M"' /> ```total_count SELECT SUM(daily_download_count) AS download_count FROM daily_stats WHERE timestamp_day BETWEEN DATE_TRUNC('month', CURRENT_DATE) AND CURRENT_DATE;

This would display :

Screenshot 2023-12-07 at 10.44.13.png

That sums up how you can create your components. Of course, there's more to it, such as reusing SQL queries, filters, etc.

Deployment

When you generate a static website, you have the flexibility to host it anywhere that supports JS static websites. Evidence also provides its own cloud service to streamline the deployment process from local to production.

Rill

rill_dashboard.png

Setup

Rill provides a Command Line Interface (CLI), essentially a binary (written in Golang) for installation. There's a single command line that executes the installation for you.

Copy code

curl -s <https://cdn.rilldata.com/install.sh> | bash

rill should then be then available in your terminal

Copy code

Usage: rill [command] Available Commands: start Build project and start web app docs Open docs.rilldata.com version Show Rill version upgrade Upgrade Rill to the latest version whoami Show current user org Manage organisations project Manage projects deploy Deploy project to Rill Cloud user Manage users env Manage variables for a project login Authenticate with the Rill API logout Logout of the Rill API help Help about any command Flags: -h, --help Print usage --interactive Prompt for missing required parameters (default true) -v, --version Show rill version Use "rill [command] --help" for more information about a command.

Project structure & Connection to DuckDB

To start a fresh new project, you can do :

Copy code

rill start my-rill-project

When you start a project, it launches a local server. When you browse to the page, you will see the following:

Screenshot 2023-12-07 at 11.25.04.png

If you click on any of these examples, it will generate .yaml and .sql files and there are 3 main folders :

Copy code

├── dashboards │   └── customer_margin_dash.yaml ├── models │   └── metrics_margin_model.sql ├── rill.yaml ├── sources │   └── metrics_margin_monitoring.yaml

In sources, you define any supported sources using YAML. Models contain SQL queries that will be used in your dashboard, and dashboard is where you specify your metrics, in YAML. You can also edit these through the Rill UI.

rill_ui.png

Coming back to our PyPI stats use case, since we have a DuckDB database containing the data, we can proceed as follows:

Copy code

rill start rill/my-rill-project --db rill/data/duckdb_stats.db

To connect to MotherDuck, you would need to export motherduck_token as an environment variable. More information on their documentation here.

There's a small workaround to do : it won't be displayed as a source table in the Rill UI.

However, we can create a model in the models folder as described below or directly use it in the dashboard.

Copy code

select * from duckdb_stats.main.daily_stats

We can now define some metrics in our model_dashboard.yaml

The overarching goal of Rill is to provide you with a tailored dashboard based on the metrics you want to see, rather than offering an endless collection of charts that you have to construct yourself.

Copy code

title: Pypi Download Stats model: model timeseries: timestamp_day measures: - label: Total Downloads expression: sum(daily_download_count) name: total_records description: Total number of records present format_preset: humanize valid_percent_of_total: true dimensions: - name: python_version label: Python_version column: python_version description: "" - name: system_name label: System_name column: system_name description: "" - name: cpu label: Cpu column: cpu description: "" - name: file_version label: File_version column: file_version description: "" - name: country label: Country column: country description: ""

Deployment

Deployment is done through Rill’s Cloud offering. It’s worth to mention that compared to the other solutions, Rill offer users access management out of the box.

Streamlit

streamlit_dashboard.png

Setup

For someone familiar with Python, getting started with Streamlit is quite simple. All you need is a Python environment and common packages like streamlit, duckdb, pandas, and matplotlib for data visualization.

Running the app, as per our example is as simple as :

Copy code

streamlit run app.py

Project structure & Connection to DuckDB/MotherDuck

You have the freedom to structure your Python app as you see fit. However, for beginners, everything can be contained in a single script.

Connecting to DuckDB/MotherDuck is the standard Python way.

For a local connect to our PyPi stats DuckDB database, use the following:

Copy code

import duckdb con = duckdb.connect(database='duckdb_stats.db', read_only=True)

To connect to MotherDuck, you will have to provide your token :

Copy code

import duckdb con = duckdb.connect('md:?motherduck_token=<token>')

Creating Visualizations

Streamlit offers a vast array of components, including interactive features for audio/video or LLMs, among others. We're only scratching the surface here.

The primary strategy involves using a Pandas dataframe and using the built-in charts that Streamlit provides.

Let’s build the Pandas dataframe first :

Copy code

# Query for filtered data query = """ SELECT DATE_TRUNC('month', timestamp_day) AS month, SUM(daily_download_count) AS total_downloads, python_version, cpu FROM duckdb_stats.main.daily_stats WHERE timestamp_day BETWEEN ? AND ? GROUP BY month, python_version, cpu ORDER BY month """ df = con.execute(query, (start_date, end_date)).df()

DuckDB supports natively conversion of results to Pandas dataframe.

And our first chart :

Copy code

# Line Graph of Downloads Over Time st.subheader("Monthly Downloads Over Time") df_monthly = df.groupby('month')['total_downloads'].sum().reset_index() st.line_chart(df_monthly.set_index('month'))
Screenshot 2023-12-07 at 11.41.09.png

Deployment

Streamlit offers a community Cloud where you can deploy your app for free. Since it's a python app, it can work on any python runtime that allows you to expose a web service.

The future of BI

Through the blog, we've explored three different tools that each offer a unique approach to BI-as-code.

You can conduct all tests locally and use Git for version control and CI/CD. Your dashboard can be easily deployed or rolled back, all while embracing software best practices. BI doesn't have to be a tedious click-through expensive UI. It's refreshing to see new perspectives. Even though some of these tools are in their early stages, they show great promise.

Stay tuned for our next blog post, where we'll dive into how to efficiently collect data from PyPI into DuckDB. This will enable you to easily build your own PyPI stats dashboard for your Python project!

Keep coding, keep quacking.

CONTENT
  1. BI-as-code ?
  2. The New Kids in Town
  3. So how are these quacking ? 🦆
  4. Rill
  5. Streamlit
  6. The future of BI

Start using MotherDuck now!

blog subscription icon

Subscribe to motherduck blog