Chapter 9

6 min read

Building and Deploying Data Apps

This is a summary of a book chapter from DuckDB in Action, published by Manning. Download the complete book for free to read the complete chapter.

'DuckDB In Action' book cover

9.1 Building a custom data app with Streamlit

Streamlit is a popular open-source Python library that allows data scientists and engineers to build interactive web applications without needing knowledge of HTML, CSS, or JavaScript. It is particularly well-suited for DuckDB because both tools run efficiently within a local Python environment.

9.1.1 What is Streamlit?

Streamlit is a "code-first" tool designed for rapid prototyping. Unlike drag-and-drop BI tools, Streamlit applications are Python scripts. This allows developers to leverage their existing knowledge of libraries like pandas, scikit-learn, and DuckDB to build frontends for data pipelines and machine learning models.

9.1.2 Building our app

The foundation of a DuckDB-powered Streamlit app is the database connection. The book emphasizes a critical setup step: connecting to the atp.duckdb database (which can be downloaded from the book’s repository in read-only mode (read_only=True).

  • Embedded Architecture: Because DuckDB runs in-process, there is no need for complex client-server authentication.
  • Security: The chapter demonstrates how to handle user inputs securely. When building a search function (e.g., searching for tennis players), the app uses prepared statements (parameterized queries) to prevent SQL injection attacks, ensuring the app remains secure even when accepting raw user text.

9.1.3 Using Streamlit components

Streamlit uses a component-based architecture to handle interactivity. The text highlights the use of the streamlit-searchbox library to create dynamic input fields with autocomplete features.

  • Layouts: Using st.columns, developers can create responsive layouts (e.g., placing two player search boxes side-by-side).
  • Data Rendering: DuckDB query results are fetched as Pandas DataFrames (.fetchdf()) and rendered immediately as interactive tables using st.dataframe, allowing users to sort and explore the data natively.

9.1.4 Visualizing data using plot.ly

For visualization beyond standard tables, the chapter integrates Plotly.

  • Data Prep with SQL: The book shows how to use DuckDB’s SQL capabilities—specifically the strftime function—to pre-process and sort date fields before they reach the visualization layer.
  • Interactive Charts: The example builds a scatterplot timeline of matches between players. It demonstrates how to customize Plotly charts (e.g., adding vertical grid lines for years using fig.add_shape) to make historical trends clearer, all rendered within the Streamlit interface using st.plotly_chart.

9.1.5 Deploying our app on the Community Cloud

Deployment is often a hurdle for data apps, but Streamlit simplifies this via the Streamlit Community Cloud.

  • Source Code: The complete source code for the app built in this chapter is available in the mneedham/atp-head-to-head GitHub repository.
  • Live Demo: You can view the deployed application running on the Community Cloud at atp-head-to-head.streamlit.app.
  • Git Integration: By pushing the code to a GitHub repository, users can trigger an automatic deployment. The Community Cloud hosts the app and provides a public URL, making the local DuckDB analysis instantly accessible to the world.

9.2 Building a BI dashboard with Apache Superset

For scenarios requiring standardized reporting rather than custom application logic, Apache Superset provides a robust, enterprise-ready BI platform. It supports a vast array of visualizations and integrates smoothly with DuckDB via SQLAlchemy.

9.2.1 What is Apache Superset?

Apache Superset is a data exploration and visualization platform known for its ability to handle large datasets.

  • Installation: The chapter details the setup of Superset locally using pip following the official PyPI instructions.
  • Drivers: To enable communication between Superset and DuckDB, the duckdb-engine (DuckDB's SQLAlchemy driver) must be installed.
  • Setup: Key initialization steps include setting a SUPERSET_SECRET_KEY, initializing the database (superset db upgrade), and creating an admin user.
  • Hosted Options: For those who prefer a managed service over self-hosting, the book mentions Preset.

9.2.2 Superset’s workflow

Understanding Superset requires learning its specific hierarchy of data objects. The book outlines the workflow for creating dashboards:

  1. Database: The connection to the DuckDB file.
  2. Dataset: A representation of a table or a saved SQL query.
  3. Chart: A visualization configured from a dataset.
  4. Dashboard: A collection of charts arranged for presentation.

9.2.3 Creating our first dashboard

The chapter walks through connecting Superset to a local DuckDB file using the connection string duckdb:///atp.duckdb.

  • Dataset Configuration: Users select tables (like the matches table) to create datasets.
  • Visualizing: The text demonstrates creating a "Bar Chart" to analyze the number of matches per year, configuring the X-Axis (tourney_date) and Metrics (COUNT(*)).

9.2.4 Creating a dataset from an SQL query

Superset is not limited to raw tables. The SQL Lab feature allows users to write complex custom SQL queries and save them as virtual datasets.

  • Advanced Analytics: The book provides an example of using Window Functions in DuckDB (COUNT(*) OVER ...) to calculate cumulative statistics (e.g., Grand Slam winners aged 30 and over).
  • Custom Metrics: These SQL-defined datasets act just like physical tables, allowing for the creation of advanced visualizations like "Big Number with Trendline" charts based on calculated metrics.

9.2.5 Exporting and importing dashboards

To support collaboration and version control, Superset offers export functionality.

  • YAML Configuration: Dashboards, charts, and datasets can be exported as ZIP files containing YAML definitions.
  • Reusability: This feature allows developers to move dashboards between environments. The example dashboard ZIP file can be found in the ch09 folder of the DuckDB in Action examples repository.

Summary

In this chapter, we explored the spectrum of options available for building user interfaces on top of DuckDB. Whether you require a highly customized application or a standard business intelligence dashboard, DuckDB’s flexible architecture supports both workflows efficiently.

  • Streamlit as a Code-First Solution: Streamlit offers a powerful low-code environment for Python developers. It allows for the rapid creation of web applications by leveraging reusable components. Its strength lies in its ability to solve recurrent frontend tasks without requiring web development expertise.
  • Deep Python Integration: Streamlit connects natively with DuckDB’s Python ecosystem. Developers can interact with the database using the standard DB-API 2.0, the Relational API, or by exchanging data directly as Pandas or Polars DataFrames.
  • Custom Logic vs. Declarative Design: Unlike purely declarative BI tools, Streamlit allows you to write custom Python logic. This enables complex data transformations and interactivity that go beyond standard reporting.
  • Interactive Visualization with Plotly: The chapter demonstrated how Plotly complements Streamlit by providing a similar low-code approach to generating high-quality, interactive visualizations that render seamlessly within the app.
  • Apache Superset for Enterprise BI: On the other end of the spectrum, Apache Superset provides a comprehensive no-code, drag-and-drop environment. It is ideal for creating visually appealing dashboards where the primary goal is data exploration rather than custom application logic.
  • SQL-Centric Configuration: While Superset handles the visualization logic, DuckDB remains the analytical engine. The only code typically required in a Superset workflow is custom SQL queries used to shape datasets and feed specific visualizations.
'DuckDB In Action' book cover