Final days: Grab your Small Data SF Ticket for workshops and technical talks on 9/23 + 9/24!small data, big fomo 🚀

Free "DuckDB in Action" book

Desktop header

MotherDuck is pleased to offer this free PDF of the Manning "DuckDB in Action" book by Mark Needham, Michael Hunger and Michael Simons. From data ingestion to advanced data pipelines, you’ll learn everything you need to get the most out of DuckDB—all through hands-on examples.

"DuckDB in Action" includes

  • Chapter 1: An introduction to DuckDB (summary)
    • Why DuckDB, a single node in-memory database, emerged in the era of big data
    • DuckDB’s capabilities
    • How DuckDB works and fits into your data pipeline
  • Chapter 2: Getting started with DuckDB (summary)
    • Installing and learning how to use the DuckDB CLI
    • Executing commands in the DuckDB CLI
    • Querying remote files
  • Chapter 3: Executing SQL queries (summary)
    • The different categories of SQL statements and their fundamental structure
    • Creating tables and structures for ingesting a real world dataset
    • Laying the fundamentals for analyzing a huge dataset in detail
    • Exploring DuckDB-specific extensions to SQL
  • Chapter 4: Advanced aggregation and analysis of data (summary)
    • Preparing, cleaning and aggregating data while ingesting
    • Using window functions to create new aggregates over different partitions of any dataset
    • Understanding the different types of sub-queries
    • Using Common Table Expressions (CTEs)
    • Applying filters to any aggregate
  • Chapter 5: Exploring data without persistence (summary)
    • Converting CSV files to Parquet
    • Auto inferring file type and data schema
    • Creating views to simplify the querying of nested JSON documents
    • Exploring the metadata of Parquet files
    • Querying other databases like SQLite
  • Chapter 6: Integrating with the Python ecosystem (summary)
    • The differences between DuckDB’s implementation of Python DB-API 2.0 and the DuckDB relational API
    • Ingesting data from pandas DataFrames, Apache Arrow Tables and more via the Python API
    • Querying pandas DataFrames with DuckDB methods
    • Exporting data to various DataFrames formats and Apache Arrow Tables
    • Using DuckDB’s relational API to compose queries
  • Chapter 7: DuckDB in the Cloud with MotherDuck (summary)
    • The idea behind MotherDuck
    • Understanding how the architecture works under the hood
    • Use cases for serverless SQL analytics
    • Creating, managing, and sharing MotherDuck databases
    • Tips for optimizing your MotherDuck usage
  • Chapter 8: Building data pipelines with DuckDB (summary)
    • The meaning and relevance of data pipelines
    • What roles DuckDB can have as part of a pipeline
    • How DuckDB integrates with tools like the Python based data load tool (dlt) for ingestion and the data build tool (dbt) from dbt Labs for transformation
    • Orchestrating pipelines with Dagster
  • Chapter 9: Building and Deploying Data Apps (summary)
    • Building an interactive web application with Streamlit
    • Deploying Streamlit applications with Streamlit Community Cloud
    • Rendering interactive charts with Plot.ly
    • Creating a dashboard for Business Intelligence (BI) with Apache Superset
    • Creating charts from a custom SQL query with Apache Superset
  • Chapter 10: Performance considerations for large datasets (summary)
    • Preparing large volumes of data to be imported into DuckDB
    • Querying metadata and running exploratory data analysis (EDA) queries on the large datasets
    • Exporting full databases concurrently to Parquet
    • Using aggregations on multiple columns to speed up statistical analysis
    • Using EXPLAIN and EXPLAIN ANALYZE to understand query plans
  • Chapter 11: Conclusion (summary)