Building data-driven components and applications doesn't have to be so ducking hard

Free "DuckDB in Action" early access book

Mobile header Desktop header

MotherDuck is pleased to offer this free early access PDF of the Manning "DuckDB in Action" book by Mark Needham, Michael Hunger and Michael Simons. The authors will be adding new chapters over time, which will be sent to you for free.

"DuckDB in Action" includes

  • Chapter 1: An introduction to DuckDB
    • Why DuckDB, a single node in-memory database, emerged in the era of big data
    • DuckDB’s capabilities
    • How DuckDB works and fits into your data pipeline
  • Chapter 2: Getting started with DuckDB
    • Installing and learning how to use the DuckDB CLI
    • Executing commands in the DuckDB CLI
    • Querying remote files
  • Chapter 3: Executing SQL queries
    • The different categories of SQL statements and their fundamental structure
    • Creating tables and structures for ingesting a real world dataset
    • Laying the fundamentals for analyzing a huge dataset in detail
    • Exploring DuckDB-specific extensions to SQL
  • Chapter 4: Advanced aggregation and analysis of data
    • Preparing, cleaning and aggregating data while ingesting
    • Using window functions to create new aggregates over different partitions of any dataset
    • Understanding the different types of sub-queries
    • Using Common Table Expressions (CTEs)
    • Applying filters to any aggregate
  • Chapter 5: Exploring data without persistence
    • Converting CSV files to Parquet
    • Auto inferring file type and data schema
    • Creating views to simplify the querying of nested JSON documents
    • Exploring the metadata of Parquet files
    • Querying other databases like SQLite
  • Chapter 6: Integrating with the Python ecosystem
    • The differences between DuckDB’s implementation of Python DB-API 2.0 and the DuckDB relational API
    • Ingesting data from pandas DataFrames, Apache Arrow Tables and more via the Python API
    • Querying pandas DataFrames with DuckDB methods
    • Exporting data to various DataFrames formats and Apache Arrow Tables
    • Using DuckDB’s relational API to compose queries
  • Chapter 7: DuckDB in the Cloud with MotherDuck
    • The idea behind MotherDuck
    • Understanding how the architecture works under the hood
    • Use cases for serverless SQL analytics
    • Creating, managing, and sharing MotherDuck databases
    • Tips for optimizing your MotherDuck usage
  • Chapter 8: Building data pipelines with DuckDB
    • The meaning and relevance of data pipelines
    • What roles DuckDB can have as part of a pipeline
    • How DuckDB integrates with tools like the Python based data load tool (dlt) for ingestion and the data build tool (dbt) from dbt Labs for transformation
    • Orchestrating pipelines with Dagster