Save the Date: Join Us At Small Data SF in September to Build Bigger with Small Data and AIGet Tickets

Free "DuckDB in Action" Early Access Book

"DuckDB in Action" includes

MotherDuck is pleased to offer this free early access PDF of the Manning “DuckDB in Action” book by Mark Needham, Michael Hunger and Michael Simons. The first few chapters should help you understand the basics of DuckDB and the authors will be adding new chapters over time [on topics such as Python integration, WASM, data pipelines, MotherDuck], which will be sent to you for free.

DuckDB book cover

Get your free book!

Sneak peek into the chapters

In this book, we are honored to include a foreword by the co-creators of DuckDB, Hannes and Mark. Their innovative work in the field of data management has been groundbreaking, and their insights provide invaluable context for the discussions within these pages.

Chapter 1

An introduction to DuckDB

  • Why DuckDB, a single node in-memory database, emerged in the era of big data
  • DuckDB’s capabilities
  • How DuckDB works and fits into your data pipeline
Chapter 2

Getting started with DuckDB

  • Installing and learning how to use the DuckDB CLI
  • Executing commands in the DuckDB CLI
  • Querying remote files
Chapter 3

Executing SQL queries

  • The different categories of SQL statements and their fundamental structure
  • Creating tables and structures for ingesting a real world dataset
  • Laying the fundamentals for analyzing a huge dataset in detail
  • Exploring DuckDB-specific extensions to SQL
Chapter 4

Advanced aggregation and analysis of data

  • Preparing, cleaning and aggregating data while ingesting
  • Using window functions to create new aggregates over different partitions of any dataset
  • Understanding the different types of sub-queries
  • Using Common Table Expressions (CTEs)
  • Applying filters to any aggregate
Chapter 5

Exploring data without persistence

  • Converting CSV files to Parquet
  • Auto inferring file type and data schema
  • Creating views to simplify the querying of nested JSON documents
  • Exploring the metadata of Parquet files
  • Querying other databases like SQLite
Chapter 6

Integrating with the Python ecosystem

  • The differences between DuckDB’s implementation of Python DB-API 2.0 and the DuckDB relational API
  • Ingesting Data via Python API
  • Querying pandas DataFrames with DuckDB methods
  • Exporting Data to Different Formats
  • Using DuckDB’s relational API to compose queries
Chapter 7
Chapter Preview

DuckDB in the Cloud with MotherDuck

  • The idea behind MotherDuck
  • Understanding how the architecture works under the hood
  • Use cases for serverless SQL analytics
  • Creating, managing, and sharing MotherDuck databases
  • Tips for optimizing your MotherDuck usage
Chapter 8

Building data pipelines with DuckDB

  • The meaning and relevance of data pipelines
  • What roles DuckDB can have as part of a pipeline
  • How DuckDB integrates with tools like the Python based data load tool (dlt) for ingestion and the data build tool (dbt) from dbt Labs for transformation
  • Orchestrating pipelines with Dagster
Chapter 9

Building and Deploying Data Apps

  • Building an interactive web application with Streamlit
  • Deploying Streamlit applications with Streamlit Community Cloud
  • Rendering interactive charts with Plot.ly
  • Creating a dashboard for Business Intelligence (BI) with Apache Superset
  • Creating charts from a custom SQL query with Apache Superset
Chapter 10

Performance considerations for large datasets

  • Preparing large volumes of data to be imported into DuckDB
  • Querying metadata and running exploratory data analysis (EDA) queries on the large datasets
  • Exporting full databases concurrently to Parquet
  • Using aggregations to speed up statistical analysis
  • Using EXPLAIN and EXPLAIN ANALYZE to understand query plans
Chapter 11

Conclusion