Free "DuckDB in Action" Book

"DuckDB in Action" includes

MotherDuck is pleased to offer this free PDF of the Manning “DuckDB in Action” book by Mark Needham, Michael Hunger and Michael Simons. This is the complete text. The first few chapters should help you understand the basics of DuckDB and getting started with SQL. The book also covers advanced data analytics with SQL, building data engineering pipelines with DuckDB, DuckDB's Python APIs and integration with DataFrames, building data apps with Web Assembly (Wasm) and more.

Sneak peek into the chapters

In this book, we are honored to include a foreword by the co-creators of DuckDB, Hannes and Mark. Their innovative work in the field of data management has been groundbreaking, and their insights provide invaluable context for the discussions within these pages.

Don’t miss

Chapter 7 on MotherDuck

Chapter 1

An introduction to DuckDB

Why DuckDB, a single node in-memory database, emerged in the era of big data
DuckDB’s capabilities
How DuckDB works and fits into your data pipeline

SUMMARY

Chapter 2

Getting started with DuckDB

Installing and learning how to use the DuckDB CLI
Executing commands in the DuckDB CLI
Querying remote files

SUMMARY

Chapter 3

Executing SQL queries

The different categories of SQL statements and their fundamental structure
Creating tables and structures for ingesting a real world dataset
Laying the fundamentals for analyzing a huge dataset in detail
Exploring DuckDB-specific extensions to SQL

SUMMARY

Chapter 4

Advanced aggregation and analysis of data

Preparing, cleaning and aggregating data while ingesting
Using window functions to create new aggregates over different partitions of any dataset
Understanding the different types of sub-queries
Using Common Table Expressions (CTEs)
Applying filters to any aggregate

SUMMARY

Chapter 5

Exploring data without persistence

Converting CSV files to Parquet
Auto inferring file type and data schema
Creating views to simplify the querying of nested JSON documents
Exploring the metadata of Parquet files
Querying other databases like SQLite

SUMMARY

Chapter 6

Integrating with the Python ecosystem

The differences between DuckDB’s implementation of Python DB-API 2.0 and the DuckDB relational API
Ingesting Data via Python API
Querying pandas DataFrames with DuckDB methods
Exporting Data to Different Formats
Using DuckDB’s relational API to compose queries

SUMMARY

Chapter 7

Chapter Preview

DuckDB in the Cloud with MotherDuck

The idea behind MotherDuck
Understanding how the architecture works under the hood
Use cases for serverless SQL analytics
Creating, managing, and sharing MotherDuck databases
Tips for optimizing your MotherDuck usage

READ THE CHAPTER

SUMMARY

Chapter 8

Building data pipelines with DuckDB

The meaning and relevance of data pipelines
What roles DuckDB can have as part of a pipeline
How DuckDB integrates with tools like the Python based data load tool (dlt) for ingestion and the data build tool (dbt) from dbt Labs for transformation
Orchestrating pipelines with Dagster

SUMMARY

Chapter 9

Building and Deploying Data Apps

Building an interactive web application with Streamlit
Deploying Streamlit applications with Streamlit Community Cloud
Rendering interactive charts with Plot.ly
Creating a dashboard for Business Intelligence (BI) with Apache Superset
Creating charts from a custom SQL query with Apache Superset

SUMMARY

Chapter 10

Performance considerations for large datasets

Preparing large volumes of data to be imported into DuckDB
Querying metadata and running exploratory data analysis (EDA) queries on the large datasets
Exporting full databases concurrently to Parquet
Using aggregations to speed up statistical analysis
Using EXPLAIN and EXPLAIN ANALYZE to understand query plans

SUMMARY

Chapter 11

Conclusion

SUMMARY