Free "DuckDB in Action" Book
"DuckDB in Action" includes
MotherDuck is pleased to offer this free PDF of the Manning “DuckDB in Action” book by Mark Needham, Michael Hunger and Michael Simons. This is the complete text. The first few chapters should help you understand the basics of DuckDB and getting started with SQL. The book also covers advanced data analytics with SQL, building data engineering pipelines with DuckDB, DuckDB's Python APIs and integration with DataFrames, building data apps with Web Assembly (Wasm) and more.
Sneak peek into the chapters
In this book, we are honored to include a foreword by the co-creators of DuckDB, Hannes and Mark. Their innovative work in the field of data management has been groundbreaking, and their insights provide invaluable context for the discussions within these pages.
Don’t miss
Chapter 7 on MotherDuckAn introduction to DuckDB
- Why DuckDB, a single node in-memory database, emerged in the era of big data
- DuckDB’s capabilities
- How DuckDB works and fits into your data pipeline
Getting started with DuckDB
- Installing and learning how to use the DuckDB CLI
- Executing commands in the DuckDB CLI
- Querying remote files
Executing SQL queries
- The different categories of SQL statements and their fundamental structure
- Creating tables and structures for ingesting a real world dataset
- Laying the fundamentals for analyzing a huge dataset in detail
- Exploring DuckDB-specific extensions to SQL
Advanced aggregation and analysis of data
- Preparing, cleaning and aggregating data while ingesting
- Using window functions to create new aggregates over different partitions of any dataset
- Understanding the different types of sub-queries
- Using Common Table Expressions (CTEs)
- Applying filters to any aggregate
Exploring data without persistence
- Converting CSV files to Parquet
- Auto inferring file type and data schema
- Creating views to simplify the querying of nested JSON documents
- Exploring the metadata of Parquet files
- Querying other databases like SQLite
Integrating with the Python ecosystem
- The differences between DuckDB’s implementation of Python DB-API 2.0 and the DuckDB relational API
- Ingesting Data via Python API
- Querying pandas DataFrames with DuckDB methods
- Exporting Data to Different Formats
- Using DuckDB’s relational API to compose queries
DuckDB in the Cloud with MotherDuck
- The idea behind MotherDuck
- Understanding how the architecture works under the hood
- Use cases for serverless SQL analytics
- Creating, managing, and sharing MotherDuck databases
- Tips for optimizing your MotherDuck usage
Building data pipelines with DuckDB
- The meaning and relevance of data pipelines
- What roles DuckDB can have as part of a pipeline
- How DuckDB integrates with tools like the Python based data load tool (dlt) for ingestion and the data build tool (dbt) from dbt Labs for transformation
- Orchestrating pipelines with Dagster
Building and Deploying Data Apps
- Building an interactive web application with Streamlit
- Deploying Streamlit applications with Streamlit Community Cloud
- Rendering interactive charts with Plot.ly
- Creating a dashboard for Business Intelligence (BI) with Apache Superset
- Creating charts from a custom SQL query with Apache Superset
Performance considerations for large datasets
- Preparing large volumes of data to be imported into DuckDB
- Querying metadata and running exploratory data analysis (EDA) queries on the large datasets
- Exporting full databases concurrently to Parquet
- Using aggregations to speed up statistical analysis
- Using EXPLAIN and EXPLAIN ANALYZE to understand query plans