Chapter 2

6 min read

Getting Started with DuckDB

This is a summary of a book chapter from DuckDB in Action, published by Manning. Download the complete book for free to read the complete chapter.

'DuckDB In Action' book cover

DuckDB has risen to prominence in the early 2020s as a powerful, embedded OLAP database. While the database can be used from within almost any programming environment, the quickest way to get familiar with its capabilities is through the Command-Line Interface (CLI). This chapter focuses on installing the tool, mastering its built-in commands, and building efficient data pipelines without leaving the terminal.

2.1 Supported Environments

DuckDB is designed to be highly portable and dependency-free. It is available for a wide range of programming languages and operating systems, supporting both Intel/AMD and ARM architectures.

At the time of writing, DuckDB supports:

  • Operating Systems: Linux, Windows, and macOS.
  • Languages & APIs: Python, R, Java, JavaScript, Go, Rust, Node.js, Julia, C/C++, ODBC, JDBC, WASM, and Swift.

While DuckDB is deeply integrated into these languages, this chapter focuses exclusively on the Command Line Interface (CLI), which allows users to execute SQL interactively or as part of shell scripts.

2.2 Installing the DuckDB CLI

The DuckDB CLI is an "embedded database" in the truest sense. There is no server to install, no background daemon to configure, and no client-server connection to manage. The installation is a "copy-to" installation—it consists of a single binary executable named duckdb. You can find the full list of packages on the installation page.

2.2.1 macOS

For macOS users, the official recommendation is to use the Homebrew package manager, which handles updates automatically.

Copy code

brew install duckdb

2.2.2 Linux and Windows

For Linux and Windows, the CLI is distributed as a zip file via GitHub Releases. The process involves:

  1. Visiting the DuckDB GitHub releases page.
  2. Downloading the package appropriate for your architecture (e.g., duckdb_cli-linux-amd64.zip).
  3. Unzipping the file.
  4. Running the binary directly.

Example for Linux (AMD64):

Copy code

wget https://github.com/duckdb/duckdb/releases/download/v1.4.3/duckdb_cli-linux-amd64.zip unzip duckdb_cli-linux-amd64.zip ./duckdb -version

2.3 Using the DuckDB CLI

Launching the DuckDB CLI is incredibly simple, requiring just the execution of the duckdb command. By default, the database operates in transient mode, storing all data in memory, which is lost upon exiting the CLI using .quit or .exit.

DuckDB CLI interface running SQL queries on a terminal

For persistent storage, you can launch the tool with a filename (e.g., duckdb my_data.db) or use the .open command within the interface.

2.3.1 SQL statements

The CLI efficiently handles standard SQL statements. To execute a command, you must end it with a semicolon (;) and a newline. The interface supports multi-line input, allowing you to paste complex queries comfortably.

2.3.2 Dot commands

In addition to SQL, the CLI supports "Dot Commands"—special internal instructions identified by a leading period (.). These do not require a semicolon.

  • .help: Displays a list of available commands.
  • .tables: Lists all available tables and views.
  • .mode: Changes how results are displayed (e.g., duckbox, line, json).
  • .timer on/off: Toggles the display of execution time for queries.
  • .read: Reads and executes SQL from an external file.
  • .open: Closes the current database and opens a new file.

Pro Tip: The .duckdbrc Config File You can configure the CLI to run specific commands every time it launches by creating a .duckdbrc file in your home directory. This is perfect for setting a custom prompt or enabling timers by default.

2.3.3 CLI arguments

The CLI is also a powerful tool for automation. You can pass flags to control behavior without entering interactive mode:

  • -json: Outputs results as JSON strings.
  • -readonly: Opens the database in read-only mode.
  • -c or -s: Runs a specific command and exits immediately.

For example, to output a query directly to JSON:

Copy code

duckdb --json -c 'SELECT * FROM my_table'

2.4 DuckDB’s extension system

To keep the core binary lightweight, DuckDB utilizes a modular extension system. Extensions are packages that add functionality—such as support for specific file formats (Parquet, Excel, JSON), geospatial data, or full-text search—only when you need them.

You can inspect the current state of your extensions using the duckdb_extensions() function:

Copy code

SELECT extension_name, loaded, installed FROM duckdb_extensions();

Installing and Loading

The lifecycle of an extension involves two steps:

  1. INSTALL extension_name; – Downloads the extension to your local environment.
  2. LOAD extension_name; – Activates the extension for the current session.

Note: Since version 0.8, DuckDB often autoloads known extensions (like httpfs or parquet) if it detects they are needed by your query.

2.5 Analyzing a CSV file with the DuckDB CLI

DuckDB excels at "Data Engineering in a Box." It can query remote files directly from HTTP servers, S3, Google Cloud Storage, or HDFS without a manual download step. We can demonstrate this using a dataset containing the total population figures for several countries (https://mng.bz/KZKZ).

Direct Remote Querying

Using the httpfs extension, you can query a remote CSV file simply by providing its URL. If the URL ends in a known extension like .csv, DuckDB infers the format automatically. For URLs without extensions (like shortened links), use read_csv_auto:

Copy code

SELECT count(*) FROM read_csv_auto('https://bit.ly/3KoiZR0');

2.5.1 Result modes

When exploring datasets with many columns, the default table view (duckbox) might truncate data. This chapter recommends switching to Line Mode for initial exploration. In this mode, every column is printed on its own line.

Copy code

.mode line SELECT * FROM read_csv_auto('...') LIMIT 1;

Other useful modes include json, csv, list, and noheader.

Building Data Pipelines

The CLI can function as a robust step in a data pipeline. You can read data from a remote source, apply SQL transformations, and write the result to a local file in a single command.

Writing to CSV:

Copy code

duckdb -csv -s "SELECT * FROM remote_file WHERE region='Europe'" > western_europe.csv

Writing to Parquet: DuckDB can also export directly to the compressed Parquet format using the COPY ... TO syntax:

Copy code

duckdb -s "COPY (SELECT * FROM remote_file) TO 'output.parquet' (FORMAT PARQUET)"

2.6 Summary

  • Availability: DuckDB is available as a library for Python, R, Java, C++, and many others, but the CLI is a standalone, dependency-free entry point.
  • CLI Control: The CLI uses dot commands (.mode, .timer, .read) to control the environment and output formats.
  • Display Modes: You can visualize data in various formats, including duckbox (tables) and line (row-oriented).
  • Remote Data: The httpfs extension allows direct SQL querying of files hosted on HTTP servers or S3.
  • Pipelines: The CLI can act as a processing engine, querying external datasets and writing results to standard output or files (CSV, Parquet) without creating persistent tables.
'DuckDB In Action' book cover