6 min read
Getting Started with DuckDB
This is a summary of a book chapter from DuckDB in Action, published by Manning. Download the complete book for free to read the complete chapter.

DuckDB has risen to prominence in the early 2020s as a powerful, embedded OLAP database. While the database can be used from within almost any programming environment, the quickest way to get familiar with its capabilities is through the Command-Line Interface (CLI). This chapter focuses on installing the tool, mastering its built-in commands, and building efficient data pipelines without leaving the terminal.
2.1 Supported Environments
DuckDB is designed to be highly portable and dependency-free. It is available for a wide range of programming languages and operating systems, supporting both Intel/AMD and ARM architectures.
At the time of writing, DuckDB supports:
- Operating Systems: Linux, Windows, and macOS.
- Languages & APIs: Python, R, Java, JavaScript, Go, Rust, Node.js, Julia, C/C++, ODBC, JDBC, WASM, and Swift.
While DuckDB is deeply integrated into these languages, this chapter focuses exclusively on the Command Line Interface (CLI), which allows users to execute SQL interactively or as part of shell scripts.
2.2 Installing the DuckDB CLI
The DuckDB CLI is an "embedded database" in the truest sense. There is no server to install, no background daemon to configure, and no client-server connection to manage. The installation is a "copy-to" installation—it consists of a single binary executable named duckdb. You can find the full list of packages on the installation page.
2.2.1 macOS
For macOS users, the official recommendation is to use the Homebrew package manager, which handles updates automatically.
Copy code
brew install duckdb
2.2.2 Linux and Windows
For Linux and Windows, the CLI is distributed as a zip file via GitHub Releases. The process involves:
- Visiting the DuckDB GitHub releases page.
- Downloading the package appropriate for your architecture (e.g.,
duckdb_cli-linux-amd64.zip). - Unzipping the file.
- Running the binary directly.
Example for Linux (AMD64):
Copy code
wget https://github.com/duckdb/duckdb/releases/download/v1.4.3/duckdb_cli-linux-amd64.zip unzip duckdb_cli-linux-amd64.zip ./duckdb -version
2.3 Using the DuckDB CLI
Launching the DuckDB CLI is incredibly simple, requiring just the execution of the duckdb command. By default, the database operates in transient mode, storing all data in memory, which is lost upon exiting the CLI using .quit or .exit.

For persistent storage, you can launch the tool with a filename (e.g., duckdb my_data.db) or use the .open command within the interface.
2.3.1 SQL statements
The CLI efficiently handles standard SQL statements. To execute a command, you must end it with a semicolon (;) and a newline. The interface supports multi-line input, allowing you to paste complex queries comfortably.
2.3.2 Dot commands
In addition to SQL, the CLI supports "Dot Commands"—special internal instructions identified by a leading period (.). These do not require a semicolon.
.help: Displays a list of available commands..tables: Lists all available tables and views..mode: Changes how results are displayed (e.g.,duckbox,line,json)..timer on/off: Toggles the display of execution time for queries..read: Reads and executes SQL from an external file..open: Closes the current database and opens a new file.
Pro Tip: The .duckdbrc Config File
You can configure the CLI to run specific commands every time it launches by creating a .duckdbrc file in your home directory. This is perfect for setting a custom prompt or enabling timers by default.
2.3.3 CLI arguments
The CLI is also a powerful tool for automation. You can pass flags to control behavior without entering interactive mode:
-json: Outputs results as JSON strings.-readonly: Opens the database in read-only mode.-cor-s: Runs a specific command and exits immediately.
For example, to output a query directly to JSON:
Copy code
duckdb --json -c 'SELECT * FROM my_table'
2.4 DuckDB’s extension system
To keep the core binary lightweight, DuckDB utilizes a modular extension system. Extensions are packages that add functionality—such as support for specific file formats (Parquet, Excel, JSON), geospatial data, or full-text search—only when you need them.
You can inspect the current state of your extensions using the duckdb_extensions() function:
Copy code
SELECT extension_name, loaded, installed
FROM duckdb_extensions();
Installing and Loading
The lifecycle of an extension involves two steps:
INSTALL extension_name;– Downloads the extension to your local environment.LOAD extension_name;– Activates the extension for the current session.
Note: Since version 0.8, DuckDB often autoloads known extensions (like httpfs or parquet) if it detects they are needed by your query.
2.5 Analyzing a CSV file with the DuckDB CLI
DuckDB excels at "Data Engineering in a Box." It can query remote files directly from HTTP servers, S3, Google Cloud Storage, or HDFS without a manual download step. We can demonstrate this using a dataset containing the total population figures for several countries (https://mng.bz/KZKZ).
Direct Remote Querying
Using the httpfs extension, you can query a remote CSV file simply by providing its URL. If the URL ends in a known extension like .csv, DuckDB infers the format automatically. For URLs without extensions (like shortened links), use read_csv_auto:
Copy code
SELECT count(*)
FROM read_csv_auto('https://bit.ly/3KoiZR0');
2.5.1 Result modes
When exploring datasets with many columns, the default table view (duckbox) might truncate data. This chapter recommends switching to Line Mode for initial exploration. In this mode, every column is printed on its own line.
Copy code
.mode line
SELECT * FROM read_csv_auto('...') LIMIT 1;
Other useful modes include json, csv, list, and noheader.
Building Data Pipelines
The CLI can function as a robust step in a data pipeline. You can read data from a remote source, apply SQL transformations, and write the result to a local file in a single command.
Writing to CSV:
Copy code
duckdb -csv -s "SELECT * FROM remote_file WHERE region='Europe'" > western_europe.csv
Writing to Parquet:
DuckDB can also export directly to the compressed Parquet format using the COPY ... TO syntax:
Copy code
duckdb -s "COPY (SELECT * FROM remote_file) TO 'output.parquet' (FORMAT PARQUET)"
2.6 Summary
- Availability: DuckDB is available as a library for Python, R, Java, C++, and many others, but the CLI is a standalone, dependency-free entry point.
- CLI Control: The CLI uses dot commands (
.mode,.timer,.read) to control the environment and output formats. - Display Modes: You can visualize data in various formats, including
duckbox(tables) andline(row-oriented). - Remote Data: The
httpfsextension allows direct SQL querying of files hosted on HTTP servers or S3. - Pipelines: The CLI can act as a processing engine, querying external datasets and writing results to standard output or files (CSV, Parquet) without creating persistent tables.


