8 min read
Getting Started with DuckDB
INFOThis is a summary of a book chapter from DuckDB in Action, published by Manning. Download the complete book for free to read the complete chapter.
DuckDB has risen to prominence in the early 2020s as a powerful, embedded OLAP database. While the database can be used from within almost any programming environment, the quickest way to get familiar with its capabilities is through the Command-Line Interface (CLI). This chapter focuses on installing the tool, mastering its built-in commands, and building efficient data pipelines without leaving the terminal.
2.1 Supported Environments
DuckDB is designed to be highly portable and dependency-free. It is available for a wide range of programming languages and operating systems, supporting both Intel/AMD and ARM architectures.
DuckDB Language & Platform Support
| Category | Supported Options |
|---|---|
| Operating Systems | Linux, Windows, macOS (Intel & ARM) |
| Languages | Python, R, Java, JavaScript, Go, Rust, Node.js, Julia, C/C++, Swift |
| Database Interfaces | ODBC, JDBC, WASM |
While DuckDB is deeply integrated into these languages, this chapter focuses exclusively on the Command Line Interface (CLI), which allows users to execute SQL interactively or as part of shell scripts.
2.2 Installing the DuckDB CLI
The DuckDB CLI is an "embedded database" in the truest sense. There is no server to install, no background daemon to configure, and no client-server connection to manage. The installation is a "copy-to" installation—it consists of a single binary executable named duckdb. You can find the full list of packages on the installation page.
DuckDB CLI vs Other Database CLIs
| Feature | DuckDB CLI | sqlite3 | psql (PostgreSQL) | mysql CLI |
|---|---|---|---|---|
| Server Required | No | No | Yes | Yes |
| Installation | Single binary | Single binary | Requires server | Requires server |
| Query Remote Files | Yes (HTTP, S3, GCS) | No | No | No |
| In-Process Analytics | Yes | Limited | No | No |
| Direct Parquet/CSV | Yes | No | No | No |
| Memory Mode | Default | Optional | N/A | N/A |
2.2.1 macOS
For macOS users, the official recommendation is to use the Homebrew package manager, which handles updates automatically.
Copy code
brew install duckdb
2.2.2 Linux and Windows
For Linux and Windows, the CLI is distributed as a zip file via GitHub Releases. The process involves:
- Visiting the DuckDB GitHub releases page.
- Downloading the package appropriate for your architecture (e.g.,
duckdb_cli-linux-amd64.zip). - Unzipping the file.
- Running the binary directly.
Example for Linux (AMD64):
Copy code
wget https://github.com/duckdb/duckdb/releases/download/v1.4.3/duckdb_cli-linux-amd64.zip unzip duckdb_cli-linux-amd64.zip ./duckdb -version
2.3 Using the DuckDB CLI
Launching the DuckDB CLI is incredibly simple, requiring just the execution of the duckdb command. By default, the database operates in transient mode, storing all data in memory, which is lost upon exiting the CLI using .quit or .exit.

For persistent storage, you can launch the tool with a filename (e.g., duckdb my_data.db) or use the .open command within the interface.
2.3.1 SQL statements
The CLI efficiently handles standard SQL statements. To execute a command, you must end it with a semicolon (;) and a newline. The interface supports multi-line input, allowing you to paste complex queries comfortably.
2.3.2 Dot commands
In addition to SQL, the CLI supports "Dot Commands"—special internal instructions identified by a leading period (.). These do not require a semicolon.
Essential DuckDB Dot Commands
| Command | Description |
|---|---|
.help | Display a list of all available commands |
.open FILENAME | Close current database and open a new file |
.read FILENAME | Read and execute SQL from an external file |
.tables | List all available tables and views |
.timer on/off | Toggle display of query execution time |
.mode MODE | Change output format (duckbox, line, json, csv) |
.maxrows N | Set maximum rows to display (duckbox format) |
.excel | Open next query result in a spreadsheet |
.quit or .exit | Exit the CLI (also Ctrl+D) |
Pro Tip: The .duckdbrc Config File
You can configure the CLI to run specific commands every time it launches by creating a .duckdbrc file in your home directory. This is perfect for setting a custom prompt or enabling timers by default.
2.3.3 CLI arguments
The CLI is also a powerful tool for automation. You can pass flags to control behavior without entering interactive mode.
DuckDB CLI Arguments Reference
| Argument | Description |
|---|---|
-readonly | Open the database in read-only mode |
-json | Set output mode to JSON |
-line | Set output mode to line-by-line |
-unsigned | Allow loading of unsigned extensions |
-s COMMAND or -c COMMAND | Run a command and exit immediately |
For example, to output a query directly to JSON:
Copy code
duckdb --json -c 'SELECT * FROM my_table'
2.4 DuckDB's Extension System
To keep the core binary lightweight, DuckDB utilizes a modular extension system. Extensions are packages that add functionality—such as support for specific file formats (Parquet, Excel, JSON), geospatial data, or full-text search—only when you need them.
Popular DuckDB Extensions
| Extension | Purpose | Pre-installed |
|---|---|---|
parquet | Read/write Parquet columnar files | Yes |
json | JSON file support and functions | Yes |
icu | International character handling | Yes |
fts | Full-text search capabilities | Yes |
httpfs | Query files from HTTP/S3/GCS | No (auto-loads) |
postgres_scanner | Query PostgreSQL databases directly | No |
sqlite_scanner | Query SQLite databases directly | No |
spatial | Geospatial data types and functions | No |
excel | Read/write Excel files | No |
motherduck | Connect to MotherDuck cloud service | No |
You can inspect the current state of your extensions using the duckdb_extensions() function:
Copy code
SELECT extension_name, loaded, installed
FROM duckdb_extensions();
Installing and Loading
The lifecycle of an extension involves two steps:
INSTALL extension_name;– Downloads the extension to your local environment.LOAD extension_name;– Activates the extension for the current session.
Note: Since version 0.8, DuckDB often autoloads known extensions (like httpfs or parquet) if it detects they are needed by your query.
2.5 Analyzing a CSV File with the DuckDB CLI
DuckDB excels at "Data Engineering in a Box." It can query remote files directly from HTTP servers, S3, Google Cloud Storage, or HDFS without a manual download step. We can demonstrate this using a dataset containing the total population figures for several countries (https://mng.bz/KZKZ).
Common Use Cases for DuckDB CLI
| Use Case | Description | Example |
|---|---|---|
| Data Exploration | Quickly analyze CSV/Parquet files without setup | SELECT * FROM 'data.csv' LIMIT 10; |
| ETL Pipelines | Transform data between formats in shell scripts | duckdb -s "COPY (...) TO 'out.parquet'" |
| File Format Conversion | Convert CSV to Parquet or vice versa | COPY (FROM 'input.csv') TO 'output.parquet' |
| Remote Data Analysis | Query files directly from S3/HTTP | FROM 's3://bucket/data.parquet' |
| Ad-hoc Reporting | Generate quick reports to CSV/JSON | duckdb -csv -s "SELECT ..." > report.csv |
Direct Remote Querying
Using the httpfs extension, you can query a remote CSV file simply by providing its URL. If the URL ends in a known extension like .csv, DuckDB infers the format automatically. For URLs without extensions (like shortened links), use read_csv_auto:
Copy code
SELECT count(*)
FROM read_csv_auto('https://bit.ly/3KoiZR0');
2.5.1 Result Modes
When exploring datasets with many columns, the default table view (duckbox) might truncate data. DuckDB provides multiple output modes to suit different use cases.
DuckDB Output Modes Comparison
| Mode Type | Modes | Best For |
|---|---|---|
| Table-based | duckbox, box, ascii, table, list, column, csv | Few columns, structured display |
| Line-based | line, json, jsonlines | Many columns, initial data exploration |
| Special | html, insert, trash (no output) | Export, debugging |
Switch modes using .mode:
Copy code
.mode line
SELECT * FROM read_csv_auto('...') LIMIT 1;
The line mode displays each column on its own row—ideal for exploring datasets with 10+ columns that would otherwise be truncated.
Real-World Example: Western Europe Population Data
The book demonstrates querying a countries dataset and filtering for Western Europe. Here's the actual output from the example:
Sample Output: Western Europe Demographics
| Country | Population | Birthrate | Deathrate |
|---|---|---|---|
| Andorra | 71,201 | 8.71 | 6.25 |
| Austria | 8,192,880 | 8.74 | 9.76 |
| Belgium | 10,379,067 | 10.38 | 10.27 |
| Denmark | 5,450,661 | 11.13 | 10.36 |
| Faroe Islands | 47,246 | 14.05 | 8.70 |
This data was extracted with a single command—no tables created, no data imported:
Copy code
duckdb -csv \
-s "SELECT Country, Population, Birthrate, Deathrate
FROM read_csv_auto('https://bit.ly/3KoiZR0')
WHERE trim(region) = 'WESTERN EUROPE'" \
> western_europe.csv
Building Data Pipelines
The CLI can function as a robust step in a data pipeline. You can read data from a remote source, apply SQL transformations, and write the result to a local file in a single command.
Writing to CSV:
Copy code
duckdb -csv -s "SELECT * FROM remote_file WHERE region='Europe'" > western_europe.csv
Writing to Parquet:
DuckDB can also export directly to the compressed Parquet format using the COPY ... TO syntax:
Copy code
duckdb -s "COPY (SELECT * FROM remote_file) TO 'output.parquet' (FORMAT PARQUET)"
2.6 Summary
- Availability: DuckDB is available as a library for Python, R, Java, C++, and many others, but the CLI is a standalone, dependency-free entry point.
- CLI Control: The CLI uses dot commands (
.mode,.timer,.read) to control the environment and output formats. - Display Modes: You can visualize data in various formats, including
duckbox(tables) andline(row-oriented). - Remote Data: The
httpfsextension allows direct SQL querying of files hosted on HTTP servers or S3. - Pipelines: The CLI can act as a processing engine, querying external datasets and writing results to standard output or files (CSV, Parquet) without creating persistent tables.
Continue Reading: Chapter 1: Why DuckDB | Chapter 3: Executing SQL Queries
Related: Learn how MotherDuck extends DuckDB to the cloud with serverless analytics.
