8 min read
Getting Started with DuckDB
INFOThis is a summary of a book chapter from DuckDB in Action, published by Manning. Download the complete book for free to read the complete chapter.
DuckDB has risen to prominence in the early 2020s as a powerful, embedded OLAP database. While the database can be used from within almost any programming environment, the quickest way to get familiar with its capabilities is through the Command-Line Interface (CLI). This chapter focuses on installing the tool, mastering its built-in commands, and building efficient data pipelines without leaving the terminal.
2.1 Supported Environments
DuckDB is designed to be highly portable and dependency-free. It is available for a wide range of programming languages and operating systems, supporting both Intel/AMD and ARM architectures.
DuckDB Language & Platform Support
| Category | Supported Options |
|---|---|
| Operating Systems | Linux, Windows, macOS (Intel & ARM) |
| Languages | Python, R, Java, JavaScript, Go, Rust, Node.js, Julia, C/C++, Swift |
| Database Interfaces | ODBC, JDBC, WASM |
While DuckDB is deeply integrated into these languages, this chapter focuses exclusively on the Command Line Interface (CLI), which allows users to execute SQL interactively or as part of shell scripts.
2.2 Installing the DuckDB CLI
The DuckDB CLI is an "embedded database" in the truest sense. There is no server to install, no background daemon to configure, and no client-server connection to manage. The installation is a "copy-to" installation—it consists of a single binary executable named duckdb. You can find the full list of packages on the installation page.
DuckDB CLI vs Other Database CLIs
| Feature | DuckDB CLI | sqlite3 | psql (PostgreSQL) | mysql CLI |
|---|---|---|---|---|
| Server Required | No | No | Yes | Yes |
| Installation | Single binary | Single binary | Requires server | Requires server |
| Query Remote Files | Yes (HTTP, S3, GCS) | No | No | No |
| In-Process Analytics | Yes | Limited | No | No |
| Direct Parquet/CSV | Yes | No | No | No |
| Memory Mode | Default | Optional | N/A | N/A |
2.2.1 macOS
For macOS users, the official recommendation is to use the Homebrew package manager, which handles updates automatically.
Copy code
brew install duckdb
2.2.2 Linux and Windows
For Linux and Windows, the CLI is distributed as a zip file via GitHub Releases. The process involves:
- Visiting the DuckDB GitHub releases page.
- Downloading the package appropriate for your architecture (e.g.,
duckdb_cli-linux-amd64.zip). - Unzipping the file.
- Running the binary directly.
Example for Linux (AMD64):
Copy code
wget https://github.com/duckdb/duckdb/releases/download/v1.4.3/duckdb_cli-linux-amd64.zip unzip duckdb_cli-linux-amd64.zip ./duckdb -version
2.3 Using the DuckDB CLI
Launching the DuckDB CLI is incredibly simple, requiring just the execution of the duckdb command. By default, the database operates in transient mode, storing all data in memory, which is lost upon exiting the CLI using .quit or .exit.

For persistent storage, you can launch the tool with a filename (e.g., duckdb my_data.db) or use the .open command within the interface.
2.3.1 SQL statements
The CLI efficiently handles standard SQL statements. To execute a command, you must end it with a semicolon (;) and a newline. The interface supports multi-line input, allowing you to paste complex queries comfortably.
2.3.2 Dot commands
In addition to SQL, the CLI supports "Dot Commands"—special internal instructions identified by a leading period (.). These do not require a semicolon.
Essential DuckDB Dot Commands
| Command | Description |
|---|---|
.help | Display a list of all available commands |
.open FILENAME | Close current database and open a new file |
.read FILENAME | Read and execute SQL from an external file |
.tables | List all available tables and views |
.timer on/off | Toggle display of query execution time |
.mode MODE | Change output format (duckbox, line, json, csv) |
.maxrows N | Set maximum rows to display (duckbox format) |
.excel | Open next query result in a spreadsheet |
.quit or .exit | Exit the CLI (also Ctrl+D) |
Pro Tip: The .duckdbrc Config File
You can configure the CLI to run specific commands every time it launches by creating a .duckdbrc file in your home directory. This is perfect for setting a custom prompt or enabling timers by default.
2.3.3 CLI arguments
The CLI is also a powerful tool for automation. You can pass flags to control behavior without entering interactive mode.
DuckDB CLI Arguments Reference
| Argument | Description |
|---|---|
-readonly | Open the database in read-only mode |
-json | Set output mode to JSON |
-line | Set output mode to line-by-line |
-unsigned | Allow loading of unsigned extensions |
-s COMMAND or -c COMMAND | Run a command and exit immediately |
For example, to output a query directly to JSON:
Copy code
duckdb --json -c 'SELECT * FROM my_table'
2.4 DuckDB's Extension System
To keep the core binary lightweight, DuckDB utilizes a modular extension system. Extensions are packages that add functionality—such as support for specific file formats (Parquet, Excel, JSON), geospatial data, or full-text search—only when you need them.
Popular DuckDB Extensions
| Extension | Purpose | Pre-installed |
|---|---|---|
parquet | Read/write Parquet columnar files | Yes |
json | JSON file support and functions | Yes |
icu | International character handling | Yes |
fts | Full-text search capabilities | Yes |
httpfs | Query files from HTTP/S3/GCS | No (auto-loads) |
postgres_scanner | Query PostgreSQL databases directly | No |
sqlite_scanner | Query SQLite databases directly | No |
spatial | Geospatial data types and functions | No |
excel | Read/write Excel files | No |
motherduck | Connect to MotherDuck cloud service | No |
You can inspect the current state of your extensions using the duckdb_extensions() function:
Copy code
SELECT extension_name, loaded, installed
FROM duckdb_extensions();
Installing and Loading
The lifecycle of an extension involves two steps:
INSTALL extension_name;– Downloads the extension to your local environment.LOAD extension_name;– Activates the extension for the current session.
Note: Since version 0.8, DuckDB often autoloads known extensions (like httpfs or parquet) if it detects they are needed by your query.
2.5 Analyzing a CSV File with the DuckDB CLI
DuckDB excels at "Data Engineering in a Box." It can query remote files directly from HTTP servers, S3, Google Cloud Storage, or HDFS without a manual download step. We can demonstrate this using a dataset containing the total population figures for several countries (https://mng.bz/KZKZ).
Common Use Cases for DuckDB CLI
| Use Case | Description | Example |
|---|---|---|
| Data Exploration | Quickly analyze CSV/Parquet files without setup | SELECT * FROM 'data.csv' LIMIT 10; |
| ETL Pipelines | Transform data between formats in shell scripts | duckdb -s "COPY (...) TO 'out.parquet'" |
| File Format Conversion | Convert CSV to Parquet or vice versa | COPY (FROM 'input.csv') TO 'output.parquet' |
| Remote Data Analysis | Query files directly from S3/HTTP | FROM 's3://bucket/data.parquet' |
| Ad-hoc Reporting | Generate quick reports to CSV/JSON | duckdb -csv -s "SELECT ..." > report.csv |
Direct Remote Querying
Using the httpfs extension, you can query a remote CSV file simply by providing its URL. If the URL ends in a known extension like .csv, DuckDB infers the format automatically. For URLs without extensions (like shortened links), use read_csv_auto:
Copy code
SELECT count(*)
FROM read_csv_auto('https://bit.ly/3KoiZR0');
2.5.1 Result Modes
When exploring datasets with many columns, the default table view (duckbox) might truncate data. DuckDB provides multiple output modes to suit different use cases.
DuckDB Output Modes Comparison
| Mode Type | Modes | Best For |
|---|---|---|
| Table-based | duckbox, box, ascii, table, list, column, csv | Few columns, structured display |
| Line-based | line, json, jsonlines | Many columns, initial data exploration |
| Special | html, insert, trash (no output) | Export, debugging |
Switch modes using .mode:
Copy code
.mode line
SELECT * FROM read_csv_auto('...') LIMIT 1;
The line mode displays each column on its own row—ideal for exploring datasets with 10+ columns that would otherwise be truncated.
Real-World Example: Western Europe Population Data
The book demonstrates querying a countries dataset and filtering for Western Europe. Here's the actual output from the example:
Sample Output: Western Europe Demographics
| Country | Population | Birthrate | Deathrate |
|---|---|---|---|
| Andorra | 71,201 | 8.71 | 6.25 |
| Austria | 8,192,880 | 8.74 | 9.76 |
| Belgium | 10,379,067 | 10.38 | 10.27 |
| Denmark | 5,450,661 | 11.13 | 10.36 |
| Faroe Islands | 47,246 | 14.05 | 8.70 |
This data was extracted with a single command—no tables created, no data imported:
Copy code
duckdb -csv \
-s "SELECT Country, Population, Birthrate, Deathrate
FROM read_csv_auto('https://bit.ly/3KoiZR0')
WHERE trim(region) = 'WESTERN EUROPE'" \
> western_europe.csv
Building Data Pipelines
The CLI can function as a robust step in a data pipeline. You can read data from a remote source, apply SQL transformations, and write the result to a local file in a single command.
Writing to CSV:
Copy code
duckdb -csv -s "SELECT * FROM remote_file WHERE region='Europe'" > western_europe.csv
Writing to Parquet:
DuckDB can also export directly to the compressed Parquet format using the COPY ... TO syntax:
Copy code
duckdb -s "COPY (SELECT * FROM remote_file) TO 'output.parquet' (FORMAT PARQUET)"
2.6 Summary
- Availability: DuckDB is available as a library for Python, R, Java, C++, and many others, but the CLI is a standalone, dependency-free entry point.
- CLI Control: The CLI uses dot commands (
.mode,.timer,.read) to control the environment and output formats. - Display Modes: You can visualize data in various formats, including
duckbox(tables) andline(row-oriented). - Remote Data: The
httpfsextension allows direct SQL querying of files hosted on HTTP servers or S3. - Pipelines: The CLI can act as a processing engine, querying external datasets and writing results to standard output or files (CSV, Parquet) without creating persistent tables.
Continue Reading: Chapter 1: Why DuckDB | Chapter 3: Executing SQL Queries
Related: Learn how MotherDuck extends DuckDB to the cloud with serverless analytics.

FAQS
How do I install the DuckDB CLI?
DuckDB CLI installation is simple—it's a single binary with no dependencies. On macOS, use Homebrew: brew install duckdb. On Linux/Windows, download the appropriate zip file from the GitHub releases page, unzip it, and run the duckdb executable directly. No installers or libraries are required.
What are DuckDB dot commands?
Dot commands are special CLI-only instructions that begin with a period (.) and don't require a semicolon. They control the CLI environment rather than executing SQL. Key commands include .help (list commands), .tables (show tables), .mode (change output format), .timer on/off (show query timing), .read (execute SQL from a file), and .open (switch database files). Type .help for the full list.
How do I query a remote CSV file with DuckDB?
DuckDB can query remote files directly using the httpfs extension. Simply use the URL in your query: SELECT * FROM 'https://example.com/data.csv';. If the URL doesn't end in .csv, use read_csv_auto(): SELECT * FROM read_csv_auto('https://bit.ly/shortlink');. The httpfs extension auto-loads when needed (DuckDB 0.8+) and supports HTTP, HTTPS, S3, and Google Cloud Storage.
What are DuckDB extensions and how do I use them?
Extensions add optional functionality to DuckDB while keeping the core lightweight. Some extensions (like parquet, json, icu) come pre-installed. Others (like httpfs, spatial, postgres_scanner) need to be installed with INSTALL extension_name; and loaded with LOAD extension_name;. Since DuckDB 0.8, many extensions auto-load when needed. Check installed extensions with SELECT * FROM duckdb_extensions();.
How do I change the output format in DuckDB CLI?
Use the .mode command to switch output formats. Table-based modes (good for few columns): duckbox (default), box, ascii, table, csv, list. Line-based modes (good for many columns): line, json, jsonlines. For example, .mode line displays each column on its own row—ideal for exploring wide datasets. You can also use CLI flags like duckdb --json for non-interactive use.
How do I use DuckDB in shell scripts and data pipelines?
DuckDB CLI works seamlessly in pipelines. Use the -s or -c flag to run a command and exit: duckdb -csv -s "SELECT * FROM data.csv WHERE region='Europe'" > output.csv. For Parquet output, use COPY: duckdb -s "COPY (SELECT * FROM data) TO 'output.parquet' (FORMAT PARQUET)". DuckDB reads from stdin and writes to stdout, making it composable with other Unix tools.
What is the .duckdbrc config file?
The .duckdbrc file (located at $HOME/.duckdbrc) runs automatically when DuckDB CLI starts. Use it to set default configurations like custom prompts, enable timers, or run initialization SQL. Example content:
.timer on
.mode duckbox
.prompt 'DuckDB> '
This saves you from repeatedly typing the same setup commands each session.
How do I export query results to Parquet format?
Use the COPY ... TO syntax with FORMAT PARQUET: COPY (SELECT * FROM my_table WHERE condition) TO 'output.parquet' (FORMAT PARQUET);. This works from the CLI or scripts. For CSV export, you can pipe output directly: duckdb -csv -s "SELECT * FROM data" > output.csv. Parquet files are compressed and columnar, making them ideal for analytical workloads.
Why does DuckDB say 'Table does not exist' when querying a URL?
This error occurs when DuckDB can't determine the file format from the URL. If your URL doesn't end in a recognized extension (like .csv or .parquet), DuckDB treats it as a table name. Solution: Use the explicit reader function: SELECT * FROM read_csv_auto('https://bit.ly/shortlink'); instead of SELECT * FROM 'https://bit.ly/shortlink';. This tells DuckDB to parse the URL as a CSV file regardless of its extension.
How does DuckDB CLI compare to sqlite3 and psql?
DuckDB CLI is unique among database CLIs: (1) No server required—unlike psql or mysql, DuckDB runs entirely in-process; (2) Direct file querying—query CSV, Parquet, and JSON files without importing them first; (3) Remote file support—query files directly from HTTP, S3, or GCS URLs; (4) Analytical focus—optimized for OLAP workloads with columnar storage and vectorized execution. It's closest to sqlite3 in simplicity but far more powerful for analytics.
What are common use cases for the DuckDB CLI?
The DuckDB CLI excels at: (1) Data exploration—quickly analyze CSV/Parquet files without setup; (2) ETL pipelines—transform data between formats in shell scripts; (3) File format conversion—convert CSV to Parquet with a single command; (4) Remote data analysis—query files on S3 or HTTP without downloading; (5) Ad-hoc reporting—generate quick reports to CSV or JSON. It's particularly valuable when you need SQL power without database infrastructure.


