query
Back to DuckDB Data Engineering Glossary
Definition
A query is a request made to a database to retrieve, modify, or analyze data. In data analytics, queries are most commonly written in SQL and allow you to ask specific questions of your data to get meaningful answers. Think of a query like asking a librarian to find all books published after 2020 that are about data science - you're defining specific criteria to get exactly the information you need.
Structure
A typical SQL query consists of several key components that work together:
- The operation you want to perform (like
SELECT
to retrieve data) - The data you want to work with (specified in the
FROM
clause) - Any conditions or filters (using
WHERE
) - How you want the results organized (with clauses like
GROUP BY
orORDER BY
)
Examples in DuckDB
Here's a simple query that retrieves all columns from a table:
Copy code
SELECT * FROM employees;
A more complex query that filters and aggregates data:
Copy code
SELECT
department,
COUNT(*) as employee_count,
AVG(salary) as avg_salary
FROM employees
WHERE hire_date >= '2020-01-01'
GROUP BY department
ORDER BY avg_salary DESC;
DuckDB supports both standard SQL queries and some extended functionality, like the ability to query directly from Parquet files:
Copy code
SELECT *
FROM 'data/employees.parquet'
WHERE salary > 50000;
Performance
DuckDB is specifically optimized for analytical queries, which typically involve scanning large amounts of data, performing aggregations, and joining tables. It uses vectorized query execution and parallel processing to make these operations particularly fast, especially compared to traditional row-oriented databases like SQLite.
Interactive Usage
Queries can be run through DuckDB's command-line interface, through programming languages like Python or R, or through various data tools that support DuckDB as a backend. The results can then be used for analysis, visualization, or further processing in your data pipeline.