SQL
Back to DuckDB Data Engineering Glossary
Overview
SQL (Structured Query Language) is the standard language for working with relational databases. Originally developed at IBM in the 1970s, SQL allows you to store, retrieve, update, and analyze data using declarative statements - meaning you describe what you want, not how to get it. SQL has become the foundation of most modern data work, from basic data analysis to complex data pipelines.
Key Concepts
SQL operates primarily through statements that begin with commands like SELECT
, INSERT
, UPDATE
, and DELETE
. The most commonly used is SELECT
for querying data. SQL also includes commands for defining and modifying database structures using CREATE
, ALTER
, and DROP
.
DuckDB Examples
DuckDB implements most standard SQL syntax while adding some helpful extensions. Here's a basic query:
Copy code
SELECT city, COUNT(*) as num_customers
FROM customers
GROUP BY city
ORDER BY num_customers DESC;
DuckDB extends standard SQL with convenient shortcuts like SELECT *
replacements:
Copy code
-- Standard SQL requires listing all columns except full_name
SELECT id, email, phone, address
FROM customers;
-- DuckDB allows excluding specific columns
SELECT * EXCLUDE (full_name)
FROM customers;
DuckDB also adds powerful extensions for working with semi-structured data:
Copy code
-- Query nested JSON data
SELECT json_extract_path(data, 'user', 'address', 'city') as city
FROM events;
-- Work with arrays
SELECT unnest(string_split(tags, ',')) as individual_tags
FROM posts;
Common Use Cases
Data analysts use SQL to explore datasets, create reports, and build dashboards. Data engineers use SQL to transform data, build data pipelines, and optimize database performance. SQL's ubiquity means it integrates well with visualization tools like Tableau, analysis notebooks like Jupyter, and modern data warehouses like Snowflake.
Best Practices
Writing clear, maintainable SQL involves consistent formatting, clear table aliases, appropriate comments, and breaking complex queries into manageable CTEs (Common Table Expressions). While SQL is fairly forgiving in terms of formatting, following consistent patterns makes queries easier to understand and modify.