SQL

Back to DuckDB Data Engineering Glossary

Overview

SQL (Structured Query Language) is the standard language for working with relational databases. Originally developed at IBM in the 1970s, SQL allows you to store, retrieve, update, and analyze data using declarative statements - meaning you describe what you want, not how to get it. SQL has become the foundation of most modern data work, from basic data analysis to complex data pipelines.

Key Concepts

SQL operates primarily through statements that begin with commands like SELECT, INSERT, UPDATE, and DELETE. The most commonly used is SELECT for querying data. SQL also includes commands for defining and modifying database structures using CREATE, ALTER, and DROP.

DuckDB Examples

DuckDB implements most standard SQL syntax while adding some helpful extensions. Here's a basic query:

Copy code

SELECT city, COUNT(*) as num_customers FROM customers GROUP BY city ORDER BY num_customers DESC;

DuckDB extends standard SQL with convenient shortcuts like SELECT * replacements:

Copy code

-- Standard SQL requires listing all columns except full_name SELECT id, email, phone, address FROM customers; -- DuckDB allows excluding specific columns SELECT * EXCLUDE (full_name) FROM customers;

DuckDB also adds powerful extensions for working with semi-structured data:

Copy code

-- Query nested JSON data SELECT json_extract_path(data, 'user', 'address', 'city') as city FROM events; -- Work with arrays SELECT unnest(string_split(tags, ',')) as individual_tags FROM posts;

Common Use Cases

Data analysts use SQL to explore datasets, create reports, and build dashboards. Data engineers use SQL to transform data, build data pipelines, and optimize database performance. SQL's ubiquity means it integrates well with visualization tools like Tableau, analysis notebooks like Jupyter, and modern data warehouses like Snowflake.

Best Practices

Writing clear, maintainable SQL involves consistent formatting, clear table aliases, appropriate comments, and breaking complex queries into manageable CTEs (Common Table Expressions). While SQL is fairly forgiving in terms of formatting, following consistent patterns makes queries easier to understand and modify.