What is a Foreign Key?

Back to DuckDB Data Engineering Glossary

Key Takeaways

  • Definition: A Foreign Key (FK) is a column that links two tables together by referencing the Primary Key of another table.
  • Purpose: It maintains Referential Integrity, ensuring relationships between data remain consistent.
  • DuckDB Context: DuckDB supports standard SQL foreign key constraints, which are useful for ensuring data quality in your warehouse, though they incur a small performance cost during write operations.
  • Usage: Essential for joining dimensions and facts in Star Schema data modeling.

Foreign Key Definition

A foreign key is a column (or collection of columns) in a table that links to the primary key in another table. It is a critical component of relational database systems, used to establish and enforce a link between data in two tables. This link ensures referential integrity, meaning the database prevents you from having data in one table that points to non-existent data in another.

Foreign Key Syntax in DuckDB

In DuckDB and other databases, you define a foreign key as part of your table creation using the FOREIGN KEY constraint. This tells the database to enforce the relationship between tables by preventing invalid references. Here's an example:

Copy code

CREATE TABLE orders ( id INTEGER PRIMARY KEY, total DECIMAL(10,2) ); CREATE TABLE order_items ( id INTEGER PRIMARY KEY, order_id INTEGER, product_name VARCHAR(100), quantity INTEGER, FOREIGN KEY (order_id) REFERENCES orders(id) );

In this example, order_id in the order_items table is a foreign key that references the id column in the orders table. DuckDB will prevent you from:

  • Inserting an order item with an order_id that doesn't exist in the orders table
  • Deleting an order from orders if there are still items referencing it in order_items

Primary Key vs. Foreign Key

Understanding the distinction between these two keys is fundamental to relational database design:

FeaturePrimary KeyForeign Key
UniquenessMust be unique for every row.Can accept duplicate values.
NullabilityCannot be NULL.Can be NULL (unless specified otherwise).
FunctionIdentifies the record itself.Identifies a relationship to another table.
CountOnly one per table.A table can have multiple foreign keys.

Composite Foreign Keys

You can also create foreign keys that reference multiple columns, known as composite foreign keys:

Copy code

CREATE TABLE locations ( country_code CHAR(2), region_code CHAR(2), city VARCHAR(100), PRIMARY KEY (country_code, region_code) ); CREATE TABLE stores ( id INTEGER PRIMARY KEY, country_code CHAR(2), region_code CHAR(2), address VARCHAR(200), FOREIGN KEY (country_code, region_code) REFERENCES locations(country_code, region_code) );

Foreign Keys in Data Warehousing

In transactional databases (OLTP), foreign keys are strictly enforced to prevent errors. in Data Warehousing (OLAP) scenarios like those used with MotherDuck, foreign keys play a dual role:

  1. Data Quality: They prevent 'orphaned' records in your fact tables.
  2. Join Optimization: Defining foreign keys helps the query optimizer understand the relationships between tables, potentially speeding up JOIN operations in complex analytical queries.

Performance Considerations

Foreign keys automatically create an index on the referencing columns in DuckDB, which can help with query performance when joining tables. However, they also add overhead during data modifications since the database must check referential integrity with each insert or update.

While DuckDB indexes foreign keys for faster joins, enabling constraints does check every INSERT or UPDATE. For massive bulk loads in a data warehouse context, engineers sometimes temporarily disable constraints to maximize load speed, re-enabling them afterwards to check integrity.

FAQS

What is the difference between a primary key and a foreign key?

A primary key uniquely identifies a record within its own table, while a foreign key is a field that links to the primary key of another table to establish a relationship.

Can a foreign key be null in DuckDB?

Yes, standard foreign keys can contain NULL values, meaning that the record has no relationship to the referenced table, unless the column is explicitly defined as NOT NULL.

Does DuckDB enforce foreign key constraints?

Yes, DuckDB enforces foreign key constraints by default, preventing actions that would destroy links between tables, though this adds some overhead during data loading.