JSON

Back to DuckDB Data Engineering Glossary

JSON (JavaScript Object Notation) is a lightweight, text-based data interchange format that's easy for humans to read and write, and simple for machines to parse and generate. It consists of two primary structures: objects (enclosed in curly braces) containing key-value pairs, and arrays (enclosed in square brackets) of values. JSON supports basic data types like strings, numbers, booleans, and null, making it versatile for representing structured data. While originally derived from JavaScript, JSON is language-independent and widely used for data exchange in web applications, APIs, and configuration files. In the context of data engineering, JSON is often used for storing semi-structured data in databases or as an intermediate format in data pipelines.

DuckDB provides built-in support for working with JSON data. Here's an example of querying JSON data in DuckDB:

Copy code

-- Create a table with a JSON column CREATE TABLE users (id INTEGER, data JSON); -- Insert some JSON data INSERT INTO users VALUES (1, '{"name": "Alice", "age": 30, "hobbies": ["reading", "hiking"]}'), (2, '{"name": "Bob", "age": 25, "hobbies": ["gaming", "cooking"]}'); -- Query JSON data SELECT id, data->>'name' AS name, data->>'age' AS age FROM users WHERE CAST(data->>'age' AS INTEGER) > 25;

This example demonstrates creating a table with a JSON column, inserting JSON data, and then querying specific fields from the JSON structure.