Simon Späti

Simon Späti's photo

Technical Author & Data Engineer

Simon is a Data Engineer and Technical Author with 20+ years of experience in the data field. He's the author of the Data Engineering Blog (ssp.sh), curator of the Data Engineering Vault (vault.ssp.sh), and currently writes a book about Data Engineering Design Patterns (dedp.online). Simon maintains an awareness of open-source data engineering technologies and enjoys sharing his knowledge with the community.

31 BLOG POSTS AND VIDEOS

DuckDB Ecosystem Newsletter – February 2026

2026/02/11 - Simon Späti

DuckDB Ecosystem Newsletter – February 2026

MSSQL Extension, Vortex File Format & 2TB Memory Benchmarks

Building an Obsidian RAG with DuckDB and MotherDuck

2026/02/05 - Simon Späti

Building an Obsidian RAG with DuckDB and MotherDuck

Build a local-first RAG for your Obsidian notes using DuckDB's vector search, then deploy it as a serverless web app with MotherDuck

This Month in the DuckDB Ecosystem: January 2026

2026/01/17 - Simon Späti

This Month in the DuckDB Ecosystem: January 2026

DuckDB news: Iceberg extension adds full DML (INSERT/UPDATE/DELETE). Process 1TB in 30 seconds. Query data via AI agents with MCP server. TypeScript macros for APIs.

This Month in the DuckDB Ecosystem: December 2025

2025/12/10 - Simon Späti

This Month in the DuckDB Ecosystem: December 2025

DuckDB news: v1.4 adds AES-256 encryption. DuckLake brings ACID-compliant lakehouse with time-travel queries. Gaggle extension queries Kaggle datasets directly via SQL.

Simplicity of a Database, but the Speed of a Cache: OLAP Caches for DuckDB

2025/12/03 - Simon Späti

Simplicity of a Database, but the Speed of a Cache: OLAP Caches for DuckDB

Speed up slow dashboards without adding new infrastructure. Learn how DuckDB's caching extensions can drop query times from minutes to seconds.

Branch, Test, Deploy: A Git-Inspired Approach for Data

2025/11/24 - Simon Späti

Branch, Test, Deploy: A Git-Inspired Approach for Data

This article explores how to bring Git style workflows like branching, testing, and deploying to your data stack. Learn how concepts like zero copy cloning and metadata pointers can finally give you isolated test environments.

DuckDB Ecosystem: November 2025

2025/11/12 - Simon Späti

DuckDB Ecosystem: November 2025

DuckDB news: QuackStore caching cuts query time from 49s to 3s. Infera runs ONNX ML models in SQL. 127 community extensions analyzed. DuckLake architecture explained.

4 Senior Data Engineers Answer 10 Top Reddit Questions

2025/10/30 - Simon Späti

4 Senior Data Engineers Answer 10 Top Reddit Questions

A great panel answering the most voted/commented data questions on Reddit

DuckDB Ecosystem: October 2025

2025/10/07 - Simon Späti

DuckDB Ecosystem: October 2025

DuckDB news: v1.4.0 LTS brings AES-256 encryption, MERGE statements, and Iceberg writes. 100x faster than Spark on local Parquet. Official Docker images released.

DuckDB Ecosystem: September 2025

2025/09/09 - Simon Späti

DuckDB Ecosystem: September 2025

DuckDB news: Spatial joins 58x faster via R-tree indexing. pg_duckdb 1.0 adds OLAP analytics to PostgreSQL. One team cut Snowflake costs 79% using DuckDB caching.

Why Semantic Layers Matter — and How to Build One with DuckDB

2025/08/19 - Simon Späti

Why Semantic Layers Matter — and How to Build One with DuckDB

Learn what a semantic layer is, why it matters, and how to build a simple one with DuckDB and Ibis using just YAML and Python

DuckDB Ecosystem: August 2025

2025/08/07 - Simon Späti

DuckDB Ecosystem: August 2025

DuckDB news: 50.7% YoY developer growth. DuckLake v0.2 adds credential secrets. BigQuery extension hits 21.7k weekly downloads. Vector search enables RAG applications.

Summer Data Engineering Roadmap

2025/07/21 - Simon Späti

Summer Data Engineering Roadmap

A comprehensive 3-week structured roadmap for learning data engineering fundamentals, from SQL and Git basics to advanced topics like streaming, data quality, and DevOps.RetryClaude can make mistakes. Please double-check responses.

This Month in the DuckDB Ecosystem: July 2025

2025/07/08 - Simon Späti

This Month in the DuckDB Ecosystem: July 2025

DuckDB news: Tributary streams Kafka data to SQL queries. SQLRooms enables browser analytics via DuckDB-WASM. pg_duckdb benchmarks show 1,500x TPC-DS speedup.

The Data Engineer Toolkit: Infrastructure, DevOps, and Beyond

2025/07/03 - Simon Späti

The Data Engineer Toolkit: Infrastructure, DevOps, and Beyond

Data engineers increasingly own DevOps. This guide breaks down the best DevOps and CI/CD tools, Kubernetes practices, data engineering platforms, workflow orchestration, and infrastructure strategies used in modern data stacks.

DuckDB Ecosystem: June 2025

2025/06/06 - Simon Späti

DuckDB Ecosystem: June 2025

DuckDB news: DuckLake combines catalog and table format with ACID metadata in SQL. Radio extension adds WebSocket and Redis Pub/Sub. Top CSV benchmark results.

The Open Lakehouse Stack: DuckDB and the Rise of Table Formats

2025/05/23 - Simon Späti

The Open Lakehouse Stack: DuckDB and the Rise of Table Formats

Learn how DuckDB and open table formats like Iceberg power a fast, composable analytics stack on affordable cloud storage

DuckDB Ecosystem: May 2025

2025/05/08 - Simon Späti

DuckDB Ecosystem: May 2025

DuckDB news: Metabase driver queries Parquet files directly. FlockMTL integrates LLMs into SQL workflows. Doom clone runs in DuckDB-WASM. Spatial wins top honors.

The Data Engineer's Guide to Efficient Log Parsing with DuckDB/MotherDuck

2025/04/18 - Simon Späti

The Data Engineer's Guide to Efficient Log Parsing with DuckDB/MotherDuck

How to Query JSON and Log Files with SQL Using DuckDB and MotherDuck

DuckDB Ecosystem: April 2025

2025/04/05 - Simon Späti

DuckDB Ecosystem: April 2025

DuckDB news: Streaming support with new remote file caching. Community extensions expand real-time analytics. Event-driven processing patterns for data pipelines.

Vector Technologies for AI: Extending Your Existing Data Stack

2025/03/28 - Simon Späti

Vector Technologies for AI: Extending Your Existing Data Stack

Understand when to use a vector database and how it differs from vector search engines.

DuckDB Ecosystem: March 2025

2025/03/07 - Simon Späti

DuckDB Ecosystem: March 2025

DuckDB news: v1.2 adds Google Sheets extension for SQL on spreadsheets. Duckberg queries Iceberg tables via Python. Smallpond sorts 110TB in 30 min using Ray.

A Beginner’s Guide to Geospatial with DuckDB Spatial and MotherDuck

2025/02/26 - Simon Späti

A Beginner’s Guide to Geospatial with DuckDB Spatial and MotherDuck

Unlock the power of geospatial analysis with DuckDB Spatial and MotherDuck, making location-based data processing faster, simpler, and more accessible for data engineers.

DuckDB Ecosystem: February 2025

2025/02/09 - Simon Späti

DuckDB Ecosystem: February 2025

DuckDB news: DuckCon #6 runs TPC-H SF300 on Raspberry Pi. SQL/PGQ graph queries 10-100x faster than Neo4j. Arrow Flight enables concurrent read/write access.

The Data Engineering Toolkit: Essential Tools for Your Machine

2025/01/22 - Simon Späti

The Data Engineering Toolkit: Essential Tools for Your Machine

Master the essential data engineering toolkit—Linux commands, Docker, Python, SQL, and developer tools. A practical guide to the tools every DE needs.

DuckDB Ecosystem: January 2025

2025/01/10 - Simon Späti

DuckDB Ecosystem: January 2025

DuckDB news: PyIceberg enables local Iceberg catalogs in Python. Zero-egress data sharing via Cloudflare R2. SQLFlow streams Kafka and Bluesky data with DuckDB SQL.

This Month in the DuckDB Ecosystem: November 2024

2024/11/04 - Simon Späti

This Month in the DuckDB Ecosystem: November 2024

DuckDB news: HTTP extension queries REST APIs in SQL. Unity Catalog integration via dbt. Pivot tables extension. Drug database demo processes 6M records per minute.

The Enterprise Case for DuckDB: 5 Key Categories and Why to Use it

2024/10/16 - Simon Späti

The Enterprise Case for DuckDB: 5 Key Categories and Why to Use it

Let's take a closer look to understand the various Enterprise use cases of DuckDB and how they can help on your data and analytics journey. 

This Month in the DuckDB Ecosystem: October 2024

2024/10/04 - Simon Späti

This Month in the DuckDB Ecosystem: October 2024

DuckDB news: v1.1 hits 6M monthly Python downloads. Spark API compatibility layer added. Build RAG apps with GPT-4o embeddings. Extensions reach 17M monthly downloads.

SUBSCRIBE

Subscribe to MotherDuck Blog

Subscription Blog Lottie