Data Engineering Is AI Engineering

2026/05/26

TL;DR: Jacob Matson refactored a real NBA box scores app using Claude Code and agent teams. The process revealed something he didn't expect: data engineering patterns like DAGs, wave-based planning, parallel workers, and constant testing are the same patterns that make AI agent workflows actually work.

From vibe code to production pipeline

About eighteen months ago, Jacob built an NBA box scores app. It worked, but the code was a mess — mixed Python and TypeScript, no incremental loading, raw SQL injection in the frontend, hard-coded values everywhere, no tests. He used AI agents to refactor the whole thing into a clean, all-TypeScript project, and the process taught him something unexpected about how data engineering and AI engineering overlap.

How the workflow changed over time

Jacob didn't start with a sophisticated setup. He began in Cursor, editing one file at a time. Then he moved to Claude Code, which was the first time he stepped out of the code review loop entirely. One window became three parallel windows, coordinated through GitHub issues. Eventually he had agent teams running in isolated work trees, each picking up an issue and reporting back. The thing that stuck with him: dispatching waves of parallel work to agents looks exactly like scheduling tasks in a data pipeline DAG.

Architecture: MotherDuck as the full stack

The app pulls data from NBA APIs (Play-by-Play Stats for historical data, NBA CDN for live scores), loads raw JSON into MotherDuck, transforms it with SQL, and serves it to a Next.js frontend using MotherDuck's DuckDB Wasm client. No separate backend, no ORM. Just SQL and a browser-side database connected to the cloud. Jacob also built an adaptive rate limiter to handle the NBA API's unpublished rate limits, plus data quality detectors that caught real edge cases like zero-minute player records.

Build your own workflow

Jacob's strongest advice: don't copy someone else's AI workflow. He compared it to driving an F1 car — the seat, pedals, and steering wheel need to be custom-fitted because you're moving so much faster than before. Adopting someone else's prompt templates or agent frameworks didn't work for him. What did work was understanding his own thinking patterns, building tooling around them, and then generalizing back to a team workflow.

What carried over from data engineering

Testing constantly. Planning work in dependency-aware waves. Building simple architectures you can reason about. Keeping pipelines as dumb as possible. The tooling changed, but the core skills transferred directly: knowing what to build, breaking problems into a graph, and validating at every step. That's data engineering. It's also how you orchestrate AI agents.

FAQS

Jacob refactored an NBA box scores app using Claude Code, and his process changed quite a bit along the way. He started with single-window Cursor sessions and ended up running parallel agents in isolated work trees, each one picking up a GitHub issue and reporting back when it finished.

What struck him was that coordinating AI agents looks a lot like scheduling tasks in a data pipeline DAG. You have parallel workers, you need to respect dependencies between tasks, and you're running tests constantly to make sure nothing broke.

The app uses MotherDuck as both the data warehouse and serving layer. Raw JSON from NBA APIs gets loaded into MotherDuck, transformed with SQL, and served to a Next.js frontend through the MotherDuck DuckDB Wasm client. There's no separate API server or backend. Jacob also built an adaptive rate limiter to handle unpublished API rate limits, and data quality detectors that caught real edge cases in the NBA data.

Jacob compared it to driving an F1 car: the seat, pedals, and steering wheel are custom-fitted to your body because you're moving so fast. He tried copying other people's prompt templates and agent frameworks, and none of it stuck. What actually helped was paying attention to how he thinks and building tooling around that. Teams still need shared interfaces, but your individual workflow should fit your brain, not someone else's.

Jacob said the split is heading toward 80% planning, 20% execution, though he sits closer to 50/50 because experimentation has gotten so cheap. He organized work into waves, groups of tasks running in parallel, with checkpoints between them to make sure tests still passed. Each agent team worked in its own isolated work tree, so merge conflicts were the AI's problem, not the humans'. The real trick was writing tests constantly so issues showed up during merges, not after.

Data engineering skills transfer directly to AI agent work. The core patterns are the same whether you're building a data pipeline or orchestrating a team of AI agents: breaking problems into dependency graphs, scheduling parallel workers, building testable assertions, keeping architectures simple enough to reason about. Jacob thinks demand for data engineering is effectively infinite, since the more data you use, the more data you need. AI can automate the scaffolding work — handling messy APIs, building reliability into connectors — but knowing what to build and how to architect it is still a human skill.

Related Videos