YouTube

Cultivating Growth: How Gardyn Scaled its Data Operations with MotherDuck

2025/05/28

From MySQL to Modern Analytics: A Data Platform Transformation Story

Gardyn, the innovative indoor hydroponic gardening company, faced a critical challenge as they scaled their operations. With smart devices generating vast amounts of sensor data, computer vision outputs, and customer interaction metrics, their data infrastructure needed a complete overhaul to keep pace with business growth.

The Starting Point: Executives Running Production Queries

When Rob joined Gardyn three years ago as their first full-time data scientist, the data landscape was minimal. Executives were running queries directly against the production MySQL database, creating risks for system performance and customer experience. The immediate priority was establishing basic analytics capabilities while separating analytical workloads from production systems.

Building the Initial Infrastructure

Rob's first steps involved creating a MySQL replica for analytics and deploying a Kubernetes cluster to run Apache Airflow for orchestration. This homegrown solution included:

  • Custom Python scripts for data ingestion
  • Raw SQL transformations managed through Airflow
  • Jupyter notebooks for ad hoc analysis
  • Plotly Dash applications for dashboards

While this approach initially worked, it quickly became unsustainable. As the data volume grew and transformation complexity increased, the daily pipeline runtime ballooned from one hour to over 24 hours, creating significant operational challenges.

The Modern Data Stack Migration

Facing these scaling limitations, Gardyn embarked on a comprehensive platform modernization. The new architecture centered around several key components:

Data Warehouse: MotherDuck

The migration from MySQL to MotherDuck delivered immediate performance improvements. Pipeline runtime dropped from 24+ hours to just 10 minutes, enabling the team to build date-spined models and perform complex time series analysis that was previously impossible.

Transformation Layer: dbt

Moving from raw SQL to dbt eliminated the need to manually manage dependencies and made the transformation logic more maintainable and scalable.

Orchestration: Dagster

The switch from Airflow to Dagster was driven by seamless dbt integration. Dagster's asset-based approach simplified dependency management and provided better visibility into the data pipeline.

Analytics Tools: Hex and Hashboard

These platforms replaced the self-hosted Jupyter notebooks and Dash applications, providing:

  • Hex for Python-based analysis and machine learning workflows
  • Hashboard for self-service BI with a semantic layer that enables non-technical users to explore data safely

Unlocking Business Value Through Integrated Data

The new platform enabled Gardyn to finally integrate data from multiple sources:

  • Device sensor readings
  • Computer vision model outputs detecting plant health
  • Customer app interactions
  • Website engagement metrics

This unified view of the customer journey has powered new features, including a "care score" that gives customers insights into how well they're maintaining their devices. The computer vision system can now detect issues with plants and automatically notify users, as well as celebrate positive milestones like new flowers or ripe vegetables.

Key Lessons for Scaling Data Operations

Rob's advice for other data professionals facing similar challenges emphasizes thinking long-term even when pressured for quick answers. Building generalizable models and scalable infrastructure from the start, even if it takes extra time initially, pays dividends as the business grows.

The transformation also freed Rob to focus on actual data science work rather than infrastructure maintenance. With the Kubernetes cluster retired and managed services handling the heavy lifting, the team can now concentrate on delivering insights that improve the customer experience and drive business growth.

Looking Forward

Gardyn's data team is now focused on modeling customer journeys in greater detail and expanding their computer vision capabilities. The solid foundation built through this migration enables them to tackle increasingly sophisticated analytical challenges while maintaining the performance and reliability their growing business demands.

The journey from production database queries to a modern data platform illustrates how thoughtful architecture choices and the right tool selection can transform a company's ability to leverage its data assets effectively.

CONTENT
  1. From MySQL to Modern Analytics: A Data Platform Transformation Story
  2. The Starting Point: Executives Running Production Queries
  3. Building the Initial Infrastructure
  4. The Modern Data Stack Migration
  5. Unlocking Business Value Through Integrated Data
  6. Key Lessons for Scaling Data Operations
  7. Looking Forward
CONTENT
  1. From MySQL to Modern Analytics: A Data Platform Transformation Story
  2. The Starting Point: Executives Running Production Queries
  3. Building the Initial Infrastructure
  4. The Modern Data Stack Migration
  5. Unlocking Business Value Through Integrated Data
  6. Key Lessons for Scaling Data Operations
  7. Looking Forward

Related Videos

"How to Efficiently Load Data into DuckLake with Estuary" video thumbnail

2025-07-26

How to Efficiently Load Data into DuckLake with Estuary

Learn how DuckLake, MotherDuck, and Estuary enable fast, real-time data integration and analytics with modern open table formats, cloud data warehousing, and no-code streaming pipelines.

YouTube

"What can Postgres learn from DuckDB? (PGConf.dev 2025)" video thumbnail

20:44

2025-06-13

What can Postgres learn from DuckDB? (PGConf.dev 2025)

DuckDB an open source SQL analytics engine that is quickly growing in popularity. This begs the question: What can Postgres learn from DuckDB?

YouTube

Ecosystem

Talk

" pg_duckdb: Ducking awesome analytics in Postgres" video thumbnail

2025-06-12

pg_duckdb: Ducking awesome analytics in Postgres

Supercharge your Postgres analytics! This talk shows how the pg_duckdb extension accelerates your slowest queries instantly, often with zero code changes. Learn practical tips and how to use remote columnar storage for even more speed.

Talk

Sources