Sophomore slump? Never heard of it! For the second year in a row, data practitioners from around the globe joined us for Small Data SF, the hands-on conference for builders creating faster, simpler, more cost-effective systems.
With incredible sessions, dynamite food, and a mighty small data community, there’s so much to unpack from both days. In the spirit of efficiency, let’s give it a shot:
Day one: workshops, workshops, workshops!
Packed rooms, quiet hallways, the faint sounds of keyboards clacking away… That was the scene for day one of Small Data SF, where we welcomed our intrepid presenters for eight hands-on, technical workshops.
Picking a favorite would be like picking a favorite child, but here are a couple of highlights straight from the workshop floor:
Serverless lakehouse from scratch with DuckLake
Ever felt like the complexity of “Big Data” lakehouse tools was just too much? This session, run by Jacob Matson of MotherDuck, featured a step-by-step walkthrough of building a serverless lakehouse on DuckLake, the simplest lakehouse format. Attendees dug into the architecture of DuckLake, got hands-on experience querying DuckLake tables with SQL, and deployed their lakehouse on MotherDuck for a truly serverless experience. Ducks and data lakes, what a combo!

Agents, meet open-source
After lunch, Zain Hasan of Together.ai jumped straight into a hands-on session for the data science-inclined. Specifically, the workshop demonstrated to attendees how to build an AI data science agent from scratch, utilizing open-source models and modern AI tools. Participants got a crash course on agent architectures, implemented the ReAct framework for agent building, and learned how to safely execute code using Together’s Code Interpreter API.

Day two: the Small Data movement evolves
As the kids say, Wednesday morning “hit different”. Following Tuesday's deep workshops, data practitioners packed into the main hall ready for something bigger. Or should we say, smaller?
The future of data engineering
Joe Reis kicked us off with The Great Data Engineering Reset, talking about the shift from pipelines to agents and beyond. With agents showing up everywhere, what happens to the data engineering discipline, practices, and teams?
We caught some early feedback from attendees on the way out, who felt the pressure and excitement of a changing industry, combined with a hearty “plus one” to Joe’s message about renewing focus on the fundamentals of data engineering as the world changes rapidly around us.

Small data, revisited
Then, from the pen that spawned the small data movement, Jordan Tigani's Small Data: The Embiggening took a renewed look at the concept of small data entirely. Is it small data we’ve really been talking about, or something different?
Jordan laid out his argument for the crowd: we should actually think about data system design in two dimensions, the compute size required for a workload and the size of the data within an organization. Imagine you have a petabyte-scale lakehouse, but 99% of your queries scan a small fraction of your data. You’d be far better served by a system designed for this reality with the flexibility to extend to the last 1% of truly large queries, versus a distributed system built for edge cases from the beginning. Midway through the talk, the whole room chanted "I've got small data" together, and it felt good.

The times, they are a-changing
After lunch, we heard talks from practitioners of all backgrounds, with data of all shapes and sizes. Apache Spark committer and PMC member Holden Karau talked us through When Not to Use Spark, putting inquiring minds at ease that no, you don’t need a Spark cluster if you can load your data into an Excel workbook. An expert perspective if we’ve ever heard one!

Sahil Gupta, senior data engineer at DoSomething.org, shared his story about rebuilding the nonprofit’s digital platform with a focus on efficient, practical design choices that reflected his team’s reality, not the latest vendor hype.

Shelby Heinecke, an AI research leader at Salesforce, shared a peek behind the AI curtain and how the small data ethos shows up in frontier AI research. We’ve all heard about large language models, but doesn’t that imply the existence of small(er) language models?
Yes! Yes it does, and Shelby’s team is building them. With a focus on high-quality, task-specific data, models with names like “TinyGiant” punch far above their weight.

We closed out with the second panel of the day, titled Is the Future Small? Benn Stancil, Joe Reis (deeplearning.ai), Shelby Heinecke (Salesforce), and George Fraser (Fivetran) met on stage to riff on the future of our industry, and how the tools that got us here may not get us where we’re going (agents, anyone?)

Small data, good vibes
From the event space to the coffee bar to the swag shop–Small Data vibes were off the charts. The whole community showed up with warm, curious energy, and it paid off in the post-event surveys. One attendee offered: “Incredible care every step of the way. Check-in flawless, calendar invites were helpful, food delicious, swag on point, vendors were limited and great. Fav conf of the year.” You love to hear it.

And can we get a shout-out for the demo booths? Sometimes you get to these things, and the expo hall feels like a labyrinth of salespeople and cheap swag. Unsurprisingly, Small Data was different. No tower-scale assemblies, just right-sized booths with good people and helpful demos. There’s a data metaphor in there somewhere.

Thank you!
From all of us here at MotherDuck, a very heartfelt “thank you” to everyone who took the time to join us for this year’s event. It’s truly the community that makes the difference, and it was wonderful to put together an experience for community members to meet, learn, and challenge data orthodoxy together.
Most conferences leave you exhausted. This one? Full of energy. Thank you to our wonderful speakers, sponsors, event partners, and everyone else who made year two of Small Data SF a reality. Until next time!

TABLE OF CONTENTS
Start using MotherDuck now!



