DuckDB Ecosystem Newsletter: April 2023

2023/04/17 - 6 min read

BY

Hey, friend đź‘‹

It’s Marcos again, your “DuckDB News Reporter” with another issue of “This Month in the DuckDB Ecosystem" for April 2023. In this issue, we have a lot of great stuff to share with you, especially Jordan Tigani’s conversation with The Register, Mark Litwintschik’s play with the DuckDB Spatial extension, and much more. Every single day, we see more and more people using DuckDB in production environments with a very diverse set of use cases. So: It’s time to embrace the 🦆.

Remember: if you have any feedback for the newsletter, feel free to send us an email to duckdbnews@motherduck.com

-Marcos

Post Image

Josh Wills

Josh Wills If you have been in the Data Analytics space for a while, you know very well who Josh Wills is. Perhaps you have read his famous quote “Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician”.

Or perhaps you have read his co-authored book called “Advanced Analytics with Spark”. Or even better: you have used the dbt extension for DuckDB created by him on production. You can find him on Twitter as @josh_wills.

Learn more about Josh here

How We Silently Switched Mode’s In-Memory Data Engine to DuckDB To Boost Visual Data Exploration Speed

This very interesting post from the Mode team explains why they selected DuckDB as its in-memory data engine for one of its core features: speed.

DuckDB's Spatial Extension

In this post, Mark Litwintschik walks through some example GIS workflows with the DuckDB Spatial extension. Highly recommended reading!!!

How fast does a compressed file in Part 2

Steven P. Sanderson II, MPH came with a second part of his series about compressed files. This time using the combination of DuckDB and Apache Arrow

DuckDB makes SQL a first-class citizen on DataCamp Workspace

In this blog post, Filip Schouwenaars lists out all recent improvements that make it seamless and efficient to query data with SQL, all without leaving the tool; thanks to DuckDB.

Use dbt and DuckDB instead of Spark in data pipelines

Niels Claeys made a bold proposal here: ditch Spark for the combination of dbt and DuckDB. We are at a perfect time to explore this approach

DuckDB Document Loader by Trent Hauck

In this tweet, the LangChain team showed the awesome work of Trent Hauck about how to use the DuckDB Document Loader with an example. If you want to play with it, you can find the docs here.

Ex-BigQuery exec and Motherduck CEO: For some users, the answer is to think small

A very insightful interview with Jordan Tigani, CEO of MotherDuck where he shared things like

“DuckDB has been able to kind of strip all that away by being an in-process database, and that means that you basically can marshal data in and out of your application, or your data frames, with the minimum of data movements”.

It’s time to think small first.

Using DuckDB with Your Dremio Data Lakehouse

In this article, Alex Merced from Dremio discusses how you can use technologies like Dremio and DuckDB to create a low-cost, high-performance data lakehouse environment accessible to all your users.

Fixing iMessage search with DuckDB

Perhaps Apple: you should listen to Daniel Palma on this. DuckDB could be perfect for this use case here. Fixing iMessages on iOS is one of the most requested features out there, and with DuckDB they could actually fix this easily.

The message is given, Tim.

Upcoming events

Webinar: Doing Analysis in a Post Big Data Era: How industry leaders are driving high-impact decisions with smaller data

April 19, 2023, 10:00 AM PDT

Join us for a conversational webinar between Jordan Tigani, Founder and CEO at MotherDuck, and Benn Stancil, co-founder and CTO at Mode, two industry leaders who’ve called at the end of big data (Benn’s take; Jordan’s take).

In this discussion, they'll talk about how the hyped “We have tons of data, and we’re going to change the world with it” narrative of the 2010s looks from today’s vantage point — and how leading companies are navigating a higher impact, faster moving data-informed decision-making process using smaller data.

Webinar: Big Data: Funeral or Renaissance?

April 20, 2023, 12:00 PM

Jordan Tigani, CEO + Founder of MotherDuck and one of the founding engineers on Google BigQuery, recently wrote a blog post called "Big Data is Dead" which took the internet by storm.

Aditya Parameswaran, Co-Founder of Ponder and Associate Professor at UC Berkeley, wrote a rebuttal called "Big Data Is Dead… Long Live Big Data."

This interactive broadcast will be a fun and lively debate answering the question of whether we should host a funeral for big data or if big data is having a renaissance.

The debate will be moderated by Aaron Elmore, Associate Professor at the University of Chicago.

Data + AI Summit Keynote Day 2

June 29, 2023, San Francisco

Data, analytics and AI landscape Discover what’s driving so much focus on data and why data professionals are zeroing in on new ways to tackle their database challenges. Learn why there is so much interest in LLMs, what is happening across the data, analytics and AI landscape and the future of the market

Evolution of the lakehouse Take a look at the larger universe that the lakehouse lives inside of, learn what’s new and explore the evolution with us

Open source technologies Hear from the open source community about what’s new and what’s to come for Apache Spark™, Delta Lake and MLflow and learn how this affects the lakehouse and the overall market at large

Presenters:

  • Hannes MĂĽhleisen, Co-Founder & CEO, DuckDB Labs
  • Lin Qiao, Co-creator of PyTorch, Co-founder and CEO, Fireworks
  • Nat Friedman, Creator of Copilot; Former CEO, Github
  • Jitendra Malik, Computer Vision Pioneer, Former Head of Facebook AI Research, University of California at Berkeley

Subscribe to DuckDB Newsletter

blog subscription icon

Subscribe to motherduck blog