This Month in the DuckDB Ecosystem: March 2023

2023/03/23 - 6 min read

BY

Hey, friend 👋

Hi, I'm Marcos! I'm a data engineer by day at Riot Games (via X-Team). By night, I create newsletters for a few topics I'm passionate about: helping folks find data digs and AWS graviton. After getting involved in the DuckDB community, I saw a great opportunity to partner with the MotherDuck team to share all the amazing things happening in the DuckDB ecosystem.

We hope you enjoy!

-Marcos

Feedback: duckdbnews@motherduck.com

In this issue, we wanted to share some of the excellent resources that came out in the second half of February and the first half of March. Enjoy

Post Image

Elliana May

Elliana is a Sr Software Engineer at Bankwest in Australia, a contributor to DuckDB and a part-time contractor for DuckDB Labs. She's done a lot of work to improve DevEx and exception handling within DuckDB.

She's also building a SQLAlchemy driver for DuckDB, allowing you to use their ORM in Python.

You can find her on Twitter @Mause_me and on GitHub.

Learn more about Elliana

1. The Surprising Birth Of DuckDB ft. Co-creator Hannes Mühleisen

In this first Quack Chat episode, Mehdi has interviewed Hannes, CEO of DuckDB Labs and co-creator of DuckDB in Brussels during DuckCon.

2. JupySQL Plotting with DuckDB

This blog post demonstrated a powerful approach for plotting large datasets powered using JupySQL and DuckDB. If you need to visualize large datasets, DuckDB offers unmatched simplicity and flexibility!

3. Shredding Deeply Nested JSON, One Vector at a Time

The fantastic team at Duck Labs just recently improved DuckDB’s JSON extension so JSON files can be directly queried as if they were tables.

4. Serverless Spatial Analysis with DuckDB and AWS Lambda — Part 1 “Making it work”

Nabil Servais shared a very simple and straightforward way to make spatial analysis with AWS Lambda and DuckDB.

If you're interested in DuckDB geospatial analysis, you'll also want to check out an excellent article by Mark Litwintschik.

5. Exploring the TPC-DS Benchmark Queries with Malloy

Carlin Eng just published a terrific blog post where he gave an overview of the TPC-DS dataset, with the queries translated to DuckDB.

6. From API to scheduled offline copies with DuckDB on Kaggle

Adrien Sales maintains a list of software end-of-life dates and wants to keep a local copy of the data used to publish the website. Kaggle provides a great notebook interface, so Adrien built a solution on top it and shared more in this article!

7. Unit testing SQL queries with DuckDB

Coste Virgile showed a very interesting approach to unit testing using SQL.

8. Build and deploy apps with DuckDB and Streamlit in under one hour

Octavian Zarzu showed us how to build a streaming application with the help of DuckDB and Streamlit.

Post Image

9. Scrooge: Analyzing Yahoo Financial Data In DuckDB

DuckDB contributor Pedro Holanda (past featured community member and speaker) explains here how to use Scrooge, is a third-party DuckDB extension focusing on financial data analysis

10. Pandas 2.0 and its Ecosystem (Arrow, Polars, DuckDB)

In this article, Simon Späti takes a closer look to Pandas 2.0 and how is its integration with the whole Python ecosystem, especially Arrow, Polars, and DuckDB.

11. Using DuckDB to repartition parquet data in S3

Tobias Müller showed a possible serverless solution using DuckDB to repartition data that is stored in S3 as parquet files, without limitations imposed by certain AWS services.

[Okay, we fooled you; we have more than 10 links this week!!]

Upcoming Online Events

Benn Stancil of Mode and Jordan Tigani of MotherDuck discuss state of Big Data (Online) (Wed, April 19, 2023, 10:00AM PDT)

Doing analysis in a post big data era? Benn and Jordan will discuss how the industry is trying to navigate making faster decisions with a higher impact using smaller datasets.

Upcoming In-Person Events

QCon London, next week, is a software development conference featuring some of the brightest minds across software. Hannes Mühleisen, co-creator of DuckDB, will present on "In-Process Analytical Data Management with DuckDB."

Data Council Austin(also next week) will feature three days of technical talks on analytics, data engineering, data science and AI. Nicholas Ursa, co-founder and software engineer at MotherDuck, will speak about how "Data Warehouses are Gilded Cages. What Comes Next?" MotherDuck CEO Jordan Tigani is also giving one of the keynotes this year on how Big Data is Dead, based on his blog post that took the internet by storm. While not directly on the topic of DuckDB, some of his ideas in the talk are inspired by it.

Data Quality Camp Happy Hour Austin will also take the night before Data Council. This event has many featured guests who are prominent in the data community, including:

Let's Talk Data San Francisco on 3 April will feature two talks around Why is DuckDB all the rage in the Data Community? with Ryan Boyd (MotherDuck co-founder) and Vino Duraisami (Developer Advocate at lakeFS).

Modern Data Stack Conference (MDS Con) by Fivetran at the beginning of April will feature leaders in the industry such as DJ Patil, George Fraser, Tristan Handy, Ali Ghodsi, renowned analyst Sanjeev Mohan and Data Council founder Pete Soderling. Ryan Boyd, co-founder at MotherDuck, will be on a panel with Gabi Steele (CEO, Preql) and Chetan Sharma (CEO, Eppo).

Utah Data Engineering Meetup Salt Lake City (UDEM) is organized by Joe Reis and Matt Housley of O'Reilly fame. At this meetup on April 19th, Ryan Boyd, co-founder at MotherDuck, will give an introduction to the open source DuckDB project, talk about how it’s used and some of the attributes which have made it take the internet by storm.

Subscribe to the Newsletter

Find something interesting in this newsletter?

Share with your friends and let them know they can subscribe.

Subscribe to DuckDB Newsletter

blog subscription icon

Subscribe to motherduck blog