
2023/02/22 - Marcos Ortiz
This Month in the DuckDB Ecosystem: February 2023
DuckDB news: v0.7.0 adds JSON ingestion, partitioned Parquet export, and UPSERT support. Benchmarks show 4-200x faster than Postgres on AWS cost queries.
Hi, I'm Marcos! I'm a data engineer by day at Riot Games (via X-Team). By night, I create newsletters for a few topics I'm passionate about: helping folks find data digs and AWS graviton. After getting involved in the DuckDB community, I saw a great opportunity to partner with the MotherDuck team to share all the amazing things happening in the DuckDB ecosystem.
We hope you enjoy!
-Marcos
Feedback: duckdbnews@motherduck.com
In this issue, we wanted to share some of the excellent resources that came out in the second half of February and the first half of March. Enjoy
Elliana is a Sr Software Engineer at Bankwest in Australia, a contributor to DuckDB and a part-time contractor for DuckDB Labs. She's done a lot of work to improve DevEx and exception handling within DuckDB.
She's also building a SQLAlchemy driver for DuckDB, allowing you to use their ORM in Python.
In this first Quack Chat episode, Mehdi has interviewed Hannes, CEO of DuckDB Labs and co-creator of DuckDB in Brussels during DuckCon.
This blog post demonstrated a powerful approach for plotting large datasets powered using JupySQL and DuckDB. If you need to visualize large datasets, DuckDB offers unmatched simplicity and flexibility!
The fantastic team at Duck Labs just recently improved DuckDB’s JSON extension so JSON files can be directly queried as if they were tables.
Nabil Servais shared a very simple and straightforward way to make spatial analysis with AWS Lambda and DuckDB.
If you're interested in DuckDB geospatial analysis, you'll also want to check out an excellent article by Mark Litwintschik.
Carlin Eng just published a terrific blog post where he gave an overview of the TPC-DS dataset, with the queries translated to DuckDB.
Adrien Sales maintains a list of software end-of-life dates and wants to keep a local copy of the data used to publish the website. Kaggle provides a great notebook interface, so Adrien built a solution on top it and shared more in this article!
Coste Virgile showed a very interesting approach to unit testing using SQL.
Octavian Zarzu showed us how to build a streaming application with the help of DuckDB and Streamlit.

DuckDB contributor Pedro Holanda (past featured community member and speaker) explains here how to use Scrooge, is a third-party DuckDB extension focusing on financial data analysis
In this article, Simon Späti takes a closer look to Pandas 2.0 and how is its integration with the whole Python ecosystem, especially Arrow, Polars, and DuckDB.
Tobias MĂĽller showed a possible serverless solution using DuckDB to repartition data that is stored in S3 as parquet files, without limitations imposed by certain AWS services.
[Okay, we fooled you; we have more than 10 links this week!!]
Benn Stancil of Mode and Jordan Tigani of MotherDuck discuss state of Big Data (Online) (Wed, April 19, 2023, 10:00AM PDT)
Doing analysis in a post big data era? Benn and Jordan will discuss how the industry is trying to navigate making faster decisions with a higher impact using smaller datasets.
QCon London, next week, is a software development conference featuring some of the brightest minds across software. Hannes MĂĽhleisen, co-creator of DuckDB, will present on "In-Process Analytical Data Management with DuckDB."
Data Council Austin(also next week) will feature three days of technical talks on analytics, data engineering, data science and AI. Nicholas Ursa, co-founder and software engineer at MotherDuck, will speak about how "Data Warehouses are Gilded Cages. What Comes Next?" MotherDuck CEO Jordan Tigani is also giving one of the keynotes this year on how Big Data is Dead, based on his blog post that took the internet by storm. While not directly on the topic of DuckDB, some of his ideas in the talk are inspired by it.
Data Quality Camp Happy Hour Austin will also take the night before Data Council. This event has many featured guests who are prominent in the data community, including:
Let's Talk Data San Francisco on 3 April will feature two talks around Why is DuckDB all the rage in the Data Community? with Ryan Boyd (MotherDuck co-founder) and Vino Duraisami (Developer Advocate at lakeFS).
Modern Data Stack Conference (MDS Con) by Fivetran at the beginning of April will feature leaders in the industry such as DJ Patil, George Fraser, Tristan Handy, Ali Ghodsi, renowned analyst Sanjeev Mohan and Data Council founder Pete Soderling. Ryan Boyd, co-founder at MotherDuck, will be on a panel with Gabi Steele (CEO, Preql) and Chetan Sharma (CEO, Eppo).
Utah Data Engineering Meetup Salt Lake City (UDEM) is organized by Joe Reis and Matt Housley of O'Reilly fame. At this meetup on April 19th, Ryan Boyd, co-founder at MotherDuck, will give an introduction to the open source DuckDB project, talk about how it’s used and some of the attributes which have made it take the internet by storm.
Find something interesting in this newsletter?
Share with your friends and let them know they can subscribe.

2023/02/22 - Marcos Ortiz
DuckDB news: v0.7.0 adds JSON ingestion, partitioned Parquet export, and UPSERT support. Benchmarks show 4-200x faster than Postgres on AWS cost queries.
2023/03/16 - Mehdi Ouazza
Interview with co-creator of DuckDB Hannes