This Month in the DuckDB Ecosystem: January 2023
2023/01/12 - 5 min read
BYHappy new year, friend đź‘‹
Hi, I'm Marcos! I'm a data engineer by day at X-Team, working for Riot Games. By night, I create newsletters for a few topics I'm passionate about: helping folks find data gigs and AWS graviton. After getting involved in the DuckDB community, I saw a great opportunity to partner with the MotherDuck team to share all the amazing things happening in the DuckDB ecosystem.
In this first issue of the year 2023, we wanted to share some of the incredible stuff coming out of the global DuckDB community.
-Marcos Feedback: duckdbnews@motherduck.com
Featured Community Members
Jacob Matson
Jacob is the writer of the Modern Data Stack in a Box with DuckDB. A fast, free, and open-source Modern Data Stack (MDS) can now be fully deployed on your laptop or to a single machine using the combination of DuckDB, Meltano, dbt, and Apache Superset.
He is working today as the VP of Finance & Operations at Simetric, bringing IoT connectivity data into a single pane-of-glass. He also does SMB analytics consulting via his agency, Elliot Point LLC.
You can find him on Twitter @matsonj
Mark Needham
Mark is a Developer Advocate at StarTree, talking about real-time analytics with Apache Pinot. If you are searching for content about DuckDB, it’s highly likely you have found his amazing blog and YouTube channel.
Top 10 DuckDB Links this Month
1. DuckDB big milestone: 1 million downloads per month reached on PyPi
Prof Peter Boncz shared this tweet, highlighting a chart with some incredible news: the duckdb Python package just reached 1M downloads per month in December 2022.
2. Lightning fast aggregations by distributing DuckDB across AWS Lambda functions
In this article, BoilingData’s team explained how to use the power of AWS Lambda as a distributed system in order to scale DuckDB querying operations using a serverless approach.
3. DuckDB in Julia vs pure Julia DataFrames.jl
In this very insightful post, Bogumił Kamiński presented an interesting comparison between native Julia and DuckDB in Julia doing some common operations in exploratory data analysis: accessing data, writing data, performing JOINs, and doing basic computational statistics. Worth a read.
4. A complete DuckDB tutorial for beginners
In this video tutorial of just 26 minutes, Marc Lamberti (the Head of Customer Education at Astronomer) explains with great detail how to start working with DuckDB from scratch, how to do the most common operations with it (GROUP BY, DESCRIBE), data cleansing and more.
The video is coupled with a Notion page with the code from the tutorial.
5. How to build a CDC pipeline with Redpanda that streams operational data from PostgreSQL to DuckDB
Need to load data from an operational database into a data lake for analytical workloads? The Redpanda team wrote this insightful post about how to build a CDC pipeline with Redpanda that streams operational data from PostgreSQL to DuckDB for OLAP analytics.
6. DuckDB: Bringing analytical SQL directly to your Python shell
In this very interesting technical talk in the PyData Eindhoven 2023, Pedro Holanda talks about how DuckDB is integrated with the rich Python ecosystem, the Pandas API, and more.
He talks about 5 key characteristics of DuckDB:
- Vectorized Execution Engine
- End-to-end Query Optimization
- Automatic Parallelism
- Beyond Memory Execution
- and Data Compression
7. Boost Your Cloud Data Applications with DuckDB and Iceberg API
Alon Agmon explained here how to use the Apache Iceberg API with DuckDB to optimize analytics queries on massive Iceberg tables in your cloud storage.
8. Learn Data with Mark: CSV to Parquet with Pandas, Polars, DuckDB
Another short but outstanding video tutorial from the one and only Mark Needham where he talked about how to combine the power of Parquet files with Pandas, Polars and DuckDB. Or if you prefer the text version, you can read the post in Mark’s blog.
Our recommendation? You must subscribe to Mark’s channel. You will find a lot of great gems there.
9. lakeFS ❤️ DuckDB: Embedding an OLAP database in the lakeFS UI
Oz Katz (co-founder and CTO at Treeverse) shared some insights about how they embedded DuckDB inside the lakeFS UI.
10. DuckDB vs. Porto Buses — A Small Case for a New OLAP Engine
Jose Cabeda explains how to use DuckDB for local analysis with only some knowledge of SQL.
Upcoming Events
Online
State Of Data 2023 (January, 18th, 2023): Benjamin Rogojan aka Seattle Data Guy will answer some questions about the current state of Data Engineering. One of those questions: Is everyone switching to DuckDB?
In-Person
Data Day Texas 2023 (January, 28th, 2023): "Your laptop is faster than your data warehouse," by Ryan Boyd (MotherDuck co-founder). 20% discount available to newsletter subscribers.
DuckCon at FOSDEM (February 3, 2023): the DuckDB team has organized this second DuckCon – gathering in Brussels right before FOSDEM. Hear from the creators and contributors to DuckDB as well as the MotherDuck team. Register on meetup.
Subscribe to DuckDB Newsletter