Building data-driven components and applications doesn't have to be so ducking hard

This Month in the DuckDB Ecosystem: January 2023

2023/01/12

BY

Subscribe to the newsletter

Happy new year, friend 👋

Hi, I'm Marcos! I'm a data engineer by day at X-Team, working for Riot Games. By night, I create newsletters for a few topics I'm passionate about: helping folks find data gigs and AWS graviton. After getting involved in the DuckDB community, I saw a great opportunity to partner with the MotherDuck team to share all the amazing things happening in the DuckDB ecosystem.

In this first issue of the year 2023, we wanted to share some of the incredible stuff coming out of the global DuckDB community.

-Marcos Feedback: duckdbnews@motherduck.com

Jacob Matson

Jacob Matson

Jacob is the writer of the Modern Data Stack in a Box with DuckDB. A fast, free, and open-source Modern Data Stack (MDS) can now be fully deployed on your laptop or to a single machine using the combination of DuckDB, Meltano, dbt, and Apache Superset.

He is working today as the VP of Finance & Operations at Simetric, bringing IoT connectivity data into a single pane-of-glass. He also does SMB analytics consulting via his agency, Elliot Point LLC.

You can find him on Twitter @matsonj

Learn more about Jacob

Mark Needham

Mark Needham

Mark is a Developer Advocate at StarTree, talking about real-time analytics with Apache Pinot. If you are searching for content about DuckDB, it’s highly likely you have found his amazing blog and YouTube channel.

Learn more about Mark

1. DuckDB big milestone: 1 million downloads per month reached on PyPi

Prof Peter Boncz shared this tweet, highlighting a chart with some incredible news: the duckdb Python package just reached 1M downloads per month in December 2022.

2. Lightning fast aggregations by distributing DuckDB across AWS Lambda functions

In this article, BoilingData’s team explained how to use the power of AWS Lambda as a distributed system in order to scale DuckDB querying operations using a serverless approach.

3. DuckDB in Julia vs pure Julia DataFrames.jl

In this very insightful post, Bogumił Kamiński presented an interesting comparison between native Julia and DuckDB in Julia doing some common operations in exploratory data analysis: accessing data, writing data, performing JOINs, and doing basic computational statistics. Worth a read.

4. A complete DuckDB tutorial for beginners

In this video tutorial of just 26 minutes, Marc Lamberti (the Head of Customer Education at Astronomer) explains with great detail how to start working with DuckDB from scratch, how to do the most common operations with it (GROUP BY, DESCRIBE), data cleansing and more.

The video is coupled with a Notion page with the code from the tutorial.

5. How to build a CDC pipeline with Redpanda that streams operational data from PostgreSQL to DuckDB

Need to load data from an operational database into a data lake for analytical workloads? The Redpanda team wrote this insightful post about how to build a CDC pipeline with Redpanda that streams operational data from PostgreSQL to DuckDB for OLAP analytics.

6. DuckDB: Bringing analytical SQL directly to your Python shell

In this very interesting technical talk in the PyData Eindhoven 2023, Pedro Holanda talks about how DuckDB is integrated with the rich Python ecosystem, the Pandas API, and more.

He talks about 5 key characteristics of DuckDB:

  • Vectorized Execution Engine
  • End-to-end Query Optimization
  • Automatic Parallelism
  • Beyond Memory Execution
  • and Data Compression

7. Boost Your Cloud Data Applications with DuckDB and Iceberg API

Alon Agmon explained here how to use the Apache Iceberg API with DuckDB to optimize analytics queries on massive Iceberg tables in your cloud storage.

8. Learn Data with Mark: CSV to Parquet with Pandas, Polars, DuckDB

Another short but outstanding video tutorial from the one and only Mark Needham where he talked about how to combine the power of Parquet files with Pandas, Polars and DuckDB. Or if you prefer the text version, you can read the post in Mark’s blog.

Our recommendation? You must subscribe to Mark’s channel. You will find a lot of great gems there.

9. lakeFS ❤️ DuckDB: Embedding an OLAP database in the lakeFS UI

Oz Katz (co-founder and CTO at Treeverse) shared some insights about how they embedded DuckDB inside the lakeFS UI.

10. DuckDB vs. Porto Buses — A Small Case for a New OLAP Engine

Jose Cabeda explains how to use DuckDB for local analysis with only some knowledge of SQL.

Upcoming Events

Online

State Of Data 2023 (January, 18th, 2023): Benjamin Rogojan aka Seattle Data Guy will answer some questions about the current state of Data Engineering. One of those questions: Is everyone switching to DuckDB?

In-Person

Data Day Texas 2023 (January, 28th, 2023): "Your laptop is faster than your data warehouse," by Ryan Boyd (MotherDuck co-founder). 20% discount available to newsletter subscribers.

DuckCon at FOSDEM (February 3, 2023): the DuckDB team has organized this second DuckCon – gathering in Brussels right before FOSDEM. Hear from the creators and contributors to DuckDB as well as the MotherDuck team. Register on meetup.

sticker-stop-quacking-transparent.png
CONTENT
  1. Happy new year, friend 👋
  2. Upcoming Events

Subscribe to the newsletter