This Month in the DuckDB Ecosystem

2022/12/15 - 4 min read

This Month in the DuckDB Ecosystem

2022/12/15 - 4 min read

Hey, friend 👋

Hi, I'm Marcos! I'm a data engineer by day at X-Team, working for Riot Games. By night, I create newsletters for a few topics I'm passionate about: helping folks find data gigs and AWS graviton. After getting involved in the DuckDB community, I saw a great opportunity to partner with the MotherDuck team to share all the amazing things happening in the DuckDB ecosystem.

Marcos

Feedback: duckdbnews@motherduck.com

Featured Community Members

Mark Raasveldt

If we're going to feature members of the community, it makes sense to start off with Mark as one of the co-creators of DuckDB. Mark is now the CTO and Co-Founder of DuckDB Labs as well as a Postdoc in the Database Architectures group within CWI. Oh, and he's still the top committer on DuckDB.

Learn more about Mark

Alex Monahan

If you've spent any time on Twitter or on the DuckDB Discord, you likely already have seen one of Alex's many helpful responses to questions big and small. Alex is a force that keeps the community quacking. He's a data scientist at Intel, but also works on documentation, tutorials and training at DuckDB Labs.

Learn more about Alex

Top 10 DuckDB Links this Month

1. DuckDB Video Series with Mark Needham

In this video series, Mark Needham does incredible work explaining in just 5 minutes how to do some common data engineering tasks with DuckDB like access parquet files in s3, how to diff parquet schemas, joining csv files on the fly, and how to use DuckDB to analyze the data quality of parquet files. Highly recommended series for anyone starting with DuckDB.

2. Build a Poor Man's Data Lake from Scratch

In this article, Pete Hunt and Sandy Ryza from Dagster built a data lake using:

DuckDB for SQL transformations
Dagster for orchestration
Parquet files on AWS S3 for storage

This is a very interesting resource because it is explained the power of DuckDB with a real use case. If you prefer watching over reading, catch their video on YouTube.

3. Common Crawl on Laptop - Extracting Subset of Data

In this article, Chillar Anand analyzed 250 GB of a very popular web crawl dataset locally using DuckDB. He demonstrates the DuckDB feature which allows you to query remote files using HTTPFS.

4. Using Polars on results from DuckDB's Arrow interface in Rust

Rust is increasing in popularity these days, and this article from Vikram Oberoi is a very interesting exploration of the topic of DuckDB + Rust.

5. DuckDB: Getting Started for Beginners

"DuckDB is an in-process OLAP DBMS written in C++ blah blah blah, too complicated. Let’s start simple, shall we?." If you can see past the ads on the blog, Mark Lambert did an amazing job explaining how to start with DuckDB from scratch.

6. Query Dataset using DuckDB

Another interesting tutorial on how to use DuckDB, the DuckDB shell (WASM), and Tad (tabular data viewer). The author, business analyst Sung Kim, has other interesting articles, including one on using DuckDB with Jupyter Notebooks.

7. Tips to Design a Distributed Architecture for DuckDB [Twitter thread]

Ismael provides great tips on the topic of which runtime component to use: Lambdas, Fargates, or VMs.

8. Observable Loves DuckDB

This interactive notebook demonstrates how to use the Observable DuckDB client, based on WASM.

9. DuckDB Geo Extension

If you're interested in experimenting with geospatial data in DuckDB, you can use this extension which adds a new GEO type and functionality for basic GIS data analysis.

10. SQL on Python, Part 1: The Simplicity of DuckDB

In this tutorial, Juan Luis Cano explains how to get started with DuckDB in Python to analyze content from Reddit on climate change. He talks about interoperability between DuckDB and pandas DataFrames, Numpy arrays and more.

DuckCon 2023 User Group

Although not until February 3rd, you should plan ahead if you want to join the DuckDB creators and contributors along with the MotherDuck team at this evening of talks, food and drinks. The event is in Brussels and collocated with FOSDEM.

Learn more

Find something interesting in this newsletter? Share with your friends and let them know they can subscribe to receive it via email.