This Month in the DuckDB Ecosystem
Subscribe to MotherDuck Blog
Hey, friend 👋
Hi, I'm Marcos! I'm a data engineer by day at X-Team, working for Riot Games. By night, I create newsletters for a few topics I'm passionate about: helping folks find data gigs and AWS graviton. After getting involved in the DuckDB community, I saw a great opportunity to partner with the MotherDuck team to share all the amazing things happening in the DuckDB ecosystem.
Featured Community Members
If we're going to feature members of the community, it makes sense to start off with Mark as one of the co-creators of DuckDB. Mark is now the CTO and Co-Founder of DuckDB Labs as well as a Postdoc in the Database Architectures group within CWI. Oh, and he's still the top committer on DuckDB.
If you've spent any time on Twitter or on the DuckDB Discord, you likely already have seen one of Alex's many helpful responses to questions big and small. Alex is a force that keeps the community quacking. He's a data scientist at Intel, but also works on documentation, tutorials and training at DuckDB Labs.
Top 10 DuckDB Links this Month
1. DuckDB Video Series with Mark Needham
In this video series, Mark Needham does incredible work explaining in just 5 minutes how to do some common data engineering tasks with DuckDB like access parquet files in s3, how to diff parquet schemas, joining csv files on the fly, and how to use DuckDB to analyze the data quality of parquet files. Highly recommended series for anyone starting with DuckDB.
2. Build a Poor Man's Data Lake from Scratch
- DuckDB for SQL transformations
- Dagster for orchestration
- Parquet files on AWS S3 for storage
This is a very interesting resource because it is explained the power of DuckDB with a real use case. If you prefer watching over reading, catch their video on YouTube.
3. Common Crawl on Laptop - Extracting Subset of Data
4. Using Polars on results from DuckDB's Arrow interface in Rust
5. DuckDB: Getting Started for Beginners
"DuckDB is an in-process OLAP DBMS written in C++ blah blah blah, too complicated. Let’s start simple, shall we?." If you can see past the ads on the blog, Mark Lambert did an amazing job explaining how to start with DuckDB from scratch.
6. Query Dataset using DuckDB
Another interesting tutorial on how to use DuckDB, the DuckDB shell (WASM), and Tad (tabular data viewer). The author, business analyst Sung Kim, has other interesting articles, including one on using DuckDB with Jupyter Notebooks.
7. Tips to Design a Distributed Architecture for DuckDB [Twitter thread]
Ismael provides great tips on the topic of which runtime component to use: Lambdas, Fargates, or VMs.
8. Observable Loves DuckDB
This interactive notebook demonstrates how to use the Observable DuckDB client, based on WASM.
9. DuckDB Geo Extension
If you're interested in experimenting with geospatial data in DuckDB, you can use this extension which adds a new GEO type and functionality for basic GIS data analysis.
10. SQL on Python, Part 1: The Simplicity of DuckDB
In this tutorial, Juan Luis Cano explains how to get started with DuckDB in Python to analyze content from Reddit on climate change. He talks about interoperability between DuckDB and pandas DataFrames, Numpy arrays and more.
DuckCon 2023 User Group
Although not until February 3rd, you should plan ahead if you want to join the DuckDB creators and contributors along with the MotherDuck team at this evening of talks, food and drinks. The event is in Brussels and collocated with FOSDEM.
Find something interesting in this newsletter? Share with your friends and let them know they can subscribe to receive it via email.
- Hey, friend 👋
- Featured Community Members
- DuckCon 2023 User Group
Subscribe to MotherDuck Blog