
2022/11/11 - Tino Tereshko, Ryan Boyd
Why Use DuckDB for Analytics?
Fast aggregations, excellent SQL support, runs anywhere, provides simplified data access: cloud and local, works with your tools and frameworks.
Hi, I'm Marcos! I'm a data engineer by day at X-Team, working for Riot Games. By night, I create newsletters for a few topics I'm passionate about: helping folks find data gigs and AWS graviton. After getting involved in the DuckDB community, I saw a great opportunity to partner with the MotherDuck team to share all the amazing things happening in the DuckDB ecosystem.
Marcos
Feedback: duckdbnews@motherduck.com
If we're going to feature members of the community, it makes sense to start off with Mark as one of the co-creators of DuckDB. Mark is now the CTO and Co-Founder of DuckDB Labs as well as a Postdoc in the Database Architectures group within CWI. Oh, and he's still the top committer on DuckDB.
If you've spent any time on Twitter or on the DuckDB Discord, you likely already have seen one of Alex's many helpful responses to questions big and small. Alex is a force that keeps the community quacking. He's a data scientist at Intel, but also works on documentation, tutorials and training at DuckDB Labs.
In this video series, Mark Needham does incredible work explaining in just 5 minutes how to do some common data engineering tasks with DuckDB like access parquet files in s3, how to diff parquet schemas, joining csv files on the fly, and how to use DuckDB to analyze the data quality of parquet files. Highly recommended series for anyone starting with DuckDB.
In this article, Pete Hunt and Sandy Ryza from Dagster built a data lake using:
This is a very interesting resource because it is explained the power of DuckDB with a real use case. If you prefer watching over reading, catch their video on YouTube.
In this article, Chillar Anand analyzed 250 GB of a very popular web crawl dataset locally using DuckDB. He demonstrates the DuckDB feature which allows you to query remote files using HTTPFS.
Rust is increasing in popularity these days, and this article from Vikram Oberoi is a very interesting exploration of the topic of DuckDB + Rust.
"DuckDB is an in-process OLAP DBMS written in C++ blah blah blah, too complicated. Letβs start simple, shall we?." If you can see past the ads on the blog, Mark Lambert did an amazing job explaining how to start with DuckDB from scratch.
Another interesting tutorial on how to use DuckDB, the DuckDB shell (WASM), and Tad (tabular data viewer). The author, business analyst Sung Kim, has other interesting articles, including one on using DuckDB with Jupyter Notebooks.
Ismael provides great tips on the topic of which runtime component to use: Lambdas, Fargates, or VMs.
This interactive notebook demonstrates how to use the Observable DuckDB client, based on WASM.
If you're interested in experimenting with geospatial data in DuckDB, you can use this extension which adds a new GEO type and functionality for basic GIS data analysis.
In this tutorial, Juan Luis Cano explains how to get started with DuckDB in Python to analyze content from Reddit on climate change. He talks about interoperability between DuckDB and pandas DataFrames, Numpy arrays and more.
Although not until February 3rd, you should plan ahead if you want to join the DuckDB creators and contributors along with the MotherDuck team at this evening of talks, food and drinks. The event is in Brussels and collocated with FOSDEM.
Find something interesting in this newsletter? Share with your friends and let them know they can subscribe to receive it via email.

2022/11/11 - Tino Tereshko, Ryan Boyd
Fast aggregations, excellent SQL support, runs anywhere, provides simplified data access: cloud and local, works with your tools and frameworks.

2022/11/15 - MotherDuck team
MotherDuck is a new serverless data warehouse and backend for data apps based on DuckDB. MotherDuck provides SQL analytics at scale.