2024/02/09 - Mehdi Ouazza
DuckDB & Python : end-to-end data engineering project [1/3]
A end-to-end data project to explore DuckDB with Python
This is Ryan from MotherDuck, and I'm excited to present the 15th DuckDB ecosystem newsletter.
I'm even more excited that the DuckDB team just released 0.10.0, which introduces backwards compatibility in the DuckDB storage format and makes many improvements around performance, memory utilization and more.Â
This issue goes into a bit more depth for each link we share. Let us know what you think or share your news by sending an email to duckdbnews@motherduck.com
Cheers!
-Ryan
![]() | Featured Community Member |

Based in France, Christophe is a Staff Software Engineer who works on cross-team initiatives scaling data systems. He has written about using DuckDB for data integration and also submitted a small fix to the DuckDB MySQL integration. He has an upcoming talk at the Paris DuckDB and MotherDuck meetup. He maintains an active Medium blog sharing knowledge around BigQuery, dbt and DuckDB, including a recent post on "How DuckDB can be up to 1000x more efficient than BigQuery?"
![]() | Top DuckDB Links this Month |
The DuckDB team announced Fusca, the latest release of DuckDB. It is named after the Velvet scooter native to Europe.
This version of DuckDB is backwards compatible in the storage format, improved memory utilization, dramatically improved the CSV loader performance and introduced improvements throughout the engine. Â
There are dozens of additional improvements, so would highly encourage you to read the blog post.
The top-rated "DuckDB in Action" book published by Manning has added four new chapters to the MEAP (early access) book.Â
Chapter 5: Exploring data without persistence
Chapter 6: Integrating with the Python ecosystem
Chapter 7: DuckDB in the Cloud with MotherDuck
Chapter 8: Building data pipelines with DuckDB
You can download the book for free, courtesy of MotherDuck.
The DuckDB team released videos from the talks, with editing courtesy of Mehdi Ouazza.Â
* State of the Duck [Hannes, Mark]
* Hugging a Duck [Polina Kazakova, Hugging Face]
* Building Data Lakes with DuckDB [Subash Roul, Fivetran]
* Duck Feather in your Parquet Cap [Niger Little-Pool, Prequel]
The Airbyte team released a public beta of PyAirbyte, or the packaging of Airbyte connectors to make them accessible in code to "bridge the gap between the flexibility of custom Python scripts and the power of a data integration platform."Â By default, PyAirbyte uses a DuckDB cache (destination), though MotherDuck, Postgres, Snowflake and BigQuery are also available.Â
Collaborating with MotherDuck, the Numbers Station team announced a LLM specifically tuned for text-to-SQL in the DuckDB dialect, with the ability to execute locally on a M1 laptop. Model weights were open sourced on Hugging Face and the model is available in GGUF format for llama.cpp.
RAG, or retrieval-augmented generation, augments a LLM with additional knowledge before it generates its response. Is the knowledge you want to use to augment your LLM stored in DuckDB or MotherDuck? This article shows you how to build your RAG.
It's becoming increasingly common to have DuckDB as the default analytics database used in data engineering tools. Earlier in the newsletter, we talked about how it's the default backend for PyAirbyte. In this article, the Ibis folks talks about how DuckDB became their default to provide a great out-of-the-box experience.
They cite their reasons for choosing DuckDB as:
Tobias gives an overview of how he built sql-workbench.com by leveraging DuckDB running in the browser via WASM (web assembly). He also uses Perspective.js for interactive data visualizations. There's quite a bit of functionality for a static website!
Petrica demonstrates using the spatial extension in DuckDB to plot visualizations of restaurants in the Netherlands. She uses the choropleth map functionality to avoid having to acquire a Mapbox API key, which is also supported by Plotly.
Mehdi had a very special guest on his Quack & Code livestream- Josh Wills, author of dbt-duckdb. They discussed how dbt and DuckDB can be used together to accelerate the developer experience by using local resources. They then dived into some code together!
![]() | Upcoming Events |
13 March, Paris 🇫🇷
MotherDuck, en collaboration avec Back Market, est heureuse d'annoncer notre 4eme rencontre en personne des groupes d'utilisateurs DuckDB en France, à Paris pour parler de DuckDB, MotherDuck et de tout ce qui concerne les données!
17 May, Pittsburgh, PA, USA 🇺🇸
Alex Monahan of DuckDB Labs and MotherDuck will present a talk on "Python and SQL: Better Together, Powered by  @DuckDB." Exact date/time TBD