Building data-driven components and applications doesn't have to be so ducking hard

This Month in the DuckDB Ecosystem: February 2024

2024/03/01

BY

Subscribe to the newsletter

Hey, friend đź‘‹

This is Ryan from MotherDuck, and I'm excited to present the 15th DuckDB ecosystem newsletter.

I'm even more excited that the DuckDB team just released 0.10.0, which introduces backwards compatibility in the DuckDB storage format and makes many improvements around performance, memory utilization and more. 

This issue goes into a bit more depth for each link we share. Let us know what you think or share your news by sending an email to duckdbnews@motherduck.com

Cheers!
-Ryan

Post Image
Post Image

Christophe Oudar

Based in France, Christophe is a Staff Software Engineer who works on cross-team initiatives scaling data systems. He has written about using DuckDB for data integration and also submitted a small fix to the DuckDB MySQL integration. He has an upcoming talk at the Paris DuckDB and MotherDuck meetup.  He maintains an active Medium blog sharing knowledge around BigQuery, dbt and DuckDB, including a recent post on "How DuckDB can be up to 1000x more efficient than BigQuery?"

Post Image

DuckDB 0.10.0: Backwards compatible, CSV loader perf, multi-database support, better memory management ++

The DuckDB team announced Fusca, the latest release of DuckDB.  It is named after the Velvet scooter native to Europe.

This version of DuckDB is backwards compatible in the storage format, improved memory utilization, dramatically improved the CSV loader performance and introduced improvements throughout the engine.  

There are dozens of additional improvements, so would highly encourage you to read the blog post.

"DuckDB in Action" book by Manning adds 4 new chapters

The top-rated "DuckDB in Action" book published by Manning has added four new chapters to the MEAP (early access) book. 

Chapter 5: Exploring data without persistence
Chapter 6: Integrating with the Python ecosystem
Chapter 7: DuckDB in the Cloud with MotherDuck
Chapter 8: Building data pipelines with DuckDB

You can download the book for free, courtesy of MotherDuck.

DuckCon #4 talk videos released

The DuckDB team released videos from the talks, with editing courtesy of Mehdi Ouazza. 

* State of the Duck [Hannes, Mark]
* Hugging a Duck [Polina Kazakova, Hugging Face]
* Building Data Lakes with DuckDB [Subash Roul, Fivetran]
* Duck Feather in your Parquet Cap [Niger Little-Pool, Prequel]

PyAirbyte: pipelines-as-code powered by DuckDB

The Airbyte team released a public beta of PyAirbyte, or the packaging of Airbyte connectors to make them accessible in code to "bridge the gap between the flexibility of custom Python scripts and the power of a data integration platform."  By default, PyAirbyte uses a DuckDB cache (destination), though MotherDuck, Postgres, Snowflake and BigQuery are also available. 

DuckDB-NSQL-7B LLM for DuckDB SQL released

Collaborating with MotherDuck, the Numbers Station team announced a LLM specifically tuned for text-to-SQL in the DuckDB dialect, with the ability to execute locally on a M1 laptop. Model weights were open sourced on Hugging Face and the model is available in GGUF format for llama.cpp.

Using DuckDB + Ibis for RAG

RAG, or retrieval-augmented generation, augments a LLM with additional knowledge before it generates its response. Is the knowledge you want to use to augment your LLM stored in DuckDB or MotherDuck? This article shows you how to build your RAG.

Why is DuckDB the default backend for Ibis?

It's becoming increasingly common to have DuckDB as the default analytics database used in data engineering tools. Earlier in the newsletter, we talked about how it's the default backend for PyAirbyte.  In this article, the Ibis folks talks about how DuckDB became their default to provide a great out-of-the-box experience.

They cite their reasons for choosing DuckDB as:

  1. Great performance for local data
  2. A thriving open source community
  3. A solid foundation
  4. A large and well-supported feature set

Using DuckDB-WASM for in-browser Data Engineering

Tobias gives an overview of how he built sql-workbench.com by leveraging DuckDB running in the browser via WASM (web assembly). He also uses Perspective.js for interactive data visualizations. There's quite a bit of functionality for a static website!

Plot(ly)ing Geo Data From DuckDB

Petrica demonstrates using the spatial extension in DuckDB to plot visualizations of restaurants in the Netherlands. She uses the choropleth map functionality to avoid having to acquire a Mapbox API key, which is also supported by Plotly.

DuckDB + dbt: Josh Wills Quacking and Coding

Mehdi had a very special guest on his Quack & Code livestream- Josh Wills, author of dbt-duckdb.  They discussed how dbt and DuckDB can be used together to accelerate the developer experience by using local resources. They then dived into some code together!

Post Image

Upcoming Events

DuckDB Meetup Paris

13 March, Paris 🇫🇷

MotherDuck, en collaboration avec Back Market, est heureuse d'annoncer notre 4eme rencontre en personne des groupes d'utilisateurs DuckDB en France, à Paris pour parler de DuckDB, MotherDuck et de tout ce qui concerne les données!

PyCon US 2024

17 May, Pittsburgh, PA, USA 🇺🇸

Alex Monahan of DuckDB Labs and MotherDuck will present a talk on "Python and SQL: Better Together, Powered by  @DuckDB." Exact date/time TBD

CONTENT
  1. Hey, friend đź‘‹
  2. Upcoming Events

Subscribe to the newsletter