2023/11/02 - Mehdi Ouazza
Making PySpark Code Faster with DuckDB
Making PySpark Code Faster with DuckDB
It’s Marcos again, aka “DuckDB News Reporter” with another issue of “This Month in the DuckDB Ecosystem" for November 2023.
As always, this is a two-way conversation: if you have any feedback on this newsletter, feel free to send us an email to duckdbnews@motherduck.com
![]() | Featured Community Member |

David Gasquez is a data engineer at Protocol Labs.
David has done an awesome service to the DuckDB community by creating a GitHub project to curate all the awesome DuckDB libraries, tools and resources.
Check out Awesome DuckDB on GitHub.
![]() | Top DuckDB Links this Month |
Simon Willison is using the power of DuckDB to access remote Parquet files, hosted on Hugging Face, to work with 148 TB of image data.
Jie Jenn provides an useful example about how to use Python and DuckDB for Data Analytics
Do you really need a big data warehouse service? According to Kieran Healy, you can solve your problems with DuckDB.
The ecosystem for DuckDB is growing exponentially every single day, and this integration with Cube is a good example of it.
If you are a researcher, you need to hear this advice from Dirk Petersen: move to DuckDB now and enjoy the benefits.
This is a very interesting benchmark conducted by Matthew Rocklin
If you were waiting for a signal to combine Apache Superset and DuckDB, this is it.
This presentation from Gábor Szárnyas provides a lot of insights about in-process analytics using the power of DuckDB.
When, Why, and How You Should Consider Using DuckDB, according to Karen Zhang.
The modern data stack is evolving with time, but we know one thing for sure: DuckDB is part of it; and this article from Rahul Soni proves it.
More people are considering to use DuckDB for geospatial apps (and they should), and Jake Gearon shares an interesting example here.
The answer is obvious: DuckDB; according to Julien Hurault
![]() | Upcoming Events |
6 December 2023 | Online 🌐
MotherDuck co-founder Ryan Boyd will speak at the Airbyte move(data) conference on Fixing the data Engineering Lifecycle, a short talk based on his discussions with the community in preparation for the panel at Coalesce with the same name.
28th-30th November 2023 | Online 🌐
This online conference will feature talks from leaders across Data, Cloud, Blockchain, AI, Web3. MotherDuck co-founder Ryan Boyd will present on "Data Analytics in the Post-Big-Data Era."
27 January 2024 | Austin, Texas, USA 🇺🇸
In this session, Peter Boncz will discuss the evolution of analytical database systems, starting from the classical relational database systems, all the way to DuckDB - the fastest growing data system today.