Hey, friend đź‘‹
It’s Marcos again, your “DuckDB News Reporter” with another issue of “This Month in the DuckDB Ecosystem" for April 2023. In this issue, we have a lot of great stuff to share with you, especially Jordan Tigani’s conversation with The Register, Mark Litwintschik’s play with the DuckDB Spatial extension, and much more. Every single day, we see more and more people using DuckDB in production environments with a very diverse set of use cases. So: It’s time to embrace the 🦆.
Remember: if you have any feedback for the newsletter, feel free to send us an email to duckdbnews@motherduck.com
-Marcos
Featured Community Member
Josh Wills
Josh Wills If you have been in the Data Analytics space for a while, you know very well who Josh Wills is. Perhaps you have read his famous quote “Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician”.
Or perhaps you have read his co-authored book called “Advanced Analytics with Spark”. Or even better: you have used the dbt extension for DuckDB created by him on production. You can find him on Twitter as @josh_wills.
Top DuckDB Links this Month
How We Silently Switched Mode’s In-Memory Data Engine to DuckDB To Boost Visual Data Exploration Speed
This very interesting post from the Mode team explains why they selected DuckDB as its in-memory data engine for one of its core features: speed.
DuckDB's Spatial Extension
In this post, Mark Litwintschik walks through some example GIS workflows with the DuckDB Spatial extension. Highly recommended reading!!!
How fast does a compressed file in Part 2
Steven P. Sanderson II, MPH came with a second part of his series about compressed files. This time using the combination of DuckDB and Apache Arrow
DuckDB makes SQL a first-class citizen on DataCamp Workspace
In this blog post, Filip Schouwenaars lists out all recent improvements that make it seamless and efficient to query data with SQL, all without leaving the tool; thanks to DuckDB.
Use dbt and DuckDB instead of Spark in data pipelines
Niels Claeys made a bold proposal here: ditch Spark for the combination of dbt and DuckDB. We are at a perfect time to explore this approach
DuckDB Document Loader by Trent Hauck
In this tweet, the LangChain team showed the awesome work of Trent Hauck about how to use the DuckDB Document Loader with an example. If you want to play with it, you can find the docs here.
Ex-BigQuery exec and Motherduck CEO: For some users, the answer is to think small
A very insightful interview with Jordan Tigani, CEO of MotherDuck where he shared things like
“DuckDB has been able to kind of strip all that away by being an in-process database, and that means that you basically can marshal data in and out of your application, or your data frames, with the minimum of data movements”.
It’s time to think small first.
Using DuckDB with Your Dremio Data Lakehouse
In this article, Alex Merced from Dremio discusses how you can use technologies like Dremio and DuckDB to create a low-cost, high-performance data lakehouse environment accessible to all your users.
Fixing iMessage search with DuckDB
Perhaps Apple: you should listen to Daniel Palma on this. DuckDB could be perfect for this use case here. Fixing iMessages on iOS is one of the most requested features out there, and with DuckDB they could actually fix this easily.
The message is given, Tim.
Upcoming events
Webinar: Doing Analysis in a Post Big Data Era: How industry leaders are driving high-impact decisions with smaller data
April 19, 2023, 10:00 AM PDT
Join us for a conversational webinar between Jordan Tigani, Founder and CEO at MotherDuck, and Benn Stancil, co-founder and CTO at Mode, two industry leaders who’ve called at the end of big data (Benn’s take; Jordan’s take).
In this discussion, they'll talk about how the hyped “We have tons of data, and we’re going to change the world with it” narrative of the 2010s looks from today’s vantage point — and how leading companies are navigating a higher impact, faster moving data-informed decision-making process using smaller data.
Webinar: Big Data: Funeral or Renaissance?
April 20, 2023, 12:00 PM
Jordan Tigani, CEO + Founder of MotherDuck and one of the founding engineers on Google BigQuery, recently wrote a blog post called "Big Data is Dead" which took the internet by storm.
Aditya Parameswaran, Co-Founder of Ponder and Associate Professor at UC Berkeley, wrote a rebuttal called "Big Data Is Dead… Long Live Big Data."
This interactive broadcast will be a fun and lively debate answering the question of whether we should host a funeral for big data or if big data is having a renaissance.
The debate will be moderated by Aaron Elmore, Associate Professor at the University of Chicago.
Data + AI Summit Keynote Day 2
June 29, 2023, San Francisco
Data, analytics and AI landscape Discover what’s driving so much focus on data and why data professionals are zeroing in on new ways to tackle their database challenges. Learn why there is so much interest in LLMs, what is happening across the data, analytics and AI landscape and the future of the market
Evolution of the lakehouse Take a look at the larger universe that the lakehouse lives inside of, learn what’s new and explore the evolution with us
Open source technologies Hear from the open source community about what’s new and what’s to come for Apache Spark™, Delta Lake and MLflow and learn how this affects the lakehouse and the overall market at large
Presenters:
- Hannes MĂĽhleisen, Co-Founder & CEO, DuckDB Labs
- Lin Qiao, Co-creator of PyTorch, Co-founder and CEO, Fireworks
- Nat Friedman, Creator of Copilot; Former CEO, Github
- Jitendra Malik, Computer Vision Pioneer, Former Head of Facebook AI Research, University of California at Berkeley
Subscribe to DuckDB Newsletter